All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
@ 2019-04-08 16:59 Kairui Song
  2019-04-15 12:38 ` Jiri Olsa
  2019-04-15 15:36 ` Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Kairui Song @ 2019-04-08 16:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young,
	Kairui Song

Currently perf callchain is not working properly with ORC unwinder,
and sampling event from trace point. We'll get useless in kernel
callchain like this:

perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	5651468729c1 [unknown] (/usr/bin/perf)
	5651467ee82a main+0x69a (/usr/bin/perf)
	7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

The root cause is within a trace point perf will try to dump the
caller's register, but without CONFIG_FRAME_POINTER we can't get
caller's BP as the frame pointer, so current frame pointer is returned
instead. We get a register combination of caller IP and current BP,
which confuse the unwinder and end the stacktrace early.

So in such case don't dump BP, and just let the unwinder start directly
and skip until we reached the stack we wanted.

This make the callchain get the full kernel space stacktrace again:

perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
    ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
    ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
	7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
	55a22960d9c1 [unknown] (/usr/bin/perf)
	55a22958982a main+0x69a (/usr/bin/perf)
	7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
    5541f689495641d7 [unknown] ([unknown])

Signed-off-by: Kairui Song <kasong@redhat.com>
---

Update from V1:
  Get rid of a lot of unneccessary code and just don't dump a inaccurate
  BP, and use SP as the marker for target frame.

 arch/x86/events/core.c            | 18 +++++++++++++++---
 arch/x86/include/asm/stacktrace.h |  9 +++++++--
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e2b1447192a8..6075a4f94376 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
 	cyc2ns_read_end();
 }
 
+static inline int
+valid_perf_registers(struct pt_regs *regs)
+{
+	return (regs->ip && regs->bp && regs->sp);
+}
+
 void
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
@@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 		return;
 	}
 
-	if (perf_callchain_store(entry, regs->ip))
+	if (valid_perf_registers(regs)) {
+		if (perf_callchain_store(entry, regs->ip))
+			return;
+		unwind_start(&state, current, regs, NULL);
+	} else if (regs->sp) {
+		unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
+	} else {
 		return;
+	}
 
-	for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
-	     unwind_next_frame(&state)) {
+	for (; !unwind_done(&state); unwind_next_frame(&state)) {
 		addr = unwind_get_return_address(&state);
 		if (!addr || perf_callchain_store(entry, addr))
 			return;
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f335aad404a4..226077e20412 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -98,18 +98,23 @@ struct stack_frame_ia32 {
     u32 return_address;
 };
 
+#ifdef CONFIG_FRAME_POINTER
 static inline unsigned long caller_frame_pointer(void)
 {
 	struct stack_frame *frame;
 
 	frame = __builtin_frame_address(0);
 
-#ifdef CONFIG_FRAME_POINTER
 	frame = frame->next_frame;
-#endif
 
 	return (unsigned long)frame;
 }
+#else
+static inline unsigned long caller_frame_pointer(void)
+{
+	return 0;
+}
+#endif
 
 void show_opcodes(struct pt_regs *regs, const char *loglvl);
 void show_ip(struct pt_regs *regs, const char *loglvl);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-08 16:59 [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
@ 2019-04-15 12:38 ` Jiri Olsa
  2019-04-15 15:36 ` Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: Jiri Olsa @ 2019-04-15 12:38 UTC (permalink / raw)
  To: Kairui Song
  Cc: linux-kernel, Peter Zijlstra, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	Thomas Gleixner, Borislav Petkov, Dave Young

On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> Currently perf callchain is not working properly with ORC unwinder,
> and sampling event from trace point. We'll get useless in kernel
> callchain like this:
> 
> perf  6429 [000]    22.498450:             kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL
>     ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
> 	7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> 	5651468729c1 [unknown] (/usr/bin/perf)
> 	5651467ee82a main+0x69a (/usr/bin/perf)
> 	7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
>     5541f689495641d7 [unknown] ([unknown])
> 
> The root cause is within a trace point perf will try to dump the
> caller's register, but without CONFIG_FRAME_POINTER we can't get
> caller's BP as the frame pointer, so current frame pointer is returned
> instead. We get a register combination of caller IP and current BP,
> which confuse the unwinder and end the stacktrace early.
> 
> So in such case don't dump BP, and just let the unwinder start directly
> and skip until we reached the stack we wanted.
> 
> This make the callchain get the full kernel space stacktrace again:
> 
> perf  6503 [000]  1567.570191:             kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL
>     ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux)
>     ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux)
> 	7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so)
> 	55a22960d9c1 [unknown] (/usr/bin/perf)
> 	55a22958982a main+0x69a (/usr/bin/perf)
> 	7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so)
>     5541f689495641d7 [unknown] ([unknown])
> 
> Signed-off-by: Kairui Song <kasong@redhat.com>

Tested-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> ---
> 
> Update from V1:
>   Get rid of a lot of unneccessary code and just don't dump a inaccurate
>   BP, and use SP as the marker for target frame.
> 
>  arch/x86/events/core.c            | 18 +++++++++++++++---
>  arch/x86/include/asm/stacktrace.h |  9 +++++++--
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index e2b1447192a8..6075a4f94376 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
>  	cyc2ns_read_end();
>  }
>  
> +static inline int
> +valid_perf_registers(struct pt_regs *regs)
> +{
> +	return (regs->ip && regs->bp && regs->sp);
> +}
> +
>  void
>  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
>  {
> @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>  		return;
>  	}
>  
> -	if (perf_callchain_store(entry, regs->ip))
> +	if (valid_perf_registers(regs)) {
> +		if (perf_callchain_store(entry, regs->ip))
> +			return;
> +		unwind_start(&state, current, regs, NULL);
> +	} else if (regs->sp) {
> +		unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> +	} else {
>  		return;
> +	}
>  
> -	for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
> -	     unwind_next_frame(&state)) {
> +	for (; !unwind_done(&state); unwind_next_frame(&state)) {
>  		addr = unwind_get_return_address(&state);
>  		if (!addr || perf_callchain_store(entry, addr))
>  			return;
> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index f335aad404a4..226077e20412 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -98,18 +98,23 @@ struct stack_frame_ia32 {
>      u32 return_address;
>  };
>  
> +#ifdef CONFIG_FRAME_POINTER
>  static inline unsigned long caller_frame_pointer(void)
>  {
>  	struct stack_frame *frame;
>  
>  	frame = __builtin_frame_address(0);
>  
> -#ifdef CONFIG_FRAME_POINTER
>  	frame = frame->next_frame;
> -#endif
>  
>  	return (unsigned long)frame;
>  }
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +	return 0;
> +}
> +#endif
>  
>  void show_opcodes(struct pt_regs *regs, const char *loglvl);
>  void show_ip(struct pt_regs *regs, const char *loglvl);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-08 16:59 [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
  2019-04-15 12:38 ` Jiri Olsa
@ 2019-04-15 15:36 ` Peter Zijlstra
  2019-04-15 16:58   ` Josh Poimboeuf
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-15 15:36 UTC (permalink / raw)
  To: Kairui Song
  Cc: linux-kernel, Josh Poimboeuf, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young


I'll mostly defer to Josh on unwinding, but a few comments below.

On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index e2b1447192a8..6075a4f94376 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
>  	cyc2ns_read_end();
>  }
>  
> +static inline int
> +valid_perf_registers(struct pt_regs *regs)
> +{
> +	return (regs->ip && regs->bp && regs->sp);
> +}

I'm unconvinced by this, with both guess and orc having !bp is perfectly
valid.

>  void
>  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
>  {
> @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>  		return;
>  	}
>  
> -	if (perf_callchain_store(entry, regs->ip))
> +	if (valid_perf_registers(regs)) {
> +		if (perf_callchain_store(entry, regs->ip))
> +			return;
> +		unwind_start(&state, current, regs, NULL);
> +	} else if (regs->sp) {
> +		unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> +	} else {
>  		return;
> +	}

AFAICT if we, by pure accident, end up with !bp for ORC, then we
initialize the unwind wrong.

Note that @regs is mostly trivially correct, except for that tracepoint
case. So I don't think we should magic here.

> -	for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
> -	     unwind_next_frame(&state)) {
> +	for (; !unwind_done(&state); unwind_next_frame(&state)) {
>  		addr = unwind_get_return_address(&state);
>  		if (!addr || perf_callchain_store(entry, addr))
>  			return;


> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index f335aad404a4..226077e20412 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -98,18 +98,23 @@ struct stack_frame_ia32 {
>      u32 return_address;
>  };
>  
> +#ifdef CONFIG_FRAME_POINTER
>  static inline unsigned long caller_frame_pointer(void)
>  {
>  	struct stack_frame *frame;
>  
>  	frame = __builtin_frame_address(0);
>  
> -#ifdef CONFIG_FRAME_POINTER
>  	frame = frame->next_frame;
> -#endif
>  
>  	return (unsigned long)frame;
>  }
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +	return 0;
> +}
> +#endif

OK, that makes sense I guess.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-15 15:36 ` Peter Zijlstra
@ 2019-04-15 16:58   ` Josh Poimboeuf
  2019-04-16 11:30     ` Kairui Song
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Poimboeuf @ 2019-04-15 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kairui Song, linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Thomas Gleixner,
	Borislav Petkov, Dave Young

On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> 
> I'll mostly defer to Josh on unwinding, but a few comments below.
> 
> On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index e2b1447192a8..6075a4f94376 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> >  	cyc2ns_read_end();
> >  }
> >  
> > +static inline int
> > +valid_perf_registers(struct pt_regs *regs)
> > +{
> > +	return (regs->ip && regs->bp && regs->sp);
> > +}
> 
> I'm unconvinced by this, with both guess and orc having !bp is perfectly
> valid.
> 
> >  void
> >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> >  {
> > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> >  		return;
> >  	}
> >  
> > -	if (perf_callchain_store(entry, regs->ip))
> > +	if (valid_perf_registers(regs)) {
> > +		if (perf_callchain_store(entry, regs->ip))
> > +			return;
> > +		unwind_start(&state, current, regs, NULL);
> > +	} else if (regs->sp) {
> > +		unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > +	} else {
> >  		return;
> > +	}
> 
> AFAICT if we, by pure accident, end up with !bp for ORC, then we
> initialize the unwind wrong.
> 
> Note that @regs is mostly trivially correct, except for that tracepoint
> case. So I don't think we should magic here.

Ah, I didn't quite understand this code before, and I still don't
really, but I guess the issue is that @regs can be either real or fake.

In the real @regs case, we just want to always unwind starting from
regs->sp.

But in the fake @regs case, we should instead unwind from the current
frame, skipping all frames until we hit the fake regs->sp.  Because
starting from fake/incomplete regs is most likely going to cause
problems with ORC (or DWARF for other arches).

The idea of a fake regs is fragile and confusing.  Is it possible to
just pass in the "skip" stack pointer directly instead?  That should
work for both FP and non-FP.  And I _think_ there's no need to ever
capture regs->bp anyway -- the stack pointer should be sufficient.

In other words, either regs should be "real", and skip_sp is NULL; or
regs should be NULL and skip_sp should have a value.

-- 
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-15 16:58   ` Josh Poimboeuf
@ 2019-04-16 11:30     ` Kairui Song
  2019-04-16 16:16       ` Alexei Starovoitov
  2019-04-16 17:39       ` Kairui Song
  0 siblings, 2 replies; 14+ messages in thread
From: Kairui Song @ 2019-04-16 11:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Kairui Song, Linux Kernel Mailing List,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Thomas Gleixner, Borislav Petkov,
	Dave Young

On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> >
> > I'll mostly defer to Josh on unwinding, but a few comments below.
> >
> > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > index e2b1447192a8..6075a4f94376 100644
> > > --- a/arch/x86/events/core.c
> > > +++ b/arch/x86/events/core.c
> > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> > >     cyc2ns_read_end();
> > >  }
> > >
> > > +static inline int
> > > +valid_perf_registers(struct pt_regs *regs)
> > > +{
> > > +   return (regs->ip && regs->bp && regs->sp);
> > > +}
> >
> > I'm unconvinced by this, with both guess and orc having !bp is perfectly
> > valid.
> >
> > >  void
> > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> > >  {
> > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> > >             return;
> > >     }
> > >
> > > -   if (perf_callchain_store(entry, regs->ip))
> > > +   if (valid_perf_registers(regs)) {
> > > +           if (perf_callchain_store(entry, regs->ip))
> > > +                   return;
> > > +           unwind_start(&state, current, regs, NULL);
> > > +   } else if (regs->sp) {
> > > +           unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > > +   } else {
> > >             return;
> > > +   }
> >
> > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > initialize the unwind wrong.
> >
> > Note that @regs is mostly trivially correct, except for that tracepoint
> > case. So I don't think we should magic here.
>
> Ah, I didn't quite understand this code before, and I still don't
> really, but I guess the issue is that @regs can be either real or fake.
>
> In the real @regs case, we just want to always unwind starting from
> regs->sp.
>
> But in the fake @regs case, we should instead unwind from the current
> frame, skipping all frames until we hit the fake regs->sp.  Because
> starting from fake/incomplete regs is most likely going to cause
> problems with ORC (or DWARF for other arches).
>
> The idea of a fake regs is fragile and confusing.  Is it possible to
> just pass in the "skip" stack pointer directly instead?  That should
> work for both FP and non-FP.  And I _think_ there's no need to ever
> capture regs->bp anyway -- the stack pointer should be sufficient.

Hi, that will break some other usage, if perf_callchain_kernel is
called but it won't unwind to the callsite (could be produced by
attach an ebpf call to kprobe), things will also go wrong. It should
start with given registers when the register is valid.
And it's true with omit frame pointer BP value could be anything, so 0
is also valid, I think I need to find a better way to tell if we could
start with the registers value or direct start unwinding and skip
until got the stack.

>
> In other words, either regs should be "real", and skip_sp is NULL; or
> regs should be NULL and skip_sp should have a value.
>
> --
> Josh
--
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 11:30     ` Kairui Song
@ 2019-04-16 16:16       ` Alexei Starovoitov
  2019-04-16 17:39       ` Kairui Song
  1 sibling, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2019-04-16 16:16 UTC (permalink / raw)
  To: Kairui Song
  Cc: Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Thomas Gleixner, Borislav Petkov,
	Dave Young, guro, yhs, brendan.d.gregg

On Tue, Apr 16, 2019 at 07:30:07PM +0800, Kairui Song wrote:
> On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> > >
> > > I'll mostly defer to Josh on unwinding, but a few comments below.
> > >
> > > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > index e2b1447192a8..6075a4f94376 100644
> > > > --- a/arch/x86/events/core.c
> > > > +++ b/arch/x86/events/core.c
> > > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> > > >     cyc2ns_read_end();
> > > >  }
> > > >
> > > > +static inline int
> > > > +valid_perf_registers(struct pt_regs *regs)
> > > > +{
> > > > +   return (regs->ip && regs->bp && regs->sp);
> > > > +}
> > >
> > > I'm unconvinced by this, with both guess and orc having !bp is perfectly
> > > valid.
> > >
> > > >  void
> > > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> > > >  {
> > > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> > > >             return;
> > > >     }
> > > >
> > > > -   if (perf_callchain_store(entry, regs->ip))
> > > > +   if (valid_perf_registers(regs)) {
> > > > +           if (perf_callchain_store(entry, regs->ip))
> > > > +                   return;
> > > > +           unwind_start(&state, current, regs, NULL);
> > > > +   } else if (regs->sp) {
> > > > +           unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > > > +   } else {
> > > >             return;
> > > > +   }
> > >
> > > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > > initialize the unwind wrong.
> > >
> > > Note that @regs is mostly trivially correct, except for that tracepoint
> > > case. So I don't think we should magic here.
> >
> > Ah, I didn't quite understand this code before, and I still don't
> > really, but I guess the issue is that @regs can be either real or fake.
> >
> > In the real @regs case, we just want to always unwind starting from
> > regs->sp.
> >
> > But in the fake @regs case, we should instead unwind from the current
> > frame, skipping all frames until we hit the fake regs->sp.  Because
> > starting from fake/incomplete regs is most likely going to cause
> > problems with ORC (or DWARF for other arches).
> >
> > The idea of a fake regs is fragile and confusing.  Is it possible to
> > just pass in the "skip" stack pointer directly instead?  That should
> > work for both FP and non-FP.  And I _think_ there's no need to ever
> > capture regs->bp anyway -- the stack pointer should be sufficient.
> 
> Hi, that will break some other usage, if perf_callchain_kernel is
> called but it won't unwind to the callsite (could be produced by
> attach an ebpf call to kprobe), things will also go wrong. It should
> start with given registers when the register is valid.
> And it's true with omit frame pointer BP value could be anything, so 0
> is also valid, I think I need to find a better way to tell if we could
> start with the registers value or direct start unwinding and skip
> until got the stack.

Hey, what is the status of the fix?
We're hitting it from bpf as well.
kernel stack traces are not working in tracepoints.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 11:30     ` Kairui Song
  2019-04-16 16:16       ` Alexei Starovoitov
@ 2019-04-16 17:39       ` Kairui Song
  2019-04-16 17:45         ` Peter Zijlstra
  2019-04-16 20:15         ` Josh Poimboeuf
  1 sibling, 2 replies; 14+ messages in thread
From: Kairui Song @ 2019-04-16 17:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Kairui Song, Linux Kernel Mailing List,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Thomas Gleixner, Borislav Petkov,
	Dave Young

On Tue, Apr 16, 2019 at 7:30 PM Kairui Song <kasong@redhat.com> wrote:
>
> On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> > >
> > > I'll mostly defer to Josh on unwinding, but a few comments below.
> > >
> > > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > index e2b1447192a8..6075a4f94376 100644
> > > > --- a/arch/x86/events/core.c
> > > > +++ b/arch/x86/events/core.c
> > > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> > > >     cyc2ns_read_end();
> > > >  }
> > > >
> > > > +static inline int
> > > > +valid_perf_registers(struct pt_regs *regs)
> > > > +{
> > > > +   return (regs->ip && regs->bp && regs->sp);
> > > > +}
> > >
> > > I'm unconvinced by this, with both guess and orc having !bp is perfectly
> > > valid.
> > >
> > > >  void
> > > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> > > >  {
> > > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> > > >             return;
> > > >     }
> > > >
> > > > -   if (perf_callchain_store(entry, regs->ip))
> > > > +   if (valid_perf_registers(regs)) {
> > > > +           if (perf_callchain_store(entry, regs->ip))
> > > > +                   return;
> > > > +           unwind_start(&state, current, regs, NULL);
> > > > +   } else if (regs->sp) {
> > > > +           unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > > > +   } else {
> > > >             return;
> > > > +   }
> > >
> > > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > > initialize the unwind wrong.
> > >
> > > Note that @regs is mostly trivially correct, except for that tracepoint
> > > case. So I don't think we should magic here.
> >
> > Ah, I didn't quite understand this code before, and I still don't
> > really, but I guess the issue is that @regs can be either real or fake.
> >
> > In the real @regs case, we just want to always unwind starting from
> > regs->sp.
> >
> > But in the fake @regs case, we should instead unwind from the current
> > frame, skipping all frames until we hit the fake regs->sp.  Because
> > starting from fake/incomplete regs is most likely going to cause
> > problems with ORC (or DWARF for other arches).
> >
> > The idea of a fake regs is fragile and confusing.  Is it possible to
> > just pass in the "skip" stack pointer directly instead?  That should
> > work for both FP and non-FP.  And I _think_ there's no need to ever
> > capture regs->bp anyway -- the stack pointer should be sufficient.
>
> Hi, that will break some other usage, if perf_callchain_kernel is
> called but it won't unwind to the callsite (could be produced by
> attach an ebpf call to kprobe), things will also go wrong. It should
> start with given registers when the register is valid.
> And it's true with omit frame pointer BP value could be anything, so 0
> is also valid, I think I need to find a better way to tell if we could
> start with the registers value or direct start unwinding and skip
> until got the stack.
>

Hi, sorry I might have some misunderstanding. Adding an extra argument
(eg. skip_sp) to indicate if it should just unwind from the current
frame, and use SP as the "skip mark", should work well.

And I also think the "fake"/"real" reg is fragile, could we abuse
another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
partially dumped fake registers? So perf_callchain_kernel just check
if it's a "partial registers", and in such case it can start unwinding
and skip until it get to SP. This make it easier to tell if the
registers are "fake".

-- 
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 17:39       ` Kairui Song
@ 2019-04-16 17:45         ` Peter Zijlstra
  2019-04-17 14:41           ` Kairui Song
  2019-04-16 20:15         ` Josh Poimboeuf
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-16 17:45 UTC (permalink / raw)
  To: Kairui Song
  Cc: Josh Poimboeuf, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> And I also think the "fake"/"real" reg is fragile, could we abuse
> another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> partially dumped fake registers? 

Sure, the SDM seems to suggest bits 1,3,5,15 are 'available'. We've
already used 3 and 5, and I think we can use !X86_EFLAGS_FIXED to
indicate a fake regs set. Any real regs set will always have that set.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 17:39       ` Kairui Song
  2019-04-16 17:45         ` Peter Zijlstra
@ 2019-04-16 20:15         ` Josh Poimboeuf
  2019-04-17  7:07           ` Peter Zijlstra
  2019-04-17 14:44           ` Kairui Song
  1 sibling, 2 replies; 14+ messages in thread
From: Josh Poimboeuf @ 2019-04-16 20:15 UTC (permalink / raw)
  To: Kairui Song
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> On Tue, Apr 16, 2019 at 7:30 PM Kairui Song <kasong@redhat.com> wrote:
> >
> > On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> > > >
> > > > I'll mostly defer to Josh on unwinding, but a few comments below.
> > > >
> > > > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > > index e2b1447192a8..6075a4f94376 100644
> > > > > --- a/arch/x86/events/core.c
> > > > > +++ b/arch/x86/events/core.c
> > > > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> > > > >     cyc2ns_read_end();
> > > > >  }
> > > > >
> > > > > +static inline int
> > > > > +valid_perf_registers(struct pt_regs *regs)
> > > > > +{
> > > > > +   return (regs->ip && regs->bp && regs->sp);
> > > > > +}
> > > >
> > > > I'm unconvinced by this, with both guess and orc having !bp is perfectly
> > > > valid.
> > > >
> > > > >  void
> > > > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> > > > >  {
> > > > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> > > > >             return;
> > > > >     }
> > > > >
> > > > > -   if (perf_callchain_store(entry, regs->ip))
> > > > > +   if (valid_perf_registers(regs)) {
> > > > > +           if (perf_callchain_store(entry, regs->ip))
> > > > > +                   return;
> > > > > +           unwind_start(&state, current, regs, NULL);
> > > > > +   } else if (regs->sp) {
> > > > > +           unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > > > > +   } else {
> > > > >             return;
> > > > > +   }
> > > >
> > > > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > > > initialize the unwind wrong.
> > > >
> > > > Note that @regs is mostly trivially correct, except for that tracepoint
> > > > case. So I don't think we should magic here.
> > >
> > > Ah, I didn't quite understand this code before, and I still don't
> > > really, but I guess the issue is that @regs can be either real or fake.
> > >
> > > In the real @regs case, we just want to always unwind starting from
> > > regs->sp.
> > >
> > > But in the fake @regs case, we should instead unwind from the current
> > > frame, skipping all frames until we hit the fake regs->sp.  Because
> > > starting from fake/incomplete regs is most likely going to cause
> > > problems with ORC (or DWARF for other arches).
> > >
> > > The idea of a fake regs is fragile and confusing.  Is it possible to
> > > just pass in the "skip" stack pointer directly instead?  That should
> > > work for both FP and non-FP.  And I _think_ there's no need to ever
> > > capture regs->bp anyway -- the stack pointer should be sufficient.
> >
> > Hi, that will break some other usage, if perf_callchain_kernel is
> > called but it won't unwind to the callsite (could be produced by
> > attach an ebpf call to kprobe), things will also go wrong. It should
> > start with given registers when the register is valid.
> > And it's true with omit frame pointer BP value could be anything, so 0
> > is also valid, I think I need to find a better way to tell if we could
> > start with the registers value or direct start unwinding and skip
> > until got the stack.
> >
> 
> Hi, sorry I might have some misunderstanding. Adding an extra argument
> (eg. skip_sp) to indicate if it should just unwind from the current
> frame, and use SP as the "skip mark", should work well.
> 
> And I also think the "fake"/"real" reg is fragile, could we abuse
> another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> partially dumped fake registers? So perf_callchain_kernel just check
> if it's a "partial registers", and in such case it can start unwinding
> and skip until it get to SP. This make it easier to tell if the
> registers are "fake".

If you do the regs->eflags thing to mark the regs as fake in
(perf_arch_fetch_caller_regs()), then I don't think skip_sp would be
needed, because regs->sp could probably mark the skip point.

Instead I was actually hoping we could get rid of fake regs and
perf_arch_fetch_caller_regs() altogether, because it's a nasty hack.
But I don't know what else those fake regs are used for.

-- 
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 20:15         ` Josh Poimboeuf
@ 2019-04-17  7:07           ` Peter Zijlstra
  2019-04-18  2:15             ` Josh Poimboeuf
  2019-04-17 14:44           ` Kairui Song
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-17  7:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Kairui Song, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Tue, Apr 16, 2019 at 03:15:59PM -0500, Josh Poimboeuf wrote:
> If you do the regs->eflags thing to mark the regs as fake in
> (perf_arch_fetch_caller_regs()), then I don't think skip_sp would be
> needed, because regs->sp could probably mark the skip point.
> 
> Instead I was actually hoping we could get rid of fake regs and
> perf_arch_fetch_caller_regs() altogether, because it's a nasty hack.
> But I don't know what else those fake regs are used for.

This is the generic perf generate a sample path. It doesn't know the
context. Normally it is an interrupt or exception of sorts and we simply
pass the pt_regs from that down the chain and all is good.

It is just for a few software events, such as SW_CONTEXT_SWITCH,
SW_MIGRATIONS and all the TP muck that we do not have regs to pass down.

These regs are used by:

 - SAMPLE_IP,
 - SAMPLE_CALLCHAIN (this here issue),
 - SAMPLE_REGS_INTR,
 - a few misc bits

There's actually comment on top of perf_fetch_caller_regs() that tries
to explain what is needed -- although I suppose that could do with an
update.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 17:45         ` Peter Zijlstra
@ 2019-04-17 14:41           ` Kairui Song
  0 siblings, 0 replies; 14+ messages in thread
From: Kairui Song @ 2019-04-17 14:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 1:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> > And I also think the "fake"/"real" reg is fragile, could we abuse
> > another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> > partially dumped fake registers?
>
> Sure, the SDM seems to suggest bits 1,3,5,15 are 'available'. We've
> already used 3 and 5, and I think we can use !X86_EFLAGS_FIXED to
> indicate a fake regs set. Any real regs set will always have that set.

Thanks! This is a good idea. Will update accordingly in V3 later.





--
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-16 20:15         ` Josh Poimboeuf
  2019-04-17  7:07           ` Peter Zijlstra
@ 2019-04-17 14:44           ` Kairui Song
  2019-04-18  7:54             ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Kairui Song @ 2019-04-17 14:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 4:16 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> > On Tue, Apr 16, 2019 at 7:30 PM Kairui Song <kasong@redhat.com> wrote:
> > >
> > > On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > >
> > > > On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> > > > >
> > > > > I'll mostly defer to Josh on unwinding, but a few comments below.
> > > > >
> > > > > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > > > index e2b1447192a8..6075a4f94376 100644
> > > > > > --- a/arch/x86/events/core.c
> > > > > > +++ b/arch/x86/events/core.c
> > > > > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct perf_event *event,
> > > > > >     cyc2ns_read_end();
> > > > > >  }
> > > > > >
> > > > > > +static inline int
> > > > > > +valid_perf_registers(struct pt_regs *regs)
> > > > > > +{
> > > > > > +   return (regs->ip && regs->bp && regs->sp);
> > > > > > +}
> > > > >
> > > > > I'm unconvinced by this, with both guess and orc having !bp is perfectly
> > > > > valid.
> > > > >
> > > > > >  void
> > > > > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
> > > > > >  {
> > > > > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> > > > > >             return;
> > > > > >     }
> > > > > >
> > > > > > -   if (perf_callchain_store(entry, regs->ip))
> > > > > > +   if (valid_perf_registers(regs)) {
> > > > > > +           if (perf_callchain_store(entry, regs->ip))
> > > > > > +                   return;
> > > > > > +           unwind_start(&state, current, regs, NULL);
> > > > > > +   } else if (regs->sp) {
> > > > > > +           unwind_start(&state, current, NULL, (unsigned long *)regs->sp);
> > > > > > +   } else {
> > > > > >             return;
> > > > > > +   }
> > > > >
> > > > > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > > > > initialize the unwind wrong.
> > > > >
> > > > > Note that @regs is mostly trivially correct, except for that tracepoint
> > > > > case. So I don't think we should magic here.
> > > >
> > > > Ah, I didn't quite understand this code before, and I still don't
> > > > really, but I guess the issue is that @regs can be either real or fake.
> > > >
> > > > In the real @regs case, we just want to always unwind starting from
> > > > regs->sp.
> > > >
> > > > But in the fake @regs case, we should instead unwind from the current
> > > > frame, skipping all frames until we hit the fake regs->sp.  Because
> > > > starting from fake/incomplete regs is most likely going to cause
> > > > problems with ORC (or DWARF for other arches).
> > > >
> > > > The idea of a fake regs is fragile and confusing.  Is it possible to
> > > > just pass in the "skip" stack pointer directly instead?  That should
> > > > work for both FP and non-FP.  And I _think_ there's no need to ever
> > > > capture regs->bp anyway -- the stack pointer should be sufficient.
> > >
> > > Hi, that will break some other usage, if perf_callchain_kernel is
> > > called but it won't unwind to the callsite (could be produced by
> > > attach an ebpf call to kprobe), things will also go wrong. It should
> > > start with given registers when the register is valid.
> > > And it's true with omit frame pointer BP value could be anything, so 0
> > > is also valid, I think I need to find a better way to tell if we could
> > > start with the registers value or direct start unwinding and skip
> > > until got the stack.
> > >
> >
> > Hi, sorry I might have some misunderstanding. Adding an extra argument
> > (eg. skip_sp) to indicate if it should just unwind from the current
> > frame, and use SP as the "skip mark", should work well.
> >
> > And I also think the "fake"/"real" reg is fragile, could we abuse
> > another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> > partially dumped fake registers? So perf_callchain_kernel just check
> > if it's a "partial registers", and in such case it can start unwinding
> > and skip until it get to SP. This make it easier to tell if the
> > registers are "fake".
>
> If you do the regs->eflags thing to mark the regs as fake in
> (perf_arch_fetch_caller_regs()), then I don't think skip_sp would be
> needed, because regs->sp could probably mark the skip point.
>
> Instead I was actually hoping we could get rid of fake regs and
> perf_arch_fetch_caller_regs() altogether, because it's a nasty hack.
> But I don't know what else those fake regs are used for.
>

Despite it's hacky, it seems not necessary to dump every register. And
is there a straight way to get caller's regs in the trace point? It
seems more trouble some. Or if we just use the regs inside the
tracepoint, but it would need even more workaround (eg. unwind one
frame before do anything).

-- 
Best Regards,
Kairui Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-17  7:07           ` Peter Zijlstra
@ 2019-04-18  2:15             ` Josh Poimboeuf
  0 siblings, 0 replies; 14+ messages in thread
From: Josh Poimboeuf @ 2019-04-18  2:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kairui Song, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 09:07:35AM +0200, Peter Zijlstra wrote:
> On Tue, Apr 16, 2019 at 03:15:59PM -0500, Josh Poimboeuf wrote:
> > If you do the regs->eflags thing to mark the regs as fake in
> > (perf_arch_fetch_caller_regs()), then I don't think skip_sp would be
> > needed, because regs->sp could probably mark the skip point.
> > 
> > Instead I was actually hoping we could get rid of fake regs and
> > perf_arch_fetch_caller_regs() altogether, because it's a nasty hack.
> > But I don't know what else those fake regs are used for.
> 
> This is the generic perf generate a sample path. It doesn't know the
> context. Normally it is an interrupt or exception of sorts and we simply
> pass the pt_regs from that down the chain and all is good.
> 
> It is just for a few software events, such as SW_CONTEXT_SWITCH,
> SW_MIGRATIONS and all the TP muck that we do not have regs to pass down.
> 
> These regs are used by:
> 
>  - SAMPLE_IP,
>  - SAMPLE_CALLCHAIN (this here issue),
>  - SAMPLE_REGS_INTR,
>  - a few misc bits
> 
> There's actually comment on top of perf_fetch_caller_regs() that tries
> to explain what is needed -- although I suppose that could do with an
> update.

Ok, I guess it makes a little more sense now to my perf-ignorant brain.
An improved comment would definitely help.

Sounds like using a reserved regs->flags bit to indicate fake regs is
the way to go.

-- 
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER
  2019-04-17 14:44           ` Kairui Song
@ 2019-04-18  7:54             ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-18  7:54 UTC (permalink / raw)
  To: Kairui Song
  Cc: Josh Poimboeuf, Linux Kernel Mailing List, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Young

On Wed, Apr 17, 2019 at 10:44:33PM +0800, Kairui Song wrote:
> Despite it's hacky, it seems not necessary to dump every register. And
> is there a straight way to get caller's regs in the trace point? It
> seems more trouble some. Or if we just use the regs inside the
> tracepoint, but it would need even more workaround (eg. unwind one
> frame before do anything).

Sure, and the current code doesn't fill out every reg either, but given
SAMPLE_REGS_INTR I think we should at least clear the entire regs before
passing it on, or otherwise make sure SAMPLE_REGS_INTR isn't allowed on
these events.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-04-18  7:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 16:59 [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER Kairui Song
2019-04-15 12:38 ` Jiri Olsa
2019-04-15 15:36 ` Peter Zijlstra
2019-04-15 16:58   ` Josh Poimboeuf
2019-04-16 11:30     ` Kairui Song
2019-04-16 16:16       ` Alexei Starovoitov
2019-04-16 17:39       ` Kairui Song
2019-04-16 17:45         ` Peter Zijlstra
2019-04-17 14:41           ` Kairui Song
2019-04-16 20:15         ` Josh Poimboeuf
2019-04-17  7:07           ` Peter Zijlstra
2019-04-18  2:15             ` Josh Poimboeuf
2019-04-17 14:44           ` Kairui Song
2019-04-18  7:54             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.