linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/x86/intel: Init early callchain for bts event
@ 2018-11-11 18:16 Jiri Olsa
  2018-11-12  0:26 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Jiri Olsa @ 2018-11-11 18:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, lkml, Ingo Molnar, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Andi Kleen

Vince reported crash in bts flush code when touching the
callchain data, which was supposed to be initialized
as an 'early' callchain data.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  Call Trace:
   <IRQ>
   intel_pmu_drain_bts_buffer+0x151/0x220
   ? intel_get_event_constraints+0x219/0x360
   ? perf_assign_events+0xe2/0x2a0
   ? select_idle_sibling+0x22/0x3a0
   ? __update_load_avg_se+0x1ec/0x270
   ? enqueue_task_fair+0x377/0xdd0
   ? cpumask_next_and+0x19/0x20
   ? load_balance+0x134/0x950
   ? check_preempt_curr+0x7a/0x90
   ? ttwu_do_wakeup+0x19/0x140
   x86_pmu_stop+0x3b/0x90
   x86_pmu_del+0x57/0x160
   event_sched_out.isra.106+0x81/0x170
   group_sched_out.part.108+0x51/0xc0
   __perf_event_disable+0x7f/0x160
   event_function+0x8c/0xd0
   remote_function+0x3c/0x50
   flush_smp_call_function_queue+0x35/0xe0
   smp_call_function_single_interrupt+0x3a/0xd0
   call_function_single_interrupt+0xf/0x20
   </IRQ>

It was triggered by fuzzer by can be easilt reproduced by:
  # perf record -e cpu/branch-instructions/p -g -c 1

The problem is that bts drain code does not initialize sample's
early callchain data and calls perf_prepare_sample with NULL
sample->callchain, even if it's expected to exist via
__PERF_SAMPLE_CALLCHAIN_EARLY sample type bit.

The fix initializes the callchain data for this case with
empty callchain, because bts does not report callchains.

Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/intel/ds.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index b7b01d762d32..1049b547fdfe 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -577,6 +577,8 @@ void intel_pmu_disable_bts(void)
 	update_debugctlmsr(debugctlmsr);
 }
 
+static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
+
 int intel_pmu_drain_bts_buffer(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -612,6 +614,9 @@ int intel_pmu_drain_bts_buffer(void)
 
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
+	if (event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)
+		data.callchain = &__empty_callchain;
+
 	/*
 	 * BTS leaks kernel addresses in branches across the cpl boundary,
 	 * such as traps or system calls, so unless the user is asking for
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/x86/intel: Init early callchain for bts event
  2018-11-11 18:16 [PATCH] perf/x86/intel: Init early callchain for bts event Jiri Olsa
@ 2018-11-12  0:26 ` Peter Zijlstra
  2018-11-12  3:28   ` Andi Kleen
  2018-11-12  8:32   ` Jiri Olsa
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2018-11-12  0:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, lkml, Ingo Molnar, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Andi Kleen

On Sun, Nov 11, 2018 at 07:16:50PM +0100, Jiri Olsa wrote:
> Vince reported crash in bts flush code when touching the
> callchain data, which was supposed to be initialized
> as an 'early' callchain data.
> 
>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>   ...

> It was triggered by fuzzer by can be easilt reproduced by:
>   # perf record -e cpu/branch-instructions/p -g -c 1
> 
> The problem is that bts drain code does not initialize sample's
> early callchain data and calls perf_prepare_sample with NULL
> sample->callchain, even if it's expected to exist via
> __PERF_SAMPLE_CALLCHAIN_EARLY sample type bit.

Not sure that is the actual problem, nor that this:

> @@ -612,6 +614,9 @@ int intel_pmu_drain_bts_buffer(void)
>  
>  	perf_sample_data_init(&data, 0, event->hw.last_period);
>  
> +	if (event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)
> +		data.callchain = &__empty_callchain;
> +
>  	/*
>  	 * BTS leaks kernel addresses in branches across the cpl boundary,
>  	 * such as traps or system calls, so unless the user is asking for

is the right fix.

If you look at commit:

  6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

Then the right fix would be to do perf_callchain() from the BTS drain
code -- if '/p'.

Because prior to that commit, we would do a perf_callchain() in
intel_pmu_drain_bts_buffer()'s call to perf_prepare_sample(), which
would do an actual stack unwind for a branch entry.

With your patch, we get an empty stack for every entry.

Which is a change in behaviour...

Now arguably, this is really stupid behaviour. Who in his right mind
wants callchain output on BTS entries. And even if they do, BTS +
precise_ip is nonsensical.

So in my mind disallowing precise_ip on BTS would be the simplest fix.

Hmm?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/x86/intel: Init early callchain for bts event
  2018-11-12  0:26 ` Peter Zijlstra
@ 2018-11-12  3:28   ` Andi Kleen
  2018-11-12  8:32   ` Jiri Olsa
  1 sibling, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2018-11-12  3:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, lkml, Ingo Molnar, Alexander Shishkin,
	Arnaldo Carvalho de Melo

> 
> So in my mind disallowing precise_ip on BTS would be the simplest fix.
> 
> Hmm?

Both seem reasonable to me.

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/x86/intel: Init early callchain for bts event
  2018-11-12  0:26 ` Peter Zijlstra
  2018-11-12  3:28   ` Andi Kleen
@ 2018-11-12  8:32   ` Jiri Olsa
  1 sibling, 0 replies; 4+ messages in thread
From: Jiri Olsa @ 2018-11-12  8:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, lkml, Ingo Molnar, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Andi Kleen

On Mon, Nov 12, 2018 at 01:26:37AM +0100, Peter Zijlstra wrote:
> On Sun, Nov 11, 2018 at 07:16:50PM +0100, Jiri Olsa wrote:
> > Vince reported crash in bts flush code when touching the
> > callchain data, which was supposed to be initialized
> > as an 'early' callchain data.
> > 
> >   BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> >   ...
> 
> > It was triggered by fuzzer by can be easilt reproduced by:
> >   # perf record -e cpu/branch-instructions/p -g -c 1
> > 
> > The problem is that bts drain code does not initialize sample's
> > early callchain data and calls perf_prepare_sample with NULL
> > sample->callchain, even if it's expected to exist via
> > __PERF_SAMPLE_CALLCHAIN_EARLY sample type bit.
> 
> Not sure that is the actual problem, nor that this:
> 
> > @@ -612,6 +614,9 @@ int intel_pmu_drain_bts_buffer(void)
> >  
> >  	perf_sample_data_init(&data, 0, event->hw.last_period);
> >  
> > +	if (event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)
> > +		data.callchain = &__empty_callchain;
> > +
> >  	/*
> >  	 * BTS leaks kernel addresses in branches across the cpl boundary,
> >  	 * such as traps or system calls, so unless the user is asking for
> 
> is the right fix.
> 
> If you look at commit:
> 
>   6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
> 
> Then the right fix would be to do perf_callchain() from the BTS drain
> code -- if '/p'.
> 
> Because prior to that commit, we would do a perf_callchain() in
> intel_pmu_drain_bts_buffer()'s call to perf_prepare_sample(), which
> would do an actual stack unwind for a branch entry.
> 
> With your patch, we get an empty stack for every entry.
> 
> Which is a change in behaviour...

I thought there's no callchain anyway, because we use zero-ed regs

> 
> Now arguably, this is really stupid behaviour. Who in his right mind
> wants callchain output on BTS entries. And even if they do, BTS +
> precise_ip is nonsensical.
> 
> So in my mind disallowing precise_ip on BTS would be the simplest fix.
> 
> Hmm?

sounds ok, will post it

thanks,
jirka

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-12  8:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-11 18:16 [PATCH] perf/x86/intel: Init early callchain for bts event Jiri Olsa
2018-11-12  0:26 ` Peter Zijlstra
2018-11-12  3:28   ` Andi Kleen
2018-11-12  8:32   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).