All of lore.kernel.org
 help / color / mirror / Atom feed
* perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
@ 2015-04-30 19:08 Vince Weaver
  2015-05-01 12:59 ` Peter Zijlstra
  2015-05-24 19:14 ` Jiri Olsa
  0 siblings, 2 replies; 12+ messages in thread
From: Vince Weaver @ 2015-04-30 19:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Paul Mackerras


So the perf_fuzzer caught this after about a week of fuzzing on a Haswell 
machine running a recent git kernel (pre 4.1-rc1 though).

We've seen this BUG before and various fixes were applied but apparently 
it wasn't enough.

Sadly it doesn't seem to be reproducible.

validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
 (hard to tell which test bit with all the inlining going on).

Vince

[428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)
[428232.710197] IP: [<ffffffff8102b3e2>] x86_schedule_events+0x112/0x250
[428232.717470] PGD cdf50067 PUD c610c067 PMD 0 
[428232.722557] Oops: 0000 [#1] SMP 
[428232.726490] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek aesni_intel snd_hda_codec_generic aes_x86_64 i915 snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep lrw snd_pcm gf128mul iTCO_wdt iTCO_vendor_support drm_kms_helper glue_helper snd_timer ppdev evdev drm ablk_helper snd cryptd mei_me soundcore xhci_pci tpm_tis psmouse xhci_hcd mei serio_raw lpc_ich tpm mfd_core parport_pc pcspkr parport wmi i2c_algo_bit battery i2c_i801 button processor video sg sr_mod sd_mod cdrom ahci libahci libata ehci_pci ehci_hcd e1000e usbcore ptp crc32c_intel fan scsi_mod pps_core usb_common thermal thermal_sys
[428232.800929] CPU: 0 PID: 31352 Comm: perf_fuzzer Tainted: G        W       4.0.0+ #136
[428232.809912] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[428232.818447] task: ffff8800cef88bd0 ti: ffff8800c56b0000 task.ti: ffff8800c56b0000
[428232.827119] RIP: 0010:[<ffffffff8102b3e2>]  [<ffffffff8102b3e2>] x86_schedule_events+0x112/0x250
[428232.837221] RSP: 0018:ffff8800c56b3cb8  EFLAGS: 00010246
[428232.843575] RAX: 0000000000000000 RBX: ffff8800c54d5000 RCX: 00000000001001b7
[428232.851871] RDX: 0000000000000000 RSI: ffff8800c42e8000 RDI: 0000000000000000
[428232.860128] RBP: ffff8800c56b3d18 R08: 0000000000000000 R09: ffff8800c54d5724
[428232.868362] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000004
[428232.876655] R13: ffff8800366ec000 R14: 0000000000000002 R15: 0000000000000004
[428232.884987] FS:  00007f560c527700(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
[428232.894338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[428232.901173] CR2: 0000000000000000 CR3: 00000000c469b000 CR4: 00000000001407f0
[428232.909498] DR0: 0000000000000000 DR1: 0000000002d54000 DR2: 0000000002b4f000
[428232.917831] DR3: 00000000033d8000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[428232.926129] Stack:
[428232.928950]  0000000000000001 0000000000000000 00000002fffffff4 0000000000000002
[428232.937669]  fffffffffffffff4 0000000000000000 ffff8800c56b3d18 ffff8800366ec000
[428232.946332]  0000000000000000 ffff8800c54d5000 ffff8800c42e8000 ffffffff81c1cfc0
[428232.955038] Call Trace:
[428232.958326]  [<ffffffff8102a75e>] x86_pmu_event_init+0x12e/0x3d0
[428232.965464]  [<ffffffff81160030>] ? perf_event_ctx_lock_nested+0x20/0x110
[428232.973486]  [<ffffffff8116023d>] perf_try_init_event+0x4d/0xb0
[428232.980501]  [<ffffffff811683af>] perf_init_event+0x13f/0x170
[428232.987355]  [<ffffffff81168275>] ? perf_init_event+0x5/0x170
[428232.994194]  [<ffffffff8116882b>] perf_event_alloc+0x44b/0x6d0
[428233.001145]  [<ffffffff81168ea3>] SYSC_perf_event_open+0x3f3/0xde0
[428233.008400]  [<ffffffff81169d5e>] SyS_perf_event_open+0xe/0x10
[428233.015325]  [<ffffffff816dd632>] system_call_fastpath+0x16/0x7a
[428233.022430] Code: a0 8d 78 01 74 31 48 8b b4 c3 28 05 00 00 48 83 c0 01 48 63 96 5c 01 00 00 4c 8b 86 98 01 00 00 83 fa ff 0f 84 b4 00 00 00 89 c7 <49> 0f a3 10 45 19 c0 45 85 c0 75 a2 45 31 f6 3b 7d b4 0f 85 9a 
[428233.044645] RIP  [<ffffffff8102b3e2>] x86_schedule_events+0x112/0x250
[428233.052293]  RSP <ffff8800c56b3cb8>
[428233.056712] CR2: 0000000000000000
[428233.061189] [drm:intel_crtc_set_config [i915]] *ERROR* failed to restore config after modeset failure
[428233.076847] ---[ end trace 5679ca0875946dbb ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-04-30 19:08 perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events Vince Weaver
@ 2015-05-01 12:59 ` Peter Zijlstra
  2015-05-04 19:32   ` Stephane Eranian
  2015-05-24 19:14 ` Jiri Olsa
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-01 12:59 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Paul Mackerras, Stephane Eranian

On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> 
> So the perf_fuzzer caught this after about a week of fuzzing on a Haswell 
> machine running a recent git kernel (pre 4.1-rc1 though).
> 
> We've seen this BUG before and various fixes were applied but apparently 
> it wasn't enough.
> 
> Sadly it doesn't seem to be reproducible.
> 
> validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
>  (hard to tell which test bit with all the inlining going on).

Assuming you build with debug info addr2line -i can help, but I think I
found it by comparing the Code section below with my objdump -D output.

Its:
		/* constraint still honored */
		if (!test_bit(hwc->idx, c->idxmsk))
			break;

Which would seem to suggest c is NULL.

Lemme go figure out how that could happen.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-01 12:59 ` Peter Zijlstra
@ 2015-05-04 19:32   ` Stephane Eranian
  2015-05-07 12:43     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Stephane Eranian @ 2015-05-04 19:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> >
> > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell
> > machine running a recent git kernel (pre 4.1-rc1 though).
> >
> > We've seen this BUG before and various fixes were applied but apparently
> > it wasn't enough.
> >
> > Sadly it doesn't seem to be reproducible.
> >
> > validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
> >  (hard to tell which test bit with all the inlining going on).
>
> Assuming you build with debug info addr2line -i can help, but I think I
> found it by comparing the Code section below with my objdump -D output.
>
> Its:
>                 /* constraint still honored */
>                 if (!test_bit(hwc->idx, c->idxmsk))
>                         break;
>
> Which would seem to suggest c is NULL.
>
But then, you'd crash in the previous loop, because after
get_event_contraint(), you touch
c->weight. I think it is more likely related to the bitmask (idxmsk).
But then it is always
allocated with the constraint even with the HT bug workaround.  So
most, likely the index
is bogus and you touch outside the idxmsk[] array.


>
> Lemme go figure out how that could happen.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-04 19:32   ` Stephane Eranian
@ 2015-05-07 12:43     ` Peter Zijlstra
  2015-05-08  4:25       ` Vince Weaver
  2015-05-18 17:40       ` Vince Weaver
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-07 12:43 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Vince Weaver, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote:
> On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> > >
> > > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell
> > > machine running a recent git kernel (pre 4.1-rc1 though).
> > >
> > > We've seen this BUG before and various fixes were applied but apparently
> > > it wasn't enough.
> > >
> > > Sadly it doesn't seem to be reproducible.
> > >
> > > validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
> > >  (hard to tell which test bit with all the inlining going on).
> >
> > Assuming you build with debug info addr2line -i can help, but I think I
> > found it by comparing the Code section below with my objdump -D output.
> >
> > Its:
> >                 /* constraint still honored */
> >                 if (!test_bit(hwc->idx, c->idxmsk))
> >                         break;
> >
> > Which would seem to suggest c is NULL.
> >
> But then, you'd crash in the previous loop, because after
> get_event_contraint(), you touch
> c->weight.

Indeed so; and we can make an analogous argument for hwc. However:

> I think it is more likely related to the bitmask (idxmsk).  But then
> it is always allocated with the constraint even with the HT bug
> workaround.  So most, likely the index is bogus and you touch outside
> the idxmsk[] array.

[428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)

But the thing really tried to touch NULL, not some random address that
faulted.

As always, Vince has found us a good puzzle ;-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-07 12:43     ` Peter Zijlstra
@ 2015-05-08  4:25       ` Vince Weaver
  2015-05-18 17:40       ` Vince Weaver
  1 sibling, 0 replies; 12+ messages in thread
From: Vince Weaver @ 2015-05-08  4:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Vince Weaver, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Paul Mackerras

On Thu, 7 May 2015, Peter Zijlstra wrote:

> Indeed so; and we can make an analogous argument for hwc. However:
> 
> > I think it is more likely related to the bitmask (idxmsk).  But then
> > it is always allocated with the constraint even with the HT bug
> > workaround.  So most, likely the index is bogus and you touch outside
> > the idxmsk[] array.
> 
> [428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)
> 
> But the thing really tried to touch NULL, not some random address that
> faulted.
> 
> As always, Vince has found us a good puzzle ;-)

and sorry I haven't been much help tracking it down.  I'm trying to 
trigger it again, but this particular bug only pops up after a week or so 
of fuzzing.  

Vince

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-07 12:43     ` Peter Zijlstra
  2015-05-08  4:25       ` Vince Weaver
@ 2015-05-18 17:40       ` Vince Weaver
  2015-05-20 13:03         ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Vince Weaver @ 2015-05-18 17:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Vince Weaver, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Paul Mackerras

On Thu, 7 May 2015, Peter Zijlstra wrote:

> On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote:
> > I think it is more likely related to the bitmask (idxmsk).  But then
> > it is always allocated with the constraint even with the HT bug
> > workaround.  So most, likely the index is bogus and you touch outside
> > the idxmsk[] array.
> 
> [428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)
> 
> But the thing really tried to touch NULL, not some random address that
> faulted.
> 
> As always, Vince has found us a good puzzle ;-)

so the Haswell machine turned up the following oops that looks related.

Yet again we are ending up with a NULL pointer in the constraint table 
somehow.

This maps to 

static bool __perf_sched_find_counter(struct perf_sched *sched)

        c = sched->events[sched->state.event]->hw.constraint;

        /* Prefer fixed purpose counters */
--->	if (c->idxmsk64 & (~0ULL << INTEL_PMC_IDX_FIXED)) {

ffffffff81029ce4:       48 8b 55 88             mov    -0x78(%rbp),%rdx
ffffffff81029ce8:       48 8b 04 c2             mov    (%rdx,%rax,8),%rax
ffffffff81029cec:       ba 20 00 00 00          mov    $0x20,%edx
ffffffff81029cf1:       48 8b 98 98 01 00 00    mov    0x198(%rax),%rbx
ffffffff81029cf8:       4c 85 23                test   %r12,(%rbx)


[306672.100641] BUG: unable to handle kernel NULL pointer dereference at           (null)
[306672.109653] IP: [<ffffffff81029cf8>] perf_assign_events+0xa8/0x290
[306672.116829] PGD cea0f067 PUD cea0e067 PMD 0 
[306672.121965] Oops: 0000 [#1] SMP 
[306672.125994] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp hid_generic kvm_intel usbhid hid kvm crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_controller i915 ppdev iTCO_wdt snd_hda_codec snd_hda_core aesni_intel aes_x86_64 lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support evdev glue_helper drm_kms_helper parport_pc drm pcspkr snd_timer ablk_helper snd cryptd soundcore processor button psmouse xhci_pci serio_raw xhci_hcd mei_me video battery lpc_ich parport mei i2c_i801 i2c_algo_bit tpm_tis tpm mfd_core wmi sg sr_mod sd_mod cdrom ehci_pci ehci_hcd ahci libahci e1000e libata ptp usbcore scsi_mod crc32c_intel usb_common pps_core thermal fan thermal_sys
[306672.203832] CPU: 1 PID: 606 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc2+ #144
[306672.213036] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[306672.221600] task: ffff8800c40b0590 ti: ffff8800c40e0000 task.ti: ffff8800c40e0000
[306672.230293] RIP: 0010:[<ffffffff81029cf8>]  [<ffffffff81029cf8>] perf_assign_events+0xa8/0x290
[306672.240224] RSP: 0018:ffff8800c40e3c28  EFLAGS: 00010293
[306672.246580] RAX: ffff880118dd8800 RBX: 0000000000000000 RCX: 0000000000000000
[306672.254891] RDX: 0000000000000020 RSI: 0000000000000002 RDI: ffff8800c40e3c88
[306672.263220] RBP: ffff8800c40e3ca8 R08: 0000000000000000 R09: ffff880036fcf520
[306672.271541] R10: ffff8800c40e3c28 R11: 0000000000000005 R12: ffffffff00000000
[306672.279874] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000005
[306672.288220] FS:  00007fad66e4e700(0000) GS:ffff88011ea40000(0000) knlGS:0000000000000000
[306672.297573] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[306672.304432] CR2: 0000000000000000 CR3: 0000000036f38000 CR4: 00000000001407e0
[306672.312745] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[306672.321097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[306672.329459] Stack:
[306672.332304]  0000000200000005 ffff880036fcf520 0000000000000004 0000000200000000
[306672.341024]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[306672.349720]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[306672.358431] Call Trace:
[306672.361771]  [<ffffffff8102b4bd>] x86_schedule_events+0x1dd/0x250
[306672.369002]  [<ffffffff8102a76e>] x86_pmu_event_init+0x12e/0x3d0
[306672.376138]  [<ffffffff81160090>] ? perf_event_ctx_lock_nested+0x20/0x110
[306672.384102]  [<ffffffff8116029d>] perf_try_init_event+0x4d/0xb0
[306672.391139]  [<ffffffff8116840f>] perf_init_event+0x13f/0x170
[306672.397977]  [<ffffffff811682d5>] ? perf_init_event+0x5/0x170
[306672.404822]  [<ffffffff8116888b>] perf_event_alloc+0x44b/0x6d0
[306672.411736]  [<ffffffff81168f03>] SYSC_perf_event_open+0x3f3/0xde0
[306672.419063]  [<ffffffff81063051>] ? __do_page_fault+0x1d1/0x460
[306672.426071]  [<ffffffff81169dbe>] SyS_perf_event_open+0xe/0x10
[306672.432987]  [<ffffffff816dd1b2>] system_call_fastpath+0x16/0x7a
[306672.440088] Code: 49 bc 00 00 00 00 ff ff ff ff 85 c0 74 65 48 63 45 94 3b 45 84 7d 5c 48 8b 55 88 48 8b 04 c2 ba 20 00 00 00 48 8b 98 98 01 00 00 <4c> 85 23 0f 85 95 00 00 00 48 63 55 98 eb 20 66 0f 1f 84 00 00 
[306672.462285] RIP  [<ffffffff81029cf8>] perf_assign_events+0xa8/0x290
[306672.469745]  RSP <ffff8800c40e3c28>
[306672.474187] CR2: 0000000000000000


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-18 17:40       ` Vince Weaver
@ 2015-05-20 13:03         ` Peter Zijlstra
  2015-05-20 13:15           ` Peter Zijlstra
  2015-05-20 13:49           ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-20 13:03 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Mon, May 18, 2015 at 01:40:31PM -0400, Vince Weaver wrote:
> On Thu, 7 May 2015, Peter Zijlstra wrote:
> 
> > On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote:
> > > I think it is more likely related to the bitmask (idxmsk).  But then
> > > it is always allocated with the constraint even with the HT bug
> > > workaround.  So most, likely the index is bogus and you touch outside
> > > the idxmsk[] array.
> > 
> > [428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)
> > 
> > But the thing really tried to touch NULL, not some random address that
> > faulted.
> > 
> > As always, Vince has found us a good puzzle ;-)
> 
> so the Haswell machine turned up the following oops that looks related.
> 
> Yet again we are ending up with a NULL pointer in the constraint table 
> somehow.
> 
> This maps to 
> 
> static bool __perf_sched_find_counter(struct perf_sched *sched)
> 
>         c = sched->events[sched->state.event]->hw.constraint;
> 
>         /* Prefer fixed purpose counters */
> --->	if (c->idxmsk64 & (~0ULL << INTEL_PMC_IDX_FIXED)) {
> 
> ffffffff81029ce4:       48 8b 55 88             mov    -0x78(%rbp),%rdx
> ffffffff81029ce8:       48 8b 04 c2             mov    (%rdx,%rax,8),%rax
> ffffffff81029cec:       ba 20 00 00 00          mov    $0x20,%edx
> ffffffff81029cf1:       48 8b 98 98 01 00 00    mov    0x198(%rax),%rbx
> ffffffff81029cf8:       4c 85 23                test   %r12,(%rbx)
> 
> 
> [306672.100641] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [306672.109653] IP: [<ffffffff81029cf8>] perf_assign_events+0xa8/0x290

So new in this release is:

static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
					struct perf_event *event)
{
	...

	/* cleanup dynamic constraint */
	if (c && (c->flags & PERF_X86_EVENT_DYNAMIC))
		event->hw.constraint = NULL;
}

Which is the only place that value is ever cleared...

Now, I've not quite figured out how that can intersect with scheduling,
typically we only call put_event_constraints() when we're done with the
event.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-20 13:03         ` Peter Zijlstra
@ 2015-05-20 13:15           ` Peter Zijlstra
  2015-05-20 13:49           ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-20 13:15 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Wed, May 20, 2015 at 03:03:12PM +0200, Peter Zijlstra wrote:
> So new in this release is:
> 
> static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
> 					struct perf_event *event)
> {
> 	...
> 
> 	/* cleanup dynamic constraint */
> 	if (c && (c->flags & PERF_X86_EVENT_DYNAMIC))
> 		event->hw.constraint = NULL;
> }
> 
> Which is the only place that value is ever cleared...
> 
> Now, I've not quite figured out how that can intersect with scheduling,
> typically we only call put_event_constraints() when we're done with the
> event.

Related to that; intel_commit_scheduling() has:

  c = event->hw.constraint;

  if (!c)
  	return;

Stephane, how can that ever be?

	x86_schedule_events() does:
		hwc->constraint = get_event_constraints()

for all events, and intel_get_event_constraints() will _always_ return a
valid constraint -- &unconstrained if nothing else.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-20 13:03         ` Peter Zijlstra
  2015-05-20 13:15           ` Peter Zijlstra
@ 2015-05-20 13:49           ` Peter Zijlstra
  2015-05-20 15:26             ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-20 13:49 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Wed, May 20, 2015 at 03:03:12PM +0200, Peter Zijlstra wrote:

> Now, I've not quite figured out how that can intersect with scheduling,
> typically we only call put_event_constraints() when we're done with the
> event.

Ah, yes, I think I've found it.

We can do actual scheduling during perf_try_init_event(), and if the
event we're adding is part of an already existing event group, the group
itself might schedule and we get nested scheduling state.

This goes boom because while the cpuc we validate on is fake, the
event->hw.constraint state is 'global' and gets trampled on.

The really safe solution would be something like the below.

---
 kernel/events/core.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1a3bf48..a4f93fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7391,13 +7391,16 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 		ctx = perf_event_ctx_lock_nested(event->group_leader,
 						 SINGLE_DEPTH_NESTING);
 		BUG_ON(!ctx);
+		raw_spin_lock_irq(&ctx->lock);
 	}
 
 	event->pmu = pmu;
 	ret = pmu->event_init(event);
 
-	if (ctx)
+	if (ctx) {
+		raw_spin_unlock_irq(&ctx->lock);
 		perf_event_ctx_unlock(event->group_leader, ctx);
+	}
 
 	if (ret)
 		module_put(pmu->module);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-20 13:49           ` Peter Zijlstra
@ 2015-05-20 15:26             ` Peter Zijlstra
  2015-05-20 16:09               ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-20 15:26 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Wed, May 20, 2015 at 03:49:22PM +0200, Peter Zijlstra wrote:
> ---
>  kernel/events/core.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1a3bf48..a4f93fb 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7391,13 +7391,16 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>  		ctx = perf_event_ctx_lock_nested(event->group_leader,
>  						 SINGLE_DEPTH_NESTING);
>  		BUG_ON(!ctx);
> +		raw_spin_lock_irq(&ctx->lock);
>  	}
>  
>  	event->pmu = pmu;
>  	ret = pmu->event_init(event);
>  
> -	if (ctx)
> +	if (ctx) {
> +		raw_spin_unlock_irq(&ctx->lock);
>  		perf_event_ctx_unlock(event->group_leader, ctx);
> +	}
>  
>  	if (ret)
>  		module_put(pmu->module);

Except of course that ->event_init() likes to do an allocation :/

Needs to be fixed differently.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-05-20 15:26             ` Peter Zijlstra
@ 2015-05-20 16:09               ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-05-20 16:09 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Stephane Eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Paul Mackerras

On Wed, May 20, 2015 at 05:26:07PM +0200, Peter Zijlstra wrote:
> Except of course that ->event_init() likes to do an allocation :/
> 
> Needs to be fixed differently.

So this puts the lock in the x86 code, it seems to build and run.

But my brain is fried after staring at this pmu scheduling code all day,
so maybe its wrong again.

Stephane, can you have a look?

---
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 87848eb..344bb90 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1823,6 +1823,9 @@ static int validate_group(struct perf_event *event)
 	fake_cpuc = allocate_fake_cpuc();
 	if (IS_ERR(fake_cpuc))
 		return PTR_ERR(fake_cpuc);
+
+
+	raw_spin_lock_irq(&leader->ctx->lock);
 	/*
 	 * the event is not yet connected with its
 	 * siblings therefore we must first collect
@@ -1843,6 +1846,8 @@ static int validate_group(struct perf_event *event)
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
 
 out:
+	raw_spin_unlock_irq(&leader->ctx->lock);
+
 	free_fake_cpuc(fake_cpuc);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events
  2015-04-30 19:08 perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events Vince Weaver
  2015-05-01 12:59 ` Peter Zijlstra
@ 2015-05-24 19:14 ` Jiri Olsa
  1 sibling, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2015-05-24 19:14 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, Paul Mackerras

On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> 
> So the perf_fuzzer caught this after about a week of fuzzing on a Haswell 
> machine running a recent git kernel (pre 4.1-rc1 though).
> 
> We've seen this BUG before and various fixes were applied but apparently 
> it wasn't enough.
> 
> Sadly it doesn't seem to be reproducible.

hi,
did the trinity git tree change, I cant fetch it anymore...

[jolsa@krava trinity]$ git remote -v
origin  git://git.codemonkey.org.uk/trinity (fetch)
origin  git://git.codemonkey.org.uk/trinity (push)
[jolsa@krava trinity]$ git remote update
Fetching origin
fatal: Unable to look up git.codemonkey.org.uk (port 9418) (Name or service not known)
error: Could not fetch origin

thanks,
jirka

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-24 19:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 19:08 perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events Vince Weaver
2015-05-01 12:59 ` Peter Zijlstra
2015-05-04 19:32   ` Stephane Eranian
2015-05-07 12:43     ` Peter Zijlstra
2015-05-08  4:25       ` Vince Weaver
2015-05-18 17:40       ` Vince Weaver
2015-05-20 13:03         ` Peter Zijlstra
2015-05-20 13:15           ` Peter Zijlstra
2015-05-20 13:49           ` Peter Zijlstra
2015-05-20 15:26             ` Peter Zijlstra
2015-05-20 16:09               ` Peter Zijlstra
2015-05-24 19:14 ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.