All of lore.kernel.org
 help / color / mirror / Atom feed
* [perf] more perf_fuzzer memory corruption
@ 2014-04-15 21:37 Vince Weaver
  2014-04-15 21:49 ` Thomas Gleixner
  2014-04-16 14:15 ` Peter Zijlstra
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-15 21:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar


Still tracking memory corruption bugs found by the perf_fuzzer, I have 
about 10 different log splats that I think might all be related to the 
same underlying problem.

Anyway I managed to trigger this using the perf_fuzzer:

[  221.065278] Slab corruption (Not tainted): kmalloc-2048 start=ffff8800cd15e800, len=2048
[  221.074062] 040: 6b 6b 6b 6b 6b 6b 6b 6b 98 72 57 cd 00 88 ff ff  kkkkkkkk.rW.....
[  221.082321] Prev obj: start=ffff8800cd15e000, len=2048
[  221.087933] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  221.096224] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

And luckily I had ftrace running at the time.

The allocation of this block is by perf_event

perf_fuzzer-2520  [001]   182.980563: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
perf_fuzzer-2520  [000]   183.628515: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
perf_fuzzer-2520  [000]   183.628521: kfree:                (perf_event_alloc+0x2f7) call_site=ffffffff81139c57 ptr=0xffff8800cd15e800
perf_fuzzer-2520  [000]   183.628844: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
...(thousands of times of kmalloc/kfree)

Is it worth wading through this mess to try to track down what happened?

Vince


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-15 21:37 [perf] more perf_fuzzer memory corruption Vince Weaver
@ 2014-04-15 21:49 ` Thomas Gleixner
  2014-04-16  3:21   ` Vince Weaver
  2014-04-16 14:15 ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2014-04-15 21:49 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 15 Apr 2014, Vince Weaver wrote:
> 
> Still tracking memory corruption bugs found by the perf_fuzzer, I have 
> about 10 different log splats that I think might all be related to the 
> same underlying problem.
> 
> Anyway I managed to trigger this using the perf_fuzzer:
> 
> [  221.065278] Slab corruption (Not tainted): kmalloc-2048 start=ffff8800cd15e800, len=2048
> [  221.074062] 040: 6b 6b 6b 6b 6b 6b 6b 6b 98 72 57 cd 00 88 ff ff  kkkkkkkk.rW.....
> [  221.082321] Prev obj: start=ffff8800cd15e000, len=2048
> [  221.087933] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> [  221.096224] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> 
> And luckily I had ftrace running at the time.
> 
> The allocation of this block is by perf_event
> 
> perf_fuzzer-2520  [001]   182.980563: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> perf_fuzzer-2520  [000]   183.628515: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> perf_fuzzer-2520  [000]   183.628521: kfree:                (perf_event_alloc+0x2f7) call_site=ffffffff81139c57 ptr=0xffff8800cd15e800
> perf_fuzzer-2520  [000]   183.628844: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> ...(thousands of times of kmalloc/kfree)
> 
> Is it worth wading through this mess to try to track down what happened?

Definitely worth a try. Can you upload the trace file and provide the
URL or send it offlist in private mail if you cannot provide a public URL.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-15 21:49 ` Thomas Gleixner
@ 2014-04-16  3:21   ` Vince Weaver
  2014-04-16  4:18     ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-16  3:21 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 15 Apr 2014, Thomas Gleixner wrote:

> On Tue, 15 Apr 2014, Vince Weaver wrote:
> > 
> > Still tracking memory corruption bugs found by the perf_fuzzer, I have 
> > about 10 different log splats that I think might all be related to the 
> > same underlying problem.
> > 
> > Anyway I managed to trigger this using the perf_fuzzer:
> > 
> > [  221.065278] Slab corruption (Not tainted): kmalloc-2048 start=ffff8800cd15e800, len=2048
> > [  221.074062] 040: 6b 6b 6b 6b 6b 6b 6b 6b 98 72 57 cd 00 88 ff ff  kkkkkkkk.rW.....
> > [  221.082321] Prev obj: start=ffff8800cd15e000, len=2048
> > [  221.087933] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > [  221.096224] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > 
> > And luckily I had ftrace running at the time.
> > 
> > The allocation of this block is by perf_event
> > 
> > perf_fuzzer-2520  [001]   182.980563: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> > perf_fuzzer-2520  [000]   183.628515: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> > perf_fuzzer-2520  [000]   183.628521: kfree:                (perf_event_alloc+0x2f7) call_site=ffffffff81139c57 ptr=0xffff8800cd15e800
> > perf_fuzzer-2520  [000]   183.628844: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> > ...(thousands of times of kmalloc/kfree)
> > 
> > Is it worth wading through this mess to try to track down what happened?
> 
> Definitely worth a try. Can you upload the trace file and provide the
> URL or send it offlist in private mail if you cannot provide a public URL.

I've poked around the trace a bit.

Possibly it looks like a struct perf_event is being used after freed,
specifically the event->migrate_entry->prev value?  I could
be completely wrong about that.

One thing to know about these fuzzer runs, the ones that cause memory 
corruption involve forking (with events active).  I haven't seen the 
corruptions when forking is disabled.

It's very simple forking, only one child is ever active at a time, 
and the child itself doesn't do anything but busy wait until it is killed.

The trace shows the problem allocations happening before a fork and
the poison message after.  The traces I have don't include the children 
though so I don't have records of what happened there.

I'll send a private link to the file downloads as they're a little large 
and the local sysadmins would probably appreicate if I limited access to 
them.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-16  3:21   ` Vince Weaver
@ 2014-04-16  4:18     ` Vince Weaver
  0 siblings, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-16  4:18 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 15 Apr 2014, Vince Weaver wrote:

> Possibly it looks like a struct perf_event is being used after freed,
> specifically the event->migrate_entry->prev value?  I could
> be completely wrong about that.

and actually I'm mixing up hex and decimal.  It looks like the actual 
value being written to the freed area is at 0x48 whichi I think maps to
	event->hlist_entry->pprev

but really if it's late enough I'm mixing hex and decimal I should 
probably stop staring at trace dumps and get some sleep.

Vince



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-15 21:37 [perf] more perf_fuzzer memory corruption Vince Weaver
  2014-04-15 21:49 ` Thomas Gleixner
@ 2014-04-16 14:15 ` Peter Zijlstra
  2014-04-16 17:30   ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-16 14:15 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar

On Tue, Apr 15, 2014 at 05:37:01PM -0400, Vince Weaver wrote:
> 
> Still tracking memory corruption bugs found by the perf_fuzzer, I have 
> about 10 different log splats that I think might all be related to the 
> same underlying problem.
> 
> Anyway I managed to trigger this using the perf_fuzzer:
> 
> [  221.065278] Slab corruption (Not tainted): kmalloc-2048 start=ffff8800cd15e800, len=2048
> [  221.074062] 040: 6b 6b 6b 6b 6b 6b 6b 6b 98 72 57 cd 00 88 ff ff  kkkkkkkk.rW.....
> [  221.082321] Prev obj: start=ffff8800cd15e000, len=2048
> [  221.087933] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> [  221.096224] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> 
> And luckily I had ftrace running at the time.
> 
> The allocation of this block is by perf_event
> 
> perf_fuzzer-2520  [001]   182.980563: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> perf_fuzzer-2520  [000]   183.628515: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> perf_fuzzer-2520  [000]   183.628521: kfree:                (perf_event_alloc+0x2f7) call_site=ffffffff81139c57 ptr=0xffff8800cd15e800
> perf_fuzzer-2520  [000]   183.628844: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff811399b5 ptr=0xffff8800cd15e800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> ...(thousands of times of kmalloc/kfree)

Does the below make any difference? I've only ran it through some light
testing to make sure it didn't insta-explode on running.

(perf stat make -j64 -s in fact)

The patch changes the exit path (identified by tglx as the most likely
fuckup source), if I read it right the if(child_event->parent) condition
in __perf_event_exit_task() is complete bullshit.

We should always detach from groups, inherited event or not.

The not detaching of the group, in turn, can cause the
__perf_event_exit_task() loops in perf_event_exit_task_context() to not
actually do what the goto again comment says. Because we do not detach
from the group, group siblings will not be placed back on the list as
singleton events.

This then allows us to 'exit' while still having events linked. Then
when we close the event fd, we'll free the event, _while_still_linked_.

The patch deals with this by iterating the event_list instead of the
pinned/flexible group lists. Making that retry superfluous.

Now I haven't gone through all details yet, so I might be talking crap.

I've pretty much fried my brain by now, so I'll go see if I can
reproduce some of this slab corruption.

---
 kernel/events/core.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a3e46d..c3c745c1d623 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7367,11 +7367,9 @@ __perf_event_exit_task(struct perf_event *child_event,
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	if (child_event->parent) {
-		raw_spin_lock_irq(&child_ctx->lock);
-		perf_group_detach(child_event);
-		raw_spin_unlock_irq(&child_ctx->lock);
-	}
+	raw_spin_lock_irq(&child_ctx->lock);
+	perf_group_detach(child_event);
+	raw_spin_unlock_irq(&child_ctx->lock);
 
 	perf_remove_from_context(child_event);
 
@@ -7443,12 +7441,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	mutex_lock(&child_ctx->mutex);
 
 again:
-	list_for_each_entry_safe(child_event, tmp, &child_ctx->pinned_groups,
-				 group_entry)
-		__perf_event_exit_task(child_event, child_ctx, child);
-
-	list_for_each_entry_safe(child_event, tmp, &child_ctx->flexible_groups,
-				 group_entry)
+	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
 		__perf_event_exit_task(child_event, child_ctx, child);
 
 	/*
@@ -7457,8 +7450,10 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	 * will still point to the list head terminating the iteration.
 	 */
 	if (!list_empty(&child_ctx->pinned_groups) ||
-	    !list_empty(&child_ctx->flexible_groups))
+	    !list_empty(&child_ctx->flexible_groups)) {
+		WARN_ON_ONCE(1);
 		goto again;
+	}
 
 	mutex_unlock(&child_ctx->mutex);
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-16 14:15 ` Peter Zijlstra
@ 2014-04-16 17:30   ` Vince Weaver
  2014-04-16 17:43     ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-16 17:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, Thomas Gleixner, Ingo Molnar

On Wed, 16 Apr 2014, Peter Zijlstra wrote:

> Does the below make any difference? I've only ran it through some light
> testing to make sure it didn't insta-explode on running.
> 
> (perf stat make -j64 -s in fact)

I'm running with your patch now and so far so good.

Unfortunately the problem isn't repeatable, but it usually shows up within 
an hour or so of fuzzing (although there's possibly a 2nd unrelated bug 
that also shows up sometimes).

If you want to try running the fuzzer on your machine too just do:
	git clone https://github.com/deater/perf_event_tests.git
	cd fuzzer
	make
and then try running the "./fast_repro98.sh" script, as that's the forking 
workload I've been using when tracking this issue.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-16 17:30   ` Vince Weaver
@ 2014-04-16 17:43     ` Vince Weaver
  2014-04-16 17:47       ` Peter Zijlstra
  2014-04-17  9:48       ` Ingo Molnar
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-16 17:43 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar

On Wed, 16 Apr 2014, Vince Weaver wrote:

> On Wed, 16 Apr 2014, Peter Zijlstra wrote:
> 
> > Does the below make any difference? I've only ran it through some light
> > testing to make sure it didn't insta-explode on running.
> > 
> > (perf stat make -j64 -s in fact)
> 
> I'm running with your patch now and so far so good.

spoke too soon, just got this with your patch applied
(I wasn't running ftrace so no trace with this one):

[ 1555.756490] Slab corruption (Not tainted): kmalloc-2048 start=ffff88011879a000, len=2048
[ 1555.765699] 040: 6b 6b 6b 6b 6b 6b 6b 6b 88 a7 97 ce 00 88 ff ff  kkkkkkkk........
[ 1555.774684] Next obj: start=ffff88011879a800, len=2048
[ 1555.780396] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1555.789150] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1556.048915] Slab corruption (Not tainted): kmalloc-2048 start=ffff88011879a000, len=2048
[ 1556.057655] 040: 6b 6b 6b 6b 6b 6b 6b 6b 40 30 04 18 01 88 ff ff  kkkkkkkk@0......
[ 1556.065946] Next obj: start=ffff88011879a800, len=2048
[ 1556.071544] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1556.079770] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1556.150121] general protection fault: 0000 [#1] SMP
[ 1556.155467] Dumping ftrace buffer:
[ 1556.159051]    (ftrace buffer empty)
[ 1556.162848] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp kvm i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 drm_kms_helper lrw snd_hda_intel snd_hda_controller snd_hda_codec drm snd_hwdep gf128mul tpm_tis mei_me snd_pcm glue_helper tpm evdev mei parport_pc snd_seq ablk_helper iTCO_wdt i2c_algo_bit psmouse iTCO_vendor_support parport snd_timer cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core battery button processor video wmi i2c_core snd_seq_device snd soundcore sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata ptp ehci_hcd xhci_hcd crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
[ 1556.236213] CPU: 4 PID: 28 Comm: ksoftirqd/4 Not tainted 3.15.0-rc1+ #62     
[ 1556.243169] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 1556.251114] task: ffff8801188f8890 ti: ffff8801188fa000 task.ti: ffff8801188fa000
[ 1556.259065] RIP: 0010:[<ffffffff8113884d>]  [<ffffffff8113884d>] perf_tp_event+0x9d/0x210
[ 1556.267821] RSP: 0000:ffff8801188fba30  EFLAGS: 00010006                     
[ 1556.273479] RAX: ffff88011879a040 RBX: 6b6b6b6b6b6b6b2b RCX: 000000000000002c
[ 1556.281000] RDX: ffffe8ffffd01878 RSI: 0000000000000001 RDI: 0000000000000000
[ 1556.288543] RBP: ffff8801188fbb08 R08: ffff8801188fbb30 R09: ffffe8ffffd03098
[ 1556.296068] R10: 0000000000000001 R11: 0000000225c17d03 R12: ffff8800cebde4d0
[ 1556.303619] R13: 0000000000000001 R14: ffff8801188fbb30 R15: ffffe8ffffd01878
[ 1556.311197] FS:  0000000000000000(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
[ 1556.320659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1556.327681] CR2: 0000000000618b50 CR3: 0000000001c11000 CR4: 00000000001407e0
[ 1556.336092] DR0: 0000000000a9e000 DR1: 0000000000000000 DR2: 0000000000a9e000
[ 1556.344624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 1556.353012] Stack:
[ 1556.356024]  ffff8801188f8890 ffffffff81c48380 ffffffff0000002c ffffe8ffffd01878
[ 1556.364798]  ffff8801188fba88 0000000000000046 0000000000000000 0000000000000004
[ 1556.373689]  0000000000000000 ffff8801188fbb78 ffff88011eb10420 ffff8801188fbb68
[ 1556.382627] Call Trace:
[ 1556.386190]  [<ffffffff81093607>] perf_trace_sched_wakeup_template+0xe7/0x100
[ 1556.394778]  [<ffffffff810953f2>] ? ttwu_do_wakeup+0xb2/0xc0                 
[ 1556.401703]  [<ffffffff810953f2>] ttwu_do_wakeup+0xb2/0xc0                   
[ 1556.408468]  [<ffffffff810954ed>] ttwu_do_activate.constprop.95+0x5d/0x70    
[ 1556.416659]  [<ffffffff810982c0>] try_to_wake_up+0x200/0x300                 
[ 1556.423711]  [<ffffffff81098432>] default_wake_function+0x12/0x20            
[ 1556.431114]  [<ffffffff810a95f8>] __wake_up_common+0x58/0x90                 
[ 1556.438103]  [<ffffffff810c90c0>] ? ftrace_raw_output_rcu_utilization+0x50/0x50
[ 1556.446860]  [<ffffffff810a9643>] __wake_up_locked+0x13/0x20                 
[ 1556.453756]  [<ffffffff810a9e07>] complete+0x37/0x50                         
[ 1556.459995]  [<ffffffff810c90d2>] wakeme_after_rcu+0x12/0x20                 
[ 1556.466903]  [<ffffffff810cc6ad>] rcu_process_callbacks+0x29d/0x620          
[ 1556.474468]  [<ffffffff810cc646>] ? rcu_process_callbacks+0x236/0x620        
[ 1556.482232]  [<ffffffff81069995>] __do_softirq+0xf5/0x290                    
[ 1556.488837]  [<ffffffff81069b60>] run_ksoftirqd+0x30/0x50                    
[ 1556.495385]  [<ffffffff8108f6ff>] smpboot_thread_fn+0xff/0x1b0               
[ 1556.502441]  [<ffffffff8108f600>] ? SyS_setgroups+0x1a0/0x1a0                
[ 1556.509398]  [<ffffffff8108822d>] kthread+0xed/0x110                         
[ 1556.515486]  [<ffffffff81088140>] ? kthread_create_on_node+0x200/0x200       
[ 1556.523324]  [<ffffffff8165a4bc>] ret_from_fork+0x7c/0xb0                    
[ 1556.529858]  [<ffffffff81088140>] ? kthread_create_on_node+0x200/0x200       
[ 1556.537642] Code: 48 c7 45 c8 00 00 00 00 48 c7 45 90 00 00 00 00 48 c7 45 d0 00 00 00 00 75 11 eb 52 66 90 48 8b 5b 40 48 85 db 74 47 48 83 eb 40 <f6> 83 90 01 00 00 01 75 ea f6 83 e8 00 00 00 20 75 e1 48 8d b5                          
[ 1556.561008] RIP  [<ffffffff8113884d>] perf_tp_event+0x9d/0x210               
[ 1556.568036]  RSP <ffff8801188fba30>                                          
[ 1556.572833] general protection fault: 0000 [#2] SMP                          
[ 1556.578955] Dumping ftrace buffer:                                           
[ 1556.583342]    (ftrace buffer empty)                                         
[ 1556.587897] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp kvm i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 drm_kms_helper lrw snd_hda_intel snd_hda_controller snd_hda_codec drm snd_hwdep gf128mul tpm_tis mei_me snd_pcm glue_helper tpm evdev mei parport_pc snd_seq ablk_helper iTCO_wdt i2c_algo_bit psmouse iTCO_vendor_support parport snd_timer cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core battery button processor video wmi i2c_core snd_seq_device snd soundcore sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata ptp ehci_hcd xhci_hcd crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys                           
[ 1556.667183] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.15.0-rc1+ #62        
[ 1556.674820] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014                                                                             
[ 1556.683519] task: ffff880118f5e350 ti: ffff880118f60000 task.ti: ffff880118f60000                                                                            
[ 1556.692431] RIP: 0010:[<ffffffff8113884d>]  [<ffffffff8113884d>] perf_tp_event+0x9d/0x210                                                                    
[ 1556.702028] RSP: 0000:ffff88011eb03af8  EFLAGS: 00010006                     
[ 1556.708553] RAX: ffff88011879a040 RBX: 6b6b6b6b6b6b6b2b RCX: 000000000000002c
[ 1556.716851] RDX: ffffe8ffffd02078 RSI: 0000000000000001 RDI: 0000000000000000
[ 1556.725272] RBP: ffff88011eb03bd0 R08: ffff88011eb03bf8 R09: ffffe8ffffd03098
[ 1556.733740] R10: 000000000000000f R11: 000000000000b717 R12: ffff8800cfb16750
[ 1556.742209] R13: 0000000000000001 R14: ffff88011eb03bf8 R15: ffffe8ffffd02078
[ 1556.750617] FS:  0000000000000000(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000                                                                     
[ 1556.760111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                
[ 1556.767040] CR2: 0000000000618b50 CR3: 0000000001c11000 CR4: 00000000001407e0
[ 1556.775451] DR0: 0000000000a9e000 DR1: 0000000000000000 DR2: 0000000000a9e000
[ 1556.783797] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 1556.792254] Stack:                                                           
[ 1556.795279]  0000000000000046 ffffffff81138fab 000000000000002c ffffe8ffffd02078                                                                             
[ 1556.804056]  0000000000000046 0000000000000046 0000000000000000 0000000000000008                                                                             
[ 1556.812825]  0000000000000000 ffff88011eb03c40 ffff88011eb10420 ffff88011eb03c30                                                                             
[ 1556.821642] Call Trace:                                                      
[ 1556.825070]  <IRQ>                                                           
[ 1556.827143]  [<ffffffff81138fab>] ? __perf_sw_event+0x6b/0x230               
[ 1556.835329]  [<ffffffff81093607>] perf_trace_sched_wakeup_template+0xe7/0x100
[ 1556.843753]  [<ffffffff810953f2>] ? ttwu_do_wakeup+0xb2/0xc0                 
[ 1556.850681]  [<ffffffff810953f2>] ttwu_do_wakeup+0xb2/0xc0                   
[ 1556.857334]  [<ffffffff810954ed>] ttwu_do_activate.constprop.95+0x5d/0x70    
[ 1556.865438]  [<ffffffff810982c0>] try_to_wake_up+0x200/0x300                 
[ 1556.872334]  [<ffffffff81098432>] default_wake_function+0x12/0x20            
[ 1556.879685]  [<ffffffff810a9d28>] autoremove_wake_function+0x18/0x40         
[ 1556.887333]  [<ffffffff810a95f8>] __wake_up_common+0x58/0x90                 
[ 1556.894223]  [<ffffffff810a9869>] __wake_up+0x39/0x50                        
[ 1556.900578]  [<ffffffff810c0f92>] wake_up_klogd_work_func+0x42/0x70          
[ 1556.908173]  [<ffffffff8112ff9f>] __irq_work_run+0x6f/0x90                   
[ 1556.914815]  [<ffffffff81130028>] irq_work_run+0x18/0x30                     
[ 1556.921277]  [<ffffffff8107272b>] update_process_times+0x5b/0x70             
[ 1556.928572]  [<ffffffff810d8665>] tick_sched_handle.isra.20+0x25/0x60        
[ 1556.936233]  [<ffffffff810d8d41>] tick_sched_timer+0x41/0x60                 
[ 1556.943021]  [<ffffffff8108b596>] __run_hrtimer+0x86/0x1e0                   
[ 1556.949681]  [<ffffffff810d8d00>] ? tick_sched_do_timer+0x40/0x40            
[ 1556.956981]  [<ffffffff8108bd87>] hrtimer_interrupt+0xf7/0x240               
[ 1556.964068]  [<ffffffff81044637>] local_apic_timer_interrupt+0x37/0x60       
[ 1556.971800]  [<ffffffff8165c996>] smp_trace_apic_timer_interrupt+0x46/0xb9   
[ 1556.979903]  [<ffffffff8165b39d>] trace_apic_timer_interrupt+0x6d/0x80       
[ 1556.987606]  <EOI>                                                           
[ 1556.989710]  [<ffffffff8165103e>] ? _raw_spin_unlock_irq+0x2e/0x40           
[ 1556.998038]  [<ffffffff81651037>] ? _raw_spin_unlock_irq+0x27/0x40           
[ 1557.005313]  [<ffffffff81090c4d>] finish_task_switch+0x7d/0x120              
[ 1557.012301]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120            
[ 1557.019455]  [<ffffffff8164c6b0>] __schedule+0x2c0/0x740                     
[ 1557.025852]  [<ffffffff8164d009>] schedule_preempt_disabled+0x29/0x70        
[ 1557.033380]  [<ffffffff810aa0e3>] cpu_startup_entry+0x133/0x3d0              
[ 1557.040257]  [<ffffffff81042a43>] start_secondary+0x193/0x200                
[ 1557.047061] Code: 48 c7 45 c8 00 00 00 00 48 c7 45 90 00 00 00 00 48 c7 45 d0 00 00 00 00 75 11 eb 52 66 90 48 8b 5b 40 48 85 db 74 47 48 83 eb 40 <f6> 83 90 01 00 00 01 75 ea f6 83 e8 00 00 00 20 75 e1 48 8d b5                          
[ 1557.070211] RIP  [<ffffffff8113884d>] perf_tp_event+0x9d/0x210               
[ 1557.077114]  RSP <ffff88011eb03af8>                                          
[ 1557.081529] ---[ end trace de66fd3e04dbf8d0 ]---                             
[ 1557.087043] Kernel panic - not syncing: Fatal exception in interrupt         
[ 1558.139376] Shutting down cpus with NMI                                      
[ 1558.144092] Dumping ftrace buffer:                                           
[ 1558.148310]    (ftrace buffer empty)                                         
[ 1558.152807] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)             

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-16 17:43     ` Vince Weaver
@ 2014-04-16 17:47       ` Peter Zijlstra
  2014-04-17  9:48       ` Ingo Molnar
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-16 17:47 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar

On Wed, Apr 16, 2014 at 01:43:49PM -0400, Vince Weaver wrote:
> On Wed, 16 Apr 2014, Vince Weaver wrote:
> 
> > On Wed, 16 Apr 2014, Peter Zijlstra wrote:
> > 
> > > Does the below make any difference? I've only ran it through some light
> > > testing to make sure it didn't insta-explode on running.
> > > 
> > > (perf stat make -j64 -s in fact)
> > 
> > I'm running with your patch now and so far so good.
> 
> spoke too soon, just got this with your patch applied
> (I wasn't running ftrace so no trace with this one):
> 
> [ 1555.756490] Slab corruption (Not tainted): kmalloc-2048 start=ffff88011879a000, len=2048
> [ 1555.765699] 040: 6b 6b 6b 6b 6b 6b 6b 6b 88 a7 97 ce 00 88 ff ff  kkkkkkkk........
> [ 1555.774684] Next obj: start=ffff88011879a800, len=2048
> [ 1555.780396] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> [ 1555.789150] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

Awww, bugger. OK more staring tomorrow. Thanks for trying.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-16 17:43     ` Vince Weaver
  2014-04-16 17:47       ` Peter Zijlstra
@ 2014-04-17  9:48       ` Ingo Molnar
  2014-04-17 11:45         ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: Ingo Molnar @ 2014-04-17  9:48 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Wed, 16 Apr 2014, Vince Weaver wrote:
> 
> > On Wed, 16 Apr 2014, Peter Zijlstra wrote:
> > 
> > > Does the below make any difference? I've only ran it through some light
> > > testing to make sure it didn't insta-explode on running.
> > > 
> > > (perf stat make -j64 -s in fact)
> > 
> > I'm running with your patch now and so far so good.
> 
> spoke too soon, just got this with your patch applied
> (I wasn't running ftrace so no trace with this one):

Is there some place where I can pick up the latestest of your fuzzer?

PeterZ has trouble reproducing the corruption locally - I'd like to 
run it too, maybe I have hardware that triggers it more readily.

Plus it would be nice to get your latest .config as well, that you are 
using to generate this crash. (You can send it in private mail as well 
if you wish.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17  9:48       ` Ingo Molnar
@ 2014-04-17 11:45         ` Peter Zijlstra
  2014-04-17 14:22           ` Ingo Molnar
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-17 11:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Vince Weaver, linux-kernel, Thomas Gleixner

On Thu, Apr 17, 2014 at 11:48:15AM +0200, Ingo Molnar wrote:
> Is there some place where I can pick up the latestest of your fuzzer?
> 
> PeterZ has trouble reproducing the corruption locally - I'd like to 
> run it too, maybe I have hardware that triggers it more readily.

>From a few emails up:

"If you want to try running the fuzzer on your machine too just do:
        git clone https://github.com/deater/perf_event_tests.git
        cd fuzzer
        make
and then try running the "./fast_repro98.sh" script, as that's the forking
workload I've been using when tracking this issue."


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17 11:45         ` Peter Zijlstra
@ 2014-04-17 14:22           ` Ingo Molnar
  2014-04-17 14:42             ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Ingo Molnar @ 2014-04-17 14:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, Thomas Gleixner


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Apr 17, 2014 at 11:48:15AM +0200, Ingo Molnar wrote:
> > Is there some place where I can pick up the latestest of your fuzzer?
> > 
> > PeterZ has trouble reproducing the corruption locally - I'd like to 
> > run it too, maybe I have hardware that triggers it more readily.
> 
> From a few emails up:
> 
> "If you want to try running the fuzzer on your machine too just do:
>         git clone https://github.com/deater/perf_event_tests.git
>         cd fuzzer
>         make
> and then try running the "./fast_repro98.sh" script, as that's the forking
> workload I've been using when tracking this issue."

Cool, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17 14:22           ` Ingo Molnar
@ 2014-04-17 14:42             ` Vince Weaver
  2014-04-17 14:54               ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-17 14:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner

On Thu, 17 Apr 2014, Ingo Molnar wrote:

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Apr 17, 2014 at 11:48:15AM +0200, Ingo Molnar wrote:
> > > Is there some place where I can pick up the latestest of your fuzzer?
> > > 
> > > PeterZ has trouble reproducing the corruption locally - I'd like to 
> > > run it too, maybe I have hardware that triggers it more readily.
> > 
> > From a few emails up:
> > 
> > "If you want to try running the fuzzer on your machine too just do:
> >         git clone https://github.com/deater/perf_event_tests.git
> >         cd fuzzer
> >         make
> > and then try running the "./fast_repro98.sh" script, as that's the forking
> > workload I've been using when tracking this issue."

I have to admit the slab corruption message is a new development with 
3.15-rc1.

I've been trying to track a possibly-the-same (but harder to debug)
memory corruption issue for a while now.

I can trigger this type of error easily on both a core2 and haswell 
machine.

I have a few more test machines I can run, and just got some extra 
serial cables so maybe I can hook up a few more serial consoles and do 
some additional testing.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17 14:42             ` Vince Weaver
@ 2014-04-17 14:54               ` Peter Zijlstra
  2014-04-17 15:35                 ` Vince Weaver
  2014-04-18 14:45                 ` Vince Weaver
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-17 14:54 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner

On Thu, Apr 17, 2014 at 10:42:47AM -0400, Vince Weaver wrote:
> On Thu, 17 Apr 2014, Ingo Molnar wrote:
> 
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Thu, Apr 17, 2014 at 11:48:15AM +0200, Ingo Molnar wrote:
> > > > Is there some place where I can pick up the latestest of your fuzzer?
> > > > 
> > > > PeterZ has trouble reproducing the corruption locally - I'd like to 
> > > > run it too, maybe I have hardware that triggers it more readily.
> > > 
> > > From a few emails up:
> > > 
> > > "If you want to try running the fuzzer on your machine too just do:
> > >         git clone https://github.com/deater/perf_event_tests.git
> > >         cd fuzzer
> > >         make
> > > and then try running the "./fast_repro98.sh" script, as that's the forking
> > > workload I've been using when tracking this issue."
> 
> I have to admit the slab corruption message is a new development with 
> 3.15-rc1.

Meh.. my machine keeps locking up with 15-rc1 and your fuzzer. It looks
to get stuck a finish_task_switch() from a preemption while waiting for
a perf IPI.

Which is complete crack because we have preemption disabled over
issueing and waiting for the IPI :/

I tried reverting some of the IPI related patches, but no joy so far,
I'm about to go try a git-bisect on this.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17 14:54               ` Peter Zijlstra
@ 2014-04-17 15:35                 ` Vince Weaver
  2014-04-18 14:45                 ` Vince Weaver
  1 sibling, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-17 15:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner

On Thu, 17 Apr 2014, Peter Zijlstra wrote:
> 
> Meh.. my machine keeps locking up with 15-rc1 and your fuzzer. It looks
> to get stuck a finish_task_switch() from a preemption while waiting for
> a perf IPI.
> 
> Which is complete crack because we have preemption disabled over
> issueing and waiting for the IPI :/
> 
> I tried reverting some of the IPI related patches, but no joy so far,
> I'm about to go try a git-bisect on this.

Bisecting can be a pain, as if you go too far back you start running into 
other bugs that have been fixed due to the fuzzer.

Over the past year there's been at least 10 perf_fuzzer-related crash 
fixes that have gotten into the kernel.  It used to be I could crash 
things in seconds, it's now up to minutes-hours but the bugs are that much 
harder to isolate :(

And I still am not fuzzing with signal-overflow enabled, which 
causes even more pain.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-17 14:54               ` Peter Zijlstra
  2014-04-17 15:35                 ` Vince Weaver
@ 2014-04-18 14:45                 ` Vince Weaver
  2014-04-18 14:51                   ` Vince Weaver
  2014-04-18 15:23                   ` Peter Zijlstra
  1 sibling, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-18 14:45 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt


OK, since the slab corruption was happening to event->hlist_entry->pprev
I added a WARN() call to every modifier of pprev under 
include/linux/*list*.h to see what was stomping over freed memory.

This is what came up:

Apr 18 10:36:11 haswell kernel: [  998.316177] ------------[ cut here ]------------
Apr 18 10:36:11 haswell kernel: [  998.321188] WARNING: CPU: 3 PID: 20717 at include/linux/rculist.h:410 perf_trace_add+0xc1/0x100()
Apr 18 10:36:11 haswell kernel: [  998.330681] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys
Apr 18 10:36:11 haswell kernel: [  998.405736] CPU: 3 PID: 20717 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #63
Apr 18 10:36:11 haswell kernel: [  998.413162] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
Apr 18 10:36:11 haswell kernel: [  998.420987]  0000000000000009 ffff880117923c70 ffffffff8164f753 0000000000000000
Apr 18 10:36:11 haswell kernel: [  998.429016]  ffff880117923ca8 ffffffff810647cd ffffe8ffffcc30d8 ffff8800ce29f040
Apr 18 10:36:11 haswell kernel: [  998.437121]  ffffffff81c1ba40 ffff8800cd3d9040 000000da35b18e50 ffff880117923cb8
Apr 18 10:36:11 haswell kernel: [  998.445246] Call Trace:
Apr 18 10:36:11 haswell kernel: [  998.447910]  [<ffffffff8164f753>] dump_stack+0x45/0x56
Apr 18 10:36:11 haswell kernel: [  998.453451]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
Apr 18 10:36:11 haswell kernel: [  998.459871]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
Apr 18 10:36:11 haswell kernel: [  998.466143]  [<ffffffff81125a01>] perf_trace_add+0xc1/0x100
Apr 18 10:36:11 haswell kernel: [  998.472160]  [<ffffffff81136640>] event_sched_in.isra.76+0x90/0x1e0
Apr 18 10:36:11 haswell kernel: [  998.478849]  [<ffffffff811367f9>] group_sched_in+0x69/0x1e0
Apr 18 10:36:11 haswell kernel: [  998.484812]  [<ffffffff81136e45>] __perf_event_enable+0x255/0x260
Apr 18 10:36:11 haswell kernel: [  998.491370]  [<ffffffff81132340>] remote_function+0x40/0x50
Apr 18 10:36:11 haswell kernel: [  998.497311]  [<ffffffff810de116>] generic_exec_single+0x126/0x170
Apr 18 10:36:11 haswell kernel: [  998.503764]  [<ffffffff81132300>] ? task_clock_event_add+0x40/0x40
Apr 18 10:36:11 haswell kernel: [  998.510432]  [<ffffffff810de1c7>] smp_call_function_single+0x67/0xa0
Apr 18 10:36:11 haswell kernel: [  998.517299]  [<ffffffff811312b4>] task_function_call+0x44/0x50
Apr 18 10:36:11 haswell kernel: [  998.523539]  [<ffffffff81136bf0>] ? perf_event_sched_in+0x90/0x90
Apr 18 10:36:11 haswell kernel: [  998.530085]  [<ffffffff81131350>] perf_event_enable+0x90/0xf0
Apr 18 10:36:11 haswell kernel: [  998.536308]  [<ffffffff811312c0>] ? task_function_call+0x50/0x50
Apr 18 10:36:11 haswell kernel: [  998.542761]  [<ffffffff8113142a>] perf_event_for_each_child+0x3a/0xa0
Apr 18 10:36:11 haswell kernel: [  998.551512]  [<ffffffff811379af>] perf_event_task_enable+0x4f/0x80
Apr 18 10:36:11 haswell kernel: [  998.560080]  [<ffffffff8107c015>] SyS_prctl+0x255/0x4b0
Apr 18 10:36:11 haswell kernel: [  998.567605]  [<ffffffff813c1406>] ? lockdep_sys_exit_thunk+0x35/0x67
Apr 18 10:36:11 haswell kernel: [  998.576333]  [<ffffffff816609ed>] system_call_fastpath+0x1a/0x1f
Apr 18 10:36:11 haswell kernel: [  998.584698] ---[ end trace b175966afd57a174 ]---
Apr 18 10:36:12 haswell kernel: [  998.910691] ------------[ cut here ]------------


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-18 14:45                 ` Vince Weaver
@ 2014-04-18 14:51                   ` Vince Weaver
  2014-04-18 15:23                   ` Peter Zijlstra
  1 sibling, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-18 14:51 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner,
	Steven Rostedt

On Fri, 18 Apr 2014, Vince Weaver wrote:

> 
> OK, since the slab corruption was happening to event->hlist_entry->pprev
> I added a WARN() call to every modifier of pprev under 
> include/linux/*list*.h to see what was stomping over freed memory.
> 
> This is what came up:

actually there were 3 hits on the WARNING before the machine locked up.

Apr 18 10:36:11 haswell kernel: [  998.316177] ------------[ cut here ]------------
Apr 18 10:36:11 haswell kernel: [  998.321188] WARNING: CPU: 3 PID: 20717 at include/linux/rculist.h:410 perf_trace_add+0xc1/0x100()
Apr 18 10:36:11 haswell kernel: [  998.330681] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys
Apr 18 10:36:11 haswell kernel: [  998.405736] CPU: 3 PID: 20717 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #63
Apr 18 10:36:11 haswell kernel: [  998.413162] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
Apr 18 10:36:11 haswell kernel: [  998.420987]  0000000000000009 ffff880117923c70 ffffffff8164f753 0000000000000000
Apr 18 10:36:11 haswell kernel: [  998.429016]  ffff880117923ca8 ffffffff810647cd ffffe8ffffcc30d8 ffff8800ce29f040
Apr 18 10:36:11 haswell kernel: [  998.437121]  ffffffff81c1ba40 ffff8800cd3d9040 000000da35b18e50 ffff880117923cb8
Apr 18 10:36:11 haswell kernel: [  998.445246] Call Trace:
Apr 18 10:36:11 haswell kernel: [  998.447910]  [<ffffffff8164f753>] dump_stack+0x45/0x56
Apr 18 10:36:11 haswell kernel: [  998.453451]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
Apr 18 10:36:11 haswell kernel: [  998.459871]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
Apr 18 10:36:11 haswell kernel: [  998.466143]  [<ffffffff81125a01>] perf_trace_add+0xc1/0x100
Apr 18 10:36:11 haswell kernel: [  998.472160]  [<ffffffff81136640>] event_sched_in.isra.76+0x90/0x1e0
Apr 18 10:36:11 haswell kernel: [  998.478849]  [<ffffffff811367f9>] group_sched_in+0x69/0x1e0
Apr 18 10:36:11 haswell kernel: [  998.484812]  [<ffffffff81136e45>] __perf_event_enable+0x255/0x260
Apr 18 10:36:11 haswell kernel: [  998.491370]  [<ffffffff81132340>] remote_function+0x40/0x50
Apr 18 10:36:11 haswell kernel: [  998.497311]  [<ffffffff810de116>] generic_exec_single+0x126/0x170
Apr 18 10:36:11 haswell kernel: [  998.503764]  [<ffffffff81132300>] ? task_clock_event_add+0x40/0x40
Apr 18 10:36:11 haswell kernel: [  998.510432]  [<ffffffff810de1c7>] smp_call_function_single+0x67/0xa0
Apr 18 10:36:11 haswell kernel: [  998.517299]  [<ffffffff811312b4>] task_function_call+0x44/0x50
Apr 18 10:36:11 haswell kernel: [  998.523539]  [<ffffffff81136bf0>] ? perf_event_sched_in+0x90/0x90
Apr 18 10:36:11 haswell kernel: [  998.530085]  [<ffffffff81131350>] perf_event_enable+0x90/0xf0
Apr 18 10:36:11 haswell kernel: [  998.536308]  [<ffffffff811312c0>] ? task_function_call+0x50/0x50
Apr 18 10:36:11 haswell kernel: [  998.542761]  [<ffffffff8113142a>] perf_event_for_each_child+0x3a/0xa0
Apr 18 10:36:11 haswell kernel: [  998.551512]  [<ffffffff811379af>] perf_event_task_enable+0x4f/0x80
Apr 18 10:36:11 haswell kernel: [  998.560080]  [<ffffffff8107c015>] SyS_prctl+0x255/0x4b0
Apr 18 10:36:11 haswell kernel: [  998.567605]  [<ffffffff813c1406>] ? lockdep_sys_exit_thunk+0x35/0x67
Apr 18 10:36:11 haswell kernel: [  998.576333]  [<ffffffff816609ed>] system_call_fastpath+0x1a/0x1f
Apr 18 10:36:11 haswell kernel: [  998.584698] ---[ end trace b175966afd57a174 ]---
Apr 18 10:36:12 haswell kernel: [  998.910691] ------------[ cut here ]------------
Apr 18 10:36:12 haswell kernel: [  998.917828] WARNING: CPU: 4 PID: 22741 at include/linux/rculist.h:410 perf_swevent_add+0x14c/0x190()
Apr 18 10:36:12 haswell kernel: [  998.930319] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys
Apr 18 10:36:12 haswell kernel: [  999.024710] CPU: 4 PID: 22741 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #63
Apr 18 10:36:12 haswell kernel: [  999.036194] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
Apr 18 10:36:12 haswell kernel: [  999.046972]  0000000000000009 ffff8800ce12dce0 ffffffff8164f753 0000000000000000
Apr 18 10:36:12 haswell kernel: [  999.057871]  ffff8800ce12dd18 ffffffff810647cd ffff8800c9ef6000 ffff8800ce29f040
Apr 18 10:36:12 haswell kernel: [  999.068652]  ffff8800ca4ff3c0 ffff8800c9ef6040 0000000000000000 ffff8800ce12dd28
Apr 18 10:36:12 haswell kernel: [  999.079416] Call Trace:
Apr 18 10:36:12 haswell kernel: [  999.084718]  [<ffffffff8164f753>] dump_stack+0x45/0x56
Apr 18 10:36:12 haswell kernel: [  999.092858]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
Apr 18 10:36:12 haswell kernel: [  999.101848]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
Apr 18 10:36:12 haswell kernel: [  999.110692]  [<ffffffff811320dc>] perf_swevent_add+0x14c/0x190
Apr 18 10:36:12 haswell kernel: [  999.119529]  [<ffffffff81136640>] event_sched_in.isra.76+0x90/0x1e0
Apr 18 10:36:12 haswell kernel: [  999.128801]  [<ffffffff811367f9>] group_sched_in+0x69/0x1e0
Apr 18 10:36:12 haswell kernel: [  999.137202]  [<ffffffff81136ad7>] ctx_sched_in+0x167/0x1f0
Apr 18 10:36:12 haswell kernel: [  999.145553]  [<ffffffff81136b9a>] perf_event_sched_in+0x3a/0x90
Apr 18 10:36:12 haswell kernel: [  999.154439]  [<ffffffff811370db>] perf_event_context_sched_in+0x7b/0xc0
Apr 18 10:36:12 haswell kernel: [  999.164065]  [<ffffffff8113785d>] __perf_event_task_sched_in+0x1dd/0x1f0
Apr 18 10:36:12 haswell kernel: [  999.173769]  [<ffffffff81091398>] finish_task_switch+0xd8/0x120
Apr 18 10:36:12 haswell kernel: [  999.182555]  [<ffffffff81096a07>] schedule_tail+0x27/0xb0
Apr 18 10:36:12 haswell kernel: [  999.190843]  [<ffffffff816608cf>] ret_from_fork+0xf/0xb0
Apr 18 10:36:12 haswell kernel: [  999.199012] ---[ end trace b175966afd57a175 ]---
Apr 18 10:36:13 haswell kernel: [  999.778173] ------------[ cut here ]------------
Apr 18 10:36:13 haswell kernel: [  999.785775] WARNING: CPU: 4 PID: 22755 at include/linux/rculist.h:410 perf_swevent_add+0x14c/0x190()
Apr 18 10:36:13 haswell kernel: [  999.797857] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys
Apr 18 10:36:13 haswell kernel: [  999.884365] CPU: 4 PID: 22755 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #63
Apr 18 10:36:13 haswell kernel: [  999.894321] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
Apr 18 10:36:13 haswell kernel: [  999.903501]  0000000000000009 ffff88011eb03df8 ffffffff8164f753 0000000000000000
Apr 18 10:36:13 haswell kernel: [  999.912817]  ffff88011eb03e30 ffffffff810647cd ffff8800ca0bc800 ffff8800ce29f040
Apr 18 10:36:13 haswell kernel: [  999.922105]  ffff8800ca4ff3c0 ffff8800ca0bc840 0000000000004a30 ffff88011eb03e40
Apr 18 10:36:13 haswell kernel: [  999.931378] Call Trace:
Apr 18 10:36:13 haswell kernel: [  999.935215]  <IRQ>  [<ffffffff8164f753>] dump_stack+0x45/0x56
Apr 18 10:36:13 haswell kernel: [  999.942436]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
Apr 18 10:36:13 haswell kernel: [  999.949915]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
Apr 18 10:36:13 haswell kernel: [  999.957098]  [<ffffffff811320dc>] perf_swevent_add+0x14c/0x190
Apr 18 10:36:13 haswell kernel: [  999.964273]  [<ffffffff81136640>] event_sched_in.isra.76+0x90/0x1e0
Apr 18 10:36:13 haswell kernel: [  999.971863]  [<ffffffff811367f9>] group_sched_in+0x69/0x1e0
Apr 18 10:36:13 haswell kernel: [  999.978608]  [<ffffffff81136e45>] __perf_event_enable+0x255/0x260
Apr 18 10:36:13 haswell kernel: [  999.985998]  [<ffffffff81132340>] remote_function+0x40/0x50
Apr 18 10:36:13 haswell kernel: [  999.992805]  [<ffffffff810de8fd>] generic_smp_call_function_single_interrupt+0x5d/0x100
Apr 18 10:36:13 haswell kernel: [ 1000.002312]  [<ffffffff810421dd>] smp_trace_call_function_single_interrupt+0x2d/0xb0
Apr 18 10:36:13 haswell kernel: [ 1000.011260]  [<ffffffff81661c9d>] trace_call_function_single_interrupt+0x6d/0x80
Apr 18 10:36:13 haswell kernel: [ 1000.020035]  <EOI>  [<ffffffff810b1c26>] ? lock_acquire+0xb6/0x120
Apr 18 10:36:13 haswell kernel: [ 1000.027489]  [<ffffffff81139b4b>] ? __perf_sw_event+0x6b/0x230
Apr 18 10:36:13 haswell kernel: [ 1000.034488]  [<ffffffff81139bcf>] __perf_sw_event+0xef/0x230
Apr 18 10:36:13 haswell kernel: [ 1000.041203]  [<ffffffff81139b4b>] ? __perf_sw_event+0x6b/0x230
Apr 18 10:36:13 haswell kernel: [ 1000.048190]  [<ffffffff810b139d>] ? __lock_acquire.isra.29+0x3bd/0xb90
Apr 18 10:36:13 haswell kernel: [ 1000.055836]  [<ffffffff8165b2ce>] __do_page_fault+0x21e/0x530
Apr 18 10:36:13 haswell kernel: [ 1000.062586]  [<ffffffff813cdd6f>] ? debug_check_no_obj_freed+0x19f/0x360
Apr 18 10:36:13 haswell kernel: [ 1000.070501]  [<ffffffff81149ca2>] ? free_pages_prepare+0x1b2/0x230
Apr 18 10:36:13 haswell kernel: [ 1000.077811]  [<ffffffff8165b647>] trace_do_page_fault+0x37/0xb0
Apr 18 10:36:13 haswell kernel: [ 1000.084762]  [<ffffffff81657cd8>] trace_page_fault+0x28/0x30
Apr 18 10:36:13 haswell kernel: [ 1000.091480]  [<ffffffff813c12a0>] ? __put_user_4+0x20/0x30
Apr 18 10:36:13 haswell kernel: [ 1000.098045]  [<ffffffff81096a3c>] ? schedule_tail+0x5c/0xb0
Apr 18 10:36:13 haswell kernel: [ 1000.104653]  [<ffffffff816608cf>] ret_from_fork+0xf/0xb0
Apr 18 10:36:13 haswell kernel: [ 1000.111017] ---[ end trace b175966afd57a176 ]---


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-18 14:45                 ` Vince Weaver
  2014-04-18 14:51                   ` Vince Weaver
@ 2014-04-18 15:23                   ` Peter Zijlstra
  2014-04-18 16:59                     ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-18 15:23 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Fri, Apr 18, 2014 at 10:45:47AM -0400, Vince Weaver wrote:
> 
> OK, since the slab corruption was happening to event->hlist_entry->pprev
> I added a WARN() call to every modifier of pprev under 
> include/linux/*list*.h to see what was stomping over freed memory.
> 
> This is what came up:
> 
> Apr 18 10:36:11 haswell kernel: [  998.316177] ------------[ cut here ]------------
> Apr 18 10:36:11 haswell kernel: [  998.321188] WARNING: CPU: 3 PID: 20717 at include/linux/rculist.h:410 perf_trace_add+0xc1/0x100()
> Apr 18 10:36:11 haswell kernel: [  998.330681] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_hda_intel iTCO_wdt snd_hda_controller i915 snd_hda_codec evdev crct10dif_pclmul drm_kms_helper iTCO_vendor_support snd_hwdep snd_pcm drm crc32_pclmul snd_seq mei_me parport_pc parport lpc_ich mfd_core psmouse ghash_clmulni_intel snd_timer snd_seq_device mei snd aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore pcspkr serio_raw i2c_i801 processor video i2c_algo_bit i2c_core wmi button battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci xhci_hcd e1000e libata ehci_hcd ptp scsi_mod crc32c_intel usbcore pps_core usb_common fan thermal thermal_sys
> Apr 18 10:36:11 haswell kernel: [  998.405736] CPU: 3 PID: 20717 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #63
> Apr 18 10:36:11 haswell kernel: [  998.413162] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> Apr 18 10:36:11 haswell kernel: [  998.420987]  0000000000000009 ffff880117923c70 ffffffff8164f753 0000000000000000
> Apr 18 10:36:11 haswell kernel: [  998.429016]  ffff880117923ca8 ffffffff810647cd ffffe8ffffcc30d8 ffff8800ce29f040
> Apr 18 10:36:11 haswell kernel: [  998.437121]  ffffffff81c1ba40 ffff8800cd3d9040 000000da35b18e50 ffff880117923cb8
> Apr 18 10:36:11 haswell kernel: [  998.445246] Call Trace:
> Apr 18 10:36:11 haswell kernel: [  998.447910]  [<ffffffff8164f753>] dump_stack+0x45/0x56
> Apr 18 10:36:11 haswell kernel: [  998.453451]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
> Apr 18 10:36:11 haswell kernel: [  998.459871]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
> Apr 18 10:36:11 haswell kernel: [  998.466143]  [<ffffffff81125a01>] perf_trace_add+0xc1/0x100
> Apr 18 10:36:11 haswell kernel: [  998.472160]  [<ffffffff81136640>] event_sched_in.isra.76+0x90/0x1e0
> Apr 18 10:36:11 haswell kernel: [  998.478849]  [<ffffffff811367f9>] group_sched_in+0x69/0x1e0
> Apr 18 10:36:11 haswell kernel: [  998.484812]  [<ffffffff81136e45>] __perf_event_enable+0x255/0x260
> Apr 18 10:36:11 haswell kernel: [  998.491370]  [<ffffffff81132340>] remote_function+0x40/0x50
> Apr 18 10:36:11 haswell kernel: [  998.497311]  [<ffffffff810de116>] generic_exec_single+0x126/0x170
> Apr 18 10:36:11 haswell kernel: [  998.503764]  [<ffffffff81132300>] ? task_clock_event_add+0x40/0x40
> Apr 18 10:36:11 haswell kernel: [  998.510432]  [<ffffffff810de1c7>] smp_call_function_single+0x67/0xa0
> Apr 18 10:36:11 haswell kernel: [  998.517299]  [<ffffffff811312b4>] task_function_call+0x44/0x50
> Apr 18 10:36:11 haswell kernel: [  998.523539]  [<ffffffff81136bf0>] ? perf_event_sched_in+0x90/0x90
> Apr 18 10:36:11 haswell kernel: [  998.530085]  [<ffffffff81131350>] perf_event_enable+0x90/0xf0
> Apr 18 10:36:11 haswell kernel: [  998.536308]  [<ffffffff811312c0>] ? task_function_call+0x50/0x50
> Apr 18 10:36:11 haswell kernel: [  998.542761]  [<ffffffff8113142a>] perf_event_for_each_child+0x3a/0xa0
> Apr 18 10:36:11 haswell kernel: [  998.551512]  [<ffffffff811379af>] perf_event_task_enable+0x4f/0x80
> Apr 18 10:36:11 haswell kernel: [  998.560080]  [<ffffffff8107c015>] SyS_prctl+0x255/0x4b0
> Apr 18 10:36:11 haswell kernel: [  998.567605]  [<ffffffff813c1406>] ? lockdep_sys_exit_thunk+0x35/0x67
> Apr 18 10:36:11 haswell kernel: [  998.576333]  [<ffffffff816609ed>] system_call_fastpath+0x1a/0x1f
> Apr 18 10:36:11 haswell kernel: [  998.584698] ---[ end trace b175966afd57a174 ]---
> Apr 18 10:36:12 haswell kernel: [  998.910691] ------------[ cut here ]------------

OK, that's a good clue. That looks like we're freeing events that still
are on the owner list, which would indicate we're freeing events that
have a refcount.

I added a WARN in free_event() to check the refcount, along with a
number of false positives (through the perf_event_open() fail path) I do
appear to be getting actual fails here.

At least I can 'reproduce' this. Earlier attempts, even based on your
.config only got me very mysterious lockups -- I suspect the corruption
happens on a slightly different spot or so and completely messes up the
machine.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-18 15:23                   ` Peter Zijlstra
@ 2014-04-18 16:59                     ` Peter Zijlstra
  2014-04-18 17:15                       ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-18 16:59 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Fri, Apr 18, 2014 at 05:23:14PM +0200, Peter Zijlstra wrote:
> OK, that's a good clue. That looks like we're freeing events that still
> are on the owner list, which would indicate we're freeing events that
> have a refcount.
> 
> I added a WARN in free_event() to check the refcount, along with a
> number of false positives (through the perf_event_open() fail path) I do
> appear to be getting actual fails here.
> 
> At least I can 'reproduce' this. Earlier attempts, even based on your
> .config only got me very mysterious lockups -- I suspect the corruption
> happens on a slightly different spot or so and completely messes up the
> machine.

The below should have only made the false positives go away, but my
machine has magically stopped going all funny on me. Could you give it a
go?

---
 kernel/events/core.c | 73 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 30 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a3e46d..f9d7b859395e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3231,8 +3231,11 @@ static void __free_event(struct perf_event *event)
 
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
-static void free_event(struct perf_event *event)
+
+static void _free_event(struct perf_event *event)
 {
+	WARN_ON(atomic_long_read(&event->refcount));
+
 	irq_work_sync(&event->pending);
 
 	unaccount_event(event);
@@ -3259,45 +3262,28 @@ static void free_event(struct perf_event *event)
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
-
 	__free_event(event);
 }
 
-int perf_event_release_kernel(struct perf_event *event)
-{
-	struct perf_event_context *ctx = event->ctx;
-
-	WARN_ON_ONCE(ctx->parent_ctx);
-	/*
-	 * There are two ways this annotation is useful:
-	 *
-	 *  1) there is a lock recursion from perf_event_exit_task
-	 *     see the comment there.
-	 *
-	 *  2) there is a lock-inversion with mmap_sem through
-	 *     perf_event_read_group(), which takes faults while
-	 *     holding ctx->mutex, however this is called after
-	 *     the last filedesc died, so there is no possibility
-	 *     to trigger the AB-BA case.
-	 */
-	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
-	raw_spin_lock_irq(&ctx->lock);
-	perf_group_detach(event);
-	raw_spin_unlock_irq(&ctx->lock);
-	perf_remove_from_context(event);
-	mutex_unlock(&ctx->mutex);
-
-	free_event(event);
+static void put_event(struct perf_event *event);
 
-	return 0;
+static void free_event(struct perf_event *event)
+{
+	if (unlikely(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)) {
+		WARN(1, "unexpected event refcount: %ld\n", 
+				atomic_long_read(&event->refcount));
+		put_event(event);
+		return;
+	}
+	_free_event(event);
 }
-EXPORT_SYMBOL_GPL(perf_event_release_kernel);
 
 /*
  * Called when the last reference to the file is gone.
  */
 static void put_event(struct perf_event *event)
 {
+	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *owner;
 
 	if (!atomic_long_dec_and_test(&event->refcount))
@@ -3336,9 +3322,36 @@ static void put_event(struct perf_event *event)
 		put_task_struct(owner);
 	}
 
-	perf_event_release_kernel(event);
+	WARN_ON_ONCE(ctx->parent_ctx);
+	/*
+	 * There are two ways this annotation is useful:
+	 *
+	 *  1) there is a lock recursion from perf_event_exit_task
+	 *     see the comment there.
+	 *
+	 *  2) there is a lock-inversion with mmap_sem through
+	 *     perf_event_read_group(), which takes faults while
+	 *     holding ctx->mutex, however this is called after
+	 *     the last filedesc died, so there is no possibility
+	 *     to trigger the AB-BA case.
+	 */
+	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
+	raw_spin_lock_irq(&ctx->lock);
+	perf_group_detach(event);
+	raw_spin_unlock_irq(&ctx->lock);
+	perf_remove_from_context(event);
+	mutex_unlock(&ctx->mutex);
+
+	_free_event(event);
 }
 
+int perf_event_release_kernel(struct perf_event *event)
+{
+	put_event(event);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_event_release_kernel);
+
 static int perf_release(struct inode *inode, struct file *file)
 {
 	put_event(file->private_data);

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-18 16:59                     ` Peter Zijlstra
@ 2014-04-18 17:15                       ` Peter Zijlstra
  2014-04-23 20:58                         ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-18 17:15 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Fri, Apr 18, 2014 at 06:59:58PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 18, 2014 at 05:23:14PM +0200, Peter Zijlstra wrote:
> > OK, that's a good clue. That looks like we're freeing events that still
> > are on the owner list, which would indicate we're freeing events that
> > have a refcount.
> > 
> > I added a WARN in free_event() to check the refcount, along with a
> > number of false positives (through the perf_event_open() fail path) I do
> > appear to be getting actual fails here.
> > 
> > At least I can 'reproduce' this. Earlier attempts, even based on your
> > .config only got me very mysterious lockups -- I suspect the corruption
> > happens on a slightly different spot or so and completely messes up the
> > machine.
> 
> The below should have only made the false positives go away, but my
> machine has magically stopped going all funny on me. Could you give it a
> go?
> 

Hmm the fuzzer task seems stuck in kernel space, can't kill it anymore.

So its likely it just didn't get around to doing enough to wreck the
system or so.

/me goes stab it in the eye.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-18 17:15                       ` Peter Zijlstra
@ 2014-04-23 20:58                         ` Vince Weaver
  2014-04-25  2:51                           ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-23 20:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Fri, 18 Apr 2014, Peter Zijlstra wrote:

> Hmm the fuzzer task seems stuck in kernel space, can't kill it anymore.
> 
> So its likely it just didn't get around to doing enough to wreck the
> system or so.
> 
> /me goes stab it in the eye.

OK, I managed to get a trace while this bug was happening.

>From my (non-expert) analysis this is what happens.

[CPU0] 1422.741358  -- perf_event_open() opens event 17 (0x11)
		which kmalloc()'d event struct address 0xffff8800cf213000

[CPU1] 1422.814014  -- clone() is called, spawning proces 31443 on CPU7
		event 17 is inherited across the clone

[CPU1] 1422.816957  -- in parent thread, event 17 is closed

[CPU1] 1422.820013  -- parent thread kills child process 31443,
			last known user of closed event 17
....

[CPU7] 1422.856881  -- grace period expires, kfree of 0xffff8800cf213000
			from CPU of child
....

[CPU1] 1423.154079  -- a prctl call to activate events calls
			perf_swevent_add() which calls
			 hlist_add_head_rcu() which finds the first
			element in the CPU1 swevent_htable hash list to 
			be our already freed (and poisoned)
			0xffff8800cf213000


In any case, when we close the event, are we somehow not removing
it from all of the swevent_htable (one per cpu?)

A link to the trace can be found here:

   web.eece.maine.edu/~vweaver/junk/interesting.trace.bz2

And the log splat here:

[ 1423.159052] WARNING: CPU: 1 PID: 30135 at include/linux/rculist.h:411 perf_swevent_add+0x16f/0x190()
[ 1423.168825] Modules linked in: fuse snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep crct10dif_pclmul i915 snd_pcm crc32_pclmul iTCO_wdt ghash_clmulni_intel aesni_intel snd_seq evdev iTCO_vendor_support drm_kms_helper snd_timer aes_x86_64 lrw gf128mul drm snd_seq_device glue_helper psmouse snd processor mei_me soundcore ablk_helper cryptd mei pcspkr video battery serio_raw i2c_i801 i2c_algo_bit lpc_ich mfd_core tpm_tis tpm parport_pc parport i2c_core wmi button sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci ehci_pci libahci e1000e xhci_hcd ehci_hcd libata ptp crc32c_intel scsi_mod usbcore usb_common pps_core fan thermal thermal_sys
[ 1423.242637] CPU: 1 PID: 30135 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #86
[ 1423.250125] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 1423.258049]  0000000000000009 ffff8800c30e5c78 ffffffff8164f7a3 0000000000000000
[ 1423.266087]  ffff8800c30e5cb0 ffffffff810647cd ffff880118383000 ffff8800cf213040
[ 1423.274159]  ffff8800b9e36788 ffff880118383040 00000145269017e9 ffff8800c30e5cc0
[ 1423.282173] Call Trace:
[ 1423.284791]  [<ffffffff8164f7a3>] dump_stack+0x45/0x56
[ 1423.290352]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
[ 1423.296775]  [<ffffffff810648aa>] warn_slowpath_null+0x1a/0x20
[ 1423.303064]  [<ffffffff8113211f>] perf_swevent_add+0x16f/0x190
[ 1423.309348]  [<ffffffff811367a0>] event_sched_in.isra.76+0x90/0x1e0
[ 1423.316084]  [<ffffffff81136959>] group_sched_in+0x69/0x1e0
[ 1423.322076]  [<ffffffff81136fa5>] __perf_event_enable+0x255/0x260
[ 1423.328580]  [<ffffffff81132360>] remote_function+0x40/0x50
[ 1423.334599]  [<ffffffff810de126>] generic_exec_single+0x126/0x170
[ 1423.341136]  [<ffffffff81132320>] ? task_clock_event_add+0x40/0x40
[ 1423.347809]  [<ffffffff810de1d7>] smp_call_function_single+0x67/0xa0
[ 1423.354642]  [<ffffffff811312d4>] task_function_call+0x44/0x50
[ 1423.360901]  [<ffffffff81136d50>] ? perf_event_sched_in+0x90/0x90
[ 1423.367441]  [<ffffffff81131370>] perf_event_enable+0x90/0xf0
[ 1423.373612]  [<ffffffff811312e0>] ? task_function_call+0x50/0x50
[ 1423.380089]  [<ffffffff8113144a>] perf_event_for_each_child+0x3a/0xa0
[ 1423.386949]  [<ffffffff81137b0f>] perf_event_task_enable+0x4f/0x80
[ 1423.393609]  [<ffffffff8107c015>] SyS_prctl+0x255/0x4b0
[ 1423.399208]  [<ffffffff81660c84>] tracesys+0xe1/0xe6
[ 1423.404539] ---[ end trace c9ab81bd2a5a1d1d ]---
[ 1423.506804] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff8800cf213000, len=2048
[ 1423.516610] 040: 6b 6b 6b 6b 6b 6b 6b 6b 88 67 e3 b9 00 88 ff ff  kkkkkkkk.g......
[ 1423.524775] Next obj: start=ffff8800cf213800, len=2048
[ 1423.530314] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1423.538465] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-23 20:58                         ` Vince Weaver
@ 2014-04-25  2:51                           ` Vince Weaver
  2014-04-28 14:21                             ` Vince Weaver
  2014-04-28 17:48                             ` Thomas Gleixner
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-25  2:51 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner,
	Steven Rostedt


I got the bug to trigger again, this time it finally managed to hit a 
debug_objects WARNING if that's any additional help.

The bug followed the same pattern, software event 
(PERF_TYPE_SOFTWARE / PERF_COUNT_SW_TASK_CLOCK) created, fork happens,
event closes in parent, child killed, rcu grace period expires and kfree
but event still active.

here's the kernel message followed by excerpts from the trace, I can 
provide full trace if anyone cares.

Vince

[ 2226.252441] ------------[ cut here ]------------
[ 2226.257503] WARNING: CPU: 4 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x83/0xa0()
[ 2226.266545] ODEBUG: free active (active state 0) object type: hrtimer hint: perf_swevent_hrtimer+0x0/0x140
[ 2226.276952] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic coretemp snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep kvm i915 crct10dif_pclmul crc32_pclmul iTCO_wdt snd_pcm drm_kms_helper ghash_clmulni_intel iTCO_vendor_support snd_seq snd_timer snd_seq_device aesni_intel snd lpc_ich drm evdev i2c_i801 aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse soundcore serio_raw pcspkr i2c_algo_bit parport_pc parport mei_me mei mfd_core i2c_core wmi button processor video battery tpm_tis tpm sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ehci_pci ahci e1000e xhci_hcd ehci_hcd libahci libata ptp crc32c_intel usbcore scsi_mod usb_common pps_core fan thermal thermal_sys
[ 2226.350769] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.15.0-rc1+ #87
[ 2226.357730] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 2226.365658]  0000000000000009 ffff88011eb03cd8 ffffffff8164f7b3 ffff88011eb03d20
[ 2226.373728]  ffff88011eb03d10 ffffffff810647cd ffff8800ce03c888 ffffffff81c433e0
[ 2226.381835]  ffffffff81a19730 ffff8800cf4e0000 ffff8800ce03c888 ffff88011eb03d70
[ 2226.389820] Call Trace:
[ 2226.392428]  <IRQ>  [<ffffffff8164f7b3>] dump_stack+0x45/0x56
[ 2226.398595]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
[ 2226.405059]  [<ffffffff8106483c>] warn_slowpath_fmt+0x4c/0x50
[ 2226.411240]  [<ffffffff813cc9e3>] debug_print_object+0x83/0xa0
[ 2226.417535]  [<ffffffff81139200>] ? __perf_event_overflow+0x270/0x270
[ 2226.424463]  [<ffffffff813cde73>] debug_check_no_obj_freed+0x263/0x360
[ 2226.431500]  [<ffffffff811316aa>] ? free_event_rcu+0x2a/0x30
[ 2226.437579]  [<ffffffff81196fd0>] kfree+0xb0/0x560
[ 2226.442740]  [<ffffffff810ccd46>] ? rcu_process_callbacks+0x236/0x620
[ 2226.449658]  [<ffffffff81131680>] ? pmu_dev_release+0x10/0x10
[ 2226.455811]  [<ffffffff811316aa>] free_event_rcu+0x2a/0x30
[ 2226.461727]  [<ffffffff810ccdad>] rcu_process_callbacks+0x29d/0x620
[ 2226.468440]  [<ffffffff810ccd46>] ? rcu_process_callbacks+0x236/0x620
[ 2226.475384]  [<ffffffff81069ab5>] __do_softirq+0xf5/0x290
[ 2226.481210]  [<ffffffff81069e9d>] irq_exit+0xad/0xc0
[ 2226.486540]  [<ffffffff81662e35>] smp_apic_timer_interrupt+0x45/0x60
[ 2226.493350]  [<ffffffff8166181d>] apic_timer_interrupt+0x6d/0x80
[ 2226.499798]  <EOI>  [<ffffffff810d958e>] ? tick_nohz_idle_exit+0x12e/0x1b0
[ 2226.507192]  [<ffffffff810aa7de>] cpu_startup_entry+0x12e/0x3d0
[ 2226.513542]  [<ffffffff81042a43>] start_secondary+0x193/0x200
[ 2226.519706] ---[ end trace ec55e71b02ef43b3 ]---


Event Created:
	<...>-13590 [000]  2225.706150: sys_enter:            NR 298 (699a70, 0, ffffffff, ffffffff, 8, 8)
	<...>-13590 [000]  2225.706160: kmalloc:              (perf_event_alloc+0x55) call_site=ffffffff8113a565 ptr=0xffff8800cfa02000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
	<...>-13590 [000]  2225.706180: bprint:               SYSC_perf_event_open: Opened: 1 1 0 (PERF_TYPE_SOFTWARE,PERF_COUNT_SW_TASK_CLOCK)
	<...>-13590 [000]  2225.706180: sys_exit:             NR 298 = 14 (0xe)

Fork:
	<...>-13590 [003]  2226.204981: sys_enter:            NR 56 (1200011, 0, 0, 7f6fab28b9d0, 0, 3516)

Close in parent:
	<...>-13590 [003]  2226.216548: sys_enter:            NR 3 (e, 11000, 11000, 22, 7f6fab0780b4, 7f6fab078120)
	<...>-14467 [004]  2226.216548: mm_page_free:         page=0xffffea0002d567e0 pfn=47540192 order=0
	<...>-13590 [003]  2226.216549: sys_exit:             NR 3 = 0

Kill of child:
	<...>-13590 [002]  2226.245087: sys_enter:            NR 62 (3884, 9, 7, 1, 7f6fab0780fc, 7f6fab078120)

Grace period expire/kfree:
	<idle>-0     [004]  2226.252428: kfree:               (free_event_rcu+0x2a) call_site=ffffffff811316aa ptr=0xffff8800cfa02000


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-25  2:51                           ` Vince Weaver
@ 2014-04-28 14:21                             ` Vince Weaver
  2014-04-28 19:38                               ` Vince Weaver
  2014-04-29  8:52                               ` Peter Zijlstra
  2014-04-28 17:48                             ` Thomas Gleixner
  1 sibling, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-28 14:21 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner,
	Steven Rostedt

On Thu, 24 Apr 2014, Vince Weaver wrote:

> [ 2226.257503] WARNING: CPU: 4 PID: 0 at lib/debugobjects.c:260 debug_print_object+0x83/0xa0()
> [ 2226.266545] ODEBUG: free active (active state 0) object type: hrtimer hint: perf_swevent_hrtimer+0x0/0x140
> [ 2226.389820] Call Trace:
> [ 2226.392428]  <IRQ>  [<ffffffff8164f7b3>] dump_stack+0x45/0x56
> [ 2226.398595]  [<ffffffff810647cd>] warn_slowpath_common+0x7d/0xa0
> [ 2226.405059]  [<ffffffff8106483c>] warn_slowpath_fmt+0x4c/0x50
> [ 2226.411240]  [<ffffffff813cc9e3>] debug_print_object+0x83/0xa0
> [ 2226.417535]  [<ffffffff81139200>] ? __perf_event_overflow+0x270/0x270
> [ 2226.424463]  [<ffffffff813cde73>] debug_check_no_obj_freed+0x263/0x360
> [ 2226.431500]  [<ffffffff811316aa>] ? free_event_rcu+0x2a/0x30
> [ 2226.437579]  [<ffffffff81196fd0>] kfree+0xb0/0x560
> [ 2226.442740]  [<ffffffff810ccd46>] ? rcu_process_callbacks+0x236/0x620
> [ 2226.449658]  [<ffffffff81131680>] ? pmu_dev_release+0x10/0x10
> [ 2226.455811]  [<ffffffff811316aa>] free_event_rcu+0x2a/0x30
> [ 2226.461727]  [<ffffffff810ccdad>] rcu_process_callbacks+0x29d/0x620
> [ 2226.468440]  [<ffffffff810ccd46>] ? rcu_process_callbacks+0x236/0x620
> [ 2226.475384]  [<ffffffff81069ab5>] __do_softirq+0xf5/0x290
> [ 2226.481210]  [<ffffffff81069e9d>] irq_exit+0xad/0xc0
> [ 2226.486540]  [<ffffffff81662e35>] smp_apic_timer_interrupt+0x45/0x60
> [ 2226.493350]  [<ffffffff8166181d>] apic_timer_interrupt+0x6d/0x80
> [ 2226.499798]  <EOI>  [<ffffffff810d958e>] ? tick_nohz_idle_exit+0x12e/0x1b0
> [ 2226.507192]  [<ffffffff810aa7de>] cpu_startup_entry+0x12e/0x3d0
> [ 2226.513542]  [<ffffffff81042a43>] start_secondary+0x193/0x200
> [ 2226.519706] ---[ end trace ec55e71b02ef43b3 ]---

so it's looking more and more like this issue is with a
	PERF_COUNT_SW_TASK_CLOCK
event.

It's being deallocated in a different process than it was started (due to 
fork).

And it really looks like the problem is even though the event is free'd, 
there's still an active hrtimer associated with it somehow.

I can't seem to find *why* there's an associated hrtimer though, as the 
event as far as I can tell was created with sample_period=0 and the 
various
	perf_swevent_init_hrtimer()
calls seem to guard with is_sampling()

This is made all the more confusing because the PERF_COUNT_SW_TASK_CLOCK 
events are handled by their own PMU even though it's faked up so they look 
like regular software events.  Is there a reason for that?

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-25  2:51                           ` Vince Weaver
  2014-04-28 14:21                             ` Vince Weaver
@ 2014-04-28 17:48                             ` Thomas Gleixner
  1 sibling, 0 replies; 81+ messages in thread
From: Thomas Gleixner @ 2014-04-28 17:48 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 24 Apr 2014, Vince Weaver wrote:
 
> I got the bug to trigger again, this time it finally managed to hit a 
> debug_objects WARNING if that's any additional help.
> 
> The bug followed the same pattern, software event 
> (PERF_TYPE_SOFTWARE / PERF_COUNT_SW_TASK_CLOCK) created, fork happens,
> event closes in parent, child killed, rcu grace period expires and kfree
> but event still active.
> 
> here's the kernel message followed by excerpts from the trace, I can 
> provide full trace if anyone cares.

I care :)

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-28 14:21                             ` Vince Weaver
@ 2014-04-28 19:38                               ` Vince Weaver
  2014-04-29  9:46                                 ` Peter Zijlstra
  2014-04-29  8:52                               ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-28 19:38 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner,
	Steven Rostedt


OK, this is my current theory as to what's going on.  I'd appreciate any 
comments.


We have an event, let's call it #16.

Event #16 is a SW event created and running in the parent on CPU0.

CPU0 (parent): calls fork()

CPU6 (child): SW Event #16 is still running on CPU0 but is visible
	on CPU6 because the fd passed through with fork

CPU0 (parent) close #16.  Event not deallocated because
        still visible in child

CPU0 (parent) kill child

CPU6 (child) shutting down.
   last user of event #16
   perf_release() called on event
   which eventually calls event_sched_out()
   which calls pmu->del which removes event from swevent_htable
   *but only on CPU6*

**** some sort of race happens with CPU0 (possibly with 
	event_sched_in() and event->state==PERF_EVENT_STATE_INACTIVE)
	That has event #16 in the cpu0 swevent_htable but not
	freed the next time ctx_sched_out() happens ****


CPU6 (idle) grace period expires, kfree happens

the CPU0 hlist still has in the list the now freed (and poisoned)
  event which causes problems, especially as new events added to
  the list over-write bytes starting at 0x48 with pprev values.


Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-28 14:21                             ` Vince Weaver
  2014-04-28 19:38                               ` Vince Weaver
@ 2014-04-29  8:52                               ` Peter Zijlstra
  2014-04-29 18:11                                 ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-29  8:52 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Mon, Apr 28, 2014 at 10:21:34AM -0400, Vince Weaver wrote:
> so it's looking more and more like this issue is with a
> 	PERF_COUNT_SW_TASK_CLOCK
> event.

But they don't actually use the hlist thing..

> It's being deallocated in a different process than it was started (due to 
> fork).
> 
> And it really looks like the problem is even though the event is free'd, 
> there's still an active hrtimer associated with it somehow.

So this is a different problem from the hlist corruption?

> I can't seem to find *why* there's an associated hrtimer though, as the 
> event as far as I can tell was created with sample_period=0 and the 
> various
> 	perf_swevent_init_hrtimer()
> calls seem to guard with is_sampling()

That is indeed, decidedly odd. 

> This is made all the more confusing because the PERF_COUNT_SW_TASK_CLOCK 
> events are handled by their own PMU even though it's faked up so they look 
> like regular software events.  Is there a reason for that?

This was the easiest route when we introduced the mulitple pmu thing or
so, its been on the todo list for a cleanup ever since :-/


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-28 19:38                               ` Vince Weaver
@ 2014-04-29  9:46                                 ` Peter Zijlstra
  2014-04-29 18:21                                   ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-29  9:46 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Mon, Apr 28, 2014 at 03:38:38PM -0400, Vince Weaver wrote:
> 
> OK, this is my current theory as to what's going on.  I'd appreciate any 
> comments.
> 
> 
> We have an event, let's call it #16.
> 
> Event #16 is a SW event created and running in the parent on CPU0.

A regular software one, right? Not a timer one.

> CPU0 (parent): calls fork()
> 
> CPU6 (child): SW Event #16 is still running on CPU0 but is visible
> 	on CPU6 because the fd passed through with fork
> 
> CPU0 (parent) close #16.  Event not deallocated because
>         still visible in child
> 
> CPU0 (parent) kill child

OK so far..

> CPU6 (child) shutting down.
>    last user of event #16
>    perf_release() called on event
>    which eventually calls event_sched_out()
>    which calls pmu->del which removes event from swevent_htable
>    *but only on CPU6*

So on fork() we'll also clone the counter; after which there's two. One
will run on each task.

Because of a context switch optimization they can actually flip around
(the below patch disables that).

So there's (possibly) two events being killed here:

 1) the event that is attached to the child process, it will be detached
 and freed. do_exit() -> perf_event_exit_task() ->
 perf_event_exit_task_context() -> __perf_event_exit_task()

 If this is the cloned event, it will put the original event and be
 freed here.

 If the child ran the cloned event; then:
 2) the closing of the fd, coupled with the put of 1) will drop the
 refcount of the original event to 0 and it too will be removed and
 freed.

IF however the original and cloned events were flipped at the time; the
child exit will detach the original event, but since the parent will
still have a cloned event attached, the clone will keep the event alive.

In this case no events will be freed until the parent too exits; at
which point the cloned event will get detached and freed. That will put
the last reference on the actual event, and that too will go.


Now, seeing that you actually see an event getting freed, we'll have to
assume the former situation, where the original event is on the parent
process and the cloned event is on the child process.

> **** some sort of race happens with CPU0 (possibly with 
> 	event_sched_in() and event->state==PERF_EVENT_STATE_INACTIVE)
> 	That has event #16 in the cpu0 swevent_htable but not
> 	freed the next time ctx_sched_out() happens ****

So on do_exit(), exit_files() happens first, it will drop the refcount
of the original event to 1.

After that, perf_event_exit_task() happens, it will (as per the
callchain above) end up in __perf_event_exit_task().

__perf_event_exit_task() will call perf_group_detach, however no groups
involved here afaik, so that's quickly done.

It will then call perf_remove_from_context() which will try to
deschedule (which is likely already done by
perf_event_exit_task_context() which de-schedules the entire context in
one go), and then remove the event from the context.

Since it is the cloned event; it will then call sync_child_event(),
whicih will push whatever counts it has gathered into the original
(parent) event, and detach itself from the parent.

This will have done put_event(parent_event), which will drop the
refcount of the original event to 0. put_event() will in turn call the
same things: perf_group_detach() -- no groups, done quickly.
perf_remove_from_context(), this will IPI from CPU6 to CPU0, and
deschedule the original event, calling ->del() on CPU0, and as per the
above continue doing list_del_event() detaching itself from the context.

After the IPI put_event() will continue with _free_event() and we'll
call ->destroy() and call_rcu and the event will be no more.

After all that, the child continues calling free_event() which will also
call ->desotry() (but on the child event) and do the same call_rcu()
also freeing this event.

Nothing left.

> CPU6 (idle) grace period expires, kfree happens
> 
> the CPU0 hlist still has in the list the now freed (and poisoned)
>   event which causes problems, especially as new events added to
>   the list over-write bytes starting at 0x48 with pprev values.

Right, so clearly something's gone funny.. above you speculate on a race
against event_sched_in(), but both paths serialize on event->ctx->lock.

__perf_remove_from_context() takes ctx->lock, as do the sched_in/out
paths.

quite the puzzle this one

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29  8:52                               ` Peter Zijlstra
@ 2014-04-29 18:11                                 ` Vince Weaver
  2014-04-29 19:21                                   ` Steven Rostedt
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-29 18:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Tue, 29 Apr 2014, Peter Zijlstra wrote:

> On Mon, Apr 28, 2014 at 10:21:34AM -0400, Vince Weaver wrote:
> > so it's looking more and more like this issue is with a
> > 	PERF_COUNT_SW_TASK_CLOCK
> > event.
> 
> But they don't actually use the hlist thing..

yes.

This turns out into another issue that I think is just use-after-free 
memory corruption exhibiting itself a different way.

I've documented at least 8 different types of error message that I think 
are all due to this issue.

> So this is a different problem from the hlist corruption?

Who knows.  That's why I'm trying to get this issue fixed so I can figure 
out which of the 10+ other bugs I'm tracking are the same or different.

> > This is made all the more confusing because the PERF_COUNT_SW_TASK_CLOCK 
> > events are handled by their own PMU even though it's faked up so they look 
> > like regular software events.  Is there a reason for that?
> 
> This was the easiest route when we introduced the mulitple pmu thing or
> so, its been on the todo list for a cleanup ever since :-/

It was very confusing and poorly documented, as is much of the perf_event 
files.  And yes, I know, I should do something about it rather than 
complain.

I've actually given up on source code inspection to figure out what's 
going on in kernel/events/core.c.  What I do now is write simple test
cases and do an ftrace function trace.  The results are often surprising.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29  9:46                                 ` Peter Zijlstra
@ 2014-04-29 18:21                                   ` Vince Weaver
  2014-04-29 19:01                                     ` Peter Zijlstra
  2014-04-29 19:26                                     ` Steven Rostedt
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-04-29 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Tue, 29 Apr 2014, Peter Zijlstra wrote:

> > Event #16 is a SW event created and running in the parent on CPU0.
> 
> A regular software one, right? Not a timer one.

Maybe.  From traces I have it looks like it's a regular one (i.e. calls 
 perf_swevent_add() ) but who knows at this point.

When I actually got a trace with perf_event_open() instrumented to print 
some attr values it looked like things were being caused by
PERF_COUNT_SW_TASK_CLOCK which makes no sense.

> > CPU6 (child) shutting down.
> >    last user of event #16
> >    perf_release() called on event
> >    which eventually calls event_sched_out()
> >    which calls pmu->del which removes event from swevent_htable
> >    *but only on CPU6*
> 
> So on fork() we'll also clone the counter; after which there's two. One
> will run on each task.

even if inherit isn't set?

> Because of a context switch optimization they can actually flip around
> (the below patch disables that).

ENOPATCH?

> quite the puzzle this one

yes.

I'm tediously working on trying to get a good trace of this happening.

I have a random seed that will trigger the bug in the fuzzer around 1 time 
in 10.

Unfortunately many of the times it crashes so hard/quickly there's no 
chance of getting the trace data (dump trace on oops never holds enough 
state, and often the fuzzing triggers its own random trace events that 
clutter those logs).

Also trace-cmd is a pain to use.  Any suggested events I should trace 
beyond the obvious?

Part of the problem is that despite what the documentation says it doesn't 
look like you can combine the "-P pid" and "-c" children option, which 
makes debugging a forking problem like this a lot harder to trace.

It's sort of possible to get around that with a really complicated -F ""
command line that does sudo back to me (don't want to fuzz as root) and 
such, but still awkward.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29 18:21                                   ` Vince Weaver
@ 2014-04-29 19:01                                     ` Peter Zijlstra
  2014-04-29 20:59                                       ` Vince Weaver
  2014-04-29 19:26                                     ` Steven Rostedt
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-29 19:01 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Tue, Apr 29, 2014 at 02:21:56PM -0400, Vince Weaver wrote:
> On Tue, 29 Apr 2014, Peter Zijlstra wrote:
> 
> > > Event #16 is a SW event created and running in the parent on CPU0.
> > 
> > A regular software one, right? Not a timer one.
> 
> Maybe.  From traces I have it looks like it's a regular one (i.e. calls 
>  perf_swevent_add() ) but who knows at this point.
> 
> When I actually got a trace with perf_event_open() instrumented to print 
> some attr values it looked like things were being caused by
> PERF_COUNT_SW_TASK_CLOCK which makes no sense.
> 
> > > CPU6 (child) shutting down.
> > >    last user of event #16
> > >    perf_release() called on event
> > >    which eventually calls event_sched_out()
> > >    which calls pmu->del which removes event from swevent_htable
> > >    *but only on CPU6*
> > 
> > So on fork() we'll also clone the counter; after which there's two. One
> > will run on each task.
> 
> even if inherit isn't set?

Fair point, nope not in that case. If you can trigger this without ever
using .inherit=1 this would exclude a lot of funny code.

> > Because of a context switch optimization they can actually flip around
> > (the below patch disables that).
> 
> ENOPATCH?

urgh.. fail.


diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5129b1201050..0d6a58950a3b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2293,6 +2291,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	if (!cpuctx->task_ctx)
 		return;
 
+#if 0
 	rcu_read_lock();
 	next_ctx = next->perf_event_ctxp[ctxn];
 	if (!next_ctx)
@@ -2335,6 +2334,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	}
 unlock:
 	rcu_read_unlock();
+#endif
 
 	if (do_switch) {
 		raw_spin_lock(&ctx->lock);

> > quite the puzzle this one
> 
> yes.
> 
> I'm tediously working on trying to get a good trace of this happening.
> 
> I have a random seed that will trigger the bug in the fuzzer around 1 time 
> in 10.
> 
> Unfortunately many of the times it crashes so hard/quickly there's no 
> chance of getting the trace data (dump trace on oops never holds enough 
> state, and often the fuzzing triggers its own random trace events that 
> clutter those logs).
> 
> Also trace-cmd is a pain to use.  Any suggested events I should trace 
> beyond the obvious?

I've never used trace-cmd :/ What I do in the crashing hard case is try
and make dump_ftrace_on_oops work, although capturing a full trace
buffer over serial is exceedingly painful -- maxcpus= might work if you
have too many CPUs, I forgot.

Anyway, I can make the fuzzer to weird shit, but it doesn't look like
the thing you're seeing, but who knows.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29 18:11                                 ` Vince Weaver
@ 2014-04-29 19:21                                   ` Steven Rostedt
  0 siblings, 0 replies; 81+ messages in thread
From: Steven Rostedt @ 2014-04-29 19:21 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner

On Tue, 29 Apr 2014 14:11:09 -0400 (EDT)
Vince Weaver <vincent.weaver@maine.edu> wrote:
 
> I've actually given up on source code inspection to figure out what's 
> going on in kernel/events/core.c.  What I do now is write simple test
> cases and do an ftrace function trace.  The results are often surprising.

You might want to remove this from the kernel/events/Makefile:


ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_core.o = -pg
endif

It will let you function trace more of what is happening.

-- Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29 18:21                                   ` Vince Weaver
  2014-04-29 19:01                                     ` Peter Zijlstra
@ 2014-04-29 19:26                                     ` Steven Rostedt
  1 sibling, 0 replies; 81+ messages in thread
From: Steven Rostedt @ 2014-04-29 19:26 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner

On Tue, 29 Apr 2014 14:21:56 -0400 (EDT)
Vince Weaver <vincent.weaver@maine.edu> wrote:
 
> Also trace-cmd is a pain to use.  Any suggested events I should trace 
> beyond the obvious?
> 
> Part of the problem is that despite what the documentation says it doesn't 
> look like you can combine the "-P pid" and "-c" children option, which 
> makes debugging a forking problem like this a lot harder to trace.

Yeah, I need kernel assistance to fix some of that.

> 
> It's sort of possible to get around that with a really complicated -F ""
> command line that does sudo back to me (don't want to fuzz as root) and 
> such, but still awkward.

I'll try to write up a patch that lets you use -P with -c. But due to
the (crappy) implementation with ptrace, trace-cmd needs to be a parent
of task.

In the mean time, you could run this as root:

 trace-cmd record -p function -F -c su non-root-user fuzz


-- Steve

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29 19:01                                     ` Peter Zijlstra
@ 2014-04-29 20:59                                       ` Vince Weaver
  2014-04-30 18:44                                         ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-29 20:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Tue, 29 Apr 2014, Peter Zijlstra wrote:

> Fair point, nope not in that case. If you can trigger this without ever
> using .inherit=1 this would exclude a lot of funny code.

I don't think inherit is being set, but I'm not actually sure.
I will have to add that to the trace_printk() and recompile/re-run



In the meantime I had a lucky crash and managed to catch a trace.

Unfortunately there's a lot of active events so it's not clear which is 
which.  I think this is going to need another round of trace generation :(

This trace can be found here:
	http://web.eece.maine.edu/~vweaver/junk/bug.out.bz2

A summary:

The troublesome memory address is allocated as part of a perf_event_open
	perf_fuzzer-4387  [001]  1802.628663: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff8800a3122800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO

The event opened successfully, fd=41, it looks like it is 
PERF_COUNT_SW_EMULATION_FAULTS with attr.period=0
	perf_fuzzer-4387  [001]  1802.628677: bprint:               SYSC_perf_event_open: Opened: 1 8 0
	perf_fuzzer-4387  [001]  1802.628677: sys_exit:             NR 298 = 41

The parent forks:
	perf_fuzzer-4387  [002]  1803.571239: sys_exit:             NR 56 = 5504

The event is closed in the parent:
	perf_fuzzer-4387  [002]  1803.582345: sys_enter:            NR 3 (29, 3000, 3000, 7f7524d760a4, 7f7524d76108, 7f7524d76120)
	perf_fuzzer-4387  [002]  1803.582345: sys_exit:             NR 3 = 0

The parent kills the child:
	perf_fuzzer-4387  [003]  1803.590145: sys_enter:            NR 62 (1580, 9, 7, 7f7524d760b8, 7f7524d760b8, 7f7524d76120)

Presumably one of the many perf_swevent_del() calls in the child is us.
	perf_fuzzer-5504  [004]  1803.590277: function:             perf_swevent_del

*** The parent somehow fails to call perf_swevent_del() on CPU3? ***

The grace period expires and the memory is freed:
	ksoftirqd/4-28    [004]  1803.609802: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff8800a3122800

An event is deleted from swevent_hlist, but ->pprev was our free'd address:
	perf_fuzzer-4387  [003]  1803.610555: function:             perf_swevent_del

Slab corruption:
	[ 1803.610555] ------------[ cut here ]------------
	[ 1803.615419] WARNING: CPU: 3 PID: 4387 at include/linux/list.h:620 perf_swevent_del+0x6e/0x90()
	[ 1803.948487] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff8800a3122800, len=2048
	[ 1803.958294] 040: 6b 6b 6b 6b 6b 6b 6b 6b 88 f7 92 17 01 88 ff ff  kkkkkkkk........

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-29 20:59                                       ` Vince Weaver
@ 2014-04-30 18:44                                         ` Peter Zijlstra
  2014-04-30 21:08                                           ` Vince Weaver
  2014-05-01 14:07                                           ` Vince Weaver
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-04-30 18:44 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt


Vince, could you add the below to whatever tracing muck you already
have?

After staring at your traces all day with Thomas, we have doubts about
the refcount integrity.



---
 kernel/events/core.c | 146 +++++++++++++++++++++++++++++----------------------
 1 file changed, 82 insertions(+), 64 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5129b1201050..dac01e099f13 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1664,6 +1664,9 @@ event_sched_in(struct perf_event *event,
 	u64 tstamp = perf_event_time(event);
 	int ret = 0;
 
+	WARN_ON(event->ctx != ctx);
+	lockdep_assert_held(&ctx->lock);
+
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
 
@@ -2029,14 +2032,9 @@ static int __perf_event_enable(void *info)
 	if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
 		goto unlock;
 
-	if (!group_can_go_on(event, cpuctx, 1)) {
-		err = -EEXIST;
-	} else {
-		if (event == leader)
-			err = group_sched_in(event, cpuctx, ctx);
-		else
-			err = event_sched_in(event, cpuctx, ctx);
-	}
+	err = -EEXIST;
+	if (group_can_go_on(event, cpuctx, 1))
+		err = group_sched_in(event, cpuctx, ctx);
 
 	if (err) {
 		/*
@@ -2293,6 +2291,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	if (!cpuctx->task_ctx)
 		return;
 
+#if 0
 	rcu_read_lock();
 	next_ctx = next->perf_event_ctxp[ctxn];
 	if (!next_ctx)
@@ -2335,6 +2334,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	}
 unlock:
 	rcu_read_unlock();
+#endif
 
 	if (do_switch) {
 		raw_spin_lock(&ctx->lock);
@@ -3233,10 +3233,18 @@ static void __free_event(struct perf_event *event)
 	if (event->pmu)
 		module_put(event->pmu->module);
 
+	WARN_ON(event->hlist_entry.pprev && event->hlist_entry.pprev != LIST_POISON2);
+
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
-static void free_event(struct perf_event *event)
+
+static void _free_event(struct perf_event *event)
 {
+	long refs = atomic_long_read(&event->refcount);
+
+	WARN(refs, "freeing event with %ld refs left; ptr=0x%p\n", refs, event);
+	trace_printk("freeing with %ld refs; ptr=0x%p\n", refs, event);
+
 	irq_work_sync(&event->pending);
 
 	unaccount_event(event);
@@ -3263,48 +3271,32 @@ static void free_event(struct perf_event *event)
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
-
 	__free_event(event);
 }
 
-int perf_event_release_kernel(struct perf_event *event)
+static void free_event(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
-
-	WARN_ON_ONCE(ctx->parent_ctx);
-	/*
-	 * There are two ways this annotation is useful:
-	 *
-	 *  1) there is a lock recursion from perf_event_exit_task
-	 *     see the comment there.
-	 *
-	 *  2) there is a lock-inversion with mmap_sem through
-	 *     perf_event_read_group(), which takes faults while
-	 *     holding ctx->mutex, however this is called after
-	 *     the last filedesc died, so there is no possibility
-	 *     to trigger the AB-BA case.
-	 */
-	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
-	raw_spin_lock_irq(&ctx->lock);
-	perf_group_detach(event);
-	raw_spin_unlock_irq(&ctx->lock);
-	perf_remove_from_context(event);
-	mutex_unlock(&ctx->mutex);
-
-	free_event(event);
-
-	return 0;
+	if (unlikely(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)) {
+		WARN(1, "unexpected event refcount: %ld; ptr=0x%p\n",
+				atomic_long_read(&event->refcount),
+				event);
+		return;
+	}
+	_free_event(event);
 }
-EXPORT_SYMBOL_GPL(perf_event_release_kernel);
 
 /*
  * Called when the last reference to the file is gone.
  */
-static void put_event(struct perf_event *event)
+static void put_event(struct perf_event *event, struct perf_event *other)
 {
+	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *owner;
 
-	if (!atomic_long_dec_and_test(&event->refcount))
+	long refs = atomic_long_sub_return(1, &event->refcount);
+	trace_printk("put ref: %ld; ptr=0x%p other=0x%p\n", refs, event, other);
+
+	if (refs)
 		return;
 
 	rcu_read_lock();
@@ -3316,14 +3308,8 @@ static void put_event(struct perf_event *event)
 	 * owner->perf_event_mutex.
 	 */
 	smp_read_barrier_depends();
-	if (owner) {
-		/*
-		 * Since delayed_put_task_struct() also drops the last
-		 * task reference we can safely take a new reference
-		 * while holding the rcu_read_lock().
-		 */
-		get_task_struct(owner);
-	}
+	if (owner && !atomic_inc_not_zero(&owner->usage))
+		owner = NULL;
 	rcu_read_unlock();
 
 	if (owner) {
@@ -3340,12 +3326,39 @@ static void put_event(struct perf_event *event)
 		put_task_struct(owner);
 	}
 
-	perf_event_release_kernel(event);
+	WARN_ON_ONCE(ctx->parent_ctx);
+	/*
+	 * There are two ways this annotation is useful:
+	 *
+	 *  1) there is a lock recursion from perf_event_exit_task
+	 *     see the comment there.
+	 *
+	 *  2) there is a lock-inversion with mmap_sem through
+	 *     perf_event_read_group(), which takes faults while
+	 *     holding ctx->mutex, however this is called after
+	 *     the last filedesc died, so there is no possibility
+	 *     to trigger the AB-BA case.
+	 */
+	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
+	raw_spin_lock_irq(&ctx->lock);
+	perf_group_detach(event);
+	raw_spin_unlock_irq(&ctx->lock);
+	perf_remove_from_context(event);
+	mutex_unlock(&ctx->mutex);
+
+	_free_event(event);
+}
+
+int perf_event_release_kernel(struct perf_event *event)
+{
+	put_event(event, NULL);
+	return 0;
 }
+EXPORT_SYMBOL_GPL(perf_event_release_kernel);
 
 static int perf_release(struct inode *inode, struct file *file)
 {
-	put_event(file->private_data);
+	put_event(file->private_data, NULL);
 	return 0;
 }
 
@@ -3969,6 +3982,11 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 			 */
 			continue;
 		}
+
+		trace_printk("inc ref: %ld; ptr=0x%p\n",
+				atomic_long_read(&event->refcount),
+				event);
+
 		rcu_read_unlock();
 
 		mutex_lock(&event->mmap_mutex);
@@ -3988,7 +4006,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 			ring_buffer_put(rb); /* can't be last, we still have one */
 		}
 		mutex_unlock(&event->mmap_mutex);
-		put_event(event);
+		put_event(event, NULL);
 
 		/*
 		 * Restart the iteration; either we're on the wrong list or
@@ -7374,7 +7392,7 @@ static void sync_child_event(struct perf_event *child_event,
 	 * Release the parent event, if this was the last
 	 * reference to it.
 	 */
-	put_event(parent_event);
+	put_event(parent_event, child_event);
 }
 
 static void
@@ -7382,11 +7400,9 @@ __perf_event_exit_task(struct perf_event *child_event,
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	if (child_event->parent) {
-		raw_spin_lock_irq(&child_ctx->lock);
-		perf_group_detach(child_event);
-		raw_spin_unlock_irq(&child_ctx->lock);
-	}
+	raw_spin_lock_irq(&child_ctx->lock);
+	perf_group_detach(child_event);
+	raw_spin_unlock_irq(&child_ctx->lock);
 
 	perf_remove_from_context(child_event);
 
@@ -7458,12 +7474,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	mutex_lock(&child_ctx->mutex);
 
 again:
-	list_for_each_entry_safe(child_event, tmp, &child_ctx->pinned_groups,
-				 group_entry)
-		__perf_event_exit_task(child_event, child_ctx, child);
-
-	list_for_each_entry_safe(child_event, tmp, &child_ctx->flexible_groups,
-				 group_entry)
+	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
 		__perf_event_exit_task(child_event, child_ctx, child);
 
 	/*
@@ -7472,8 +7483,10 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	 * will still point to the list head terminating the iteration.
 	 */
 	if (!list_empty(&child_ctx->pinned_groups) ||
-	    !list_empty(&child_ctx->flexible_groups))
+	    !list_empty(&child_ctx->flexible_groups)) {
+		WARN_ON_ONCE(1);
 		goto again;
+	}
 
 	mutex_unlock(&child_ctx->mutex);
 
@@ -7519,7 +7532,7 @@ static void perf_free_event(struct perf_event *event,
 	list_del_init(&event->child_list);
 	mutex_unlock(&parent->child_mutex);
 
-	put_event(parent);
+	put_event(parent, event);
 
 	perf_group_detach(event);
 	list_del_event(event, ctx);
@@ -7605,6 +7618,11 @@ inherit_event(struct perf_event *parent_event,
 		return NULL;
 	}
 
+	trace_printk("inherit inc ref: %ld; ptr=0x%p other=0x%p\n",
+			atomic_long_read(&parent_event->refcount),
+			parent_event,
+			child_event);
+
 	get_ctx(child_ctx);
 
 	/*

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-30 18:44                                         ` Peter Zijlstra
@ 2014-04-30 21:08                                           ` Vince Weaver
  2014-04-30 22:51                                             ` Thomas Gleixner
  2014-05-01 14:07                                           ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-04-30 21:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Wed, 30 Apr 2014, Peter Zijlstra wrote:

> 
> Vince, could you add the below to whatever tracing muck you already
> have?
> 
> After staring at your traces all day with Thomas, we have doubts about
> the refcount integrity.

I've been staring at traces all day too.  Will try your patch tomorrow.

>From my staring, what looks like is happening in the trace is:

task_sched_in in parent adds our freed (but alive in child) event:
	perf_fuzzer-2517  [001]   215.228165: bprint:               perf_swevent_add: VMW add_rcu: 0xffff880036cbb000
This adds the event to the swevent_hlist


the child is in the process of exiting, eventually frees the event
     perf_fuzzer-3634  [006]   215.228250: function:             perf_release
     perf_fuzzer-3634  [006]   215.228250: function:             perf_event_release_kernel
     perf_fuzzer-3634  [006]   215.228251: function:                perf_group_detach
     perf_fuzzer-3634  [006]   215.228251: function:                   perf_event__header_size
     perf_fuzzer-3634  [006]   215.228251: function:                perf_remove_from_context

	Which then does
        list_del_event()
        event->state=PERF_EVENT_STATE_OFF;

Soon after the parent does task_sched_out
	which gets to event_sched_out()
	which hits 
		if (event->state != PERF_EVENT_STATE_ACTIVE)
			return;
	So it never hits the
		event->pmu->del(event, 0);

	We need to get the value off the hlist.

This analysis is probably wrong though because if it's as simple as that 
above then I'm not sure why it isn't easier to hit the bug.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-30 21:08                                           ` Vince Weaver
@ 2014-04-30 22:51                                             ` Thomas Gleixner
  2014-05-01 10:26                                               ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2014-04-30 22:51 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Wed, 30 Apr 2014, Vince Weaver wrote:

> On Wed, 30 Apr 2014, Peter Zijlstra wrote:
> 
> > 
> > Vince, could you add the below to whatever tracing muck you already
> > have?
> > 
> > After staring at your traces all day with Thomas, we have doubts about
> > the refcount integrity.
> 
> I've been staring at traces all day too.  Will try your patch tomorrow.
> 
> >From my staring, what looks like is happening in the trace is:
> 
> task_sched_in in parent adds our freed (but alive in child) event:

And that's the issue which puzzles us. Let's look at what we expect:

event create:	
       parent_evt->refcount = 1
       parent_fd = fd(file)
       file->refcnt = 1

mmap:
       file->refcnt = 2

clone:
       child_fd = dup(parent_fd)
         file->refcnt = 3

       child_evt = dup(parent_evt)
       	 parent_evt->refcount = 2

unmap:
       file->refcnt = 2

close(parent_fd)
       file->refcnt = 1

kill child

child exits:

   exit_files()
     close(child_fd)
       file->refcnt = 0
          perf_release()
       	    parent_evt->refcount = 1

   perf_event_exit_task()
     kill child_evt
       parent_evt->refcount = 0
       	---> destroy parent

Now the trace shows a different story:

     perf_fuzzer-4387  [001]  1802.628659: sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)
     perf_fuzzer-4387  [001]  1802.628659: function:             SyS_perf_event_open
     perf_fuzzer-4387  [001]  1802.628660: function:             perf_event_alloc
     perf_fuzzer-4387  [001]  1802.628663: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff8800a3122800 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628663: function:                perf_init_event
     perf_fuzzer-4387  [001]  1802.628664: function:                   perf_tp_event_init
     perf_fuzzer-4387  [001]  1802.628664: function:                   perf_swevent_init
     perf_fuzzer-4387  [001]  1802.628667: function:             perf_lock_task_context
     perf_fuzzer-4387  [001]  1802.628669: kmem_cache_alloc:     (__d_alloc+0x25) call_site=ffffffff811cfa75 ptr=0xffff8800b37ddeb8 bytes_req=288 bytes_alloc=312 gfp_flags=GFP_KERNEL
     perf_fuzzer-4387  [001]  1802.628671: kmem_cache_alloc:     (get_empty_filp+0x5c) call_site=ffffffff811b7b7c ptr=0xffff880117502ac0 bytes_req=448 bytes_alloc=448 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628672: function:             perf_install_in_context
     perf_fuzzer-4387  [001]  1802.628672: function:             __perf_install_in_context
     perf_fuzzer-4387  [001]  1802.628672: function:                perf_pmu_disable
     perf_fuzzer-4387  [001]  1802.628673: function:                   perf_pmu_nop_void
     perf_fuzzer-4387  [001]  1802.628673: function:             perf_event__header_size
     perf_fuzzer-4387  [001]  1802.628673: function:             perf_event__header_size
     perf_fuzzer-4387  [001]  1802.628673: function:             perf_event_sched_in
     perf_fuzzer-4387  [001]  1802.628674: function:             perf_pmu_enable
     perf_fuzzer-4387  [001]  1802.628674: function:                perf_pmu_nop_void
     perf_fuzzer-4387  [001]  1802.628675: function:             perf_unpin_context
     perf_fuzzer-4387  [001]  1802.628676: function:             perf_event__header_size
     perf_fuzzer-4387  [001]  1802.628676: function:             perf_event__id_header_size.isra.19
     perf_fuzzer-4387  [001]  1802.628677: bprint:               SYSC_perf_event_open: Opened: 1 8 0
     perf_fuzzer-4387  [001]  1802.628677: sys_exit:             NR 298 = 41

parent_evt->refcount = 1
file->refcount = 1

Now mmap:

     perf_fuzzer-4387  [001]  1802.628678: sys_enter:            NR 9 (0, 3000, 3, 1, 29, 0)
     perf_fuzzer-4387  [001]  1802.628680: kmem_cache_alloc:     (mmap_region+0x348) call_site=ffffffff811771c8 ptr=0xffff8800b54255d0 bytes_req=184 bytes_alloc=208 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628680: function:             perf_mmap
     perf_fuzzer-4387  [001]  1802.628682: kmalloc:              (rb_alloc+0x38) call_site=ffffffff8113c858 ptr=0xffff88003766bb00 bytes_req=208 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628682: function:             perf_mmap_alloc_page
     perf_fuzzer-4387  [001]  1802.628683: mm_page_alloc:        page=0xffffea00027abf60 pfn=41598816 order=0 migratetype=0 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628683: function:             perf_mmap_alloc_page
     perf_fuzzer-4387  [001]  1802.628684: mm_page_alloc:        page=0xffffea00027abf28 pfn=41598760 order=0 migratetype=0 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628684: function:             perf_mmap_alloc_page
     perf_fuzzer-4387  [001]  1802.628685: mm_page_alloc:        page=0xffffea00027abef0 pfn=41598704 order=0 migratetype=0 gfp_flags=GFP_KERNEL|GFP_ZERO
     perf_fuzzer-4387  [001]  1802.628686: function:             perf_event_update_userpage
     perf_fuzzer-4387  [001]  1802.628692: function:             perf_event_mmap
     perf_fuzzer-4387  [001]  1802.628698: kmalloc:              (perf_event_mmap+0x90) call_site=ffffffff81139d40 ptr=0xffff880083d0d000 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL
     perf_fuzzer-4387  [001]  1802.628699: function:                perf_event_aux
     perf_fuzzer-4387  [001]  1802.628699: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628699: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628699: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628699: function:                      perf_event_mmap_output
     perf_fuzzer-4387  [001]  1802.628700: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628700: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628700: function:                   perf_event_aux_ctx
     perf_fuzzer-4387  [001]  1802.628701: kfree:                (perf_event_mmap+0x175) call_site=ffffffff81139e25 ptr=0xffff880083d0d000
     perf_fuzzer-4387  [001]  1802.628702: sys_exit:             NR 9 = 140141108113408 (0x7f7524f7b000)

file->refcount = 2

And here the clone:

     perf_fuzzer-4387  [002]  1803.570829: sys_enter:            NR 56 (1200011, 0, 0, 7f7524f899d0, 0, 1123)

     perf_fuzzer-4387  [002]  1803.570855: function:             perf_event_init_task

Somewhere in this maze we copy the parent event so in theory:

parent_evt->refcount = 2

Now we copy the file descriptors:

     perf_fuzzer-4387  [002]  1803.571065: kmem_cache_alloc:     (dup_fd+0x33) call_site=ffffffff811d4a23 ptr=0xffff8800ce29e100 bytes_req=704 bytes_alloc=704 gfp_flags=GFP_KERNEL

file->refcount = 3

     perf_fuzzer-4387  [002]  1803.571239: sys_exit:             NR 56 = 5504

Fork is done and we unmap:

     perf_fuzzer-4387  [002]  1803.582330: sys_enter:            NR 11 (7f7524f7b000, 3000, 3000, 7f7524d760a4, 7f7524d76108, 7f7524d76120)
     perf_fuzzer-4387  [002]  1803.582335: function:             perf_mmap_close
     perf_fuzzer-4387  [002]  1803.582339: kmem_cache_free:      (remove_vma+0x63) call_site=ffffffff811742f3 ptr=0xffff8800b54255d0
     perf_fuzzer-4387  [002]  1803.582344: sys_exit:             NR 11 = 0

file->refcount = 2

Now we close the file descriptor

     perf_fuzzer-4387  [002]  1803.582345: sys_enter:            NR 3 (29, 3000, 3000, 7f7524d760a4, 7f7524d76108, 7f7524d76120)
     perf_fuzzer-4387  [002]  1803.582345: sys_exit:             NR 3 = 0

file->refcount = 1

Now the child gets killed

     perf_fuzzer-4387  [003]  1803.590145: sys_enter:            NR 62 (1580, 9, 7, 7f7524d760b8, 7f7524d760b8, 7f7524d76120)
     perf_fuzzer-4387  [003]  1803.590148: kmem_cache_alloc:     (__sigqueue_alloc+0x9c) call_site=ffffffff81073c0c ptr=0xffff8800897825e0 bytes_req=160 bytes_alloc=184 gfp_flags=GFP_ATOMIC|GFP_NOTRACK
     perf_fuzzer-4387  [003]  1803.590149: sys_exit:             NR 62 = 0

And the child exits and calls exit_files() and that decrements
file->refcount to 0, so perf_release is called:

     perf_fuzzer-5504  [004]  1803.590311: function:             perf_release

After that the dentry which is associated to the parent event is
released:

     perf_fuzzer-5504  [004]  1803.590353: kmem_cache_free:      (__d_free+0x42) call_site=ffffffff811cc1f2 ptr=0xffff8800b37ddeb8

That's fine, but the issue is, that perf_release() also calls:

     perf_fuzzer-5504  [004]  1803.590312: function:             perf_event_release_kernel

And that means that the parent_evt->refcount was not 2 as one would
expect. It was 1, otherwise put_event() which is called from
perf_release() would have returned right away due to:

	if (!atomic_long_dec_and_test(&event->refcount))
		return;

So after exit_files() is done perf_event_exit_task() is invoked

     perf_fuzzer-5504  [004]  1803.590365: function:             perf_event_exit_task

That cleans up the child events. So the child still has a reference to
the parent event which was destroyed before. It's still not kfree'd as
that happens via RCU way later

     ksoftirqd/4-28    [004]  1803.609802: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff8800a3122800

But of course the child cleanup will poke in the by now invalid parent
and leave some references to that at some other place, which will
later on be detected as slab corruption or some other kind of
wreckage.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-30 22:51                                             ` Thomas Gleixner
@ 2014-05-01 10:26                                               ` Peter Zijlstra
  2014-05-01 11:50                                                 ` Peter Zijlstra
  2014-05-01 13:22                                                 ` Vince Weaver
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-01 10:26 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, May 01, 2014 at 12:51:33AM +0200, Thomas Gleixner wrote:
> And that's the issue which puzzles us. Let's look at what we expect:
> 
> Now the trace shows a different story:
> 
>      perf_fuzzer-4387  [001]  1802.628659: sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)

That's a per-cpu event (.pid = -1, .cpu = 12), they don't get inherited,
so the only thing keeping it alive is the fd the child got. So
exit_files() killing this thing makes perfect sense.

Onwards to find another funny.



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 10:26                                               ` Peter Zijlstra
@ 2014-05-01 11:50                                                 ` Peter Zijlstra
  2014-05-01 12:35                                                   ` Thomas Gleixner
  2014-05-01 13:22                                                 ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-01 11:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, May 01, 2014 at 12:26:02PM +0200, Peter Zijlstra wrote:
> On Thu, May 01, 2014 at 12:51:33AM +0200, Thomas Gleixner wrote:
> > And that's the issue which puzzles us. Let's look at what we expect:
> > 
> > Now the trace shows a different story:
> > 
> >      perf_fuzzer-4387  [001]  1802.628659: sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)
> 
> That's a per-cpu event (.pid = -1, .cpu = 12), they don't get inherited,
> so the only thing keeping it alive is the fd the child got. So
> exit_files() killing this thing makes perfect sense.
> 
> Onwards to find another funny.

awk '/alloc/ { 
	for(i=1; i<=NF; i++) { 
		if ($i ~ /^ptr=/) { 
			ptr=gensub("^ptr=","","g",$i); 
			if (ptr ~ /nil/) break; 
			seen[ptr]=1; 
			m = ++memory[ptr]; 
			if (m != 1) { 
				printf "alloc: %d ptr=%s\n", m, ptr; 
				memory[ptr] = 0; 
			} 
			break; 
		}
	}
}
/free/ { 
	for(i=1; i<=NF; i++) { 
		if ($i ~ /^ptr=/) { 
			ptr=gensub("^ptr=","","g",$i); 
			if (ptr ~ /nil/) 
				break; 
			m = --memory[ptr]; 
			if (m != 0) { 
				memory[ptr] = 0; 
				s = seen[ptr]; 
				seen[ptr] = 1; 
				if (!s) 
					break; 
				printf "free: %d ptr=%s\n", m, ptr; 
			} 
			break; 
		}
	}
}' bug.out | less

Gives fun things like:

alloc: 2 ptr=0xffff880118fda000
free: -1 ptr=0xffff880118fda000
alloc: 2 ptr=0xffff880118fda000


And if we then look at

grep ptr=0xffff880118fda000 bug.out | less

We find lovely bits such as:

     perf_fuzzer-4387  [001]  1773.427175: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
     ksoftirqd/6-38    [006]  1773.457770: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
          <idle>-0     [007]  1774.020378: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
     perf_fuzzer-4387  [000]  1774.096354: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO


That's almost half a second between the double free, Vince, is your TSC
solid?

# grep sched_clock_stable /proc/sched_debug 
sched_clock_stable()                    : 1

Should tell, if that's a 0 reading the trace becomes a whole lot more
'interesting'.



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 11:50                                                 ` Peter Zijlstra
@ 2014-05-01 12:35                                                   ` Thomas Gleixner
  2014-05-01 13:12                                                     ` Peter Zijlstra
  2014-05-01 13:29                                                     ` Thomas Gleixner
  0 siblings, 2 replies; 81+ messages in thread
From: Thomas Gleixner @ 2014-05-01 12:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 1 May 2014, Peter Zijlstra wrote:
> On Thu, May 01, 2014 at 12:26:02PM +0200, Peter Zijlstra wrote:
> > On Thu, May 01, 2014 at 12:51:33AM +0200, Thomas Gleixner wrote:
> > > And that's the issue which puzzles us. Let's look at what we expect:
> > > 
> > > Now the trace shows a different story:
> > > 
> > >      perf_fuzzer-4387  [001]  1802.628659: sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)
> > 
> > That's a per-cpu event (.pid = -1, .cpu = 12), they don't get inherited,
> > so the only thing keeping it alive is the fd the child got. So
> > exit_files() killing this thing makes perfect sense.

Duh, right. Should have noticed :(

> 
> grep ptr=0xffff880118fda000 bug.out | less
> 
> We find lovely bits such as:
> 
>      perf_fuzzer-4387  [001]  1773.427175: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
>      ksoftirqd/6-38    [006]  1773.457770: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
>           <idle>-0     [007]  1774.020378: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
>      perf_fuzzer-4387  [000]  1774.096354: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> 
> 
> That's almost half a second between the double free, Vince, is your TSC
> solid?

grep DROPPED bug.out

Now align that with the double malloc/free sites and you have an explanation ...




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 12:35                                                   ` Thomas Gleixner
@ 2014-05-01 13:12                                                     ` Peter Zijlstra
  2014-05-01 13:29                                                     ` Thomas Gleixner
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-01 13:12 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, May 01, 2014 at 02:35:02PM +0200, Thomas Gleixner wrote:
> > grep ptr=0xffff880118fda000 bug.out | less
> > 
> > We find lovely bits such as:
> > 
> >      perf_fuzzer-4387  [001]  1773.427175: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> >      ksoftirqd/6-38    [006]  1773.457770: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
> >           <idle>-0     [007]  1774.020378: kfree:                (free_event_rcu+0x2f) call_site=ffffffff8113177f ptr=0xffff880118fda000
> >      perf_fuzzer-4387  [000]  1774.096354: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff8113a8fa ptr=0xffff880118fda000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO
> > 
> > 
> > That's almost half a second between the double free, Vince, is your TSC
> > solid?
> 
> grep DROPPED bug.out
> 
> Now align that with the double malloc/free sites and you have an explanation ...

Argh!

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 10:26                                               ` Peter Zijlstra
  2014-05-01 11:50                                                 ` Peter Zijlstra
@ 2014-05-01 13:22                                                 ` Vince Weaver
  1 sibling, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 1 May 2014, Peter Zijlstra wrote:

> On Thu, May 01, 2014 at 12:51:33AM +0200, Thomas Gleixner wrote:
> > And that's the issue which puzzles us. Let's look at what we expect:
> > 
> > Now the trace shows a different story:
> > 
> >      perf_fuzzer-4387  [001]  1802.628659: sys_enter:            
> NR 298 (69bb58, 0, ffffffff, 12, 0, 0)
> 
> That's a per-cpu event (.pid = -1, .cpu = 12), they don't get inherited,
> so the only thing keeping it alive is the fd the child got. So
> exit_files() killing this thing makes perfect sense.

wait, are you sure?  Isn't that pid=0, cpu=-1, group_fd=12?

my machine only has 8 cpus...

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 12:35                                                   ` Thomas Gleixner
  2014-05-01 13:12                                                     ` Peter Zijlstra
@ 2014-05-01 13:29                                                     ` Thomas Gleixner
  1 sibling, 0 replies; 81+ messages in thread
From: Thomas Gleixner @ 2014-05-01 13:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 1 May 2014, Thomas Gleixner wrote:

> On Thu, 1 May 2014, Peter Zijlstra wrote:
> > On Thu, May 01, 2014 at 12:26:02PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 01, 2014 at 12:51:33AM +0200, Thomas Gleixner wrote:
> > > > And that's the issue which puzzles us. Let's look at what we expect:
> > > > 
> > > > Now the trace shows a different story:
> > > > 
> > > >      perf_fuzzer-4387  [001]  1802.628659: sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)
> > > 
> > > That's a per-cpu event (.pid = -1, .cpu = 12), they don't get inherited,
> > > so the only thing keeping it alive is the fd the child got. So
> > > exit_files() killing this thing makes perfect sense.
> 
> Duh, right. Should have noticed :(

And having a second look:

SYSCALL_DEFINE5(perf_event_open,
		struct perf_event_attr __user *, attr_uptr,
		pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)

sys_enter:            NR 298 (69bb58, 0, ffffffff, 12, 0, 0)

attr_uptr = 0x69bb58
pid 	  = 0
cpu 	  = -1
group_fd  = 12
flags 	  = 0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-04-30 18:44                                         ` Peter Zijlstra
  2014-04-30 21:08                                           ` Vince Weaver
@ 2014-05-01 14:07                                           ` Vince Weaver
  2014-05-01 14:27                                             ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Wed, 30 Apr 2014, Peter Zijlstra wrote:

> Vince, could you add the below to whatever tracing muck you already
> have?

OK, running with your patch, I get this messages a few times.  No crashing 
or memory corruption messages, but as I've said before that only happens 
maybe 10% of the time, let me run a few more times.

line 1666 is
	        WARN_ON(event->ctx != ctx);

[  583.051054] ------------[ cut here ]------------
[  583.056011] WARNING: CPU: 1 PID: 2479 at kernel/events/core.c:1666 event_sched_in.isra.77+0x209/0x230()
[  583.066099] Modules linked in: fuse snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul drm_kms_helper snd_hda_codec crc32_pclmul ghash_clmulni_intel aesni_intel snd_hwdep snd_pcm drm snd_seq aes_x86_64 lrw evdev parport_pc iTCO_wdt gf128mul snd_timer tpm_tis snd_seq_device glue_helper i2c_i801 snd iTCO_vendor_support ablk_helper cryptd parport psmouse pcspkr serio_raw button processor video battery i2c_algo_bit i2c_core tpm wmi lpc_ich mfd_core mei_me mei soundcore sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci libahci ehci_pci xhci_hcd ehci_hcd libata e1000e scsi_mod ptp usbcore crc32c_intel pps_core usb_common fan thermal thermal_sys
[  583.139793] CPU: 1 PID: 2479 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #92
[  583.147150] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  583.155136]  0000000000000009 ffff8800cdb85bf0 ffffffff81649790 0000000000000000
[  583.163109]  ffff8800cdb85c28 ffffffff810646ad ffff8800cf085000 ffff8800cd762c00
[  583.171171]  ffff88011ea58398 ffff88011ea5839c 0000007c37c9d592 ffff8800cdb85c38
[  583.179182] Call Trace:
[  583.181834]  [<ffffffff81649790>] dump_stack+0x45/0x56
[  583.187355]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[  583.193802]  [<ffffffff8106478a>] warn_slowpath_null+0x1a/0x20
[  583.200054]  [<ffffffff81135d69>] event_sched_in.isra.77+0x209/0x230
[  583.206830]  [<ffffffff81135e42>] group_sched_in+0xb2/0x1e0
[  583.212824]  [<ffffffff81136094>] ctx_sched_in+0x124/0x1f0
[  583.218711]  [<ffffffff811361c0>] perf_event_sched_in+0x60/0x90
[  583.225063]  [<ffffffff81136a1f>] __perf_install_in_context+0x11f/0x1a0
[  583.232181]  [<ffffffff811317a0>] remote_function+0x40/0x50
[  583.238127]  [<ffffffff810dda26>] generic_exec_single+0x126/0x170
[  583.244681]  [<ffffffff81131760>] ? task_clock_event_add+0x40/0x40
[  583.251300]  [<ffffffff810ddad7>] smp_call_function_single+0x67/0xa0
[  583.258103]  [<ffffffff81130784>] task_function_call+0x44/0x50
[  583.264366]  [<ffffffff81136900>] ? perf_cpu_hrtimer_handler+0x200/0x200
[  583.271547]  [<ffffffff81131ba6>] perf_install_in_context+0x86/0x100
[  583.278342]  [<ffffffff8113a488>] SYSC_perf_event_open+0x968/0xb00
[  583.284990]  [<ffffffff8113aa09>] SyS_perf_event_open+0x9/0x10
[  583.291249]  [<ffffffff8165a46d>] system_call_fastpath+0x1a/0x1f
[  583.297658] ---[ end trace 41ec7a21bb260454 ]---
[  623.740479] ------------[ cut here ]------------


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 14:07                                           ` Vince Weaver
@ 2014-05-01 14:27                                             ` Vince Weaver
  2014-05-01 15:09                                               ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 14:27 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Thomas Gleixner,
	Steven Rostedt

On Thu, 1 May 2014, Vince Weaver wrote:

> On Wed, 30 Apr 2014, Peter Zijlstra wrote:
> 
> > Vince, could you add the below to whatever tracing muck you already
> > have?

and this might be what you're looking for.  This is with a different 
random seed than the one I've used for other traces, your patch changes 
the syscall behavior enough that the one I was using before wasn't going 
down the same path.

This WARNING is
WARN_ON(event->hlist_entry.pprev && event->hlist_entry.pprev != LIST_POISON2);

[ 1554.910867] ------------[ cut here ]------------
[ 1554.919535] WARNING: CPU: 5 PID: 16431 at kernel/events/core.c:3232 __free_event+0x86/0x90()
[ 1554.931534] Modules linked in: fuse snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul drm_kms_helper snd_hda_codec crc32_pclmul ghash_clmulni_intel aesni_intel snd_hwdep snd_pcm drm snd_seq aes_x86_64 lrw evdev parport_pc iTCO_wdt gf128mul snd_timer tpm_tis snd_seq_device glue_helper i2c_i801 snd iTCO_vendor_support ablk_helper cryptd parport psmouse pcspkr serio_raw button processor video battery i2c_algo_bit i2c_core tpm wmi lpc_ich mfd_core mei_me mei soundcore sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci libahci ehci_pci xhci_hcd ehci_hcd libata e1000e scsi_mod ptp usbcore crc32c_intel pps_core usb_common fan thermal thermal_sys
[ 1555.010748] CPU: 5 PID: 16431 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #92
[ 1555.020142] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 1555.028924]  0000000000000009 ffff880117147b78 ffffffff81649790 0000000000000000
[ 1555.037771]  ffff880117147bb0 ffffffff810646ad ffff880116daf800 0000000000000000
[ 1555.046611]  ffff880036e45a10 ffff8800cde07588 ffff880116dafaa0 ffff880117147bc0
[ 1555.055490] Call Trace:
[ 1555.058963]  [<ffffffff81649790>] dump_stack+0x45/0x56
[ 1555.065357]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[ 1555.072646]  [<ffffffff8106478a>] warn_slowpath_null+0x1a/0x20
[ 1555.079740]  [<ffffffff81132e56>] __free_event+0x86/0x90
[ 1555.086298]  [<ffffffff811339c9>] _free_event+0xc9/0x200
[ 1555.092859]  [<ffffffff81133c78>] put_event+0x178/0x1f0
[ 1555.099293]  [<ffffffff81133b68>] ? put_event+0x68/0x1f0
[ 1555.105853]  [<ffffffff81133d12>] perf_release+0x12/0x20
[ 1555.112422]  [<ffffffff811b64ec>] __fput+0xdc/0x1e0
[ 1555.118576]  [<ffffffff811b663e>] ____fput+0xe/0x10
[ 1555.124630]  [<ffffffff81085137>] task_work_run+0xa7/0xe0
[ 1555.131307]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[ 1555.137606]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[ 1555.145356]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[ 1555.151970]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[ 1555.159542]  [<ffffffff81012438>] do_signal+0x48/0x990
[ 1555.165869]  [<ffffffff81655592>] ? do_page_fault+0x22/0x30
[ 1555.172645]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[ 1555.179484]  [<ffffffff81651abc>] retint_signal+0x48/0x8c
[ 1555.185991] ---[ end trace 41ec7a21bb260463 ]---
[ 1556.112904] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff880116daf800, len=2048
[ 1556.126221] 040: 6b 6b 6b 6b 6b 6b 6b 6b a8 75 36 ca 00 88 ff ff  kkkkkkkk.u6.....
[ 1556.137243] Prev obj: start=ffff880116daf000, len=2048
[ 1556.145566] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[ 1556.156292] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

I can try slapping ftrace on and getting a trace if you want.

I've been using
trace-cmd record -e kmem -e raw_syscalls -p function -l '*perf*' -n 'perf_event_task_tick'

which is a compromise between log size and info, but as you've seen it 
loses useful info, especially all the things in core.c that don't
include *perf*.

Vince


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 14:27                                             ` Vince Weaver
@ 2014-05-01 15:09                                               ` Peter Zijlstra
  2014-05-01 15:50                                                 ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-01 15:09 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Thu, May 01, 2014 at 10:27:45AM -0400, Vince Weaver wrote:
> On Thu, 1 May 2014, Vince Weaver wrote:
> 
> > On Wed, 30 Apr 2014, Peter Zijlstra wrote:
> > 
> > > Vince, could you add the below to whatever tracing muck you already
> > > have?
> 
> and this might be what you're looking for.  This is with a different 
> random seed than the one I've used for other traces, your patch changes 
> the syscall behavior enough that the one I was using before wasn't going 
> down the same path.
> 
> This WARNING is
> WARN_ON(event->hlist_entry.pprev && event->hlist_entry.pprev != LIST_POISON2);
> 
> [ 1554.910867] ------------[ cut here ]------------
> [ 1554.919535] WARNING: CPU: 5 PID: 16431 at kernel/events/core.c:3232 __free_event+0x86/0x90()
> [ 1554.931534] Modules linked in: fuse snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul drm_kms_helper snd_hda_codec crc32_pclmul ghash_clmulni_intel aesni_intel snd_hwdep snd_pcm drm snd_seq aes_x86_64 lrw evdev parport_pc iTCO_wdt gf128mul snd_timer tpm_tis snd_seq_device glue_helper i2c_i801 snd iTCO_vendor_support ablk_helper cryptd parport psmouse pcspkr serio_raw button processor video battery i2c_algo_bit i2c_core tpm wmi lpc_ich mfd_core mei_me mei soundcore sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ahci libahci ehci_pci xhci_hcd ehci_hcd libata e1000e scsi_mod ptp usbcore crc32c_intel pps_core usb_common fan thermal thermal_sys
> [ 1555.010748] CPU: 5 PID: 16431 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #92
> [ 1555.020142] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [ 1555.028924]  0000000000000009 ffff880117147b78 ffffffff81649790 0000000000000000
> [ 1555.037771]  ffff880117147bb0 ffffffff810646ad ffff880116daf800 0000000000000000
> [ 1555.046611]  ffff880036e45a10 ffff8800cde07588 ffff880116dafaa0 ffff880117147bc0
> [ 1555.055490] Call Trace:
> [ 1555.058963]  [<ffffffff81649790>] dump_stack+0x45/0x56
> [ 1555.065357]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
> [ 1555.072646]  [<ffffffff8106478a>] warn_slowpath_null+0x1a/0x20
> [ 1555.079740]  [<ffffffff81132e56>] __free_event+0x86/0x90
> [ 1555.086298]  [<ffffffff811339c9>] _free_event+0xc9/0x200
> [ 1555.092859]  [<ffffffff81133c78>] put_event+0x178/0x1f0
> [ 1555.099293]  [<ffffffff81133b68>] ? put_event+0x68/0x1f0
> [ 1555.105853]  [<ffffffff81133d12>] perf_release+0x12/0x20
> [ 1555.112422]  [<ffffffff811b64ec>] __fput+0xdc/0x1e0
> [ 1555.118576]  [<ffffffff811b663e>] ____fput+0xe/0x10
> [ 1555.124630]  [<ffffffff81085137>] task_work_run+0xa7/0xe0
> [ 1555.131307]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
> [ 1555.137606]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
> [ 1555.145356]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
> [ 1555.151970]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
> [ 1555.159542]  [<ffffffff81012438>] do_signal+0x48/0x990
> [ 1555.165869]  [<ffffffff81655592>] ? do_page_fault+0x22/0x30
> [ 1555.172645]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
> [ 1555.179484]  [<ffffffff81651abc>] retint_signal+0x48/0x8c
> [ 1555.185991] ---[ end trace 41ec7a21bb260463 ]---
> [ 1556.112904] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff880116daf800, len=2048
> [ 1556.126221] 040: 6b 6b 6b 6b 6b 6b 6b 6b a8 75 36 ca 00 88 ff ff  kkkkkkkk.u6.....
> [ 1556.137243] Prev obj: start=ffff880116daf000, len=2048
> [ 1556.145566] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> [ 1556.156292] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> 
> I can try slapping ftrace on and getting a trace if you want.
> 
> I've been using
> trace-cmd record -e kmem -e raw_syscalls -p function -l '*perf*' -n 'perf_event_task_tick'
> 
> which is a compromise between log size and info, but as you've seen it 
> loses useful info, especially all the things in core.c that don't
> include *perf*.
> 

/proc/sys/kernel/traceoff_on_warning

Is also useful, it disables tracing the moment a warn/bug hits.

But yes please!

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 15:09                                               ` Peter Zijlstra
@ 2014-05-01 15:50                                                 ` Vince Weaver
  2014-05-01 16:31                                                   ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel, Thomas Gleixner, Steven Rostedt

On Thu, 1 May 2014, Peter Zijlstra wrote:
> 
> But yes please!

OK, sorry for the delay, had forgotten to re-enable -pg for perf in the 
makefile when I applied your patch so had to re-build the kernel.

The trace is here:
	www.eece.maine.edu/~vweaver/junk/pzbug.out.bz2

No analysis so hopefully it's good, I've got an event to go to in a few 
minutes.

The messages:

[  634.846367] ------------[ cut here ]------------
[  634.851477] WARNING: CPU: 6 PID: 2915 at kernel/events/core.c:3232 __free_event+0x93/0xa0()
[  634.860583] Modules linked in: fuse snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_generic i915 drm_kms_helper crc32_pclmul snd_hda_intel snd_hda_controller ghash_clmulni_intel aesni_intel snd_hda_codec aes_x86_64 drm snd_hwdep evdev psmouse iTCO_wdt iTCO_vendor_support lrw gf128mul glue_helper snd_pcm pcspkr serio_raw i2c_i801 ablk_helper tpm_tis parport_pc parport cryptd snd_seq snd_timer snd_seq_device video battery button snd i2c_algo_bit i2c_core tpm mei_me mei lpc_ich mfd_core processor wmi soundcore sg sd_mod sr_mod crc_t10dif crct10dif_common cdrom ehci_pci ahci xhci_hcd e1000e libahci ehci_hcd libata usbcore ptp crc32c_intel pps_core scsi_mod usb_common thermal fan thermal_sys
[  634.935276] CPU: 6 PID: 2915 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #94
[  634.942754] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  634.950728]  0000000000000009 ffff8801174b7b78 ffffffff81649bf0 0000000000000000
[  634.958795]  ffff8801174b7bb0 ffffffff810646ad ffff8800cef05000 0000000000000000
[  634.966855]  ffff8800cd47be10 ffff880036c7b388 ffff8800cef052a0 ffff8801174b7bc0
[  634.974858] Call Trace:
[  634.977482]  [<ffffffff81649bf0>] dump_stack+0x45/0x56
[  634.983007]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[  634.989447]  [<ffffffff8106478a>] warn_slowpath_null+0x1a/0x20
[  634.995674]  [<ffffffff811330e3>] __free_event+0x93/0xa0
[  635.001429]  [<ffffffff81133cce>] _free_event+0xce/0x210
[  635.007180]  [<ffffffff81133f90>] put_event+0x180/0x1f0
[  635.012796]  [<ffffffff81133e80>] ? put_event+0x70/0x1f0
[  635.018487]  [<ffffffff81134037>] perf_release+0x17/0x20
[  635.024228]  [<ffffffff811b694c>] __fput+0xdc/0x1e0
[  635.029457]  [<ffffffff811b6a9e>] ____fput+0xe/0x10
[  635.034719]  [<ffffffff81085137>] task_work_run+0xa7/0xe0
[  635.040524]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[  635.045978]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[  635.052880]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[  635.058669]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[  635.065384]  [<ffffffff81012438>] do_signal+0x48/0x990
[  635.070911]  [<ffffffff8101f516>] ? ftrace_raw_event_sys_exit+0x56/0x80
[  635.077994]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[  635.084059]  [<ffffffff81651f3c>] retint_signal+0x48/0x8c
[  635.089912] ---[ end trace bf0bdbfdb698177c ]---
[  635.995839] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff8800cef05000, len=2048
[  636.006669] 040: 6b 6b 6b 6b 6b 6b 6b 6b a8 75 90 17 01 88 ff ff  kkkkkkkk.u......
[  636.017593] Next obj: start=ffff8800cef05800, len=2048
[  636.024510] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  636.035462] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 15:50                                                 ` Vince Weaver
@ 2014-05-01 16:31                                                   ` Thomas Gleixner
  2014-05-01 17:18                                                     ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2014-05-01 16:31 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 1 May 2014, Vince Weaver wrote:

> On Thu, 1 May 2014, Peter Zijlstra wrote:
> > 
> > But yes please!
> 
> OK, sorry for the delay, had forgotten to re-enable -pg for perf in the 
> makefile when I applied your patch so had to re-build the kernel.
> 
> The trace is here:
> 	www.eece.maine.edu/~vweaver/junk/pzbug.out.bz2
> 
> No analysis so hopefully it's good, I've got an event to go to in a few 
> minutes.
> 
> The messages:
> 
> [  634.846367] ------------[ cut here ]------------
> [  634.851477] WARNING: CPU: 6 PID: 2915 at kernel/events/core.c:3232 __free_event+0x93/0xa0()

So we are on the right track:

     perf_fuzzer-2915  [006]   634.846280: bprint:               _free_event: freeing with 0 refs; ptr=0x0xffff8800cef05000

> [  634.935276] CPU: 6 PID: 2915 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #94
> [  634.942754] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [  634.950728]  0000000000000009 ffff8801174b7b78 ffffffff81649bf0 0000000000000000
> [  634.958795]  ffff8801174b7bb0 ffffffff810646ad ffff8800cef05000 0000000000000000

----------------------------------------------------^^^^^^^^^^^^^^^^

> [  634.966855]  ffff8800cd47be10 ffff880036c7b388 ffff8800cef052a0 ffff8801174b7bc0

And the corrupted element:

> [  635.995839] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff8800cef05000, len=2048

The same address.

Heading out now and postponing the chase for tomorrow morning.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 16:31                                                   ` Thomas Gleixner
@ 2014-05-01 17:18                                                     ` Vince Weaver
  2014-05-01 18:49                                                       ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 17:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vince Weaver, Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 1 May 2014, Thomas Gleixner wrote:

> Heading out now and postponing the chase for tomorrow morning.

Some decoding of the trace.

One thing that's possibly unrelated, but on both this and the previous
bug the main thread was doing a "perf_poll" while the bug is triggered.
I guess in theory that could mess up the ref count, although in both
cases the poll() call started after the event was closed so I wouldn't
think it'd be able to poll on a closed fd.


Allocated:
	<...>-2508  [002]   634.721037: kmalloc:              (perf_event_alloc+0x5a) call_site=ffffffff81139b4a ptr=0xffff8800cef05000 bytes_req=1272 bytes_alloc=2048 gfp_flags=GFP_KERNEL|GFP_ZERO

This time it's PERF_COUNT_SW_PAGE_FAULTS_MAJ but with inherit enabled:
	<...>-2508  [002]   634.721193: bprint:               SYSC_perf_event_open: Opened: 1 6 0 1
	<...>-2508  [002]   634.721193: sys_exit:             NR 298 = 3


__perf_event_task_sched in
We are added to the CPU1 swevent hlist for the last time (and never 
deleted, hence the bug):
	<...>-2508  [001]   634.838474: function:             __perf_event_task_sched_in
	<...>-2508  [001]   634.838490: bprint:               perf_swevent_add: VMW add_rcu: 0xffff8800cef05000

Event fd closed in parent:
	<...>-2508  [001]   634.839922: sys_enter:            NR 3 (3, 7ffff523dc8c, 0, 22, 7f420cf6210c, 7f420cf62120)
	<...>-2508  [001]   634.839924: sys_exit:             NR 3 = 0

Kill the child:
	<...>-2508  [001]   634.845994: sys_enter:            NR 62 (b63, 9, 7, 1, 7f420cf620f8, 7f420cf62120)

Event released:
	perf_fuzzer-2915  [006]   634.846247: function:             perf_release
	perf_fuzzer-2915  [006]   634.846248: bprint:               put_event: put ref: 0; ptr=0x0xffff8800cef05000 other=0x(nil)
	perf_fuzzer-2915  [006]   634.846252: function:             perf_group_detach
	perf_fuzzer-2915  [006]   634.846253: function:                perf_event__header_size
	perf_fuzzer-2915  [006]   634.846253: function:             perf_remove_from_context


__perf_event_task_sched out on CPU1.
If not closed would have been deleted from the swevent_hlist around here:
	<...>-2508  [001]   634.846265: function:             __perf_event_task_sched_out
	<...>-2508  [001]   634.846270: bprint:               perf_swevent_del: VMW del_rcu: 0xffff8800cf1b8800

Event freed:
	perf_fuzzer-2915  [006]   634.846280: bprint:               _free_event: freeing with 0 refs; ptr=0x0xffff8800cef05000
	perf_fuzzer-2915  [006]   634.846282: function:             sw_perf_event_destroy


Grace period expire:
	<idle>-0     [006]   635.138983: kfree:                (free_event_rcu+0x2f) call_site=ffffffff81130c3f ptr=0xffff8800cef05000


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 17:18                                                     ` Vince Weaver
@ 2014-05-01 18:49                                                       ` Vince Weaver
  2014-05-01 21:32                                                         ` Vince Weaver
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 18:49 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Steven Rostedt


OK, humor me a bit here.

I'm looking at the buggy trace and comparing against a "good" trace where 
the bug doesn't happen.

It is a rance condition of sorts, because it's just a 10us or so 
interleaving of calls that causes the bug to happen or not.

In the good trace:

	[parent] __perf_event_task_sched_out (and hence perf_swevent_del)
	[child]  perf_release

In the buggy trace:

	[child] perf_release
	[parent] __perf_event_task_sched_out (perf_swevent_del never happens)


perf_swevent_del calls
	hlist_del_rcu(event->hlist_entry)
to remove the event from the swevent hlist.

Now in theory perf_release() calls sw_perf_event_destroy() which you
would think would also call the above.  Instead it does
	 swevent_hlist_put_cpu(event, cpu);
which does all kinds of weird hash stuff that I don't follow.

Should the above two be equivelent?  Is it reference counting in there 
with if (!--swhash->hlist_refcount) causing the issue?

Anyway I'm tired of staring at traces for the moment.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 18:49                                                       ` Vince Weaver
@ 2014-05-01 21:32                                                         ` Vince Weaver
  2014-05-02 11:15                                                         ` Peter Zijlstra
  2014-05-02 15:42                                                         ` Peter Zijlstra
  2 siblings, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-01 21:32 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Steven Rostedt


OK, with the following patch I've been running the problem test case for 
an hour without triggering the bug.

I'm sure this is the wrong fix (maybe patching over the problem istead of 
fixing the root cause), but it works for me.

It looks like this whole mess got introduced with 76e1d9047 in Linux 
2.6.35 when the swevent code was converted to use a hashed list.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a..970d711 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5775,6 +5800,11 @@ static void sw_perf_event_destroy(struct perf_event *event)
 
 	WARN_ON(event->parent);
 
+	perf_pmu_disable(event->pmu);
+	if ((event->hlist_entry.pprev) && (event->hlist_entry.pprev!=LIST_POISON2))
+		 event->pmu->del(event, 0);
+	perf_pmu_enable(event->pmu);
+
 	static_key_slow_dec(&perf_swevent_enabled[event_id]);
 	swevent_hlist_put(event);
 }

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 18:49                                                       ` Vince Weaver
  2014-05-01 21:32                                                         ` Vince Weaver
@ 2014-05-02 11:15                                                         ` Peter Zijlstra
  2014-05-02 15:42                                                         ` Peter Zijlstra
  2 siblings, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-02 11:15 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1702 bytes --]

On Thu, May 01, 2014 at 02:49:01PM -0400, Vince Weaver wrote:
> 
> OK, humor me a bit here.
> 
> I'm looking at the buggy trace and comparing against a "good" trace where 
> the bug doesn't happen.
> 
> It is a rance condition of sorts, because it's just a 10us or so 
> interleaving of calls that causes the bug to happen or not.
> 
> In the good trace:
> 
> 	[parent] __perf_event_task_sched_out (and hence perf_swevent_del)
> 	[child]  perf_release
> 
> In the buggy trace:
> 
> 	[child] perf_release
> 	[parent] __perf_event_task_sched_out (perf_swevent_del never happens)
> 
> 
> perf_swevent_del calls
> 	hlist_del_rcu(event->hlist_entry)
> to remove the event from the swevent hlist.
> 
> Now in theory perf_release() calls sw_perf_event_destroy() which you
> would think would also call the above.  Instead it does
> 	 swevent_hlist_put_cpu(event, cpu);
> which does all kinds of weird hash stuff that I don't follow.
> 
> Should the above two be equivelent?  Is it reference counting in there 
> with if (!--swhash->hlist_refcount) causing the issue?

perf_release()
  put_event()
    perf_remove_from_context()
      __perf_remove_from_context()
        event_sched_out()
	  ->del()

is the path that would call ->del() and hlist_del_rcu().

Now perf_remove_from_context() only calls __perf_remove_from_context()
when the task is active somewhere, otherwise it simply calls
list_del_event().

Both perf_remove_from_context() and perf_event_context_sched_out() (as
called from __perf_event_task_sched_out) hold ctx->lock, so they should
be serialized against each other.

Clearly I'm missing something though, will go stare at the trace now.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-01 18:49                                                       ` Vince Weaver
  2014-05-01 21:32                                                         ` Vince Weaver
  2014-05-02 11:15                                                         ` Peter Zijlstra
@ 2014-05-02 15:42                                                         ` Peter Zijlstra
  2014-05-02 16:22                                                           ` Vince Weaver
  2014-05-02 17:06                                                           ` [perf] more perf_fuzzer memory corruption Vince Weaver
  2 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-02 15:42 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 6331 bytes --]

On Thu, May 01, 2014 at 02:49:01PM -0400, Vince Weaver wrote:
> It is a rance condition of sorts, because it's just a 10us or so 
> interleaving of calls that causes the bug to happen or not.
> 
> In the good trace:
> 
> 	[parent] __perf_event_task_sched_out (and hence perf_swevent_del)
> 	[child]  perf_release
> 
> In the buggy trace:
> 
> 	[child] perf_release
> 	[parent] __perf_event_task_sched_out (perf_swevent_del never happens)
> 

Can you give this a spin?

---
Subject: perf: Fix race in removing an event
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri May 2 16:56:01 CEST 2014

When removing a (sibling) event we do:

	raw_spin_lock_irq(&ctx->lock);
	perf_group_detach(event);
	raw_spin_unlock_irq(&ctx->lock);

	<hole>

	perf_remove_from_context(event);
		raw_spin_lock_irq(&ctx->lock);
		...
		raw_spin_unlock_irq(&ctx->lock);

Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.

So, if during <hole> we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.

The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.

Hereafter we can proceed to free the event; while still programmed!

Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.

The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/core.c |   41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1444,6 +1444,11 @@ group_sched_out(struct perf_event *group
 		cpuctx->exclusive = 0;
 }
 
+struct remove_event {
+	struct perf_event *event;
+	bool detach_group;
+};
+
 /*
  * Cross CPU call to remove a performance event
  *
@@ -1452,12 +1457,15 @@ group_sched_out(struct perf_event *group
  */
 static int __perf_remove_from_context(void *info)
 {
-	struct perf_event *event = info;
+	struct remove_event *re = info;
+	struct perf_event *event = re->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	raw_spin_lock(&ctx->lock);
 	event_sched_out(event, cpuctx, ctx);
+	if (re->detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
 		ctx->is_active = 0;
@@ -1482,10 +1490,14 @@ static int __perf_remove_from_context(vo
  * When called from perf_event_exit_task, it's OK because the
  * context has been detached from its task.
  */
-static void perf_remove_from_context(struct perf_event *event)
+static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
+	struct remove_event re = {
+		.event = event,
+		.detach_group = detach_group,
+	};
 
 	lockdep_assert_held(&ctx->mutex);
 
@@ -1494,12 +1506,12 @@ static void perf_remove_from_context(str
 		 * Per cpu events are removed via an smp call and
 		 * the removal is always successful.
 		 */
-		cpu_function_call(event->cpu, __perf_remove_from_context, event);
+		cpu_function_call(event->cpu, __perf_remove_from_context, &re);
 		return;
 	}
 
 retry:
-	if (!task_function_call(task, __perf_remove_from_context, event))
+	if (!task_function_call(task, __perf_remove_from_context, &re))
 		return;
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -1516,6 +1528,8 @@ static void perf_remove_from_context(str
 	 * Since the task isn't running, its safe to remove the event, us
 	 * holding the ctx->lock ensures the task won't get scheduled in.
 	 */
+	if (detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
 }
@@ -3285,10 +3299,7 @@ int perf_event_release_kernel(struct per
 	 *     to trigger the AB-BA case.
 	 */
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
-	raw_spin_lock_irq(&ctx->lock);
-	perf_group_detach(event);
-	raw_spin_unlock_irq(&ctx->lock);
-	perf_remove_from_context(event);
+	perf_remove_from_context(event, true);
 	mutex_unlock(&ctx->mutex);
 
 	free_event(event);
@@ -7180,7 +7191,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		struct perf_event_context *gctx = group_leader->ctx;
 
 		mutex_lock(&gctx->mutex);
-		perf_remove_from_context(group_leader);
+		perf_remove_from_context(group_leader, false);
 
 		/*
 		 * Removing from the context ends up with disabled
@@ -7190,7 +7201,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		perf_event__state_init(group_leader);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
 				    group_entry) {
-			perf_remove_from_context(sibling);
+			perf_remove_from_context(sibling, false);
 			perf_event__state_init(sibling);
 			put_ctx(gctx);
 		}
@@ -7320,7 +7331,7 @@ void perf_pmu_migrate_context(struct pmu
 	mutex_lock(&src_ctx->mutex);
 	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
 				 event_entry) {
-		perf_remove_from_context(event);
+		perf_remove_from_context(event, false);
 		unaccount_event_cpu(event, src_cpu);
 		put_ctx(src_ctx);
 		list_add(&event->migrate_entry, &events);
@@ -7382,13 +7393,7 @@ __perf_event_exit_task(struct perf_event
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	if (child_event->parent) {
-		raw_spin_lock_irq(&child_ctx->lock);
-		perf_group_detach(child_event);
-		raw_spin_unlock_irq(&child_ctx->lock);
-	}
-
-	perf_remove_from_context(child_event);
+	perf_remove_from_context(child_event, !!child_event->parent);
 
 	/*
 	 * It can happen that the parent exits first, and has events

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 15:42                                                         ` Peter Zijlstra
@ 2014-05-02 16:22                                                           ` Vince Weaver
  2014-05-02 16:22                                                             ` Peter Zijlstra
  2014-05-02 17:06                                                           ` [perf] more perf_fuzzer memory corruption Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-02 16:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt


I'll try the patch next.

Meanwhile, can polling on a closed event cause problems with the reference 
count?

In my various failure traces there's always been a poll() active at the 
time of crash, and I added some trace_printk()s and it looks like poll is 
at least attempting to poll on the closed file descriptor that's causing 
us problems.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 16:22                                                           ` Vince Weaver
@ 2014-05-02 16:22                                                             ` Peter Zijlstra
  2014-05-02 16:43                                                               ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-02 16:22 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On Fri, May 02, 2014 at 12:22:30PM -0400, Vince Weaver wrote:
> 
> I'll try the patch next.
> 
> Meanwhile, can polling on a closed event cause problems with the reference 
> count?
> 
> In my various failure traces there's always been a poll() active at the 
> time of crash, and I added some trace_printk()s and it looks like poll is 
> at least attempting to poll on the closed file descriptor that's causing 
> us problems.

In principle the vfs file refcounting should be responsible for that.
But I'll go over it in a bit.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 16:22                                                             ` Peter Zijlstra
@ 2014-05-02 16:43                                                               ` Vince Weaver
  2014-05-02 17:27                                                                 ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-02 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Peter Zijlstra wrote:

> In principle the vfs file refcounting should be responsible for that.
> But I'll go over it in a bit.

The poll code is ancient and the C-parser in my head really can't handle 
it very well.

Anyway for completeness this is the kind of thing I'm seeing.
The poll() manpage isn't very clear about what is supposed to happen if 
you poll() on a closed file descriptor.


FD#3 closed
	perf_fuzzer-2293  [003]   286.500137: sys_enter:            NR 3 (3, 7fff841b9eac, 0, 22, 7ff17078110c, 7ff170781120)

Child killed:
	perf_fuzzer-2293  [003]   286.505587: sys_exit:             NR 62 = 0

Poll started, seems to have freed fd #3 as an argument:
	perf_fuzzer-2293  [003]   286.505703: sys_enter:            NR 7 (7fff841b9b00, 55, 3, 40e3e3, 7ff1707810dc, 7ff170781120)

(child is still closing out at this point)


Event freed:
	<...>-2701  [004]   286.505904: bprint:               _free_event: freeing with 0 refs; ptr=0x0xffff8800ce88e000

fd#3 is still being polled despite the event being completely gone now:
	perf_fuzzer-2293  [003]   286.508846: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508847: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508848: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508849: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508850: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508846: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508847: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508848: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508849: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508850: bprint:               do_sys_poll: VMW: poll 3
	perf_fuzzer-2293  [003]   286.508850: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508851: bprint:               do_sys_poll: VMW: poll 12
	perf_fuzzer-2293  [003]   286.508850: function:             perf_poll
	perf_fuzzer-2293  [003]   286.508851: bprint:               do_sys_poll: VMW: poll 12

Finally done polling:
	perf_fuzzer-2293  [003]   286.509002: sys_exit:             NR 7 = 0


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 17:06                                                           ` [perf] more perf_fuzzer memory corruption Vince Weaver
@ 2014-05-02 17:04                                                             ` Peter Zijlstra
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-02 17:04 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

On Fri, May 02, 2014 at 01:06:52PM -0400, Vince Weaver wrote:
> On Fri, 2 May 2014, Peter Zijlstra wrote:
> > 
> > Can you give this a spin?
> > 
> > ---
> > Subject: perf: Fix race in removing an event
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Fri May 2 16:56:01 CEST 2014
> 
> Nope, still shows the bug pretty quickly:

Curses.. back to staring at traces I suppose :/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 15:42                                                         ` Peter Zijlstra
  2014-05-02 16:22                                                           ` Vince Weaver
@ 2014-05-02 17:06                                                           ` Vince Weaver
  2014-05-02 17:04                                                             ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-02 17:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Peter Zijlstra wrote:
> 
> Can you give this a spin?
> 
> ---
> Subject: perf: Fix race in removing an event
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri May 2 16:56:01 CEST 2014

Nope, still shows the bug pretty quickly:

[  210.411542] ------------[ cut here ]------------
[  210.416638] WARNING: CPU: 7 PID: 2646 at kernel/events/core.c:3232 __free_event+0x93/0xa0()
[  210.425777] Modules linked in: fuse snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i915 crct10dif_pclmul crc32_pclmul snd_seq snd_timer ghash_clmulni_intel snd_seq_device aesni_intel aes_x86_64 snd lrw gf128mul ppdev glue_helper soundcore evdev drm_kms_helper iTCO_wdt drm i2c_algo_bit iTCO_vendor_support mei_me mei psmouse ablk_helper cryptd parport_pc i2c_i801 i2c_core tpm_tis battery lpc_ich mfd_core wmi parport serio_raw tpm pcspkr button processor video sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata ptp xhci_hcd crc32c_intel ehci_hcd usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
[  210.500288] CPU: 7 PID: 2646 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #101
[  210.507740] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  210.515703]  0000000000000009 ffff8800cdf03b78 ffffffff81649b70 0000000000000000
[  210.523768]  ffff8800cdf03bb0 ffffffff810646ad ffff880036ee9000 0000000000000000
[  210.531931]  ffff8800cdb00a10 ffff8800cdb23248 ffff880036ee92a0 ffff8800cdf03bc0
[  210.539985] Call Trace:
[  210.542647]  [<ffffffff81649b70>] dump_stack+0x45/0x56
[  210.548199]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[  210.554644]  [<ffffffff8106478a>] warn_slowpath_null+0x1a/0x20
[  210.560916]  [<ffffffff81133083>] __free_event+0x93/0xa0
[  210.566679]  [<ffffffff81133c6e>] _free_event+0xce/0x210
[  210.572386]  [<ffffffff81133f30>] put_event+0x180/0x1f0
[  210.578004]  [<ffffffff81133e20>] ? put_event+0x70/0x1f0
[  210.583730]  [<ffffffff81133fd7>] perf_release+0x17/0x20
[  210.589443]  [<ffffffff811b68cc>] __fput+0xdc/0x1e0
[  210.594698]  [<ffffffff811b6a1e>] ____fput+0xe/0x10
[  210.599948]  [<ffffffff81085137>] task_work_run+0xa7/0xe0
[  210.605786]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[  210.611233]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[  210.618186]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[  210.623995]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[  210.630731]  [<ffffffff81012438>] do_signal+0x48/0x990
[  210.636246]  [<ffffffff81655992>] ? do_page_fault+0x22/0x30
[  210.642268]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[  210.648350]  [<ffffffff81651ebc>] retint_signal+0x48/0x8c
[  210.654207] ---[ end trace 55a1ac7c9a3f451c ]---
[  212.567179] Slab corruption (Tainted: G        W    ): kmalloc-2048 start=ffff880036ee9000, len=2048
[  212.576985] 040: 6b 6b 6b 6b 6b 6b 6b 6b a8 15 94 cf 00 88 ff ff  kkkkkkkk........


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 16:43                                                               ` Vince Weaver
@ 2014-05-02 17:27                                                                 ` Peter Zijlstra
  2014-05-02 17:46                                                                   ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-02 17:27 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

On Fri, May 02, 2014 at 12:43:17PM -0400, Vince Weaver wrote:
> On Fri, 2 May 2014, Peter Zijlstra wrote:
> 
> > In principle the vfs file refcounting should be responsible for that.
> > But I'll go over it in a bit.
> 
> The poll code is ancient and the C-parser in my head really can't handle 
> it very well.

Yeah, that code isn't my favourite part either..

> Anyway for completeness this is the kind of thing I'm seeing.
> The poll() manpage isn't very clear about what is supposed to happen if 
> you poll() on a closed file descriptor.

              POLLNVAL
                     Invalid request: fd not open (output only).

Seems applicable..


Also, could you send your entire diff this way so we're more or less
staring at the same code again?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 17:27                                                                 ` Peter Zijlstra
@ 2014-05-02 17:46                                                                   ` Vince Weaver
  2014-05-02 19:12                                                                     ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-02 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Peter Zijlstra wrote:

> On Fri, May 02, 2014 at 12:43:17PM -0400, Vince Weaver wrote:
> > On Fri, 2 May 2014, Peter Zijlstra wrote:
> > 
> > > In principle the vfs file refcounting should be responsible for that.
> > > But I'll go over it in a bit.
> > 
> > The poll code is ancient and the C-parser in my head really can't handle 
> > it very well.
> 
> Yeah, that code isn't my favourite part either..
> 
> > Anyway for completeness this is the kind of thing I'm seeing.
> > The poll() manpage isn't very clear about what is supposed to happen if 
> > you poll() on a closed file descriptor.
> 
>               POLLNVAL
>                      Invalid request: fd not open (output only).
> 
> Seems applicable..

You're right and it seems to return that properly, so having poll be 
active might just be a weird co-incidence.

> Also, could you send your entire diff this way so we're more or less
> staring at the same code again?

that last test I ran was just 3.15-rc1 plus the last patch you sent,
plus a patch to allow -pg on the event.c file, plus an unrelated one that 
works around the current make-kpkg debian breakage.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 17:46                                                                   ` Vince Weaver
@ 2014-05-02 19:12                                                                     ` Thomas Gleixner
  2014-05-02 20:15                                                                       ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2014-05-02 19:12 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Vince Weaver wrote:
> On Fri, 2 May 2014, Peter Zijlstra wrote:
> 
> > On Fri, May 02, 2014 at 12:43:17PM -0400, Vince Weaver wrote:
> > > On Fri, 2 May 2014, Peter Zijlstra wrote:
> > > 
> > > > In principle the vfs file refcounting should be responsible for that.
> > > > But I'll go over it in a bit.
> > > 
> > > The poll code is ancient and the C-parser in my head really can't handle 
> > > it very well.
> > 
> > Yeah, that code isn't my favourite part either..
> > 
> > > Anyway for completeness this is the kind of thing I'm seeing.
> > > The poll() manpage isn't very clear about what is supposed to happen if 
> > > you poll() on a closed file descriptor.
> > 
> >               POLLNVAL
> >                      Invalid request: fd not open (output only).
> > 
> > Seems applicable..
> 
> You're right and it seems to return that properly, so having poll be 
> active might just be a weird co-incidence.
> 
> > Also, could you send your entire diff this way so we're more or less
> > staring at the same code again?
> 
> that last test I ran was just 3.15-rc1 plus the last patch you sent,
> plus a patch to allow -pg on the event.c file, plus an unrelated one that 
> works around the current make-kpkg debian breakage.

Hmm, and where comes the WARN_ON in _free_event() from? That's not in
Peters last patch.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 19:12                                                                     ` Thomas Gleixner
@ 2014-05-02 20:15                                                                       ` Vince Weaver
  2014-05-02 20:45                                                                         ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-02 20:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vince Weaver, Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Thomas Gleixner wrote:

> Hmm, and where comes the WARN_ON in _free_event() from? That's not in
> Peters last patch.

ahh, you're right :(  My fault.  I gave the new patch and the previous 
patch similar names and applied the wrong one.

OK the proper patch has been running the quick reproducer for a bit 
without triggering the issue, I'll let it run a bit more and then upgrade 
to full fuzzing.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 20:15                                                                       ` Vince Weaver
@ 2014-05-02 20:45                                                                         ` Thomas Gleixner
  2014-05-03  2:32                                                                           ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2014-05-02 20:45 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Vince Weaver wrote:

> On Fri, 2 May 2014, Thomas Gleixner wrote:
> 
> > Hmm, and where comes the WARN_ON in _free_event() from? That's not in
> > Peters last patch.
> 
> ahh, you're right :(  My fault.  I gave the new patch and the previous 
> patch similar names and applied the wrong one.
> 
> OK the proper patch has been running the quick reproducer for a bit 
> without triggering the issue, I'll let it run a bit more and then upgrade 
> to full fuzzing.

If you do that, please add the patch below.

Thanks,

	tglx


Index: linux-2.6/kernel/events/core.c
===================================================================
--- linux-2.6.orig/kernel/events/core.c
+++ linux-2.6/kernel/events/core.c
@@ -7378,7 +7378,7 @@ __perf_event_exit_task(struct perf_event
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	perf_remove_from_context(child_event, !!child_event->parent);
+	perf_remove_from_context(child_event, true);
 
 	/*
 	 * It can happen that the parent exits first, and has events

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-02 20:45                                                                         ` Thomas Gleixner
@ 2014-05-03  2:32                                                                           ` Vince Weaver
  2014-05-03  3:02                                                                             ` Vince Weaver
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-03  2:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vince Weaver, Peter Zijlstra, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, 2 May 2014, Thomas Gleixner wrote:

> > OK the proper patch has been running the quick reproducer for a bit 
> > without triggering the issue, I'll let it run a bit more and then upgrade 
> > to full fuzzing.
> 
> If you do that, please add the patch below.

I've been fuzzing without your additional patch for 6 hours and all looks
(almost) good.  I can add in your patch and let it fuzz overnight.

I say almost because the following gets triggered, but I think it's an 
unrelated issue.

Vince

[17190.202941] ------------[ cut here ]------------
[17190.207906] WARNING: CPU: 2 PID: 4743 at 
arch/x86/kernel/cpu/perf_event_intel.c:1373 intel_pmu_handle_irq+0x2a4/0x3c0()
[17190.219460] perfevents: irq loop stuck!
[17190.223579] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 lrw snd_seq snd_timer snd_seq_device gf128mul snd i915 glue_helper evdev soundcore drm_kms_helper mei_me iTCO_wdt iTCO_vendor_support lpc_ich battery drm ppdev psmouse serio_raw ablk_helper cryptd wmi parport_pc mei parport tpm_tis i2c_algo_bit button processor video tpm i2c_i801 i2c_core mfd_core pcspkr sd_mod sr_mod crc_t10dif cdrom crct10dif_common ehci_pci ahci xhci_hcd ehci_hcd libahci e1000e libata ptp crc32c_intel usbcore scsi_mod pps_core usb_common thermal fan thermal_sys
[17190.298419] CPU: 2 PID: 4743 Comm: perf_fuzzer Not tainted 3.15.0-rc1+ #102
[17190.305926] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[17190.313906]  0000000000000009 ffff88011ea86cb0 ffffffff81649c80 ffff88011ea86cf8
[17190.322034]  ffff88011ea86ce8 ffffffff810646ad 0000000000000064 ffff88011ea8cbe0
[17190.330134]  ffff8800cf7a7800 0000000000000040 ffff88011ea8cde0 ffff88011ea86d48
[17190.338122] Call Trace:
[17190.340775]  <NMI>  [<ffffffff81649c80>] dump_stack+0x45/0x56
[17190.347023]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[17190.353472]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
[17190.359677]  [<ffffffff8102ef94>] intel_pmu_handle_irq+0x2a4/0x3c0
[17190.366315]  [<ffffffff8105034d>] ? native_write_msr_safe+0xd/0x10
[17190.372954]  [<ffffffff8165378b>] perf_event_nmi_handler+0x2b/0x50
[17190.379629]  [<ffffffff81652f58>] nmi_handle.isra.5+0xa8/0x150
[17190.385879]  [<ffffffff81652eb5>] ? nmi_handle.isra.5+0x5/0x150
[17190.392287]  [<ffffffff816530d8>] do_nmi+0xd8/0x340
[17190.397572]  [<ffffffff81652581>] end_repeat_nmi+0x1e/0x2e
[17190.403472]  [<ffffffff8105034a>] ? native_write_msr_safe+0xa/0x10
[17190.410098]  [<ffffffff8105034a>] ? native_write_msr_safe+0xa/0x10
[17190.416765]  [<ffffffff8105034a>] ? native_write_msr_safe+0xa/0x10
[17190.423386]  <<EOE>>  [<ffffffff8102eb7d>] intel_pmu_enable_event+0x21d/0x240
[17190.431048]  [<ffffffff81027baa>] x86_pmu_start+0x7a/0x100
[17190.436992]  [<ffffffff810283a5>] x86_pmu_enable+0x295/0x310
[17190.443104]  [<ffffffff8113528f>] perf_pmu_enable+0x2f/0x40
[17190.449087]  [<ffffffff811369a8>] perf_event_context_sched_in+0x88/0xd0
[17190.456165]  [<ffffffff8113713d>] __perf_event_task_sched_in+0x1dd/0x1f0
[17190.463412]  [<ffffffff81090ca8>] finish_task_switch+0xd8/0x120
[17190.469750]  [<ffffffff8164ca90>] __schedule+0x2c0/0x740
[17190.475443]  [<ffffffff8164cf39>] schedule+0x29/0x70
[17190.480772]  [<ffffffff8164c74c>] schedule_hrtimeout_range_clock+0x13c/0x180
[17190.488331]  [<ffffffff8108b1c0>] ? hrtimer_get_res+0x50/0x50
[17190.494491]  [<ffffffff8164c6c9>] ? schedule_hrtimeout_range_clock+0xb9/0x180
[17190.502135]  [<ffffffff8164c7a3>] schedule_hrtimeout_range+0x13/0x20
[17190.508983]  [<ffffffff811c94c9>] poll_schedule_timeout+0x49/0x70
[17190.515535]  [<ffffffff811cab22>] do_sys_poll+0x422/0x540
[17190.521354]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.528737]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.536129]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.543552]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.550915]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.558290]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.565698]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.573075]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.580488]  [<ffffffff811c9650>] ? poll_select_copy_remaining+0x130/0x130
[17190.589071]  [<ffffffff811cad15>] SyS_poll+0x65/0x100
[17190.595690]  [<ffffffff8165a96d>] system_call_fastpath+0x1a/0x1f
[17190.603315] ---[ end trace d44f7960e96a18da ]---
[17190.609412] 
[17190.612182] CPU#2: ctrl:       0000000000000000
[17190.618136] CPU#2: status:     0000000000000000
[17190.624190] CPU#2: overflow:   0000000000000000
[17190.630144] CPU#2: fixed:      00000000000000ba
[17190.636123] CPU#2: pebs:       0000000000000000
[17190.642042] CPU#2: active:     0000000300000001
[17190.648000] CPU#2:   gen-PMC0 ctrl:  00000000004000c4
[17190.654531] CPU#2:   gen-PMC0 count: 0000000000000001
[17190.661059] CPU#2:   gen-PMC0 left:  0000ffffffffffff
[17190.667576] CPU#2:   gen-PMC1 ctrl:  0000000000120280
[17190.674101] CPU#2:   gen-PMC1 count: 0000000000005439
[17190.680623] CPU#2:   gen-PMC1 left:  0000ffffffffaf43
[17190.687127] CPU#2:   gen-PMC2 ctrl:  0000000000114f2e
[17190.693589] CPU#2:   gen-PMC2 count: 0000000000000001
[17190.700039] CPU#2:   gen-PMC2 left:  0000ffffffffffff
[17190.706455] CPU#2:   gen-PMC3 ctrl:  00000000001300c0
[17190.712846] CPU#2:   gen-PMC3 count: 0000000000000001
[17190.719135] CPU#2:   gen-PMC3 left:  0000ffffffffffff
[17190.725357] CPU#2: fixed-PMC0 count: 0000fffffffffffe
[17190.731529] CPU#2: fixed-PMC1 count: 0000ffff192febe2
[17190.737687] CPU#2: fixed-PMC2 count: 0000000000000001
[17190.743840] perf_event_intel: clearing PMU state on CPU#2
...
[21886.270130] perf_event_intel: clearing PMU state on CPU#2


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-03  2:32                                                                           ` Vince Weaver
@ 2014-05-03  3:02                                                                             ` Vince Weaver
  2014-05-03  7:33                                                                               ` Peter Zijlstra
  2014-05-05  9:31                                                                               ` Peter Zijlstra
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-03  3:02 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Steven Rostedt

On Fri, 2 May 2014, Vince Weaver wrote:

> I've been fuzzing without your additional patch for 6 hours and all looks
> (almost) good.  I can add in your patch and let it fuzz overnight.

and I applied the additional patch, installed the kernel, hit reboot, and 
the following happened (this was caused by rebooting while fuzzing
was ongoing) :(

I'm remote from the system too so the poor machine is going to be sitting 
there oopsing away until Monday.

The system is going down for reboot NOW!
INIT: Switching to runlevel: 6
INIT: Sending p
[24444.795403] ------------[ cut here ]------------
[24444.802143] WARNING: CPU: 1 PID: 23062 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[24444.812613] list_del corruption. prev->next should be ffff8800c9028010, but was 6b6b6b6b6b6b6b6b
[24444.908976] CPU: 1 PID: 23062 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #102
[24444.919934] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[24444.930206]  0000000000000009 ffff8800c8991ad8 ffffffff81649c80 ffff8800c8991b20
[24444.940667]  ffff8800c8991b10 ffffffff810646ad ffff8800c9028000 ffff8801181bf000
[24444.951155]  ffff8800c9028010 ffff8800c9028000 0000000000000001 ffff8800c8991b70
[24444.961677] Call Trace:
[24444.967131]  [<ffffffff81649c80>] dump_stack+0x45/0x56
[24444.975333]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[24444.984433]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
[24444.993286]  [<ffffffff813c9fc1>] __list_del_entry+0xa1/0xd0
[24445.002082]  [<ffffffff81131ec4>] list_del_event+0xe4/0xf0
[24445.010729]  [<ffffffff811326c0>] perf_remove_from_context+0xb0/0x120
[24445.020315]  [<ffffffff81133d8f>] perf_event_release_kernel+0x3f/0x80
[24445.029918]  [<ffffffff81133ea3>] put_event+0xd3/0x100
[24445.038205]  [<ffffffff81133e00>] ? put_event+0x30/0x100
[24445.046638]  [<ffffffff81133ee5>] perf_release+0x15/0x20
[24445.055082]  [<ffffffff811b69dc>] __fput+0xdc/0x1e0
[24445.063108]  [<ffffffff811b6b2e>] ____fput+0xe/0x10
[24445.071114]  [<ffffffff81085154>] task_work_run+0xc4/0xe0
[24445.079679]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[24445.087922]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[24445.097582]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[24445.106200]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[24445.115733]  [<ffffffff81012438>] do_signal+0x48/0x990
[24445.124132]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[24445.133520]  [<ffffffff81651417>] ? _raw_spin_unlock_irq+0x27/0x40
[24445.143007]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[24445.152398]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120
[24445.161807]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[24445.170791]  [<ffffffff81651fbc>] retint_signal+0x48/0x8c
[24445.179516] ---[ end trace d44f7960e96a18db ]---
[24445.627788] ------------[ cut here ]------------
[24445.635804] WARNING: CPU: 2 PID: 23062 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[24445.646825] list_del corruption. prev->next should be ffff8800ce89a810, but was 6b6b6b6b6b6b6b6b
[info] Will now restart.
[24454.007929] general protection fault: 0000 [#1] SMP 
[24454.016867] Dumping ftrace buffer:
[24454.023308]    (ftrace buffer empty)
[24454.117735] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W     3.15.0-rc1+ #102
[24454.127563] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[24454.137169] task: ffffffff81c184c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[24454.146980] RIP: 0010:[<ffffffff811356d2>]  [<ffffffff811356d2>] __perf_remove_from_context+0x22/0xd0
[24454.158482] RSP: 0018:ffff88011ea03f18  EFLAGS: 00010087
[24454.165746] RAX: 01b8550000441f0f RBX: ffffffff81019590 RCX: 000000000000000a
[24454.174917] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880118f0b800
[24454.184018] RBP: ffff88011ea03f40 R08: 0000000000000000 R09: 0000000000000001
[24454.193001] R10: 0000000000000000 R11: 0000000225c17d03 R12: ffff88011ea18310
[24454.202013] R13: ffff88011ea18310 R14: 0000000000000005 R15: ffff880118f0b800
[24454.211207] FS:  0000000000000000(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
[24454.221242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24454.228734] CR2: 00007f449fcb0d20 CR3: 0000000001c11000 CR4: 00000000001407f0
[24454.237680] DR0: 0000000000000000 DR1: 0000000002106000 DR2: 0000000000000000
[24454.246599] DR3: 0000000002106000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[24454.255588] Stack:
[24454.258976]  ffff880118f0b800 ffff88011ea18310 ffff88011ea18408 0000000000000005
[24454.268286]  ffffffff81c99ab0 ffff88011ea03f78 ffffffff81135818 ffffffff811357b5
[24454.277423]  ffff880118f0b800 ffff880036de1c98 0000000000000000 0000163b0f931a27
[24454.286642] Call Trace:
[24454.290378]  <IRQ> 
[24454.292457]  [<ffffffff81135818>] __perf_event_exit_context+0x98/0xf0
[24454.301858]  [<ffffffff811357b5>] ? __perf_event_exit_context+0x35/0xf0
[24454.310098]  [<ffffffff810de20d>] generic_smp_call_function_single_interrupt+0x5d/0x100
[24454.319785]  [<ffffffff81042197>] smp_call_function_single_interrupt+0x27/0x40
[24454.328627]  [<ffffffff8165bb9d>] call_function_single_interrupt+0x6d/0x80
[24454.337078]  <EOI> 
[24454.339156]  [<ffffffff814e1b52>] ? cpuidle_enter_state+0x52/0xc0
[24454.347970]  [<ffffffff814e1b48>] ? cpuidle_enter_state+0x48/0xc0
[24454.355492]  [<ffffffff814e1bf7>] cpuidle_enter+0x17/0x20
[24454.362218]  [<ffffffff810aa270>] cpu_startup_entry+0x2c0/0x3d0
[24454.369450]  [<ffffffff81639ba6>] rest_init+0xb6/0xc0
[24454.375753]  [<ffffffff81639af5>] ? rest_init+0x5/0xc0
[24454.382110]  [<ffffffff81d05f75>] start_kernel+0x43d/0x448
[24454.388873]  [<ffffffff81d05941>] ? repair_env_string+0x5c/0x5c
[24454.396094]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
[24454.403605]  [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c
[24454.411393]  [<ffffffff81d05733>] x86_64_start_kernel+0x143/0x152
[24454.418825] Code: 70 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55 41 54 53 4c 8b 2f 49 8b 9d d8 01 00 00 48 8b 03 <4c> 8b 60 38 65 4c 03 24 25 e8 de 00 00 4c 8d 73 10 4c 89 f7 e8 
[24454.441219] RIP  [<ffffffff811356d2>] __perf_remove_from_context+0x22/0xd0
[24454.449564]  RSP <ffff88011ea03f18>
[24454.454447] bad: scheduling from the idle thread!
...
etc, forever

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-03  3:02                                                                             ` Vince Weaver
@ 2014-05-03  7:33                                                                               ` Peter Zijlstra
  2014-05-05  9:31                                                                               ` Peter Zijlstra
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-03  7:33 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, May 02, 2014 at 11:02:25PM -0400, Vince Weaver wrote:
> On Fri, 2 May 2014, Vince Weaver wrote:
> 
> > I've been fuzzing without your additional patch for 6 hours and all looks
> > (almost) good.  I can add in your patch and let it fuzz overnight.
> 
> and I applied the additional patch, installed the kernel, hit reboot, and 
> the following happened (this was caused by rebooting while fuzzing
> was ongoing) :(
> 
> I'm remote from the system too so the poor machine is going to be sitting 
> there oopsing away until Monday.

Yeah, my system did the same when I tried to shut it down last night. I
have remote power though (or else I would've been forced to like walk
downstairs) :-)

I suppose that means I'll have something to do come Monday morning.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-03  3:02                                                                             ` Vince Weaver
  2014-05-03  7:33                                                                               ` Peter Zijlstra
@ 2014-05-05  9:31                                                                               ` Peter Zijlstra
  2014-05-05 16:00                                                                                 ` Vince Weaver
  2014-05-08 10:40                                                                                 ` [tip:perf/core] perf: Fix race in removing an event tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-05  9:31 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Fri, May 02, 2014 at 11:02:25PM -0400, Vince Weaver wrote:
> On Fri, 2 May 2014, Vince Weaver wrote:
> 
> > I've been fuzzing without your additional patch for 6 hours and all looks
> > (almost) good.  I can add in your patch and let it fuzz overnight.
> 
> and I applied the additional patch, installed the kernel, hit reboot, and 
> the following happened (this was caused by rebooting while fuzzing
> was ongoing) :(

Does this one work better? Making sure all __perf_remove_from_context()
callers pass the right structure seems to improve things no end. My
machine is now happy to reboot again.

---
Subject: perf: Fix race in removing an event
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri May 2 16:56:01 CEST 2014

When removing a (sibling) event we do:

	raw_spin_lock_irq(&ctx->lock);
	perf_group_detach(event);
	raw_spin_unlock_irq(&ctx->lock);

	<hole>

	perf_remove_from_context(event);
		raw_spin_lock_irq(&ctx->lock);
		...
		raw_spin_unlock_irq(&ctx->lock);

Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.

So, if during <hole> we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.

The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.

Hereafter we can proceed to free the event; while still programmed!

Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.

The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/core.c |   47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1444,6 +1444,11 @@ group_sched_out(struct perf_event *group
 		cpuctx->exclusive = 0;
 }
 
+struct remove_event {
+	struct perf_event *event;
+	bool detach_group;
+};
+
 /*
  * Cross CPU call to remove a performance event
  *
@@ -1452,12 +1457,15 @@ group_sched_out(struct perf_event *group
  */
 static int __perf_remove_from_context(void *info)
 {
-	struct perf_event *event = info;
+	struct remove_event *re = info;
+	struct perf_event *event = re->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	raw_spin_lock(&ctx->lock);
 	event_sched_out(event, cpuctx, ctx);
+	if (re->detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
 		ctx->is_active = 0;
@@ -1482,10 +1490,14 @@ static int __perf_remove_from_context(vo
  * When called from perf_event_exit_task, it's OK because the
  * context has been detached from its task.
  */
-static void perf_remove_from_context(struct perf_event *event)
+static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
+	struct remove_event re = {
+		.event = event,
+		.detach_group = detach_group,
+	};
 
 	lockdep_assert_held(&ctx->mutex);
 
@@ -1494,12 +1506,12 @@ static void perf_remove_from_context(str
 		 * Per cpu events are removed via an smp call and
 		 * the removal is always successful.
 		 */
-		cpu_function_call(event->cpu, __perf_remove_from_context, event);
+		cpu_function_call(event->cpu, __perf_remove_from_context, &re);
 		return;
 	}
 
 retry:
-	if (!task_function_call(task, __perf_remove_from_context, event))
+	if (!task_function_call(task, __perf_remove_from_context, &re))
 		return;
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -1516,6 +1528,8 @@ static void perf_remove_from_context(str
 	 * Since the task isn't running, its safe to remove the event, us
 	 * holding the ctx->lock ensures the task won't get scheduled in.
 	 */
+	if (detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
 }
@@ -3285,10 +3299,7 @@ int perf_event_release_kernel(struct per
 	 *     to trigger the AB-BA case.
 	 */
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
-	raw_spin_lock_irq(&ctx->lock);
-	perf_group_detach(event);
-	raw_spin_unlock_irq(&ctx->lock);
-	perf_remove_from_context(event);
+	perf_remove_from_context(event, true);
 	mutex_unlock(&ctx->mutex);
 
 	free_event(event);
@@ -7180,7 +7191,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		struct perf_event_context *gctx = group_leader->ctx;
 
 		mutex_lock(&gctx->mutex);
-		perf_remove_from_context(group_leader);
+		perf_remove_from_context(group_leader, false);
 
 		/*
 		 * Removing from the context ends up with disabled
@@ -7190,7 +7201,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		perf_event__state_init(group_leader);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
 				    group_entry) {
-			perf_remove_from_context(sibling);
+			perf_remove_from_context(sibling, false);
 			perf_event__state_init(sibling);
 			put_ctx(gctx);
 		}
@@ -7320,7 +7331,7 @@ void perf_pmu_migrate_context(struct pmu
 	mutex_lock(&src_ctx->mutex);
 	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
 				 event_entry) {
-		perf_remove_from_context(event);
+		perf_remove_from_context(event, false);
 		unaccount_event_cpu(event, src_cpu);
 		put_ctx(src_ctx);
 		list_add(&event->migrate_entry, &events);
@@ -7382,13 +7393,7 @@ __perf_event_exit_task(struct perf_event
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	if (child_event->parent) {
-		raw_spin_lock_irq(&child_ctx->lock);
-		perf_group_detach(child_event);
-		raw_spin_unlock_irq(&child_ctx->lock);
-	}
-
-	perf_remove_from_context(child_event);
+	perf_remove_from_context(child_event, !!child_event->parent);
 
 	/*
 	 * It can happen that the parent exits first, and has events
@@ -7872,14 +7877,14 @@ static void perf_pmu_rotate_stop(struct
 
 static void __perf_event_exit_context(void *__info)
 {
+	struct remove_event re = { .detach_group = false };
 	struct perf_event_context *ctx = __info;
-	struct perf_event *event;
 
 	perf_pmu_rotate_stop(ctx->pmu);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(event, &ctx->event_list, event_entry)
-		__perf_remove_from_context(event);
+	list_for_each_entry_rcu(re.event, &ctx->event_list, event_entry)
+		__perf_remove_from_context(&re);
 	rcu_read_unlock();
 }
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05  9:31                                                                               ` Peter Zijlstra
@ 2014-05-05 16:00                                                                                 ` Vince Weaver
  2014-05-05 17:10                                                                                   ` Vince Weaver
  2014-05-05 17:29                                                                                   ` [perf] more perf_fuzzer memory corruption Ingo Molnar
  2014-05-08 10:40                                                                                 ` [tip:perf/core] perf: Fix race in removing an event tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-05 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Mon, 5 May 2014, Peter Zijlstra wrote:

> Does this one work better? Making sure all __perf_remove_from_context()
> callers pass the right structure seems to improve things no end. My
> machine is now happy to reboot again.

Yes, I've been fuzzing this for a few hours on both my haswell and core2 
test systems and it's doing great, also survived a reboot cycle.

Tested-by: Vince Weaver <vincent.weaver@maine.edu>

(Although often things like to crash the instant my tested-by e-mails 
clear the lkml list.)

I also want to say thanks for all the work everyone has done in getting 
this analyzed and fixed.

Vince



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 16:00                                                                                 ` Vince Weaver
@ 2014-05-05 17:10                                                                                   ` Vince Weaver
  2014-05-05 17:14                                                                                     ` Peter Zijlstra
  2014-05-05 17:29                                                                                   ` [perf] more perf_fuzzer memory corruption Ingo Molnar
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-05 17:10 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Steven Rostedt

On Mon, 5 May 2014, Vince Weaver wrote:

> (Although often things like to crash the instant my tested-by e-mails 
> clear the lkml list.)

This did turn up on the core2 machine.  I had been seeing this problem 
earlier but was hoping it was part of the memory corruption issue:

[ 4918.921921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 4918.925692] IP: [<ffffffff81539fa6>] mutex_lock+0x19/0x37
[ 4918.925692] PGD c5e62067 PUD cae00067 PMD 0 
[ 4918.925692] Oops: 0002 [#1] SMP 
[ 4918.925692] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative f71882fg mcs7830 usbnet ohci_pci evdev ohci_hcd acpi_cpufreq coretemp psmouse serio_raw pcspkr video wmi i2c_nforce2 button processor thermal_sys sg ehci_pci ehci_hcd sd_mod usbcore usb_common
[ 4918.925692] CPU: 0 PID: 9777 Comm: perf_fuzzer Not tainted 3.15.0-rc4+ #42
[ 4918.925692] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[ 4918.925692] task: ffff8800c628e800 ti: ffff8800c6610000 task.ti: ffff8800c6610000
[ 4918.925692] RIP: 0010:[<ffffffff81539fa6>]  [<ffffffff81539fa6>] mutex_lock+0x19/0x37
[ 4918.925692] RSP: 0018:ffff8800c6611d30  EFLAGS: 00010286
[ 4918.925692] RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffff8800c9b89080
[ 4918.925692] RDX: 0000000080000000 RSI: 0000000000000286 RDI: 0000000000000010
[ 4918.925692] RBP: ffff8800c6611d50 R08: 0000000000000001 R09: ffff8800c5b41f00
[ 4918.925692] R10: ffff8800cb217400 R11: ffff8800caf03400 R12: ffff8800c628e800
[ 4918.925692] R13: ffff8800c628e800 R14: 0000000000000001 R15: 0000000000000038
[ 4918.925692] FS:  00007f8700f46700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[ 4918.925692] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4918.925692] CR2: 0000000000000010 CR3: 00000000c974d000 CR4: 00000000000407f0
[ 4918.925692] DR0: 0000000000a80000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4918.925692] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 4918.925692] Stack:
[ 4918.925692]  ffff880119c62010 ffff8800c628e800 ffff8800c628e800 0000000000000000
[ 4918.925692]  ffff8800c6611dc0 ffffffff810d10b7 ffff8800c5b41f48 0000000000000010
[ 4918.925692]  ffff880119c62010 0000000000000286 0000000000000286 0000000119c62010
[ 4918.925692] Call Trace:
[ 4918.925692]  [<ffffffff810d10b7>] perf_event_init_context+0x7c/0x1c8
[ 4918.925692]  [<ffffffff810d126c>] perf_event_init_task+0x69/0x6d
[ 4918.925692]  [<ffffffff8103e1ff>] copy_process+0x5cc/0x163b
[ 4918.925692]  [<ffffffff8112a044>] ? __d_free+0x53/0x58
[ 4918.925692]  [<ffffffff8112a6ab>] ? dentry_kill+0x1b8/0x1d5
[ 4918.925692]  [<ffffffff81130d9d>] ? mntput+0x2a/0x2c
[ 4918.925692]  [<ffffffff811193af>] ? __fput+0x17e/0x18d
[ 4918.925692]  [<ffffffff8103f536>] do_fork+0x74/0x1dc
[ 4918.925692]  [<ffffffff8111943a>] ? ____fput+0xe/0x10
[ 4918.925692]  [<ffffffff81058960>] ? task_work_run+0x8d/0xa0
[ 4918.925692]  [<ffffffff8103f6b4>] SyS_clone+0x16/0x18
[ 4918.925692]  [<ffffffff81542c89>] stub_clone+0x69/0x90
[ 4918.925692]  [<ffffffff815429e6>] ? system_call_fastpath+0x1a/0x1f
[ 4918.925692] Code: 48 8b 04 25 00 b9 00 00 b2 01 48 89 47 18 89 d0 c9 c3 55 48 89 e5 53 48 83 ec 18 66 66 66 66 90 48 89 fb e8 d7 f6 ff ff 48 89 df <f0> ff 0f 79 05 e8 7a 05 00 00 65 48 8b 04 25 00 b9 00 00 48 89 
[ 4918.925692] RIP  [<ffffffff81539fa6>] mutex_lock+0x19/0x37
[ 4918.925692]  RSP <ffff8800c6611d30>
[ 4918.925692] CR2: 0000000000000010
[ 4919.771050] ---[ end trace 9a98d4ca40642975 ]---


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 17:10                                                                                   ` Vince Weaver
@ 2014-05-05 17:14                                                                                     ` Peter Zijlstra
  2014-05-05 18:47                                                                                       ` Vince Weaver
  2014-05-08 10:40                                                                                       ` [tip:perf/core] perf: Fix perf_event_init_context() tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-05 17:14 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Mon, May 05, 2014 at 01:10:55PM -0400, Vince Weaver wrote:
> On Mon, 5 May 2014, Vince Weaver wrote:
> 
> > (Although often things like to crash the instant my tested-by e-mails 
> > clear the lkml list.)
> 
> This did turn up on the core2 machine.  I had been seeing this problem 
> earlier but was hoping it was part of the memory corruption issue:
> 
> [ 4918.921921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [ 4918.925692] IP: [<ffffffff81539fa6>] mutex_lock+0x19/0x37

> [ 4918.925692] Call Trace:
> [ 4918.925692]  [<ffffffff810d10b7>] perf_event_init_context+0x7c/0x1c8
> [ 4918.925692]  [<ffffffff810d126c>] perf_event_init_task+0x69/0x6d
> [ 4918.925692]  [<ffffffff8103e1ff>] copy_process+0x5cc/0x163b
> [ 4918.925692]  [<ffffffff8103f536>] do_fork+0x74/0x1dc
> [ 4918.925692]  [<ffffffff8103f6b4>] SyS_clone+0x16/0x18
> [ 4918.925692]  [<ffffffff81542c89>] stub_clone+0x69/0x90


Cute.. does the below cure?


---
Subject: perf: Fix perf_event_init_context()
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon May  5 19:12:20 CEST 2014

perf_pin_task_context() can return NULL but perf_event_init_context()
assumes it will not, correct this.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/core.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7745,6 +7745,8 @@ int perf_event_init_context(struct task_
 	 * swapped under us.
 	 */
 	parent_ctx = perf_pin_task_context(parent, ctxn);
+	if (!parent_ctx)
+		return 0;
 
 	/*
 	 * No need to check if parent_ctx != NULL here; since we saw


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 16:00                                                                                 ` Vince Weaver
  2014-05-05 17:10                                                                                   ` Vince Weaver
@ 2014-05-05 17:29                                                                                   ` Ingo Molnar
  2014-05-06  4:51                                                                                     ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Ingo Molnar @ 2014-05-05 17:29 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Steven Rostedt,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Mon, 5 May 2014, Peter Zijlstra wrote:
> 
> > Does this one work better? Making sure all __perf_remove_from_context()
> > callers pass the right structure seems to improve things no end. My
> > machine is now happy to reboot again.
> 
> Yes, I've been fuzzing this for a few hours on both my haswell and core2 
> test systems and it's doing great, also survived a reboot cycle.
> 
> Tested-by: Vince Weaver <vincent.weaver@maine.edu>

I wish there was a stronger tag that would credit your efforts! ...

Kudos!

> (Although often things like to crash the instant my tested-by 
> e-mails clear the lkml list.)
> 
> I also want to say thanks for all the work everyone has done in 
> getting this analyzed and fixed.

I'm also thinking about waiting a bit before applying anything even 
borderline intrusive to the perf core, to make sure there's enough 
fuzz time to declare stable state (at least as far into the ABI as the 
fuzzing is able to reach). Future bisection efforts could use that 
kind of known-stable release.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 17:14                                                                                     ` Peter Zijlstra
@ 2014-05-05 18:47                                                                                       ` Vince Weaver
  2014-05-05 19:36                                                                                         ` Peter Zijlstra
  2014-05-06  1:06                                                                                         ` Vince Weaver
  2014-05-08 10:40                                                                                       ` [tip:perf/core] perf: Fix perf_event_init_context() tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-05 18:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Mon, 5 May 2014, Peter Zijlstra wrote:

> Cute.. does the below cure?
> 
> 
> ---
> Subject: perf: Fix perf_event_init_context()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon May  5 19:12:20 CEST 2014
> 
> perf_pin_task_context() can return NULL but perf_event_init_context()
> assumes it will not, correct this.

It makes the oops go away, but it does make the fuzzer become unkillable 
while using 100% CPU.

It looks like it is stuck repeating this forever:
	perf_fuzzer-5256  [000]   275.943049: kmalloc:              (T.1262+0xe) call_site=ffffffff810d022f ptr=0xffff8800cb028400 bytes_req=216 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
	perf_fuzzer-5256  [000]   275.943057: function:             perf_lock_task_context
	perf_fuzzer-5256  [000]   275.943057: function:             alloc_perf_context
and memory is slowly leaking away.


Meanwhile the haswell and AMD machines have been fuzzing away without 
issue, I don't know why the core2 machine is always the trouble maker.

Vince


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 18:47                                                                                       ` Vince Weaver
@ 2014-05-05 19:36                                                                                         ` Peter Zijlstra
  2014-05-05 19:51                                                                                           ` Vince Weaver
  2014-05-06  1:06                                                                                         ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-05 19:36 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On Mon, May 05, 2014 at 02:47:32PM -0400, Vince Weaver wrote:
> On Mon, 5 May 2014, Peter Zijlstra wrote:
> 
> > Cute.. does the below cure?
> > 
> > 
> > ---
> > Subject: perf: Fix perf_event_init_context()
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Mon May  5 19:12:20 CEST 2014
> > 
> > perf_pin_task_context() can return NULL but perf_event_init_context()
> > assumes it will not, correct this.
> 
> It makes the oops go away, but it does make the fuzzer become unkillable 
> while using 100% CPU.

Ooh, I know that one. Its what my WSM-EP favours. I'll try and have a
look.

> It looks like it is stuck repeating this forever:
> 	perf_fuzzer-5256  [000]   275.943049: kmalloc:              (T.1262+0xe) call_site=ffffffff810d022f ptr=0xffff8800cb028400 bytes_req=216 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
> 	perf_fuzzer-5256  [000]   275.943057: function:             perf_lock_task_context
> 	perf_fuzzer-5256  [000]   275.943057: function:             alloc_perf_context
> and memory is slowly leaking away.

Oh, usually when my WSM gets funny like this and I enable the tracer it
just stops being a computer and starts being a brick.

Might be a nice clue though.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 19:36                                                                                         ` Peter Zijlstra
@ 2014-05-05 19:51                                                                                           ` Vince Weaver
  0 siblings, 0 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-05 19:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

On Mon, 5 May 2014, Peter Zijlstra wrote:

> > It looks like it is stuck repeating this forever:
> > 	perf_fuzzer-5256  [000]   275.943049: kmalloc:              (T.1262+0xe) call_site=ffffffff810d022f ptr=0xffff8800cb028400 bytes_req=216 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
> > 	perf_fuzzer-5256  [000]   275.943057: function:             perf_lock_task_context
> > 	perf_fuzzer-5256  [000]   275.943057: function:             alloc_perf_context
> > and memory is slowly leaking away.
> 
> Oh, usually when my WSM gets funny like this and I enable the tracer it
> just stops being a computer and starts being a brick.

I'd give you more details from the tracer but that's all lost in the 
middle of a dreaded
	CPU:1 [87702 EVENTS DROPPED]
incident.

Though I can try again to see if I can get a better trace.

Also possibly related, the following was 
generated at least one of the times things got wedged (according to my 
notes this same warning has also been triggered previously by Dave Jones 
and trinity but I don't know if it's the same problem or not):

[  804.036004] WARNING: CPU: 1 PID: 26033 at kernel/events/core.c:2398 task_ctx_sched_out+0x49/0x6e()
[  804.036004] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative f71882fg mcs7830 usbnet evdev video wmi acpi_cpufreq ohci_pci pcspkr psmouse button processor thermal_sys coretemp serio_raw ohci_hcd i2c_nforce2 sg ehci_pci ehci_hcd sd_mod usbcore usb_common
[  804.036004] CPU: 1 PID: 26033 Comm: perf_fuzzer Tainted: G      D       3.15.0-rc4+ #44
[  804.036004] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[  804.036004]  000000000000095e ffff880039ac3c08 ffffffff81539016 000000000000095e
[  804.036004]  0000000000000000 ffff880039ac3c48 ffffffff8103fcb9 0000000000000000
[  804.036004]  ffffffff810ca939 ffff88011fc95b08 ffff8800cbb6a010 0000000000000000
[  804.036004] Call Trace:
[  804.036004]  [<ffffffff81539016>] dump_stack+0x49/0x5b
[  804.036004]  [<ffffffff8103fcb9>] warn_slowpath_common+0x81/0x9b
[  804.036004]  [<ffffffff810ca939>] ? task_ctx_sched_out+0x49/0x6e
[  804.036004]  [<ffffffff8103fced>] warn_slowpath_null+0x1a/0x1c
[  804.036004]  [<ffffffff810ca939>] task_ctx_sched_out+0x49/0x6e
[  804.036004]  [<ffffffff810d0111>] perf_event_exit_task+0xee/0x1fe
[  804.036004]  [<ffffffff8105e91c>] ? switch_task_namespaces+0x1d/0x51
[  804.036004]  [<ffffffff81041bb6>] do_exit+0x400/0x942
[  804.036004]  [<ffffffff810f3d50>] ? do_read_fault+0x169/0x263
[  804.036004]  [<ffffffff810d3b51>] ? unlock_page+0x27/0x2c
[  804.036004]  [<ffffffff81042170>] do_group_exit+0x78/0xa0
[  804.036004]  [<ffffffff8104f378>] get_signal_to_deliver+0x46d/0x48a
[  804.036004]  [<ffffffff810bc037>] ? function_trace_call+0xc8/0xf2
[  804.036004]  [<ffffffff81002526>] do_signal+0x46/0x5e8
[  804.036004]  [<ffffffff810b8279>] ? __trace_buffer_unlock_commit+0x3c/0x44
[  804.036004]  [<ffffffff8100c176>] ? ftrace_raw_event_sys_exit+0x4a/0x72
[  804.036004]  [<ffffffff81002af4>] do_notify_resume+0x2c/0x64
[  804.036004]  [<ffffffff8153c235>] retint_signal+0x3d/0x78
[  804.036004] ---[ end trace 6cadd738b5f7eb96 ]---



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 18:47                                                                                       ` Vince Weaver
  2014-05-05 19:36                                                                                         ` Peter Zijlstra
@ 2014-05-06  1:06                                                                                         ` Vince Weaver
  2014-05-06 16:57                                                                                           ` Vince Weaver
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-06  1:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Steven Rostedt

On Mon, 5 May 2014, Vince Weaver wrote:

> Meanwhile the haswell and AMD machines have been fuzzing away without 
> issue, I don't know why the core2 machine is always the trouble maker.

The haswell has been fuzzing 12 hours with only a NMI dazed/confused 
message.

The AMD A10 machine however has a wedged fuzzer process.  I think a 
different problem than the core2 one.  I don't have a serial console on 
that machine yet, I guess time to set one up.

I do think though that the memory corruption bug has been fixed by the 
patch.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-05 17:29                                                                                   ` [perf] more perf_fuzzer memory corruption Ingo Molnar
@ 2014-05-06  4:51                                                                                     ` Vince Weaver
  2014-05-06 17:06                                                                                       ` Vince Weaver
  2014-05-07 19:11                                                                                       ` Ingo Molnar
  0 siblings, 2 replies; 81+ messages in thread
From: Vince Weaver @ 2014-05-06  4:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	Steven Rostedt, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker

On Mon, 5 May 2014, Ingo Molnar wrote:
> 
> I'm also thinking about waiting a bit before applying anything even 
> borderline intrusive to the perf core, to make sure there's enough 
> fuzz time to declare stable state (at least as far into the ABI as the 
> fuzzing is able to reach). Future bisection efforts could use that 
> kind of known-stable release.

That does sound like a good idea.  It is nice finally getting to the state 
where you can fuzz for hours (rather than minutes) without hitting a bug.

Of course that might change.  Development of the fuzzer has more or less
stalled for the past 6 months as I was spending all of my time 
reporting/chasing bugs rather than enhancing the fuzzer.  

In any case if we can get the recent patches applied in time for 3.15 I 
think it will turn out to be a nice release perf-event-stability wise.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-06  1:06                                                                                         ` Vince Weaver
@ 2014-05-06 16:57                                                                                           ` Vince Weaver
  2014-05-07 16:45                                                                                             ` Peter Zijlstra
  0 siblings, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-06 16:57 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Steven Rostedt

On Mon, 5 May 2014, Vince Weaver wrote:

> On Mon, 5 May 2014, Vince Weaver wrote:
> 
> > Meanwhile the haswell and AMD machines have been fuzzing away without 
> > issue, I don't know why the core2 machine is always the trouble maker.
> 
> The haswell has been fuzzing 12 hours with only a NMI dazed/confused 
> message.

So the Haswell seemed to still be going strong after 24-hours, but then I 
killed the fuzzer with control-C and got this.

^C
[87536.479011] ------------[ cut here ]------------
[87536.484553] WARNING: CPU: 1 PID: 11978 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[87536.493994] list_del corruption. prev->next should be ffff8800ce684810, but was 6b6b6b6b6b6b6b6b
[87536.503915] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 aesni_intel snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device tpm_tis ppdev snd aes_x86_64 parport_pc tpm evdev mei_me drm_kms_helper iTCO_wdt drm soundcore lrw gf128mul glue_helper iTCO_vendor_support wmi ablk_helper i2c_algo_bit button battery processor mei psmouse parport pcspkr serio_raw cryptd i2c_i801 video i2c_core lpc_ich mfd_core sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata xhci_hcd ehci_hcd ptp crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
[87536.581372] CPU: 1 PID: 11978 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #104
[87536.590762] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[87536.599435]  0000000000000009 ffff880117b57ad8 ffffffff81649ca0 ffff880117b57b20
[87536.608228]  ffff880117b57b10 ffffffff810646ad ffff8800ce684800 ffff880036a64000
[87536.616970]  ffff8800ce684810 ffff8800ce684800 0000000000000001 ffff880117b57b70
[87536.625688] Call Trace:
[87536.629039]  [<ffffffff81649ca0>] dump_stack+0x45/0x56
[87536.635247]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[87536.642374]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
[87536.649211]  [<ffffffff813c9fe1>] __list_del_entry+0xa1/0xd0
[87536.655953]  [<ffffffff81131ec4>] list_del_event+0xe4/0xf0
[87536.662477]  [<ffffffff811326c0>] perf_remove_from_context+0xb0/0x120
[87536.670005]  [<ffffffff81133d8f>] perf_event_release_kernel+0x3f/0x80
[87536.677530]  [<ffffffff81133ea3>] put_event+0xd3/0x100
[87536.683702]  [<ffffffff81133e00>] ? put_event+0x30/0x100
[87536.690047]  [<ffffffff81133ee5>] perf_release+0x15/0x20
[87536.696292]  [<ffffffff811b69fc>] __fput+0xdc/0x1e0
[87536.702191]  [<ffffffff811b6b4e>] ____fput+0xe/0x10
[87536.708038]  [<ffffffff81085154>] task_work_run+0xc4/0xe0
[87536.714503]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[87536.720546]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[87536.728117]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[87536.734480]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[87536.741814]  [<ffffffff81012438>] do_signal+0x48/0x990
[87536.747877]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[87536.754994]  [<ffffffff81651437>] ? _raw_spin_unlock_irq+0x27/0x40
[87536.762243]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[87536.769465]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120
[87536.776622]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[87536.783323]  [<ffffffff81651fbc>] retint_signal+0x48/0x8c
[87536.789726] ---[ end trace 2b5a3d32e8d767a7 ]---
[87537.231116] ------------[ cut here ]------------
[87537.238622] WARNING: CPU: 1 PID: 5694 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[87537.250022] list_del corruption. prev->next should be ffff88003697d010, but was 6b6b6b6b6b6b6b6b
[87537.262134] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 aesni_intel snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device tpm_tis ppdev snd aes_x86_64 parport_pc tpm evdev mei_me drm_kms_helper iTCO_wdt drm soundcore lrw gf128mul glue_helper iTCO_vendor_support wmi ablk_helper i2c_algo_bit button battery processor mei psmouse parport pcspkr serio_raw cryptd i2c_i801 video i2c_core lpc_ich mfd_core sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata xhci_hcd ehci_hcd ptp crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
[87537.350071] CPU: 1 PID: 5694 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #104
[87537.362120] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[87537.373623]  0000000000000009 ffff88011780bad8 ffffffff81649ca0 ffff88011780bb20
[87537.385332]  ffff88011780bb10 ffffffff810646ad ffff88003697d000 ffff8800cf2bd000
[87537.397049]  ffff88003697d010 ffff88003697d000 0000000000000001 ffff88011780bb70
[87537.408852] Call Trace:
[87537.415220]  [<ffffffff81649ca0>] dump_stack+0x45/0x56
[87537.424550]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[87537.434876]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
[87537.444923]  [<ffffffff813c9fe1>] __list_del_entry+0xa1/0xd0
[87537.454848]  [<ffffffff81131ec4>] list_del_event+0xe4/0xf0
[87537.464638]  [<ffffffff811326c0>] perf_remove_from_context+0xb0/0x120
[87537.475461]  [<ffffffff81133d8f>] perf_event_release_kernel+0x3f/0x80
[87537.486265]  [<ffffffff81133ea3>] put_event+0xd3/0x100
[87537.495776]  [<ffffffff81133e00>] ? put_event+0x30/0x100
[87537.505405]  [<ffffffff81133ee5>] perf_release+0x15/0x20
[87537.515112]  [<ffffffff811b69fc>] __fput+0xdc/0x1e0
[87537.524426]  [<ffffffff811b6b4e>] ____fput+0xe/0x10
[87537.533695]  [<ffffffff81085154>] task_work_run+0xc4/0xe0
[87537.543618]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
[87537.553129]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[87537.564205]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[87537.574142]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[87537.585042]  [<ffffffff81012438>] do_signal+0x48/0x990
[87537.594644]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[87537.605102]  [<ffffffff81651437>] ? _raw_spin_unlock_irq+0x27/0x40
[87537.615521]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120
[87537.625663]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[87537.635166]  [<ffffffff81651fbc>] retint_signal+0x48/0x8c
[87537.644367] ---[ end trace 2b5a3d32e8d767a8 ]---
[87541.568852] ------------[ cut here ]------------
[87541.577199] WARNING: CPU: 1 PID: 5694 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[87541.589363] list_del corruption. prev->next should be ffff8800cdfed810, but was 6b6b6b6b6b6b6b6b
[87541.602267] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 aesni_intel snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device tpm_tis ppdev snd aes_x86_64 parport_pc tpm evdev mei_me drm_kms_helper iTCO_wdt drm soundcore lrw gf128mul glue_helper iTCO_vendor_support wmi ablk_helper i2c_algo_bit button battery processor mei psmouse parport pcspkr serio_raw cryptd i2c_i801 video i2c_core lpc_ich mfd_core sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata xhci_hcd ehci_hcd ptp crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
[87541.692626] CPU: 1 PID: 5694 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #104
[87541.705282] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[87541.717333]  0000000000000009 ffff88011780baf8 ffffffff81649ca0 ffff88011780bb40
[87541.729445]  ffff88011780bb30 ffffffff810646ad ffff8800cdfed800 ffff8800cf2bd000
[87541.741712]  ffff8800cdfed810 ffff8800cdfed800 0000000000000001 ffff88011780bb90
[87541.753980] Call Trace:
[87541.760749]  [<ffffffff81649ca0>] dump_stack+0x45/0x56
[87541.770475]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
[87541.781089]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
[87541.791465]  [<ffffffff813c9fe1>] __list_del_entry+0xa1/0xd0
[87541.801773]  [<ffffffff81131ec4>] list_del_event+0xe4/0xf0
[87541.811853]  [<ffffffff811326c0>] perf_remove_from_context+0xb0/0x120
[87541.822965]  [<ffffffff81133d8f>] perf_event_release_kernel+0x3f/0x80
[87541.834111]  [<ffffffff81133ea3>] put_event+0xd3/0x100
[87541.843821]  [<ffffffff81133e00>] ? put_event+0x30/0x100
[87541.853752]  [<ffffffff81134e91>] __perf_event_exit_task.isra.79+0xd1/0x120
[87541.865386]  [<ffffffff8113b226>] perf_event_exit_task+0x206/0x260
[87541.876240]  [<ffffffff81066d69>] do_exit+0x2d9/0xa50
[87541.885869]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
[87541.896974]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
[87541.907010]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
[87541.917964]  [<ffffffff81012438>] do_signal+0x48/0x990
[87541.927735]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
[87541.938519]  [<ffffffff81651437>] ? _raw_spin_unlock_irq+0x27/0x40
[87541.949341]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120
[87541.960078]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
[87541.970193]  [<ffffffff81651fbc>] retint_signal+0x48/0x8c
[87541.979819] ---[ end trace 2b5a3d32e8d767a9 ]---


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-06  4:51                                                                                     ` Vince Weaver
@ 2014-05-06 17:06                                                                                       ` Vince Weaver
  2014-05-07 19:12                                                                                         ` Ingo Molnar
  2014-05-07 19:11                                                                                       ` Ingo Molnar
  1 sibling, 1 reply; 81+ messages in thread
From: Vince Weaver @ 2014-05-06 17:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	Steven Rostedt, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker

On Tue, 6 May 2014, Vince Weaver wrote:

> In any case if we can get the recent patches applied in time for 3.15 I 
> think it will turn out to be a nice release perf-event-stability wise.

I should also mention I have a list of open perf_fuzzer issues here:
	http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/bugs_found.html

It's not organized well but it's how I keep track of the various issues 
after I find and report them.

Vince

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-06 16:57                                                                                           ` Vince Weaver
@ 2014-05-07 16:45                                                                                             ` Peter Zijlstra
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Zijlstra @ 2014-05-07 16:45 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 4028 bytes --]

On Tue, May 06, 2014 at 12:57:08PM -0400, Vince Weaver wrote:
> On Mon, 5 May 2014, Vince Weaver wrote:
> 
> > On Mon, 5 May 2014, Vince Weaver wrote:
> > 
> > > Meanwhile the haswell and AMD machines have been fuzzing away without 
> > > issue, I don't know why the core2 machine is always the trouble maker.
> > 
> > The haswell has been fuzzing 12 hours with only a NMI dazed/confused 
> > message.
> 
> So the Haswell seemed to still be going strong after 24-hours, but then I 
> killed the fuzzer with control-C and got this.
> 
> ^C
> [87536.479011] ------------[ cut here ]------------
> [87536.484553] WARNING: CPU: 1 PID: 11978 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [87536.493994] list_del corruption. prev->next should be ffff8800ce684810, but was 6b6b6b6b6b6b6b6b
> [87536.503915] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp coretemp kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 aesni_intel snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device tpm_tis ppdev snd aes_x86_64 parport_pc tpm evdev mei_me drm_kms_helper iTCO_wdt drm soundcore lrw gf128mul glue_helper iTCO_vendor_support wmi ablk_helper i2c_algo_bit button battery processor mei psmouse parport pcspkr serio_raw cryptd i2c_i801 video i2c_core lpc_ich mfd_core sd_mod sr_mod crc_t10dif cdrom crct10dif_common ahci libahci ehci_pci e1000e libata xhci_hcd ehci_hcd ptp crc32c_intel usbcore scsi_mod pps_core usb_common fan thermal thermal_sys
> [87536.581372] CPU: 1 PID: 11978 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc1+ #104
> [87536.590762] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [87536.599435]  0000000000000009 ffff880117b57ad8 ffffffff81649ca0 ffff880117b57b20
> [87536.608228]  ffff880117b57b10 ffffffff810646ad ffff8800ce684800 ffff880036a64000
> [87536.616970]  ffff8800ce684810 ffff8800ce684800 0000000000000001 ffff880117b57b70
> [87536.625688] Call Trace:
> [87536.629039]  [<ffffffff81649ca0>] dump_stack+0x45/0x56
> [87536.635247]  [<ffffffff810646ad>] warn_slowpath_common+0x7d/0xa0
> [87536.642374]  [<ffffffff8106471c>] warn_slowpath_fmt+0x4c/0x50
> [87536.649211]  [<ffffffff813c9fe1>] __list_del_entry+0xa1/0xd0
> [87536.655953]  [<ffffffff81131ec4>] list_del_event+0xe4/0xf0
> [87536.662477]  [<ffffffff811326c0>] perf_remove_from_context+0xb0/0x120
> [87536.670005]  [<ffffffff81133d8f>] perf_event_release_kernel+0x3f/0x80
> [87536.677530]  [<ffffffff81133ea3>] put_event+0xd3/0x100
> [87536.683702]  [<ffffffff81133e00>] ? put_event+0x30/0x100
> [87536.690047]  [<ffffffff81133ee5>] perf_release+0x15/0x20
> [87536.696292]  [<ffffffff811b69fc>] __fput+0xdc/0x1e0
> [87536.702191]  [<ffffffff811b6b4e>] ____fput+0xe/0x10
> [87536.708038]  [<ffffffff81085154>] task_work_run+0xc4/0xe0
> [87536.714503]  [<ffffffff81066d5c>] do_exit+0x2cc/0xa50
> [87536.720546]  [<ffffffff81076949>] ? get_signal_to_deliver+0x249/0x650
> [87536.728117]  [<ffffffff8106756c>] do_group_exit+0x4c/0xc0
> [87536.734480]  [<ffffffff81076991>] get_signal_to_deliver+0x291/0x650
> [87536.741814]  [<ffffffff81012438>] do_signal+0x48/0x990
> [87536.747877]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
> [87536.754994]  [<ffffffff81651437>] ? _raw_spin_unlock_irq+0x27/0x40
> [87536.762243]  [<ffffffff81090c4d>] ? finish_task_switch+0x7d/0x120
> [87536.769465]  [<ffffffff81090c0f>] ? finish_task_switch+0x3f/0x120
> [87536.776622]  [<ffffffff81012df0>] do_notify_resume+0x70/0xa0
> [87536.783323]  [<ffffffff81651fbc>] retint_signal+0x48/0x8c
> [87536.789726] ---[ end trace 2b5a3d32e8d767a7 ]---
> [87537.231116] ------------[ cut here ]------------

Of course it did :/ This thing can't ever _just_ work..

My WSM is playing silly buggers and prefers the endless loop (which you
saw on Core2 iirc) when I press ^C.

I'll see if I can make it do something useful.. No immediate ideas
though.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-06  4:51                                                                                     ` Vince Weaver
  2014-05-06 17:06                                                                                       ` Vince Weaver
@ 2014-05-07 19:11                                                                                       ` Ingo Molnar
  1 sibling, 0 replies; 81+ messages in thread
From: Ingo Molnar @ 2014-05-07 19:11 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Steven Rostedt,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Mon, 5 May 2014, Ingo Molnar wrote:
> > 
> > I'm also thinking about waiting a bit before applying anything even 
> > borderline intrusive to the perf core, to make sure there's enough 
> > fuzz time to declare stable state (at least as far into the ABI as the 
> > fuzzing is able to reach). Future bisection efforts could use that 
> > kind of known-stable release.
> 
> That does sound like a good idea.  It is nice finally getting to the 
> state where you can fuzz for hours (rather than minutes) without 
> hitting a bug.
> 
> Of course that might change.  Development of the fuzzer has more or 
> less stalled for the past 6 months as I was spending all of my time 
> reporting/chasing bugs rather than enhancing the fuzzer.
> 
> In any case if we can get the recent patches applied in time for 
> 3.15 I think it will turn out to be a nice release 
> perf-event-stability wise.

Yeah, I'm trying to get them all into v3.15 - and if they prove out 
fine they might be backportable further as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [perf] more perf_fuzzer memory corruption
  2014-05-06 17:06                                                                                       ` Vince Weaver
@ 2014-05-07 19:12                                                                                         ` Ingo Molnar
  0 siblings, 0 replies; 81+ messages in thread
From: Ingo Molnar @ 2014-05-07 19:12 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Steven Rostedt,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Tue, 6 May 2014, Vince Weaver wrote:
> 
> > In any case if we can get the recent patches applied in time for 3.15 I 
> > think it will turn out to be a nice release perf-event-stability wise.
> 
> I should also mention I have a list of open perf_fuzzer issues here:
> 	http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/bugs_found.html
> 
> It's not organized well but it's how I keep track of the various issues 
> after I find and report them.

That looks useful, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:perf/core] perf: Fix race in removing an event
  2014-05-05  9:31                                                                               ` Peter Zijlstra
  2014-05-05 16:00                                                                                 ` Vince Weaver
@ 2014-05-08 10:40                                                                                 ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 81+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-05-08 10:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, acme, vincent.weaver, tglx

Commit-ID:  46ce0fe97a6be7532ce6126bb26ce89fed81528c
Gitweb:     http://git.kernel.org/tip/46ce0fe97a6be7532ce6126bb26ce89fed81528c
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 2 May 2014 16:56:01 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 7 May 2014 11:33:14 +0200

perf: Fix race in removing an event

When removing a (sibling) event we do:

	raw_spin_lock_irq(&ctx->lock);
	perf_group_detach(event);
	raw_spin_unlock_irq(&ctx->lock);

	<hole>

	perf_remove_from_context(event);
		raw_spin_lock_irq(&ctx->lock);
		...
		raw_spin_unlock_irq(&ctx->lock);

Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.

So, if during <hole> we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.

The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.

Hereafter we can proceed to free the event; while still programmed!

Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.

The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.

Most-likely-Fixes: e03a9a55b4e ("perf: Change close() semantics for group events")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a..ea899e2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1443,6 +1443,11 @@ group_sched_out(struct perf_event *group_event,
 		cpuctx->exclusive = 0;
 }
 
+struct remove_event {
+	struct perf_event *event;
+	bool detach_group;
+};
+
 /*
  * Cross CPU call to remove a performance event
  *
@@ -1451,12 +1456,15 @@ group_sched_out(struct perf_event *group_event,
  */
 static int __perf_remove_from_context(void *info)
 {
-	struct perf_event *event = info;
+	struct remove_event *re = info;
+	struct perf_event *event = re->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	raw_spin_lock(&ctx->lock);
 	event_sched_out(event, cpuctx, ctx);
+	if (re->detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
 		ctx->is_active = 0;
@@ -1481,10 +1489,14 @@ static int __perf_remove_from_context(void *info)
  * When called from perf_event_exit_task, it's OK because the
  * context has been detached from its task.
  */
-static void perf_remove_from_context(struct perf_event *event)
+static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
+	struct remove_event re = {
+		.event = event,
+		.detach_group = detach_group,
+	};
 
 	lockdep_assert_held(&ctx->mutex);
 
@@ -1493,12 +1505,12 @@ static void perf_remove_from_context(struct perf_event *event)
 		 * Per cpu events are removed via an smp call and
 		 * the removal is always successful.
 		 */
-		cpu_function_call(event->cpu, __perf_remove_from_context, event);
+		cpu_function_call(event->cpu, __perf_remove_from_context, &re);
 		return;
 	}
 
 retry:
-	if (!task_function_call(task, __perf_remove_from_context, event))
+	if (!task_function_call(task, __perf_remove_from_context, &re))
 		return;
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -1515,6 +1527,8 @@ retry:
 	 * Since the task isn't running, its safe to remove the event, us
 	 * holding the ctx->lock ensures the task won't get scheduled in.
 	 */
+	if (detach_group)
+		perf_group_detach(event);
 	list_del_event(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
 }
@@ -3281,10 +3295,7 @@ int perf_event_release_kernel(struct perf_event *event)
 	 *     to trigger the AB-BA case.
 	 */
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
-	raw_spin_lock_irq(&ctx->lock);
-	perf_group_detach(event);
-	raw_spin_unlock_irq(&ctx->lock);
-	perf_remove_from_context(event);
+	perf_remove_from_context(event, true);
 	mutex_unlock(&ctx->mutex);
 
 	free_event(event);
@@ -7165,7 +7176,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		struct perf_event_context *gctx = group_leader->ctx;
 
 		mutex_lock(&gctx->mutex);
-		perf_remove_from_context(group_leader);
+		perf_remove_from_context(group_leader, false);
 
 		/*
 		 * Removing from the context ends up with disabled
@@ -7175,7 +7186,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		perf_event__state_init(group_leader);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
 				    group_entry) {
-			perf_remove_from_context(sibling);
+			perf_remove_from_context(sibling, false);
 			perf_event__state_init(sibling);
 			put_ctx(gctx);
 		}
@@ -7305,7 +7316,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 	mutex_lock(&src_ctx->mutex);
 	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
 				 event_entry) {
-		perf_remove_from_context(event);
+		perf_remove_from_context(event, false);
 		unaccount_event_cpu(event, src_cpu);
 		put_ctx(src_ctx);
 		list_add(&event->migrate_entry, &events);
@@ -7367,13 +7378,7 @@ __perf_event_exit_task(struct perf_event *child_event,
 			 struct perf_event_context *child_ctx,
 			 struct task_struct *child)
 {
-	if (child_event->parent) {
-		raw_spin_lock_irq(&child_ctx->lock);
-		perf_group_detach(child_event);
-		raw_spin_unlock_irq(&child_ctx->lock);
-	}
-
-	perf_remove_from_context(child_event);
+	perf_remove_from_context(child_event, !!child_event->parent);
 
 	/*
 	 * It can happen that the parent exits first, and has events
@@ -7857,14 +7862,14 @@ static void perf_pmu_rotate_stop(struct pmu *pmu)
 
 static void __perf_event_exit_context(void *__info)
 {
+	struct remove_event re = { .detach_group = false };
 	struct perf_event_context *ctx = __info;
-	struct perf_event *event;
 
 	perf_pmu_rotate_stop(ctx->pmu);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(event, &ctx->event_list, event_entry)
-		__perf_remove_from_context(event);
+	list_for_each_entry_rcu(re.event, &ctx->event_list, event_entry)
+		__perf_remove_from_context(&re);
 	rcu_read_unlock();
 }
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [tip:perf/core] perf: Fix perf_event_init_context()
  2014-05-05 17:14                                                                                     ` Peter Zijlstra
  2014-05-05 18:47                                                                                       ` Vince Weaver
@ 2014-05-08 10:40                                                                                       ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 81+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-05-08 10:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, vincent.weaver, peterz, acme, tglx

Commit-ID:  ffb4ef21ac4308c2e738e6f83b6741bbc9b4fa3b
Gitweb:     http://git.kernel.org/tip/ffb4ef21ac4308c2e738e6f83b6741bbc9b4fa3b
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 5 May 2014 19:12:20 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 7 May 2014 11:33:15 +0200

perf: Fix perf_event_init_context()

perf_pin_task_context() can return NULL but perf_event_init_context()
assumes it will not, correct this.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/20140505171428.GU26782@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ea899e2..7123284 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7729,6 +7729,8 @@ int perf_event_init_context(struct task_struct *child, int ctxn)
 	 * swapped under us.
 	 */
 	parent_ctx = perf_pin_task_context(parent, ctxn);
+	if (!parent_ctx)
+		return 0;
 
 	/*
 	 * No need to check if parent_ctx != NULL here; since we saw

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2014-05-08 10:41 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15 21:37 [perf] more perf_fuzzer memory corruption Vince Weaver
2014-04-15 21:49 ` Thomas Gleixner
2014-04-16  3:21   ` Vince Weaver
2014-04-16  4:18     ` Vince Weaver
2014-04-16 14:15 ` Peter Zijlstra
2014-04-16 17:30   ` Vince Weaver
2014-04-16 17:43     ` Vince Weaver
2014-04-16 17:47       ` Peter Zijlstra
2014-04-17  9:48       ` Ingo Molnar
2014-04-17 11:45         ` Peter Zijlstra
2014-04-17 14:22           ` Ingo Molnar
2014-04-17 14:42             ` Vince Weaver
2014-04-17 14:54               ` Peter Zijlstra
2014-04-17 15:35                 ` Vince Weaver
2014-04-18 14:45                 ` Vince Weaver
2014-04-18 14:51                   ` Vince Weaver
2014-04-18 15:23                   ` Peter Zijlstra
2014-04-18 16:59                     ` Peter Zijlstra
2014-04-18 17:15                       ` Peter Zijlstra
2014-04-23 20:58                         ` Vince Weaver
2014-04-25  2:51                           ` Vince Weaver
2014-04-28 14:21                             ` Vince Weaver
2014-04-28 19:38                               ` Vince Weaver
2014-04-29  9:46                                 ` Peter Zijlstra
2014-04-29 18:21                                   ` Vince Weaver
2014-04-29 19:01                                     ` Peter Zijlstra
2014-04-29 20:59                                       ` Vince Weaver
2014-04-30 18:44                                         ` Peter Zijlstra
2014-04-30 21:08                                           ` Vince Weaver
2014-04-30 22:51                                             ` Thomas Gleixner
2014-05-01 10:26                                               ` Peter Zijlstra
2014-05-01 11:50                                                 ` Peter Zijlstra
2014-05-01 12:35                                                   ` Thomas Gleixner
2014-05-01 13:12                                                     ` Peter Zijlstra
2014-05-01 13:29                                                     ` Thomas Gleixner
2014-05-01 13:22                                                 ` Vince Weaver
2014-05-01 14:07                                           ` Vince Weaver
2014-05-01 14:27                                             ` Vince Weaver
2014-05-01 15:09                                               ` Peter Zijlstra
2014-05-01 15:50                                                 ` Vince Weaver
2014-05-01 16:31                                                   ` Thomas Gleixner
2014-05-01 17:18                                                     ` Vince Weaver
2014-05-01 18:49                                                       ` Vince Weaver
2014-05-01 21:32                                                         ` Vince Weaver
2014-05-02 11:15                                                         ` Peter Zijlstra
2014-05-02 15:42                                                         ` Peter Zijlstra
2014-05-02 16:22                                                           ` Vince Weaver
2014-05-02 16:22                                                             ` Peter Zijlstra
2014-05-02 16:43                                                               ` Vince Weaver
2014-05-02 17:27                                                                 ` Peter Zijlstra
2014-05-02 17:46                                                                   ` Vince Weaver
2014-05-02 19:12                                                                     ` Thomas Gleixner
2014-05-02 20:15                                                                       ` Vince Weaver
2014-05-02 20:45                                                                         ` Thomas Gleixner
2014-05-03  2:32                                                                           ` Vince Weaver
2014-05-03  3:02                                                                             ` Vince Weaver
2014-05-03  7:33                                                                               ` Peter Zijlstra
2014-05-05  9:31                                                                               ` Peter Zijlstra
2014-05-05 16:00                                                                                 ` Vince Weaver
2014-05-05 17:10                                                                                   ` Vince Weaver
2014-05-05 17:14                                                                                     ` Peter Zijlstra
2014-05-05 18:47                                                                                       ` Vince Weaver
2014-05-05 19:36                                                                                         ` Peter Zijlstra
2014-05-05 19:51                                                                                           ` Vince Weaver
2014-05-06  1:06                                                                                         ` Vince Weaver
2014-05-06 16:57                                                                                           ` Vince Weaver
2014-05-07 16:45                                                                                             ` Peter Zijlstra
2014-05-08 10:40                                                                                       ` [tip:perf/core] perf: Fix perf_event_init_context() tip-bot for Peter Zijlstra
2014-05-05 17:29                                                                                   ` [perf] more perf_fuzzer memory corruption Ingo Molnar
2014-05-06  4:51                                                                                     ` Vince Weaver
2014-05-06 17:06                                                                                       ` Vince Weaver
2014-05-07 19:12                                                                                         ` Ingo Molnar
2014-05-07 19:11                                                                                       ` Ingo Molnar
2014-05-08 10:40                                                                                 ` [tip:perf/core] perf: Fix race in removing an event tip-bot for Peter Zijlstra
2014-05-02 17:06                                                           ` [perf] more perf_fuzzer memory corruption Vince Weaver
2014-05-02 17:04                                                             ` Peter Zijlstra
2014-04-29 19:26                                     ` Steven Rostedt
2014-04-29  8:52                               ` Peter Zijlstra
2014-04-29 18:11                                 ` Vince Weaver
2014-04-29 19:21                                   ` Steven Rostedt
2014-04-28 17:48                             ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.