linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf_fuzzer crash on pentium 4
@ 2014-05-06 15:42 Vince Weaver
  2014-05-06 15:46 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Vince Weaver @ 2014-05-06 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Cyrill Gorcunov


So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.

It crashes more or less instantly (sorry for the line wrapping, 
just got the serial console hooked up and don't have minicom configured 
right yet).

this is 3.15-rc4 with the anti-memory corruption patch applied.

[   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
[   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
[   67.876146] PGD 3cea7067 PUD 3cea8067 PMD 0 
[   67.876146] Oops: 0000 [#1] SMP 
[   67.876146] Modules linked in: loop snd_hda_codec_analog snd_hda_codec_genern
[   67.876146] CPU: 0 PID: 2192 Comm: perf_fuzzer Tainted: G        W     3.15.1
[   67.876146] Hardware name: LENOVO 88088NU/LENOVO, BIOS 2JKT37AUS 07/12/2007
[   67.876146] task: ffff88003c0610d0 ti: ffff88003c062000 task.ti: ffff88003c00
[   67.876146] RIP: 0010:[<ffffffff81013df2>]  [<ffffffff81013df2>] p4_pmu_sche1
[   67.876146] RSP: 0000:ffff88003f403d60  EFLAGS: 00010046
[   67.876146] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000003a2
[   67.876146] RDX: ffff88003c0610d0 RSI: 0000000000000003 RDI: 0000000000000000
[   67.876146] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[   67.876146] R10: 00007f156ab399d0 R11: 0000000000000246 R12: 0000000000000000
[   67.876146] R13: 0000000000000002 R14: ffff88003f403de8 R15: ffff88003b766000
[   67.876146] FS:  00007f156ab39700(0000) GS:ffff88003f400000(0000) knlGS:00000
[   67.876146] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   67.876146] CR2: 0000000000000004 CR3: 000000003c598000 CR4: 00000000000007f0
[   67.876146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   67.876146] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[   67.876146] Stack:
[   67.876146]  0000000000000002 0000000000000000 ffff88003f40bb50 0000000100003
[   67.876146]  0000000000000003 3000020c0403c200 0000000000000001 0000000000004
[   67.876146]  0000000000000000 ffff88003f40bb50 ffff88003f403de8 0000000000003
[   67.876146] Call Trace:
[   67.876146]  <IRQ> 
[   67.876146]  [<ffffffff810104c7>] ? x86_pmu_commit_txn+0x45/0x8b
[   67.876146]  [<ffffffff8104d6c6>] ? search_exception_tables+0x1d/0x2d
[   67.876146]  [<ffffffff8102cc65>] ? fixup_exception+0x10/0x53
[   67.876146]  [<ffffffff813e65dd>] ? do_general_protection+0x30/0x12d
[   67.876146]  [<ffffffff813e6082>] ? general_protection+0x22/0x30
[   67.876146]  [<ffffffff810ba5ef>] ? event_sched_in+0x129/0x136
[   67.876146]  [<ffffffff810ba68a>] ? group_sched_in+0x8e/0x138
[   67.876146]  [<ffffffff810bb1af>] ? __perf_event_enable+0xea/0x128
[   67.876146]  [<ffffffff810b76c0>] ? remote_function+0x13/0x3b
[   67.876146]  [<ffffffff81084fb7>] ? generic_smp_call_function_single_interrua
[   67.876146]  [<ffffffff810227db>] ? smp_call_function_single_interrupt+0xf/0c
[   67.876146]  [<ffffffff813ebbba>] ? call_function_single_interrupt+0x6a/0x70
[   67.876146]  <EOI> 
[   67.876146] Code: 08 49 8b 97 28 01 00 00 48 89 d5 48 c1 ed 39 83 e5 3f 83 f 
[   67.876146] RIP  [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
[   67.876146]  RSP <ffff88003f403d60>
[   67.876146] CR2: 0000000000000004
[   67.876146] ---[ end trace a88368266e292dfa ]---
[   67.876146] Kernel panic - not syncing: Fatal exception in interrupt
[   67.876146] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0x)
[   67.876146] drm_kms_helper: panic occurred, switching back to text console
[   67.876146] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:42 perf_fuzzer crash on pentium 4 Vince Weaver
@ 2014-05-06 15:46 ` Peter Zijlstra
  2014-05-06 15:49   ` Cyrill Gorcunov
  2014-05-06 16:11   ` Vince Weaver
  2014-05-06 20:23 ` Cyrill Gorcunov
  2014-05-28 13:56 ` Pavel Machek
  2 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-05-06 15:46 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Ingo Molnar, Cyrill Gorcunov

On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> 
> So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> 
> It crashes more or less instantly (sorry for the line wrapping, 
> just got the serial console hooked up and don't have minicom configured 
> right yet).
> 
> this is 3.15-rc4 with the anti-memory corruption patch applied.

*sigh*.. the one x86 PMU driver I really don't know. I might actually
have some P4 class hardware, I'll try and get it wired up this week
somewhere.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:46 ` Peter Zijlstra
@ 2014-05-06 15:49   ` Cyrill Gorcunov
  2014-05-06 16:05     ` Vince Weaver
  2014-05-06 16:11   ` Vince Weaver
  1 sibling, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-06 15:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, Ingo Molnar

On Tue, May 06, 2014 at 05:46:37PM +0200, Peter Zijlstra wrote:
> On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> > 
> > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > 
> > It crashes more or less instantly (sorry for the line wrapping, 
> > just got the serial console hooked up and don't have minicom configured 
> > right yet).
> > 
> > this is 3.15-rc4 with the anti-memory corruption patch applied.
> 
> *sigh*.. the one x86 PMU driver I really don't know. I might actually
> have some P4 class hardware, I'll try and get it wired up this week
> somewhere.

I'll take a look, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:49   ` Cyrill Gorcunov
@ 2014-05-06 16:05     ` Vince Weaver
  2014-05-06 16:06       ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-06 16:05 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Ingo Molnar

On Tue, 6 May 2014, Cyrill Gorcunov wrote:

> On Tue, May 06, 2014 at 05:46:37PM +0200, Peter Zijlstra wrote:
> > On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> > > 
> > > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > > 
> > > It crashes more or less instantly (sorry for the line wrapping, 
> > > just got the serial console hooked up and don't have minicom configured 
> > > right yet).
> > > 
> > > this is 3.15-rc4 with the anti-memory corruption patch applied.
> > 
> > *sigh*.. the one x86 PMU driver I really don't know. I might actually
> > have some P4 class hardware, I'll try and get it wired up this week
> > somewhere.
> 
> I'll take a look, thanks.

This happens quickly enough I can probably generate a small test case, but 
I won't have time to do that until maybe tomorrow.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 16:05     ` Vince Weaver
@ 2014-05-06 16:06       ` Cyrill Gorcunov
  0 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-06 16:06 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

On Tue, May 06, 2014 at 12:05:41PM -0400, Vince Weaver wrote:
> On Tue, 6 May 2014, Cyrill Gorcunov wrote:
> 
> > On Tue, May 06, 2014 at 05:46:37PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> > > > 
> > > > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > > > 
> > > > It crashes more or less instantly (sorry for the line wrapping, 
> > > > just got the serial console hooked up and don't have minicom configured 
> > > > right yet).
> > > > 
> > > > this is 3.15-rc4 with the anti-memory corruption patch applied.
> > > 
> > > *sigh*.. the one x86 PMU driver I really don't know. I might actually
> > > have some P4 class hardware, I'll try and get it wired up this week
> > > somewhere.
> > 
> > I'll take a look, thanks.
> 
> This happens quickly enough I can probably generate a small test case, but 
> I won't have time to do that until maybe tomorrow.

I'm quite busy until night as well, so no rush.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:46 ` Peter Zijlstra
  2014-05-06 15:49   ` Cyrill Gorcunov
@ 2014-05-06 16:11   ` Vince Weaver
  2014-05-06 16:16     ` Cyrill Gorcunov
  1 sibling, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-06 16:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Vince Weaver, linux-kernel, Ingo Molnar, Cyrill Gorcunov

On Tue, 6 May 2014, Peter Zijlstra wrote:

> On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> > 
> > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > 
> > It crashes more or less instantly (sorry for the line wrapping, 
> > just got the serial console hooked up and don't have minicom configured 
> > right yet).
> > 
> > this is 3.15-rc4 with the anti-memory corruption patch applied.
> 
> *sigh*.. the one x86 PMU driver I really don't know. I might actually
> have some P4 class hardware, I'll try and get it wired up this week
> somewhere.

I have a Pentium II in working condition I can fire up too.  All my other 
older x86 hardware (K6-2, 486) pre-dates performance counters.

Just be glad that when I moved I had to leave behind the itanium, PA-RISC, 
Power-G3, Alpha, SPARC niagara, pentium-pro and K7-athlon machines.  
Though I guess other than the latter two those wouldn't be your problems.

If I had infinite time I'd try to get the SGI Octane, Ultrasparc, and 
avr32 boards up and going again.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 16:11   ` Vince Weaver
@ 2014-05-06 16:16     ` Cyrill Gorcunov
  2014-05-06 17:56       ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-06 16:16 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

On Tue, May 06, 2014 at 12:11:49PM -0400, Vince Weaver wrote:
> 
> If I had infinite time I'd try to get the SGI Octane, Ultrasparc, and 
> avr32 boards up and going again.

Btw, Vince, perf_fuzzer -- it's http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 16:16     ` Cyrill Gorcunov
@ 2014-05-06 17:56       ` Vince Weaver
  0 siblings, 0 replies; 35+ messages in thread
From: Vince Weaver @ 2014-05-06 17:56 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, Peter Zijlstra, linux-kernel, Ingo Molnar

On Tue, 6 May 2014, Cyrill Gorcunov wrote:

> On Tue, May 06, 2014 at 12:11:49PM -0400, Vince Weaver wrote:
> > 
> > If I had infinite time I'd try to get the SGI Octane, Ultrasparc, and 
> > avr32 boards up and going again.
> 
> Btw, Vince, perf_fuzzer -- it's http://web.eece.maine.edu/~vweaver/projects/perf_events/fuzzer/?

Yes.

Quick start instructions:

	git clone git://github.com/deater/perf_event_tests
	cd fuzzer
	make
	./fast_repro98.sh

and I find the bug triggers pretty quickly after that.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:42 perf_fuzzer crash on pentium 4 Vince Weaver
  2014-05-06 15:46 ` Peter Zijlstra
@ 2014-05-06 20:23 ` Cyrill Gorcunov
  2014-05-06 21:30   ` Vince Weaver
  2014-05-08  2:00   ` Don Zickus
  2014-05-28 13:56 ` Pavel Machek
  2 siblings, 2 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-06 20:23 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> 
> So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> 
> It crashes more or less instantly (sorry for the line wrapping, 
> just got the serial console hooked up and don't have minicom configured 
> right yet).
> 
> this is 3.15-rc4 with the anti-memory corruption patch applied.
> 
> [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331

This looks like

p4_pmu_schedule_events:
		...
		bind = p4_config_get_bind(hwc->config);
			returned bind = NULL;
		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]); NULL deref

If i'm right (btw it's possible to use addr2line helper?) then hwc->config
is corrupted and p4_config_get_bind returned nil simply because proper event
was not found. And I don't understand how it could happen because before
configuration gets written into hwc->config it's validated once obtained
from user-space as a raw event. Weird...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 20:23 ` Cyrill Gorcunov
@ 2014-05-06 21:30   ` Vince Weaver
  2014-05-06 21:46     ` Cyrill Gorcunov
  2014-05-08  2:00   ` Don Zickus
  1 sibling, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-06 21:30 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, 7 May 2014, Cyrill Gorcunov wrote:

> > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> 
> This looks like
> 
> p4_pmu_schedule_events:
> 		...
> 		bind = p4_config_get_bind(hwc->config);
> 			returned bind = NULL;
> 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]); NULL deref
> 
> If i'm right (btw it's possible to use addr2line helper?) 

Yes, the address maps to

	escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);

> then hwc->config
> is corrupted and p4_config_get_bind returned nil simply because proper event
> was not found. And I don't understand how it could happen because before
> configuration gets written into hwc->config it's validated once obtained
> from user-space as a raw event. Weird...

I'll try to get some sort of trace out if it to see what event is being 
tried.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 21:30   ` Vince Weaver
@ 2014-05-06 21:46     ` Cyrill Gorcunov
  2014-05-07 16:46       ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-06 21:46 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, May 06, 2014 at 05:30:19PM -0400, Vince Weaver wrote:
> On Wed, 7 May 2014, Cyrill Gorcunov wrote:
> 
> > > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> > 
> > This looks like
> > 
> > p4_pmu_schedule_events:
> > 		...
> > 		bind = p4_config_get_bind(hwc->config);
> > 			returned bind = NULL;
> > 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]); NULL deref
> > 
> > If i'm right (btw it's possible to use addr2line helper?) 
> 
> Yes, the address maps to
> 
> 	escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);

Great, now we knows the reason of the issue, only to figure out
why is left ;)

> > then hwc->config
> > is corrupted and p4_config_get_bind returned nil simply because proper event
> > was not found. And I don't understand how it could happen because before
> > configuration gets written into hwc->config it's validated once obtained
> > from user-space as a raw event. Weird...
> 
> I'll try to get some sort of trace out if it to see what event is being 
> tried.

Yeah, this would help a lot.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 21:46     ` Cyrill Gorcunov
@ 2014-05-07 16:46       ` Vince Weaver
  2014-05-07 16:49         ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-07 16:46 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, 7 May 2014, Cyrill Gorcunov wrote:

> On Tue, May 06, 2014 at 05:30:19PM -0400, Vince Weaver wrote:
> > On Wed, 7 May 2014, Cyrill Gorcunov wrote:
> > 
> > > > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > > > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> > > 
> > > This looks like
> > > 
> > > p4_pmu_schedule_events:
> > > 		...
> > > 		bind = p4_config_get_bind(hwc->config);
> > > 			returned bind = NULL;
> > > 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]); NULL deref
> > > 
> > > If i'm right (btw it's possible to use addr2line helper?) 
> > 
> > Yes, the address maps to
> > 
> > 	escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);
> 
> Great, now we knows the reason of the issue, only to figure out
> why is left ;)
> 
> > > then hwc->config
> > > is corrupted and p4_config_get_bind returned nil simply because proper event
> > > was not found. And I don't understand how it could happen because before
> > > configuration gets written into hwc->config it's validated once obtained
> > > from user-space as a raw event. Weird...
> > 
> > I'll try to get some sort of trace out if it to see what event is being 
> > tried.
> 
> Yeah, this would help a lot.

sorry for the delay, I like to compile kernels locally and it takes a 
really long time to build a ftrace-enabled kernel on a pentium 4 it seems.

Anyway I threw some printks in, and this is what I get:

[  447.572626] VMW: bind=NULL config=6b6b6b6b6b6b6b6b

I have slab poisoning turned on.  Use after free?

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 16:46       ` Vince Weaver
@ 2014-05-07 16:49         ` Cyrill Gorcunov
  2014-05-07 16:58           ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-07 16:49 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 12:46:24PM -0400, Vince Weaver wrote:
> 
> sorry for the delay, I like to compile kernels locally and it takes a 
> really long time to build a ftrace-enabled kernel on a pentium 4 it seems.
> 
> Anyway I threw some printks in, and this is what I get:
> 
> [  447.572626] VMW: bind=NULL config=6b6b6b6b6b6b6b6b
> 
> I have slab poisoning turned on.  Use after free?

Looks so. It's list poison iirc, thus i think it comes from upper level,
ie from perf general code.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 16:49         ` Cyrill Gorcunov
@ 2014-05-07 16:58           ` Cyrill Gorcunov
  2014-05-07 17:07             ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-07 16:58 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 08:49:02PM +0400, Cyrill Gorcunov wrote:
> On Wed, May 07, 2014 at 12:46:24PM -0400, Vince Weaver wrote:
> > 
> > sorry for the delay, I like to compile kernels locally and it takes a 
> > really long time to build a ftrace-enabled kernel on a pentium 4 it seems.
> > 
> > Anyway I threw some printks in, and this is what I get:
> > 
> > [  447.572626] VMW: bind=NULL config=6b6b6b6b6b6b6b6b
> > 
> > I have slab poisoning turned on.  Use after free?
> 
> Looks so. It's list poison iirc, thus i think it comes from upper level,
> ie from perf general code.

Vince, I'm trying to figure out where it might come from, but no
ideas yet.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 16:58           ` Cyrill Gorcunov
@ 2014-05-07 17:07             ` Vince Weaver
  2014-05-07 18:24               ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-07 17:07 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, 7 May 2014, Cyrill Gorcunov wrote:

> On Wed, May 07, 2014 at 08:49:02PM +0400, Cyrill Gorcunov wrote:
> > On Wed, May 07, 2014 at 12:46:24PM -0400, Vince Weaver wrote:
> > > 
> > > sorry for the delay, I like to compile kernels locally and it takes a 
> > > really long time to build a ftrace-enabled kernel on a pentium 4 it seems.
> > > 
> > > Anyway I threw some printks in, and this is what I get:
> > > 
> > > [  447.572626] VMW: bind=NULL config=6b6b6b6b6b6b6b6b
> > > 
> > > I have slab poisoning turned on.  Use after free?
> > 
> > Looks so. It's list poison iirc, thus i think it comes from upper level,
> > ie from perf general code.
> 
> Vince, I'm trying to figure out where it might come from, but no
> ideas yet.

I just got this, also looks like poison (see RBX). 

This could be related to the ongoing memory corruption bug found in 
another thread and not p4-related at all.

I thought I was running with PeterZ's latest patch that was supposed to 
avoid the corruption.  Hmmm.  Let me reboot and try a few more things.

[  427.981605] general protection fault: 0000 [#1] SMP 
[  427.985574] Modules linked in: loop microcode snd_hda_codec_analog snd_hda_codec_generic i915 snd_hda_intel snd_hda_controller iTCO_wdt snd_hda_codec iTCO_vendor_support ppdev drm_kms_helper snd_hwdep evdev snd_pcm drm snd_timer snd i2c_algo_bit i2c_i801 psmouse pcspkr soundcore serio_raw i2c_core lpc_ich mfd_core video tpm_tis tpm parport_pc parport button acpi_cpufreq processor thermal_sys sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic tg3 ptp pps_core ata_piix libata uhci_hcd ehci_pci scsi_mod ehci_hcd libphy floppy usbcore usb_common
[  427.985574] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W     3.15.0-rc4+ #2
[  427.985574] Hardware name: LENOVO 88088NU/LENOVO, BIOS 2JKT37AUS 07/12/2007
[  427.985574] task: ffffffff81814430 ti: ffffffff81800000 task.ti: ffffffff81800000
[  427.985574] RIP: 0010:[<ffffffff810d31f7>]  [<ffffffff810d31f7>] __perf_sw_event+0xc6/0x122
[  427.985574] RSP: 0018:ffffffff81801d38  EFLAGS: 00010006
[  427.985574] RAX: ffff88003a17f6d0 RBX: 6b6b6b6b6b6b6b2b RCX: ffff88003f40ee54
[  427.985574] RDX: 9e37fffffffc0001 RSI: 0000000000000003 RDI: 0000000100000000
[  427.985574] RBP: ffffffff81801df0 R08: ffffffff81a23ec0 R09: 0000000000000003
[  427.985574] R10: 0000000000000000 R11: 0000000000000020 R12: ffffffff81801e00
[  427.985574] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000001
[  427.985574] FS:  0000000000000000(0000) GS:ffff88003f400000(0000) knlGS:0000000000000000
[  427.985574] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  427.985574] CR2: 0000000000618af8 CR3: 0000000039879000 CR4: 00000000000007f0
[  427.985574] DR0: 00000000020b9000 DR1: 00000000020b9000 DR2: 00000000020b9000
[  427.985574] DR3: 0000000000000800 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[  427.985574] Stack:
[  427.985574]  0000000000012e00 ffffffff81801e28 0000000000000046 000000000000015d
[  427.985574]  0000000000000000 ffffffff81801da8 ffffffff81801d78 ffffffff81008780
[  427.985574]  0000000000000000 0000000000000000 ffff88003f40ce00 0000000000000000
[  427.985574] Call Trace:
[  427.985574]  [<ffffffff81008780>] ? read_tsc+0x9/0x19
[  427.985574]  [<ffffffff8105ef16>] perf_event_task_sched_out+0x59/0x67
[  427.985574]  [<ffffffff8105eefe>] ? perf_event_task_sched_out+0x41/0x67
[  427.985574]  [<ffffffff81432be3>] __schedule+0x237/0x4cd
[  427.985574]  [<ffffffff81432eec>] schedule+0x73/0x75
[  427.985574]  [<ffffffff81433140>] schedule_preempt_disabled+0xe/0x10
[  427.985574]  [<ffffffff8106d20c>] cpu_startup_entry+0x1db/0x1e7
[  427.985574]  [<ffffffff814254e3>] rest_init+0x77/0x79
[  427.985574]  [<ffffffff818e6d1d>] start_kernel+0x3ba/0x3c5
[  427.985574]  [<ffffffff818e6771>] ? repair_env_string+0x58/0x58
[  427.985574]  [<ffffffff818e6489>] x86_64_start_reservations+0x2a/0x2c
[  427.985574]  [<ffffffff818e657c>] x86_64_start_kernel+0xf1/0xf4
[  427.985574] Code: 0a 44 89 ef e8 b0 fd ff ff eb 6a 44 89 f6 bf 01 00 00 00 e8 7e 94 ff ff 48 8d 04 c3 48 8b 18 48 85 db 75 19 31 db 48 85 db 74 d6 <83> bb c0 00 00 00 01 74 0f 48 8b 5b 40 48 85 db 74 c4 48 83 eb 
[  427.985574] RIP  [<ffffffff810d31f7>] __perf_sw_event+0xc6/0x122
[  427.985574]  RSP <ffffffff81801d38>
[  427.985574] ---[ end trace b545a4ca53c4641d ]---
[  427.985574] Kernel panic - not syncing: Attempted to kill the idle task!
[  427.985574] Shutting down cpus with NMI
[  427.985574] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  427.985574] drm_kms_helper: panic occurred, switching back to text console
[  427.985574] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 17:07             ` Vince Weaver
@ 2014-05-07 18:24               ` Cyrill Gorcunov
  2014-05-07 21:17                 ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-07 18:24 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 01:07:40PM -0400, Vince Weaver wrote:
> > 
> > Vince, I'm trying to figure out where it might come from, but no
> > ideas yet.
> 
> I just got this, also looks like poison (see RBX).

Indeed, except ending 2b value.

> This could be related to the ongoing memory corruption bug found in 
> another thread and not p4-related at all.
> 
> I thought I was running with PeterZ's latest patch that was supposed to 
> avoid the corruption.  Hmmm.  Let me reboot and try a few more things.

Thanks! Please ping me if find something new.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 18:24               ` Cyrill Gorcunov
@ 2014-05-07 21:17                 ` Vince Weaver
  2014-05-07 21:51                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-07 21:17 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, 7 May 2014, Cyrill Gorcunov wrote:

> > I thought I was running with PeterZ's latest patch that was supposed to 
> > avoid the corruption.  Hmmm.  Let me reboot and try a few more things.
> 
> Thanks! Please ping me if find something new.

It turns out to be my fault, I was running with an incomplete version of 
PeterZ's patch.  We need to get the fix into the kernel as apparently I 
fail at manually applying patches across multiple machines.

So to summarize, with PeterZ's fix the various memory corruption crashes 
in the p4 code no longer happen.

When fuzzing on the p4 *other* things do happen.
	* at least two warnings pop up almost instantly
	* eventually the machine will crash in an endless NMI storm
	* also I managed to get the machine wedged with an unkillable
	  process, sort of like the known problem PeterZ has.

The NMI issue is probably the only one that is p4 related, and I do get 
the NMI warnings on other machines too, it's just the p4 is the only one 
where it brings down the machine.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 21:17                 ` Vince Weaver
@ 2014-05-07 21:51                   ` Cyrill Gorcunov
  2014-05-07 21:54                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-07 21:51 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 05:17:22PM -0400, Vince Weaver wrote:
> On Wed, 7 May 2014, Cyrill Gorcunov wrote:
> 
> > > I thought I was running with PeterZ's latest patch that was supposed to 
> > > avoid the corruption.  Hmmm.  Let me reboot and try a few more things.
> > 
> > Thanks! Please ping me if find something new.
> 
> It turns out to be my fault, I was running with an incomplete version of 
> PeterZ's patch.  We need to get the fix into the kernel as apparently I 
> fail at manually applying patches across multiple machines.
> 
> So to summarize, with PeterZ's fix the various memory corruption crashes 
> in the p4 code no longer happen.

Good. Thanks a lot, Vince!

> When fuzzing on the p4 *other* things do happen.
> 	* at least two warnings pop up almost instantly
> 	* eventually the machine will crash in an endless NMI storm
> 	* also I managed to get the machine wedged with an unkillable
> 	  process, sort of like the known problem PeterZ has.
> 
> The NMI issue is probably the only one that is p4 related, and I do get 
> the NMI warnings on other machines too, it's just the p4 is the only one 
> where it brings down the machine.

Vince, could you please provde more details on that? Is it possible
to somehow log which events were used by perf?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 21:51                   ` Cyrill Gorcunov
@ 2014-05-07 21:54                     ` Cyrill Gorcunov
  2014-05-08  5:14                       ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-07 21:54 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Don Zickus

On Thu, May 08, 2014 at 01:51:44AM +0400, Cyrill Gorcunov wrote:
> 
> > When fuzzing on the p4 *other* things do happen.
> > 	* at least two warnings pop up almost instantly
> > 	* eventually the machine will crash in an endless NMI storm
> > 	* also I managed to get the machine wedged with an unkillable
> > 	  process, sort of like the known problem PeterZ has.
> > 
> > The NMI issue is probably the only one that is p4 related, and I do get 
> > the NMI warnings on other machines too, it's just the p4 is the only one 
> > where it brings down the machine.
> 
> Vince, could you please provde more details on that? Is it possible
> to somehow log which events were used by perf?

There were a bug in p4 pmu Don (CC'ed) fixed not that long ago but I fear
not all corner cases might be covered yet.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 20:23 ` Cyrill Gorcunov
  2014-05-06 21:30   ` Vince Weaver
@ 2014-05-08  2:00   ` Don Zickus
  2014-05-08  5:38     ` Cyrill Gorcunov
  2014-05-08  7:37     ` Cyrill Gorcunov
  1 sibling, 2 replies; 35+ messages in thread
From: Don Zickus @ 2014-05-08  2:00 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 12:23:08AM +0400, Cyrill Gorcunov wrote:
> On Tue, May 06, 2014 at 11:42:58AM -0400, Vince Weaver wrote:
> > 
> > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > 
> > It crashes more or less instantly (sorry for the line wrapping, 
> > just got the serial console hooked up and don't have minicom configured 
> > right yet).
> > 
> > this is 3.15-rc4 with the anti-memory corruption patch applied.
> > 
> > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> 
> This looks like
> 
> p4_pmu_schedule_events:
> 		...
> 		bind = p4_config_get_bind(hwc->config);
> 			returned bind = NULL;
> 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]); NULL deref
> 
> If i'm right (btw it's possible to use addr2line helper?) then hwc->config
> is corrupted and p4_config_get_bind returned nil simply because proper event
> was not found. And I don't understand how it could happen because before
> configuration gets written into hwc->config it's validated once obtained
> from user-space as a raw event. Weird...

I think my commit 13beacee817d27a40ffc6f065ea0042685611dd5 explains this
corruption.  Though I have to admit I haven't looked through the problem
very closely yet.

IOW my lazy fix in that commit doesn't cover fuzzers and the real problem
in p4_pmu_schedule_events. :-)

Cheers,
Don

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-07 21:54                     ` Cyrill Gorcunov
@ 2014-05-08  5:14                       ` Vince Weaver
  2014-05-08  5:40                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-08  5:14 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar, Don Zickus

On Thu, 8 May 2014, Cyrill Gorcunov wrote:

> > > The NMI issue is probably the only one that is p4 related, and I do get 
> > > the NMI warnings on other machines too, it's just the p4 is the only one 
> > > where it brings down the machine.
> > 
> > Vince, could you please provde more details on that? Is it possible
> > to somehow log which events were used by perf?
> 
> There were a bug in p4 pmu Don (CC'ed) fixed not that long ago but I fear
> not all corner cases might be covered yet.

I hit the NMI warnings somewhat often on Intel hardware (Haswell, Core2) 
but it usually doesn't make the system unusable like it does on p4.

I can try to get a trace, although I'm not sure it will be useful.  I 
spent a lot of time getting a reproducible test case for the same warnings 
on core2 and it was unclear what the proble was and it was never fixed.

The messages look like this:

[ 2944.203423] Uhhuh. NMI received for unknown reason 31 on CPU 0.
[ 2944.208006] Do you have a strange power saving mode enabled?
[ 2944.208006] Dazed and confused, but trying to continue
[ 2944.208006] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 2944.208006] Do you have a strange power saving mode enabled?
[ 2944.208006] Dazed and confused, but trying to continue
[ 2944.208006] Uhhuh. NMI received for unknown reason 31 on CPU 0.
[ 2944.208006] Do you have a strange power saving mode enabled?
[ 2944.208006] Dazed and confused, but trying to continue

repeating forever, system is unusable.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  2:00   ` Don Zickus
@ 2014-05-08  5:38     ` Cyrill Gorcunov
  2014-05-08  7:37     ` Cyrill Gorcunov
  1 sibling, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-08  5:38 UTC (permalink / raw)
  To: Don Zickus; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 10:00:50PM -0400, Don Zickus wrote:
> > 
> > If i'm right (btw it's possible to use addr2line helper?) then hwc->config
> > is corrupted and p4_config_get_bind returned nil simply because proper event
> > was not found. And I don't understand how it could happen because before
> > configuration gets written into hwc->config it's validated once obtained
> > from user-space as a raw event. Weird...
> 
> I think my commit 13beacee817d27a40ffc6f065ea0042685611dd5 explains this
> corruption.  Though I have to admit I haven't looked through the problem
> very closely yet.

nope ;) without the fix above we could (and we did) simply misconfigure
counter but never get access out of array bound. anyway, Vince confirm
the fix from PeterZ healed the problem.

> IOW my lazy fix in that commit doesn't cover fuzzers and the real problem
> in p4_pmu_schedule_events. :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  5:14                       ` Vince Weaver
@ 2014-05-08  5:40                         ` Cyrill Gorcunov
  0 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-08  5:40 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Don Zickus

On Thu, May 08, 2014 at 01:14:56AM -0400, Vince Weaver wrote:
> > 
> > There were a bug in p4 pmu Don (CC'ed) fixed not that long ago but I fear
> > not all corner cases might be covered yet.
> 
> I hit the NMI warnings somewhat often on Intel hardware (Haswell, Core2) 
> but it usually doesn't make the system unusable like it does on p4.
> 
> I can try to get a trace, although I'm not sure it will be useful.  I 
> spent a lot of time getting a reproducible test case for the same warnings 
> on core2 and it was unclear what the proble was and it was never fixed.
> 
> The messages look like this:
> 
> [ 2944.203423] Uhhuh. NMI received for unknown reason 31 on CPU 0.
> [ 2944.208006] Do you have a strange power saving mode enabled?
> [ 2944.208006] Dazed and confused, but trying to continue
> [ 2944.208006] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 2944.208006] Do you have a strange power saving mode enabled?
> [ 2944.208006] Dazed and confused, but trying to continue
> [ 2944.208006] Uhhuh. NMI received for unknown reason 31 on CPU 0.
> [ 2944.208006] Do you have a strange power saving mode enabled?
> [ 2944.208006] Dazed and confused, but trying to continue
> 
> repeating forever, system is unusable.

Vince, is it possible to get a trace which exactly events perf-fuzzed
pushed into the kernel? Maybe it would shed some light.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  2:00   ` Don Zickus
  2014-05-08  5:38     ` Cyrill Gorcunov
@ 2014-05-08  7:37     ` Cyrill Gorcunov
  2014-05-08  7:49       ` Cyrill Gorcunov
  1 sibling, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-08  7:37 UTC (permalink / raw)
  To: Don Zickus, Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 07, 2014 at 10:00:50PM -0400, Don Zickus wrote:
> 
> I think my commit 13beacee817d27a40ffc6f065ea0042685611dd5 explains this
> corruption.  Though I have to admit I haven't looked through the problem
> very closely yet.
> 
> IOW my lazy fix in that commit doesn't cover fuzzers and the real problem
> in p4_pmu_schedule_events. :-)

Don, Vince, could you please give the patch a run? I've only compile tested
it obviously since I've no real p4 hw. And the patch itself is a bit ugly
but should bring the light if we're still having problems in events
scheduling.
---
 arch/x86/kernel/cpu/perf_event_p4.c |   61 +++++++++++++++---------------------
 1 file changed, 27 insertions(+), 34 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1063,23 +1063,23 @@ static int p4_pmu_handle_irq(struct pt_r
  * swap thread specific fields according to a thread
  * we are going to run on
  */
-static void p4_pmu_swap_config_ts(struct hw_perf_event *hwc, int cpu)
+static u64 p4_pmu_swap_config_ts(u64 config, int cpu)
 {
 	u32 escr, cccr;
 
 	/*
 	 * we either lucky and continue on same cpu or no HT support
 	 */
-	if (!p4_should_swap_ts(hwc->config, cpu))
-		return;
+	if (!p4_should_swap_ts(config, cpu))
+		return config;
 
 	/*
 	 * the event is migrated from an another logical
 	 * cpu, so we need to swap thread specific flags
 	 */
 
-	escr = p4_config_unpack_escr(hwc->config);
-	cccr = p4_config_unpack_cccr(hwc->config);
+	escr = p4_config_unpack_escr(config);
+	cccr = p4_config_unpack_cccr(config);
 
 	if (p4_ht_thread(cpu)) {
 		cccr &= ~P4_CCCR_OVF_PMI_T0;
@@ -1092,9 +1092,9 @@ static void p4_pmu_swap_config_ts(struct
 			escr &= ~P4_ESCR_T0_USR;
 			escr |= P4_ESCR_T1_USR;
 		}
-		hwc->config  = p4_config_pack_escr(escr);
-		hwc->config |= p4_config_pack_cccr(cccr);
-		hwc->config |= P4_CONFIG_HT;
+		config  = p4_config_pack_escr(escr);
+		config |= p4_config_pack_cccr(cccr);
+		config |= P4_CONFIG_HT;
 	} else {
 		cccr &= ~P4_CCCR_OVF_PMI_T1;
 		cccr |= P4_CCCR_OVF_PMI_T0;
@@ -1106,10 +1106,12 @@ static void p4_pmu_swap_config_ts(struct
 			escr &= ~P4_ESCR_T1_USR;
 			escr |= P4_ESCR_T0_USR;
 		}
-		hwc->config  = p4_config_pack_escr(escr);
-		hwc->config |= p4_config_pack_cccr(cccr);
-		hwc->config &= ~P4_CONFIG_HT;
+		config  = p4_config_pack_escr(escr);
+		config |= p4_config_pack_cccr(cccr);
+		config &= ~P4_CONFIG_HT;
 	}
+
+	return config;
 }
 
 /*
@@ -1208,6 +1210,7 @@ static int p4_pmu_schedule_events(struct
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	unsigned long escr_mask[BITS_TO_LONGS(P4_ESCR_MSR_TABLE_SIZE)];
 	int cpu = smp_processor_id();
+	u64 config[X86_PMC_IDX_MAX];
 	struct hw_perf_event *hwc;
 	struct p4_event_bind *bind;
 	unsigned int i, thread, num;
@@ -1233,12 +1236,13 @@ again:
 		if (pass > 2)
 			goto done;
 
-		bind = p4_config_get_bind(hwc->config);
+		config[i] = hwc->config;
+		bind = p4_config_get_bind(config[i]);
 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);
 		if (unlikely(escr_idx == -1))
 			goto done;
 
-		if (hwc->idx != -1 && !p4_should_swap_ts(hwc->config, cpu)) {
+		if (hwc->idx != -1 && !p4_should_swap_ts(config[i], cpu)) {
 			cntr_idx = hwc->idx;
 			if (assign)
 				assign[i] = hwc->idx;
@@ -1250,32 +1254,15 @@ again:
 			/*
 			 * Check whether an event alias is still available.
 			 */
-			config_alias = p4_get_alias_event(hwc->config);
+			config_alias = p4_get_alias_event(config[i]);
 			if (!config_alias)
 				goto done;
-			hwc->config = config_alias;
+			config[i] = config_alias;
 			pass++;
 			goto again;
 		}
-		/*
-		 * Perf does test runs to see if a whole group can be assigned
-		 * together succesfully.  There can be multiple rounds of this.
-		 * Unfortunately, p4_pmu_swap_config_ts touches the hwc->config
-		 * bits, such that the next round of group assignments will
-		 * cause the above p4_should_swap_ts to pass instead of fail.
-		 * This leads to counters exclusive to thread0 being used by
-		 * thread1.
-		 *
-		 * Solve this with a cheap hack, reset the idx back to -1 to
-		 * force a new lookup (p4_next_cntr) to get the right counter
-		 * for the right thread.
-		 *
-		 * This probably doesn't comply with the general spirit of how
-		 * perf wants to work, but P4 is special. :-(
-		 */
-		if (p4_should_swap_ts(hwc->config, cpu))
-			hwc->idx = -1;
-		p4_pmu_swap_config_ts(hwc, cpu);
+
+		config[i] = p4_pmu_swap_config_ts(config[i], cpu);
 		if (assign)
 			assign[i] = cntr_idx;
 reserve:
@@ -1284,6 +1271,12 @@ reserve:
 	}
 
 done:
+	if (num == 0) {
+		for (i = 0; i < n; i++, num--) {
+			hwc = &cpuc->event_list[i]->hw;
+			hwc->config = config[i];
+		}
+	}
 	return num ? -EINVAL : 0;
 }
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  7:37     ` Cyrill Gorcunov
@ 2014-05-08  7:49       ` Cyrill Gorcunov
  2014-05-08  8:02         ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-08  7:49 UTC (permalink / raw)
  To: Don Zickus, Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Thu, May 08, 2014 at 11:37:56AM +0400, Cyrill Gorcunov wrote:
> On Wed, May 07, 2014 at 10:00:50PM -0400, Don Zickus wrote:
> > 
> > I think my commit 13beacee817d27a40ffc6f065ea0042685611dd5 explains this
> > corruption.  Though I have to admit I haven't looked through the problem
> > very closely yet.
> > 
> > IOW my lazy fix in that commit doesn't cover fuzzers and the real problem
> > in p4_pmu_schedule_events. :-)
> 
> Don, Vince, could you please give the patch a run? I've only compile tested
> it obviously since I've no real p4 hw. And the patch itself is a bit ugly
> but should bring the light if we're still having problems in events
> scheduling.

Drop it, won't work.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  7:49       ` Cyrill Gorcunov
@ 2014-05-08  8:02         ` Cyrill Gorcunov
  2014-05-09 16:19           ` Vince Weaver
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-08  8:02 UTC (permalink / raw)
  To: Don Zickus, Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Thu, May 08, 2014 at 11:49:30AM +0400, Cyrill Gorcunov wrote:
> > Don, Vince, could you please give the patch a run? I've only compile tested
> > it obviously since I've no real p4 hw. And the patch itself is a bit ugly
> > but should bring the light if we're still having problems in events
> > scheduling.
> 
> Drop it, won't work.

Updated.
---
 arch/x86/kernel/cpu/perf_event_p4.c |   67 ++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 37 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1063,23 +1063,23 @@ static int p4_pmu_handle_irq(struct pt_r
  * swap thread specific fields according to a thread
  * we are going to run on
  */
-static void p4_pmu_swap_config_ts(struct hw_perf_event *hwc, int cpu)
+static u64 p4_pmu_swap_config_ts(u64 config, int cpu)
 {
 	u32 escr, cccr;
 
 	/*
 	 * we either lucky and continue on same cpu or no HT support
 	 */
-	if (!p4_should_swap_ts(hwc->config, cpu))
-		return;
+	if (!p4_should_swap_ts(config, cpu))
+		return config;
 
 	/*
 	 * the event is migrated from an another logical
 	 * cpu, so we need to swap thread specific flags
 	 */
 
-	escr = p4_config_unpack_escr(hwc->config);
-	cccr = p4_config_unpack_cccr(hwc->config);
+	escr = p4_config_unpack_escr(config);
+	cccr = p4_config_unpack_cccr(config);
 
 	if (p4_ht_thread(cpu)) {
 		cccr &= ~P4_CCCR_OVF_PMI_T0;
@@ -1092,9 +1092,9 @@ static void p4_pmu_swap_config_ts(struct
 			escr &= ~P4_ESCR_T0_USR;
 			escr |= P4_ESCR_T1_USR;
 		}
-		hwc->config  = p4_config_pack_escr(escr);
-		hwc->config |= p4_config_pack_cccr(cccr);
-		hwc->config |= P4_CONFIG_HT;
+		config  = p4_config_pack_escr(escr);
+		config |= p4_config_pack_cccr(cccr);
+		config |= P4_CONFIG_HT;
 	} else {
 		cccr &= ~P4_CCCR_OVF_PMI_T1;
 		cccr |= P4_CCCR_OVF_PMI_T0;
@@ -1106,10 +1106,12 @@ static void p4_pmu_swap_config_ts(struct
 			escr &= ~P4_ESCR_T1_USR;
 			escr |= P4_ESCR_T0_USR;
 		}
-		hwc->config  = p4_config_pack_escr(escr);
-		hwc->config |= p4_config_pack_cccr(cccr);
-		hwc->config &= ~P4_CONFIG_HT;
+		config  = p4_config_pack_escr(escr);
+		config |= p4_config_pack_cccr(cccr);
+		config &= ~P4_CONFIG_HT;
 	}
+
+	return config;
 }
 
 /*
@@ -1208,9 +1210,10 @@ static int p4_pmu_schedule_events(struct
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	unsigned long escr_mask[BITS_TO_LONGS(P4_ESCR_MSR_TABLE_SIZE)];
 	int cpu = smp_processor_id();
+	u64 config[X86_PMC_IDX_MAX];
 	struct hw_perf_event *hwc;
 	struct p4_event_bind *bind;
-	unsigned int i, thread, num;
+	unsigned int i, thread;
 	int cntr_idx, escr_idx;
 	u64 config_alias;
 	int pass;
@@ -1218,12 +1221,13 @@ static int p4_pmu_schedule_events(struct
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
 	bitmap_zero(escr_mask, P4_ESCR_MSR_TABLE_SIZE);
 
-	for (i = 0, num = n; i < n; i++, num--) {
+	for (i = 0; i < n; i++) {
 
 		hwc = &cpuc->event_list[i]->hw;
 		thread = p4_ht_thread(cpu);
 		pass = 0;
 
+		config[i] = hwc->config;
 again:
 		/*
 		 * It's possible to hit a circular lock
@@ -1233,12 +1237,12 @@ again:
 		if (pass > 2)
 			goto done;
 
-		bind = p4_config_get_bind(hwc->config);
+		bind = p4_config_get_bind(config[i]);
 		escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);
 		if (unlikely(escr_idx == -1))
 			goto done;
 
-		if (hwc->idx != -1 && !p4_should_swap_ts(hwc->config, cpu)) {
+		if (hwc->idx != -1 && !p4_should_swap_ts(config[i], cpu)) {
 			cntr_idx = hwc->idx;
 			if (assign)
 				assign[i] = hwc->idx;
@@ -1250,32 +1254,15 @@ again:
 			/*
 			 * Check whether an event alias is still available.
 			 */
-			config_alias = p4_get_alias_event(hwc->config);
+			config_alias = p4_get_alias_event(config[i]);
 			if (!config_alias)
 				goto done;
-			hwc->config = config_alias;
+			config[i] = config_alias;
 			pass++;
 			goto again;
 		}
-		/*
-		 * Perf does test runs to see if a whole group can be assigned
-		 * together succesfully.  There can be multiple rounds of this.
-		 * Unfortunately, p4_pmu_swap_config_ts touches the hwc->config
-		 * bits, such that the next round of group assignments will
-		 * cause the above p4_should_swap_ts to pass instead of fail.
-		 * This leads to counters exclusive to thread0 being used by
-		 * thread1.
-		 *
-		 * Solve this with a cheap hack, reset the idx back to -1 to
-		 * force a new lookup (p4_next_cntr) to get the right counter
-		 * for the right thread.
-		 *
-		 * This probably doesn't comply with the general spirit of how
-		 * perf wants to work, but P4 is special. :-(
-		 */
-		if (p4_should_swap_ts(hwc->config, cpu))
-			hwc->idx = -1;
-		p4_pmu_swap_config_ts(hwc, cpu);
+
+		config[i] = p4_pmu_swap_config_ts(config[i], cpu);
 		if (assign)
 			assign[i] = cntr_idx;
 reserve:
@@ -1284,7 +1271,13 @@ reserve:
 	}
 
 done:
-	return num ? -EINVAL : 0;
+	if (i == n) {
+		for (i = 0; i < n; i++)
+			cpuc->event_list[i]->hw.config = config[i];
+		return 0;
+	}
+
+	return -EINVAL;
 }
 
 PMU_FORMAT_ATTR(cccr, "config:0-31" );

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-08  8:02         ` Cyrill Gorcunov
@ 2014-05-09 16:19           ` Vince Weaver
  2014-05-09 16:30             ` Cyrill Gorcunov
  2014-05-14 20:39             ` Cyrill Gorcunov
  0 siblings, 2 replies; 35+ messages in thread
From: Vince Weaver @ 2014-05-09 16:19 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Don Zickus, Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Thu, 8 May 2014, Cyrill Gorcunov wrote:
> 
> Updated.
> ---
>  arch/x86/kernel/cpu/perf_event_p4.c |   67 ++++++++++++++++--------------------
>  1 file changed, 30 insertions(+), 37 deletions(-)

I tried this patch, and even though it seemed to fix one of the NMI storms 
I was experiencing I've managed to trigger again using a different random 
seed.

I've been trying to track down a trace of what is triggering things, but 
this is very difficult as the full log isn't making it to the serial 
console, even when I fsync() stdout.

Maybe related, but the following messages tend to happen a lot while 
fuzzing, and always happen before the fuzzing that eventually locks up:

The warnings are for
	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
and
	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);	


[ 3086.168915] hrtimer: interrupt took 4251 ns

[ 3097.845779] NOHZ: local_softirq_pending 100
[ 3098.222766] NOHZ: local_softirq_pending 100
[ 3098.239817] ------------[ cut here ]------------
[ 3098.240006] WARNING: CPU: 0 PID: 1877 at arch/x86/kernel/cpu/perf_event.c:1082 x86_pmu_start+0x4b/0xf8()
[ 3098.240006] Modules linked in: loop iTCO_wdt iTCO_vendor_support lpc_ich ppdev evdev microcode pcspkr psmouse i915 serio_raw parport_pc tpm_tis drm_kms_helper i2c_i801 tpm mfd_core parport drm acpi_cpufreq processor video button thermal_sys snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i2c_algo_bit snd_timer i2c_core snd soundcore sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common tg3 ehci_pci uhci_hcd ehci_hcd ptp ata_generic ata_piix floppy libata scsi_mod pps_core libphy usbcore usb_common
[ 3098.240006] CPU: 0 PID: 1877 Comm: perf_fuzzer Not tainted 3.15.0-rc4+ #4
[ 3098.240006] Hardware name: LENOVO 88088NU/LENOVO, BIOS 2JKT37AUS 07/12/2007
[ 3098.240006]  0000000000000000 ffff88003af6bb40 ffffffff81430fbe 0000000000000000
[ 3098.240006]  ffff88003af6bb78 ffffffff8103c77d ffffffff81012912 ffff88003cb52c00
[ 3098.240006]  ffff88003f40bb50 000000000000000c ffff88003f40bb58 ffff88003af6bb88
[ 3098.240006] Call Trace:
[ 3098.240006]  [<ffffffff81430fbe>] dump_stack+0x45/0x56
[ 3098.240006]  [<ffffffff8103c77d>] warn_slowpath_common+0x7f/0x98
[ 3098.240006]  [<ffffffff81012912>] ? x86_pmu_start+0x4b/0xf8
[ 3098.240006]  [<ffffffff8103c844>] warn_slowpath_null+0x1a/0x1c
[ 3098.240006]  [<ffffffff81012912>] x86_pmu_start+0x4b/0xf8
[ 3098.240006]  [<ffffffff81012ee2>] x86_pmu_enable+0x154/0x233
[ 3098.240006]  [<ffffffff810cfc66>] perf_pmu_enable+0x27/0x29
[ 3098.240006]  [<ffffffff810117e7>] x86_pmu_commit_txn+0x7b/0x98
[ 3098.240006]  [<ffffffff8108a9e5>] ? clockevents_program_event+0x9d/0xb9
[ 3098.240006]  [<ffffffff810591ad>] ? __hrtimer_start_range_ns+0x267/0x299
[ 3098.240006]  [<ffffffff81015067>] ? p4_pmu_enable_event+0x111/0x11c
[ 3098.240006]  [<ffffffff810150b0>] ? p4_pmu_enable_all+0x3e/0x48
[ 3098.240006]  [<ffffffff810d0cfe>] ? event_sched_in+0x138/0x148
[ 3098.240006]  [<ffffffff810d0da6>] group_sched_in+0x98/0x141
[ 3098.240006]  [<ffffffff81064b4b>] ? sched_clock_cpu+0x91/0xa2
[ 3098.240006]  [<ffffffff810d1960>] __perf_event_enable+0xf6/0x136
[ 3098.240006]  [<ffffffff810cd934>] remote_function+0x1c/0x45
[ 3098.240006]  [<ffffffff810905c6>] generic_exec_single+0x3e/0x10f
[ 3098.240006]  [<ffffffff810cd918>] ? cpu_clock_event_add+0x1b/0x1b
[ 3098.240006]  [<ffffffff810cd918>] ? cpu_clock_event_add+0x1b/0x1b
[ 3098.240006]  [<ffffffff81090715>] smp_call_function_single+0x7e/0x86
[ 3098.240006]  [<ffffffff810cc9cd>] task_function_call+0x49/0x53
[ 3098.240006]  [<ffffffff810d186a>] ? __perf_install_in_context+0xf2/0xf2
[ 3098.240006]  [<ffffffff810cf080>] perf_event_enable+0x8a/0xbf
[ 3098.240006]  [<ffffffff810ceff6>] ? __perf_event_mark_enabled+0x5f/0x5f
[ 3098.240006]  [<ffffffff810cca34>] perf_event_for_each_child+0x5d/0x98
[ 3098.240006]  [<ffffffff810d048b>] perf_event_task_enable+0x56/0x7c
[ 3098.240006]  [<ffffffff8104dea9>] SyS_prctl+0x16e/0x391
[ 3098.240006]  [<ffffffff811228fd>] ? SyS_write+0x63/0x79
[ 3098.240006]  [<ffffffff8143acd6>] system_call_fastpath+0x1a/0x1f
[ 3098.240006] ---[ end trace de5690b3396a1c26 ]---

[ 3113.245401] NOHZ: local_softirq_pending 100
[ 3117.838763] ------------[ cut here ]------------
[ 3117.840006] WARNING: CPU: 1 PID: 1877 at arch/x86/kernel/cpu/perf_event.c:1164 x86_pmu_stop+0x6d/0xa0()
[ 3117.840006] Modules linked in: loop iTCO_wdt iTCO_vendor_support lpc_ich ppdev evdev microcode pcspkr psmouse i915 serio_raw parport_pc tpm_tis drm_kms_helper i2c_i801 tpm mfd_core parport drm acpi_cpufreq processor video button thermal_sys snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i2c_algo_bit snd_timer i2c_core snd soundcore sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common tg3 ehci_pci uhci_hcd ehci_hcd ptp ata_generic ata_piix floppy libata scsi_mod pps_core libphy usbcore usb_common
[ 3117.840006] CPU: 1 PID: 1877 Comm: perf_fuzzer Tainted: G        W     3.15.0-rc4+ #4
[ 3117.840006] Hardware name: LENOVO 88088NU/LENOVO, BIOS 2JKT37AUS 07/12/2007
[ 3117.840006]  0000000000000000 ffff88003af6b7a0 ffffffff81430fbe 0000000000000000
[ 3117.840006]  ffff88003af6b7d8 ffffffff8103c77d ffffffff81012034 ffff880037038400
[ 3117.840006]  ffff88003f50bb50 0000000000000004 000002d043484e06 ffff88003af6b7e8
[ 3117.840006] Call Trace:
[ 3117.840006]  [<ffffffff81430fbe>] dump_stack+0x45/0x56
[ 3117.840006]  [<ffffffff8103c77d>] warn_slowpath_common+0x7f/0x98
[ 3117.840006]  [<ffffffff81012034>] ? x86_pmu_stop+0x6d/0xa0
[ 3117.840006]  [<ffffffff8103c844>] warn_slowpath_null+0x1a/0x1c
[ 3117.840006]  [<ffffffff81012034>] x86_pmu_stop+0x6d/0xa0
[ 3117.840006]  [<ffffffff810120a1>] x86_pmu_del+0x3a/0xe5
[ 3117.840006]  [<ffffffff810cfd01>] event_sched_out+0x99/0x102
[ 3117.840006]  [<ffffffff810cfd95>] group_sched_out+0x2b/0x7b
[ 3117.840006]  [<ffffffff810d003b>] ctx_sched_out+0x9c/0xf1
[ 3117.840006]  [<ffffffff810d126b>] __perf_event_task_sched_out+0x171/0x306
[ 3117.840006]  [<ffffffff8105eef0>] perf_event_task_sched_out+0x33/0x67
[ 3117.840006]  [<ffffffff81066d7d>] ? set_next_entity+0x3e/0x65
[ 3117.840006]  [<ffffffff8106890b>] ? pick_next_task_fair+0x142/0x336
[ 3117.840006]  [<ffffffff810674bf>] ? dequeue_task_fair+0x155/0x162
[ 3117.840006]  [<ffffffff8105e710>] ? pick_next_task+0x33/0x6b
[ 3117.840006]  [<ffffffff81432c03>] __schedule+0x237/0x4cd
[ 3117.840006]  [<ffffffff81432f0c>] schedule+0x73/0x75
[ 3117.840006]  [<ffffffff81432973>] schedule_hrtimeout_range_clock+0xb6/0xe6
[ 3117.840006]  [<ffffffff81058972>] ? hrtimer_get_res+0x42/0x42
[ 3117.840006]  [<ffffffff810591f3>] ? hrtimer_start_range_ns+0x14/0x16
[ 3117.840006]  [<ffffffff814329b6>] schedule_hrtimeout_range+0x13/0x15
[ 3117.840006]  [<ffffffff81130be8>] poll_schedule_timeout+0x41/0x61
[ 3117.840006]  [<ffffffff81131df0>] do_sys_poll+0x391/0x429
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81130d7c>] ? poll_select_copy_remaining+0xf9/0xf9
[ 3117.840006]  [<ffffffff81131f28>] SyS_poll+0x50/0xb7
[ 3117.840006]  [<ffffffff8143aed3>] tracesys+0xe1/0xe6
[ 3117.840006] ---[ end trace de5690b3396a1c27 ]---

[ 3123.751913] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 3123.752446] Do you have a strange power saving mode enabled?
[ 3123.752446] Dazed and confused, but trying to continue
[ 3123.752446] Uhhuh. NMI received for unknown reason 31 on CPU 0.
[ 3123.752446] Do you have a strange power saving mode enabled?
[ 3123.752446] Dazed and confused, but trying to continue
....repeat forever


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-09 16:19           ` Vince Weaver
@ 2014-05-09 16:30             ` Cyrill Gorcunov
  2014-05-14 20:39             ` Cyrill Gorcunov
  1 sibling, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-09 16:30 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Don Zickus, linux-kernel, Peter Zijlstra, Ingo Molnar

On Fri, May 09, 2014 at 12:19:49PM -0400, Vince Weaver wrote:
> On Thu, 8 May 2014, Cyrill Gorcunov wrote:
> > 
> > Updated.
> > ---
> >  arch/x86/kernel/cpu/perf_event_p4.c |   67 ++++++++++++++++--------------------
> >  1 file changed, 30 insertions(+), 37 deletions(-)
> 
> I tried this patch, and even though it seemed to fix one of the NMI storms 
> I was experiencing I've managed to trigger again using a different random 
> seed.

Thanks a lot for help in testing, Vince! I think the patch would be still
needed so I will prepare normal change log explaining what has been fixed
in the patch.

> I've been trying to track down a trace of what is triggering things, but 
> this is very difficult as the full log isn't making it to the serial 
> console, even when I fsync() stdout.

I see. Need to think. Maybe something comes to mind. Debuggin it is really
a hard job :/ I've been promised to get access to real p4 machine next
week, maybe I find something.

> Maybe related, but the following messages tend to happen a lot while 
> fuzzing, and always happen before the fuzzing that eventually locks up:
> 
> The warnings are for
> 	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> and
> 	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);	

Thanks for info, Vince! Look, every time you start perf_fuzzer (note,
I didn't read its code yet, that's why I'm asking) -- does it log
which events are passed to kernel from userspace? Btw, do you run
kernel with nmi-watchdog turned on?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-09 16:19           ` Vince Weaver
  2014-05-09 16:30             ` Cyrill Gorcunov
@ 2014-05-14 20:39             ` Cyrill Gorcunov
  2014-05-15  5:31               ` Vince Weaver
  1 sibling, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-14 20:39 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Don Zickus, linux-kernel, Peter Zijlstra, Ingo Molnar

On Fri, May 09, 2014 at 12:19:49PM -0400, Vince Weaver wrote:
> On Thu, 8 May 2014, Cyrill Gorcunov wrote:
> > 
> > Updated.
> > ---
> >  arch/x86/kernel/cpu/perf_event_p4.c |   67 ++++++++++++++++--------------------
> >  1 file changed, 30 insertions(+), 37 deletions(-)
> 
> I tried this patch, and even though it seemed to fix one of the NMI storms 
> I was experiencing I've managed to trigger again using a different random 
> seed.
> 
> I've been trying to track down a trace of what is triggering things, but 
> this is very difficult as the full log isn't making it to the serial 
> console, even when I fsync() stdout.
> 
> Maybe related, but the following messages tend to happen a lot while 
> fuzzing, and always happen before the fuzzing that eventually locks up:
> 
> The warnings are for
> 	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> and
> 	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);

So I'm experiencing the same problem on latest -tip + my patches applied.

[  635.184382] perf interrupt took too long (2522 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[  638.674769] perf interrupt took too long (5009 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 1126.156992] ------------[ cut here ]------------
[ 1126.157010] WARNING: CPU: 0 PID: 6166 at arch/x86/kernel/cpu/perf_event.c:1083 x86_pmu_start+0x50/0xe5()
[ 1126.157014] Modules linked in:
[ 1126.157022] CPU: 0 PID: 6166 Comm: perf_fuzzer Not tainted 3.15.0-rc5-gfddecae-dirty #2
[ 1126.157024] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./P5GD1 PRO, BIOS 1012.001 10/13/2005
[ 1126.157024]  00000000 00000000 f49add90 c15684ed 00000000 f49adda8 c10385cc c10112a9
[ 1126.157024]  f5d5e7f0 f3bafc00 0000000c f49addb8 c10385f7 00000009 00000000 f49addd0
[ 1126.157024]  c10112a9 00000002 f5d5e7f4 f3bafc00 f5d5e7f0 f49addf8 c10118ea 00000001
[ 1126.157024] Call Trace:
[ 1126.157024]  [<c15684ed>] dump_stack+0x49/0x73
[ 1126.157024]  [<c10385cc>] warn_slowpath_common+0x66/0x7d
[ 1126.157024]  [<c10112a9>] ? x86_pmu_start+0x50/0xe5
[ 1126.157024]  [<c10385f7>] warn_slowpath_null+0x14/0x18
[ 1126.157024]  [<c10112a9>] x86_pmu_start+0x50/0xe5
[ 1126.157024]  [<c10118ea>] x86_pmu_enable+0x221/0x260
[ 1126.157024]  [<c10c6e8f>] perf_pmu_enable+0x1f/0x23
[ 1126.157024]  [<c10c87a0>] perf_cpu_hrtimer_handler+0xe9/0x131
[ 1126.157024]  [<c10c86b7>] ? __perf_install_in_context+0xc7/0xc7
[ 1126.157024]  [<c1053304>] __run_hrtimer+0xa6/0x149
[ 1126.157024]  [<c1053b57>] hrtimer_interrupt+0xe6/0x1e5
[ 1126.157024]  [<c12b28c0>] ? __this_cpu_preempt_check+0xf/0x11
[ 1126.157024]  [<c1026a7f>] local_apic_timer_interrupt+0x45/0x4a
[ 1126.157024]  [<c1026f7c>] smp_trace_apic_timer_interrupt+0x48/0xa2
[ 1126.157024]  [<c156f006>] trace_apic_timer_interrupt+0x32/0x38
[ 1126.157024]  [<c106007b>] ? sched_slice.isra.40+0x7e/0x91
[ 1126.157024]  [<c108bec1>] ? generic_exec_single+0x4f/0xea
[ 1126.157024]  [<c10c4179>] ? perf_cgroup_exit+0x17/0x17
[ 1126.157024]  [<c10c4179>] ? perf_cgroup_exit+0x17/0x17
[ 1126.157024]  [<c108c011>] smp_call_function_single+0x66/0x9a
[ 1126.157024]  [<c10c3730>] cpu_function_call+0x29/0x2e
[ 1126.157024]  [<c10c7107>] ? group_sched_out+0x66/0x66
[ 1126.157024]  [<c10c5aae>] perf_event_disable+0x2d/0x7b
[ 1126.157024]  [<c10c5a81>] ? list_del_event+0xa8/0xa8
[ 1126.157024]  [<c10c3a2a>] perf_event_for_each_child+0x4c/0x7b
[ 1126.157024]  [<c10c768f>] perf_event_task_disable+0x3a/0x67
[ 1126.157024]  [<c1048bb9>] SyS_prctl+0x14a/0x345
[ 1126.157024]  [<c106e363>] ? trace_hardirqs_on_caller+0x177/0x1d2
[ 1126.157024]  [<c156e644>] sysenter_do_call+0x12/0x32
[ 1126.157024] ---[ end trace 1c8a0d8dcf7e5bde ]---

Continue investigating...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-14 20:39             ` Cyrill Gorcunov
@ 2014-05-15  5:31               ` Vince Weaver
  2014-05-15 22:09                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Vince Weaver @ 2014-05-15  5:31 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vince Weaver, Don Zickus, linux-kernel, Peter Zijlstra, Ingo Molnar

On Thu, 15 May 2014, Cyrill Gorcunov wrote:

> So I'm experiencing the same problem on latest -tip + my patches applied.

glad it's not just me.

I find the problem to be reproducible and so in theory it might be 
possible to generate a small reproducing test case.  
I meant to do that already but I got distracted 
by other things.

I think I managed to have a trace I recorded and replayed recreate the 
issue, but narrowing things down from a 2000+ syscall trace can be 
tedious.

Let me know if you have any other issues with the perf_fuzzer.

Vince

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-15  5:31               ` Vince Weaver
@ 2014-05-15 22:09                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-15 22:09 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Don Zickus, linux-kernel, Peter Zijlstra, Ingo Molnar

On Thu, May 15, 2014 at 01:31:16AM -0400, Vince Weaver wrote:
> On Thu, 15 May 2014, Cyrill Gorcunov wrote:
> 
> > So I'm experiencing the same problem on latest -tip + my patches applied.
> 
> glad it's not just me.
> 
> I find the problem to be reproducible and so in theory it might be 
> possible to generate a small reproducing test case.  
> I meant to do that already but I got distracted by other things.

Please ping me once you manage to create such case.

> 
> I think I managed to have a trace I recorded and replayed recreate the 
> issue, but narrowing things down from a 2000+ syscall trace can be 
> tedious.

I imagine :/

> 
> Let me know if you have any other issues with the perf_fuzzer.

Sure. Because the testing machine I'm having is connected remotely
when hang happens the only notification I get is unresposible
ssh connection, sigh...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-06 15:42 perf_fuzzer crash on pentium 4 Vince Weaver
  2014-05-06 15:46 ` Peter Zijlstra
  2014-05-06 20:23 ` Cyrill Gorcunov
@ 2014-05-28 13:56 ` Pavel Machek
  2014-05-28 14:06   ` Cyrill Gorcunov
  2 siblings, 1 reply; 35+ messages in thread
From: Pavel Machek @ 2014-05-28 13:56 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Cyrill Gorcunov

Hi!

> So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> 
> It crashes more or less instantly (sorry for the line wrapping, 
> just got the serial console hooked up and don't have minicom configured 
> right yet).
> 
> this is 3.15-rc4 with the anti-memory corruption patch applied.
> 
> [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331

Pentium 4 's were not 64bit, were they?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-28 13:56 ` Pavel Machek
@ 2014-05-28 14:06   ` Cyrill Gorcunov
  2014-05-28 15:20     ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-28 14:06 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar

On Wed, May 28, 2014 at 03:56:17PM +0200, Pavel Machek wrote:
> Hi!
> 
> > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > 
> > It crashes more or less instantly (sorry for the line wrapping, 
> > just got the serial console hooked up and don't have minicom configured 
> > right yet).
> > 
> > this is 3.15-rc4 with the anti-memory corruption patch applied.
> > 
> > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> 
> Pentium 4 's were not 64bit, were they?

Not all, but there were 64bit varians (Xeons iirc).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-28 14:06   ` Cyrill Gorcunov
@ 2014-05-28 15:20     ` Peter Zijlstra
  2014-05-28 15:43       ` Cyrill Gorcunov
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2014-05-28 15:20 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Pavel Machek, Vince Weaver, linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 854 bytes --]

On Wed, May 28, 2014 at 06:06:32PM +0400, Cyrill Gorcunov wrote:
> On Wed, May 28, 2014 at 03:56:17PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > So just to be difficult I fired up the perf_fuzzer on a Pentium 4 machine.
> > > 
> > > It crashes more or less instantly (sorry for the line wrapping, 
> > > just got the serial console hooked up and don't have minicom configured 
> > > right yet).
> > > 
> > > this is 3.15-rc4 with the anti-memory corruption patch applied.
> > > 
> > > [   67.872274] BUG: unable to handle kernel NULL pointer dereference at 00000004
> > > [   67.876146] IP: [<ffffffff81013df2>] p4_pmu_schedule_events+0xa5/0x331
> > 
> > Pentium 4 's were not 64bit, were they?
> 
> Not all, but there were 64bit varians (Xeons iirc).

http://en.wikipedia.org/wiki/Pentium_d

consumer netburst chip with x86_64.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf_fuzzer crash on pentium 4
  2014-05-28 15:20     ` Peter Zijlstra
@ 2014-05-28 15:43       ` Cyrill Gorcunov
  0 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-05-28 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Pavel Machek, Vince Weaver, linux-kernel, Ingo Molnar

On Wed, May 28, 2014 at 05:20:11PM +0200, Peter Zijlstra wrote:
> > > 
> > > Pentium 4 's were not 64bit, were they?
> > 
> > Not all, but there were 64bit varians (Xeons iirc).
> 
> http://en.wikipedia.org/wiki/Pentium_d
> 
> consumer netburst chip with x86_64.

Yeah, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-05-28 15:43 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06 15:42 perf_fuzzer crash on pentium 4 Vince Weaver
2014-05-06 15:46 ` Peter Zijlstra
2014-05-06 15:49   ` Cyrill Gorcunov
2014-05-06 16:05     ` Vince Weaver
2014-05-06 16:06       ` Cyrill Gorcunov
2014-05-06 16:11   ` Vince Weaver
2014-05-06 16:16     ` Cyrill Gorcunov
2014-05-06 17:56       ` Vince Weaver
2014-05-06 20:23 ` Cyrill Gorcunov
2014-05-06 21:30   ` Vince Weaver
2014-05-06 21:46     ` Cyrill Gorcunov
2014-05-07 16:46       ` Vince Weaver
2014-05-07 16:49         ` Cyrill Gorcunov
2014-05-07 16:58           ` Cyrill Gorcunov
2014-05-07 17:07             ` Vince Weaver
2014-05-07 18:24               ` Cyrill Gorcunov
2014-05-07 21:17                 ` Vince Weaver
2014-05-07 21:51                   ` Cyrill Gorcunov
2014-05-07 21:54                     ` Cyrill Gorcunov
2014-05-08  5:14                       ` Vince Weaver
2014-05-08  5:40                         ` Cyrill Gorcunov
2014-05-08  2:00   ` Don Zickus
2014-05-08  5:38     ` Cyrill Gorcunov
2014-05-08  7:37     ` Cyrill Gorcunov
2014-05-08  7:49       ` Cyrill Gorcunov
2014-05-08  8:02         ` Cyrill Gorcunov
2014-05-09 16:19           ` Vince Weaver
2014-05-09 16:30             ` Cyrill Gorcunov
2014-05-14 20:39             ` Cyrill Gorcunov
2014-05-15  5:31               ` Vince Weaver
2014-05-15 22:09                 ` Cyrill Gorcunov
2014-05-28 13:56 ` Pavel Machek
2014-05-28 14:06   ` Cyrill Gorcunov
2014-05-28 15:20     ` Peter Zijlstra
2014-05-28 15:43       ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).