linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: fuzzer, lots of warnings
@ 2015-07-02 15:42 Vince Weaver
  2015-07-02 16:47 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2015-07-02 15:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian


So this does seem to have gotten worse with 4.2-rc0.  Four warnings are 
triggered more or less immediately after starting fuzzing.


(#1)

WARNING: CPU: 0 PID: 32434 at arch/x86/kernel/cpu/perf_event.c:1219 x86_pmu_start+0xaa/0x100()

	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))

[  224.158535]  [<ffffffff8102a1ea>] x86_pmu_start+0xaa/0x100
[  224.164401]  [<ffffffff81159f39>] perf_ioctl+0x429/0x470
[  224.170101]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
[  224.176012]  [<ffffffff811fef14>] ? mntput+0x24/0x40
[  224.181343]  [<ffffffff811df3ea>] ? __fput+0x17a/0x1e0
[  224.186858]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
[  224.192275]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a

(#2)

[  584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
	(this one was described in a previous e-mail)


(#3)

[ 1210.501445] WARNING: CPU: 6 PID: 0 at arch/x86/kernel/cpu/perf_event.c:1307 x86_pmu_stop+0xb8/0xd0()

	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);

[ 1210.653328]  [<ffffffff810291b8>] x86_pmu_stop+0xb8/0xd0
[ 1210.659466]  [<ffffffff81029216>] x86_pmu_del+0x46/0x140
[ 1210.665607]  [<ffffffff81156244>] event_sched_out.isra.80+0x124/0x320
[ 1210.672924]  [<ffffffff81156553>] ? __perf_event_disable+0x53/0x130
[ 1210.680097]  [<ffffffff81156488>] group_sched_out+0x48/0xc0
[ 1210.686530]  [<ffffffff8101d2a8>] ? native_sched_clock+0x28/0x90
[ 1210.693445]  [<ffffffff811565fe>] __perf_event_disable+0xfe/0x130
[ 1210.700408]  [<ffffffff811532c0>] ? cpu_clock_event_start+0x40/0x40
[ 1210.707550]  [<ffffffff81153302>] remote_function+0x42/0x50
[ 1210.713942]  [<ffffffff810f0dab>] flush_smp_call_function_queue+0x7b/0x170
[ 1210.721762]  [<ffffffff810f1793>] generic_smp_call_function_single_interrupt+0x13/0x60
[ 1210.730679]  [<ffffffff81049228>] smp_trace_call_function_single_interrupt+0x38/0xb0
[ 1210.739467]  [<ffffffff816a78db>] trace_call_function_single_interrupt+0x6b/0x70
[ 1210.747820]  <EOI>  [<ffffffff81543f24>] ? cpuidle_enter_state+0xf4/0x220
[ 1210.755614]  [<ffffffff81543f00>] ? cpuidle_enter_state+0xd0/0x220
[ 1210.762708]  [<ffffffff81544087>] cpuidle_enter+0x17/0x20
[ 1210.768966]  [<ffffffff810b05db>] call_cpuidle+0x3b/0x70
[ 1210.775135]  [<ffffffff81544063>] ? cpuidle_select+0x13/0x20
[ 1210.781637]  [<ffffffff810b0855>] cpu_startup_entry+0x245/0x310
[ 1210.788411]  [<ffffffff81049a40>] start_secondary+0x120/0x130

(#4)

[ 1638.702578] WARNING: CPU: 2 PID: 25543 at arch/x86/kernel/cpu/perf_event_intel_lbr.c:349 intel_pmu_lbr_disable+0xbc/0xd0()

	WARN_ON_ONCE(cpuc->lbr_users < 0);

[ 1638.857921]  [<ffffffff8102e56c>] intel_pmu_lbr_disable+0xbc/0xd0
[ 1638.864874]  [<ffffffff81031812>] intel_pmu_disable_event+0xc2/0x130
[ 1638.872125]  [<ffffffff81030a49>] ? __intel_pmu_disable_all+0x49/0x70
[ 1638.879455]  [<ffffffff81029175>] x86_pmu_stop+0x75/0xd0
[ 1638.885560]  [<ffffffff81159f1a>] perf_ioctl+0x40a/0x470
[ 1638.891650]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
[ 1638.897945]  [<ffffffff811dd9ac>] ? vfs_read+0x11c/0x130
[ 1638.904069]  [<ffffffff811f27bd>] ? poll_select_set_timeout+0x4d/0x70
[ 1638.911417]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
[ 1638.917295]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 15:42 perf: fuzzer, lots of warnings Vince Weaver
@ 2015-07-02 16:47 ` Peter Zijlstra
  2015-07-02 17:59   ` Vince Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2015-07-02 16:47 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Jul 02, 2015 at 11:42:43AM -0400, Vince Weaver wrote:
> (#4)
> 
> [ 1638.702578] WARNING: CPU: 2 PID: 25543 at arch/x86/kernel/cpu/perf_event_intel_lbr.c:349 intel_pmu_lbr_disable+0xbc/0xd0()
> 
> 	WARN_ON_ONCE(cpuc->lbr_users < 0);
> 
> [ 1638.857921]  [<ffffffff8102e56c>] intel_pmu_lbr_disable+0xbc/0xd0
> [ 1638.864874]  [<ffffffff81031812>] intel_pmu_disable_event+0xc2/0x130
> [ 1638.872125]  [<ffffffff81030a49>] ? __intel_pmu_disable_all+0x49/0x70
> [ 1638.879455]  [<ffffffff81029175>] x86_pmu_stop+0x75/0xd0
> [ 1638.885560]  [<ffffffff81159f1a>] perf_ioctl+0x40a/0x470
> [ 1638.891650]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
> [ 1638.897945]  [<ffffffff811dd9ac>] ? vfs_read+0x11c/0x130
> [ 1638.904069]  [<ffffffff811f27bd>] ? poll_select_set_timeout+0x4d/0x70
> [ 1638.911417]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
> [ 1638.917295]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a

This might maybe help:

http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 16:47 ` Peter Zijlstra
@ 2015-07-02 17:59   ` Vince Weaver
  2015-07-02 18:30     ` Ingo Molnar
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vince Weaver @ 2015-07-02 17:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 2 Jul 2015, Peter Zijlstra wrote:

> On Thu, Jul 02, 2015 at 11:42:43AM -0400, Vince Weaver wrote:
> > (#4)
> > 
> > [ 1638.702578] WARNING: CPU: 2 PID: 25543 at arch/x86/kernel/cpu/perf_event_intel_lbr.c:349 intel_pmu_lbr_disable+0xbc/0xd0()
> > 
> > 	WARN_ON_ONCE(cpuc->lbr_users < 0);
> > 
> > [ 1638.857921]  [<ffffffff8102e56c>] intel_pmu_lbr_disable+0xbc/0xd0
> > [ 1638.864874]  [<ffffffff81031812>] intel_pmu_disable_event+0xc2/0x130
> > [ 1638.872125]  [<ffffffff81030a49>] ? __intel_pmu_disable_all+0x49/0x70
> > [ 1638.879455]  [<ffffffff81029175>] x86_pmu_stop+0x75/0xd0
> > [ 1638.885560]  [<ffffffff81159f1a>] perf_ioctl+0x40a/0x470
> > [ 1638.891650]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
> > [ 1638.897945]  [<ffffffff811dd9ac>] ? vfs_read+0x11c/0x130
> > [ 1638.904069]  [<ffffffff811f27bd>] ? poll_select_set_timeout+0x4d/0x70
> > [ 1638.911417]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
> > [ 1638.917295]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a
> 
> This might maybe help:
> 
> http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net

ah, OK, I missed that one somehow.  It would also explain the 

[ 3971.156004] Uhhuh. NMI received for unknown reason 21 on CPU 2.
[ 3971.162812] Do you have a strange power saving mode enabled?
[ 3971.169363] Dazed and confused, but trying to continue

messages I am getting.


I'm also still getting a lot of 
	perfevents: irq loop stuck!
messages, I thought the workaround for that had gone in for 4.2 but I 
guess not.

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 17:59   ` Vince Weaver
@ 2015-07-02 18:30     ` Ingo Molnar
  2015-07-03  4:48       ` Vince Weaver
  2015-07-02 19:48     ` Stephane Eranian
  2015-07-03  4:46     ` Vince Weaver
  2 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2015-07-02 18:30 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> I'm also still getting a lot of 
> 	perfevents: irq loop stuck!
> messages, I thought the workaround for that had gone in for 4.2 but I 
> guess not.

Hm, so I was waiting for your feedback regarding the precise period cutoff to use, 
and I guess that's where the patch got lost.

Does the value of 2 below work for you?

Also I bet we'd need the workaround on a lot more CPU models as well, I sometimes 
see that warning on an early Nehalem prototype, model 26 (Nehalem-EP).

So my guess is that everything Nehalem and later is affected, i.e. NHM, WSM, SNB, 
IVB and HSW:

        case 30: /* 45nm Nehalem    */
        case 26: /* 45nm Nehalem-EP */
        case 46: /* 45nm Nehalem-EX */
        case 37: /* 32nm Westmere    */
        case 44: /* 32nm Westmere-EP */
        case 47: /* 32nm Westmere-EX */
        case 42: /* 32nm SandyBridge         */
        case 45: /* 32nm SandyBridge-E/EN/EP */
        case 58: /* 22nm IvyBridge       */
        case 62: /* 22nm IvyBridge-EP/EX */
        case 60: /* 22nm Haswell Core */
        case 63: /* 22nm Haswell Server */
        case 69: /* 22nm Haswell ULT */
        case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */

Has anyone ever seen that warning on Broadwell and later Intel CPUs?

Thanks,

	Ingo

Signed-off-by: Ingo Molnar <mingo@kernel.org>

---
 arch/x86/kernel/cpu/perf_event_intel.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 960e85de13fb..26b13ea8299c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2479,6 +2479,15 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 
 	return c;
 }
+/*
+ * Really short periods might create infinite PMC NMI loops on Haswell,
+ * so don't allow a period of 1. There's no official erratum for this AFAIK.
+ */
+static unsigned int hsw_limit_period(struct perf_event *event, unsigned int left)
+{
+	return max(left, 2U);
+}
+
 
 /*
  * Broadwell:
@@ -2495,7 +2504,7 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
  * Therefore the effective (average) period matches the requested period,
  * despite coarser hardware granularity.
  */
-static unsigned bdw_limit_period(struct perf_event *event, unsigned left)
+static unsigned int bdw_limit_period(struct perf_event *event, unsigned left)
 {
 	if ((event->hw.config & INTEL_ARCH_EVENT_MASK) ==
 			X86_CONFIG(.event=0xc0, .umask=0x01)) {
@@ -3265,6 +3274,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.hw_config = hsw_hw_config;
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		x86_pmu.cpu_events = hsw_events_attrs;
+		x86_pmu.limit_period = hsw_limit_period;
 		x86_pmu.lbr_double_abort = true;
 		pr_cont("Haswell events, ");
 		break;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 17:59   ` Vince Weaver
  2015-07-02 18:30     ` Ingo Molnar
@ 2015-07-02 19:48     ` Stephane Eranian
  2015-07-03  4:46     ` Vince Weaver
  2 siblings, 0 replies; 8+ messages in thread
From: Stephane Eranian @ 2015-07-02 19:48 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, LKML, Ingo Molnar, Arnaldo Carvalho de Melo

On Thu, Jul 2, 2015 at 7:59 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> On Thu, 2 Jul 2015, Peter Zijlstra wrote:
>
> > On Thu, Jul 02, 2015 at 11:42:43AM -0400, Vince Weaver wrote:
> > > (#4)
> > >
> > > [ 1638.702578] WARNING: CPU: 2 PID: 25543 at arch/x86/kernel/cpu/perf_event_intel_lbr.c:349 intel_pmu_lbr_disable+0xbc/0xd0()
> > >
> > >     WARN_ON_ONCE(cpuc->lbr_users < 0);
> > >
> > > [ 1638.857921]  [<ffffffff8102e56c>] intel_pmu_lbr_disable+0xbc/0xd0
> > > [ 1638.864874]  [<ffffffff81031812>] intel_pmu_disable_event+0xc2/0x130
> > > [ 1638.872125]  [<ffffffff81030a49>] ? __intel_pmu_disable_all+0x49/0x70
> > > [ 1638.879455]  [<ffffffff81029175>] x86_pmu_stop+0x75/0xd0
> > > [ 1638.885560]  [<ffffffff81159f1a>] perf_ioctl+0x40a/0x470
> > > [ 1638.891650]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
> > > [ 1638.897945]  [<ffffffff811dd9ac>] ? vfs_read+0x11c/0x130
> > > [ 1638.904069]  [<ffffffff811f27bd>] ? poll_select_set_timeout+0x4d/0x70
> > > [ 1638.911417]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
> > > [ 1638.917295]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a
> >
> > This might maybe help:
> >
> > http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net
>
> ah, OK, I missed that one somehow.  It would also explain the
>
> [ 3971.156004] Uhhuh. NMI received for unknown reason 21 on CPU 2.
> [ 3971.162812] Do you have a strange power saving mode enabled?
> [ 3971.169363] Dazed and confused, but trying to continue
>
> messages I am getting.
>
>
> I'm also still getting a lot of
>         perfevents: irq loop stuck!

 Last I saw this was on WSM. I also suspect this may not happen if the
event is only measured at the user level.

> messages, I thought the workaround for that had gone in for 4.2 but I
> guess not.
>
> Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 17:59   ` Vince Weaver
  2015-07-02 18:30     ` Ingo Molnar
  2015-07-02 19:48     ` Stephane Eranian
@ 2015-07-03  4:46     ` Vince Weaver
  2 siblings, 0 replies; 8+ messages in thread
From: Vince Weaver @ 2015-07-03  4:46 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 2 Jul 2015, Vince Weaver wrote:

> On Thu, 2 Jul 2015, Peter Zijlstra wrote:
> 
> > On Thu, Jul 02, 2015 at 11:42:43AM -0400, Vince Weaver wrote:
> > > (#4)
> > > 
> > > [ 1638.702578] WARNING: CPU: 2 PID: 25543 at arch/x86/kernel/cpu/perf_event_intel_lbr.c:349 intel_pmu_lbr_disable+0xbc/0xd0()
> > > 
> > > 	WARN_ON_ONCE(cpuc->lbr_users < 0);
> > > 
> > > [ 1638.857921]  [<ffffffff8102e56c>] intel_pmu_lbr_disable+0xbc/0xd0
> > > [ 1638.864874]  [<ffffffff81031812>] intel_pmu_disable_event+0xc2/0x130
> > > [ 1638.872125]  [<ffffffff81030a49>] ? __intel_pmu_disable_all+0x49/0x70
> > > [ 1638.879455]  [<ffffffff81029175>] x86_pmu_stop+0x75/0xd0
> > > [ 1638.885560]  [<ffffffff81159f1a>] perf_ioctl+0x40a/0x470
> > > [ 1638.891650]  [<ffffffff811f199a>] do_vfs_ioctl+0x2ba/0x4a0
> > > [ 1638.897945]  [<ffffffff811dd9ac>] ? vfs_read+0x11c/0x130
> > > [ 1638.904069]  [<ffffffff811f27bd>] ? poll_select_set_timeout+0x4d/0x70
> > > [ 1638.911417]  [<ffffffff811f1bf9>] SyS_ioctl+0x79/0x90
> > > [ 1638.917295]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a
> > 
> > This might maybe help:
> > 
> > http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net

I applied that patch and while the lbr_disable warning hasn't re-appeared 
I am still getting a lot of NMI dazed and confused messages.

Though that might be normal, I think I've always gotten NMI dazed and 
confused messages (although a bit more frequently on the core2 box than 
the haswell one).

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-02 18:30     ` Ingo Molnar
@ 2015-07-03  4:48       ` Vince Weaver
  2015-07-06 16:36         ` Vince Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2015-07-03  4:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 2 Jul 2015, Ingo Molnar wrote:

> 
> * Vince Weaver <vincent.weaver@maine.edu> wrote:
> 
> > I'm also still getting a lot of 
> > 	perfevents: irq loop stuck!
> > messages, I thought the workaround for that had gone in for 4.2 but I 
> > guess not.
> 
> Hm, so I was waiting for your feedback regarding the precise period cutoff to use, 
> and I guess that's where the patch got lost.
> 
> Does the value of 2 below work for you?

I still get the messages when using 2.  I'll see if I can narrow down the 
cutoff.  I know the messages definitely go away when using 128.

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: fuzzer, lots of warnings
  2015-07-03  4:48       ` Vince Weaver
@ 2015-07-06 16:36         ` Vince Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Vince Weaver @ 2015-07-06 16:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Fri, 3 Jul 2015, Vince Weaver wrote:

> On Thu, 2 Jul 2015, Ingo Molnar wrote:
> 
> > 
> > * Vince Weaver <vincent.weaver@maine.edu> wrote:
> > 
> > > I'm also still getting a lot of 
> > > 	perfevents: irq loop stuck!
> > > messages, I thought the workaround for that had gone in for 4.2 but I 
> > > guess not.
> > 
> > Hm, so I was waiting for your feedback regarding the precise period cutoff to use, 
> > and I guess that's where the patch got lost.
> > 
> > Does the value of 2 below work for you?
> 
> I still get the messages when using 2.  I'll see if I can narrow down the 
> cutoff.  I know the messages definitely go away when using 128.

>From what I can tell, a value of 2 will still generate "irq loop stuck", 
but 3 appears to be safe, but I'm still fuzzing to be sure.  Previously I 
ran a kernel with 5 for a while and that also did not generate any 
messages.

Vince



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-06 16:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-02 15:42 perf: fuzzer, lots of warnings Vince Weaver
2015-07-02 16:47 ` Peter Zijlstra
2015-07-02 17:59   ` Vince Weaver
2015-07-02 18:30     ` Ingo Molnar
2015-07-03  4:48       ` Vince Weaver
2015-07-06 16:36         ` Vince Weaver
2015-07-02 19:48     ` Stephane Eranian
2015-07-03  4:46     ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).