linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: at arch/x86/kernel/cpu/perf_event.c:989
@ 2012-02-02 21:17 Markus Trippelsdorf
  2012-02-02 21:32 ` Markus Trippelsdorf
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Trippelsdorf @ 2012-02-02 21:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar

The following warning is easily reproducible after todays
'perf-urgent-for-mingo' merge by just running "perf top":

------------[ cut here ]------------
WARNING: at arch/x86/kernel/cpu/perf_event.c:989 x86_pmu_start+0xa5/0x100()
Hardware name: System Product Name
Pid: 0, comm: swapper/2 Not tainted 3.3.0-rc2-00068-g24b36da-dirty #22
Call Trace:
 <IRQ>  [<ffffffff81054ba0>] ? warn_slowpath_common+0x60/0xa0
 [<ffffffff8103e1a5>] ? x86_pmu_start+0xa5/0x100
 [<ffffffff8109f062>] ? perf_adjust_freq_unthr_context.part.82+0xe2/0x140
 [<ffffffff8109f260>] ? perf_event_task_tick+0x1a0/0x280
 [<ffffffff81078be3>] ? scheduler_tick+0xc3/0x140
 [<ffffffff810615b5>] ? update_process_times+0x55/0x80
 [<ffffffff8108904d>] ? tick_sched_timer+0x4d/0xa0
 [<ffffffff81072d1b>] ? __run_hrtimer.isra.32+0x3b/0xc0
 [<ffffffff810735e5>] ? hrtimer_interrupt+0xc5/0x1e0
 [<ffffffff81045de3>] ? smp_apic_timer_interrupt+0x63/0xa0
 [<ffffffff8147768b>] ? apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff81039100>] ? default_idle+0x20/0x40
 [<ffffffff81039172>] ? amd_e400_idle+0x52/0x100
 [<ffffffff81030156>] ? cpu_idle+0xb6/0xe0
---[ end trace 7dc3992f562261b6 ]---


-- 
Markus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-02 21:17 WARNING: at arch/x86/kernel/cpu/perf_event.c:989 Markus Trippelsdorf
@ 2012-02-02 21:32 ` Markus Trippelsdorf
  2012-02-02 22:22   ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Trippelsdorf @ 2012-02-02 21:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Stephane Eranian, Paul Mackerras, Ingo Molnar

On 2012.02.02 at 22:17 +0100, Markus Trippelsdorf wrote:
> The following warning is easily reproducible after todays
> 'perf-urgent-for-mingo' merge by just running "perf top":
> 
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/cpu/perf_event.c:989 x86_pmu_start+0xa5/0x100()
> Hardware name: System Product Name
> Pid: 0, comm: swapper/2 Not tainted 3.3.0-rc2-00068-g24b36da-dirty #22
> Call Trace:
>  <IRQ>  [<ffffffff81054ba0>] ? warn_slowpath_common+0x60/0xa0
>  [<ffffffff8103e1a5>] ? x86_pmu_start+0xa5/0x100
>  [<ffffffff8109f062>] ? perf_adjust_freq_unthr_context.part.82+0xe2/0x140
>  [<ffffffff8109f260>] ? perf_event_task_tick+0x1a0/0x280
>  [<ffffffff81078be3>] ? scheduler_tick+0xc3/0x140
>  [<ffffffff810615b5>] ? update_process_times+0x55/0x80
>  [<ffffffff8108904d>] ? tick_sched_timer+0x4d/0xa0
>  [<ffffffff81072d1b>] ? __run_hrtimer.isra.32+0x3b/0xc0
>  [<ffffffff810735e5>] ? hrtimer_interrupt+0xc5/0x1e0
>  [<ffffffff81045de3>] ? smp_apic_timer_interrupt+0x63/0xa0
>  [<ffffffff8147768b>] ? apic_timer_interrupt+0x6b/0x70
>  <EOI>  [<ffffffff81039100>] ? default_idle+0x20/0x40
>  [<ffffffff81039172>] ? amd_e400_idle+0x52/0x100
>  [<ffffffff81030156>] ? cpu_idle+0xb6/0xe0
> ---[ end trace 7dc3992f562261b6 ]---

Most likely caused by e050e3f0a71b :
 perf: Fix broken interrupt rate throttling

CCing Stephane.

-- 
Markus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-02 21:32 ` Markus Trippelsdorf
@ 2012-02-02 22:22   ` Stephane Eranian
  2012-02-04 13:51     ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-02 22:22 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar

Thanks for the bug report.
I did fix a couple of those in intel_ds.c and intel_lbr.c, obviously
there is another one in x86_pmu_start().
I'll post a patch to remove that warning.


On Thu, Feb 2, 2012 at 10:32 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2012.02.02 at 22:17 +0100, Markus Trippelsdorf wrote:
>> The following warning is easily reproducible after todays
>> 'perf-urgent-for-mingo' merge by just running "perf top":
>>
>> ------------[ cut here ]------------
>> WARNING: at arch/x86/kernel/cpu/perf_event.c:989 x86_pmu_start+0xa5/0x100()
>> Hardware name: System Product Name
>> Pid: 0, comm: swapper/2 Not tainted 3.3.0-rc2-00068-g24b36da-dirty #22
>> Call Trace:
>>  <IRQ>  [<ffffffff81054ba0>] ? warn_slowpath_common+0x60/0xa0
>>  [<ffffffff8103e1a5>] ? x86_pmu_start+0xa5/0x100
>>  [<ffffffff8109f062>] ? perf_adjust_freq_unthr_context.part.82+0xe2/0x140
>>  [<ffffffff8109f260>] ? perf_event_task_tick+0x1a0/0x280
>>  [<ffffffff81078be3>] ? scheduler_tick+0xc3/0x140
>>  [<ffffffff810615b5>] ? update_process_times+0x55/0x80
>>  [<ffffffff8108904d>] ? tick_sched_timer+0x4d/0xa0
>>  [<ffffffff81072d1b>] ? __run_hrtimer.isra.32+0x3b/0xc0
>>  [<ffffffff810735e5>] ? hrtimer_interrupt+0xc5/0x1e0
>>  [<ffffffff81045de3>] ? smp_apic_timer_interrupt+0x63/0xa0
>>  [<ffffffff8147768b>] ? apic_timer_interrupt+0x6b/0x70
>>  <EOI>  [<ffffffff81039100>] ? default_idle+0x20/0x40
>>  [<ffffffff81039172>] ? amd_e400_idle+0x52/0x100
>>  [<ffffffff81030156>] ? cpu_idle+0xb6/0xe0
>> ---[ end trace 7dc3992f562261b6 ]---
>
> Most likely caused by e050e3f0a71b :
>  perf: Fix broken interrupt rate throttling
>
> CCing Stephane.
>
> --
> Markus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-02 22:22   ` Stephane Eranian
@ 2012-02-04 13:51     ` Ingo Molnar
  2012-02-04 18:09       ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-02-04 13:51 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Markus Trippelsdorf, linux-kernel, Peter Zijlstra, Paul Mackerras


there's yet another one triggering at:

[89214.962603] ------------[ cut here ]------------
[89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
[89214.975825] Hardware name: X8DTN
[89214.979268] Modules linked in:
[89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
[89214.988865] Call Trace:
[89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
[89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
[89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
[89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
[89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
[89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
[89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
[89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
[89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
[89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
[89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-04 13:51     ` Ingo Molnar
@ 2012-02-04 18:09       ` Stephane Eranian
  2012-02-06  9:54         ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-04 18:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Markus Trippelsdorf, linux-kernel, Peter Zijlstra, Paul Mackerras

I am working on it. It is hard to reproduce for me.

On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> there's yet another one triggering at:
>
> [89214.962603] ------------[ cut here ]------------
> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
> [89214.975825] Hardware name: X8DTN
> [89214.979268] Modules linked in:
> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
> [89214.988865] Call Trace:
> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
>
> Thanks,
>
>        Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-04 18:09       ` Stephane Eranian
@ 2012-02-06  9:54         ` Stephane Eranian
  2012-02-06 10:12           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06  9:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Markus Trippelsdorf, linux-kernel, Peter Zijlstra, Paul Mackerras

On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@google.com> wrote:
> I am working on it. It is hard to reproduce for me.
>
What did you run to trigger this warning? What system is this on?

> On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> there's yet another one triggering at:
>>
>> [89214.962603] ------------[ cut here ]------------
>> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
>> [89214.975825] Hardware name: X8DTN
>> [89214.979268] Modules linked in:
>> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
>> [89214.988865] Call Trace:
>> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
>> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
>> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
>> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
>> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
>> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
>> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
>> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
>> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
>> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
>> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
>>
>> Thanks,
>>
>>        Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06  9:54         ` Stephane Eranian
@ 2012-02-06 10:12           ` Eric Dumazet
  2012-02-06 10:40             ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-02-06 10:12 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Ingo Molnar, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

Le lundi 06 février 2012 à 10:54 +0100, Stephane Eranian a écrit :
> On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@google.com> wrote:
> > I am working on it. It is hard to reproduce for me.
> >
> What did you run to trigger this warning? What system is this on?
> 
> > On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >>
> >> there's yet another one triggering at:
> >>
> >> [89214.962603] ------------[ cut here ]------------
> >> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
> >> [89214.975825] Hardware name: X8DTN
> >> [89214.979268] Modules linked in:
> >> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
> >> [89214.988865] Call Trace:
> >> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
> >> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
> >> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
> >> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
> >> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
> >> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
> >> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
> >> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
> >> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
> >> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
> >> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
> >>
> >> Thanks,
> >>
> >>        Ingo

Stephane, I trigger this as well very easily on my machine, 32bit
kernel, using the following :


perf record -a -g hackbench 10 thread 4000


[ 1205.338006] ------------[ cut here ]------------
[ 1205.338028] WARNING: at arch/x86/kernel/cpu/perf_event.c:989
x86_pmu_start+0xba/0xf0()
[ 1205.338044] Hardware name: ProLiant BL460c G1
[ 1205.338053] Modules linked in: xt_hashlimit af_packet tg3 bonding
[ 1205.338076] Pid: 0, comm: swapper/5 Not tainted
3.3.0-rc2-00172-g23783f8 #55
[ 1205.338090] Call Trace:
[ 1205.338100]  [<c0609e06>] ? printk+0x1d/0x1f
[ 1205.338111]  [<c022b012>] warn_slowpath_common+0x72/0xa0
[ 1205.338123]  [<c021072a>] ? x86_pmu_start+0xba/0xf0
[ 1205.338134]  [<c021072a>] ? x86_pmu_start+0xba/0xf0
[ 1205.338145]  [<c022b062>] warn_slowpath_null+0x22/0x30
[ 1205.338157]  [<c021072a>] x86_pmu_start+0xba/0xf0
[ 1205.338170]  [<c02aba7b>] perf_adjust_freq_unthr_context.part.75
+0xfb/0x180
[ 1205.338185]  [<c02abd21>] perf_event_task_tick+0x221/0x290
[ 1205.338199]  [<c0258dee>] ? update_cpu_load+0xbe/0xf0
[ 1205.338210]  [<c0259728>] scheduler_tick+0x98/0xf0
[ 1205.338222]  [<c0239c6a>] update_process_times+0x5a/0x70
[ 1205.338235]  [<c026bb10>] tick_sched_timer+0x60/0x1f0
[ 1205.338248]  [<c024f1c0>] ? __remove_hrtimer+0x40/0xa0
[ 1205.338260]  [<c024f3f7>] __run_hrtimer+0x67/0x1e0
[ 1205.338270]  [<c026bab0>] ? tick_init_highres+0x20/0x20
[ 1205.338297]  [<c02501b0>] hrtimer_interrupt+0xe0/0x260
[ 1205.338323]  [<c025c207>] ? sched_clock_cpu+0xd7/0x160
[ 1205.338350]  [<c025c841>] ? set_next_entity+0x31/0x70
[ 1205.338380]  [<c0610804>] smp_apic_timer_interrupt+0x54/0x88
[ 1205.338407]  [<c060fcda>] apic_timer_interrupt+0x2a/0x30
[ 1205.338439]  [<c0209efc>] ? mwait_idle+0x7c/0x1d0
[ 1205.338470]  [<c0201606>] cpu_idle+0x66/0xa0
[ 1205.338495]  [<c060500f>] start_secondary+0x1bf/0x1c5
[ 1205.338520] ---[ end trace 94f790d96c8679f1 ]---
[ 1211.617012] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 1211.617049] Do you have a strange power saving mode enabled?
[ 1211.617075] Dazed and confused, but trying to continue
[ 1211.970013] Uhhuh. NMI received for unknown reason 31 on CPU 7.
[ 1211.970050] Do you have a strange power saving mode enabled?
[ 1211.970076] Dazed and confused, but trying to continue
[ 1214.440012] Uhhuh. NMI received for unknown reason 21 on CPU 3.
[ 1214.440048] Do you have a strange power saving mode enabled?
[ 1214.440074] Dazed and confused, but trying to continue
[ 1214.634012] Uhhuh. NMI received for unknown reason 31 on CPU 4.
[ 1214.634047] Do you have a strange power saving mode enabled?
[ 1214.634073] Dazed and confused, but trying to continue
[ 1216.568016] Uhhuh. NMI received for unknown reason 31 on CPU 1.
[ 1216.568052] Do you have a strange power saving mode enabled?
[ 1216.568078] Dazed and confused, but trying to continue
[ 1217.309010] Uhhuh. NMI received for unknown reason 31 on CPU 2.
[ 1217.309044] Do you have a strange power saving mode enabled?
[ 1217.309070] Dazed and confused, but trying to continue




processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz
stepping	: 6



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 10:12           ` Eric Dumazet
@ 2012-02-06 10:40             ` Stephane Eranian
  2012-02-06 11:11               ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 10:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

On Mon, Feb 6, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 06 février 2012 à 10:54 +0100, Stephane Eranian a écrit :
>> On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@google.com> wrote:
>> > I am working on it. It is hard to reproduce for me.
>> >
>> What did you run to trigger this warning? What system is this on?
>>
>> > On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >>
>> >> there's yet another one triggering at:
>> >>
>> >> [89214.962603] ------------[ cut here ]------------
>> >> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
>> >> [89214.975825] Hardware name: X8DTN
>> >> [89214.979268] Modules linked in:
>> >> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
>> >> [89214.988865] Call Trace:
>> >> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
>> >> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
>> >> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
>> >> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
>> >> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
>> >> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
>> >> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
>> >> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
>> >> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
>> >> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
>> >> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
>> >>
>> >> Thanks,
>> >>
>> >>        Ingo
>
> Stephane, I trigger this as well very easily on my machine, 32bit
> kernel, using the following :
>
>
> perf record -a -g hackbench 10 thread 4000
>
I tried that on my Nehalem 64-bit running 3.3.0-rc2 where I reverted
that chunck of commit 84f2b9b:

--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -986,9 +986,6 @@ static void x86_pmu_start(struct perf_event
*event, int flags)
        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
        int idx = event->hw.idx;

-       if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
-               return;
-
        if (WARN_ON_ONCE(idx == -1))
                return;

I have an explanation for the other two WARN_ON_ONCE() but not for that
one. Friday, I was able to track this down to a situation where from unthr
we call pmu->stop() but because the event is already marked as not active
in cpuc->active_mask,  PERF_HES_STOPPED is not set, then
x86_pmu_start() complains. It happens during frequency adjustments and
not unthrottling.

This is odd because the only place where cpuc->active_mask is cleared
(for the event) is x86_pmu_stop(). So looks like we get into a situation where
cpuc->active_mask[b] == 0 && event->state != HES_STOPPED. But I don't
know where this could happen.


>
> [ 1205.338006] ------------[ cut here ]------------
> [ 1205.338028] WARNING: at arch/x86/kernel/cpu/perf_event.c:989
> x86_pmu_start+0xba/0xf0()
> [ 1205.338044] Hardware name: ProLiant BL460c G1
> [ 1205.338053] Modules linked in: xt_hashlimit af_packet tg3 bonding
> [ 1205.338076] Pid: 0, comm: swapper/5 Not tainted
> 3.3.0-rc2-00172-g23783f8 #55
> [ 1205.338090] Call Trace:
> [ 1205.338100]  [<c0609e06>] ? printk+0x1d/0x1f
> [ 1205.338111]  [<c022b012>] warn_slowpath_common+0x72/0xa0
> [ 1205.338123]  [<c021072a>] ? x86_pmu_start+0xba/0xf0
> [ 1205.338134]  [<c021072a>] ? x86_pmu_start+0xba/0xf0
> [ 1205.338145]  [<c022b062>] warn_slowpath_null+0x22/0x30
> [ 1205.338157]  [<c021072a>] x86_pmu_start+0xba/0xf0
> [ 1205.338170]  [<c02aba7b>] perf_adjust_freq_unthr_context.part.75
> +0xfb/0x180
> [ 1205.338185]  [<c02abd21>] perf_event_task_tick+0x221/0x290
> [ 1205.338199]  [<c0258dee>] ? update_cpu_load+0xbe/0xf0
> [ 1205.338210]  [<c0259728>] scheduler_tick+0x98/0xf0
> [ 1205.338222]  [<c0239c6a>] update_process_times+0x5a/0x70
> [ 1205.338235]  [<c026bb10>] tick_sched_timer+0x60/0x1f0
> [ 1205.338248]  [<c024f1c0>] ? __remove_hrtimer+0x40/0xa0
> [ 1205.338260]  [<c024f3f7>] __run_hrtimer+0x67/0x1e0
> [ 1205.338270]  [<c026bab0>] ? tick_init_highres+0x20/0x20
> [ 1205.338297]  [<c02501b0>] hrtimer_interrupt+0xe0/0x260
> [ 1205.338323]  [<c025c207>] ? sched_clock_cpu+0xd7/0x160
> [ 1205.338350]  [<c025c841>] ? set_next_entity+0x31/0x70
> [ 1205.338380]  [<c0610804>] smp_apic_timer_interrupt+0x54/0x88
> [ 1205.338407]  [<c060fcda>] apic_timer_interrupt+0x2a/0x30
> [ 1205.338439]  [<c0209efc>] ? mwait_idle+0x7c/0x1d0
> [ 1205.338470]  [<c0201606>] cpu_idle+0x66/0xa0
> [ 1205.338495]  [<c060500f>] start_secondary+0x1bf/0x1c5
> [ 1205.338520] ---[ end trace 94f790d96c8679f1 ]---
> [ 1211.617012] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 1211.617049] Do you have a strange power saving mode enabled?
> [ 1211.617075] Dazed and confused, but trying to continue
> [ 1211.970013] Uhhuh. NMI received for unknown reason 31 on CPU 7.
> [ 1211.970050] Do you have a strange power saving mode enabled?
> [ 1211.970076] Dazed and confused, but trying to continue
> [ 1214.440012] Uhhuh. NMI received for unknown reason 21 on CPU 3.
> [ 1214.440048] Do you have a strange power saving mode enabled?
> [ 1214.440074] Dazed and confused, but trying to continue
> [ 1214.634012] Uhhuh. NMI received for unknown reason 31 on CPU 4.
> [ 1214.634047] Do you have a strange power saving mode enabled?
> [ 1214.634073] Dazed and confused, but trying to continue
> [ 1216.568016] Uhhuh. NMI received for unknown reason 31 on CPU 1.
> [ 1216.568052] Do you have a strange power saving mode enabled?
> [ 1216.568078] Dazed and confused, but trying to continue
> [ 1217.309010] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 1217.309044] Do you have a strange power saving mode enabled?
> [ 1217.309070] Dazed and confused, but trying to continue
>
>
>
>
> processor       : 7
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz
> stepping        : 6
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 10:40             ` Stephane Eranian
@ 2012-02-06 11:11               ` Eric Dumazet
  2012-02-06 14:13                 ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-02-06 11:11 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Ingo Molnar, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

Le lundi 06 février 2012 à 11:40 +0100, Stephane Eranian a écrit :
> On Mon, Feb 6, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le lundi 06 février 2012 à 10:54 +0100, Stephane Eranian a écrit :
> >> On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@google.com> wrote:
> >> > I am working on it. It is hard to reproduce for me.
> >> >
> >> What did you run to trigger this warning? What system is this on?
> >>
> >> > On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >> >>
> >> >> there's yet another one triggering at:
> >> >>
> >> >> [89214.962603] ------------[ cut here ]------------
> >> >> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
> >> >> [89214.975825] Hardware name: X8DTN
> >> >> [89214.979268] Modules linked in:
> >> >> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
> >> >> [89214.988865] Call Trace:
> >> >> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
> >> >> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
> >> >> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
> >> >> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
> >> >> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
> >> >> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
> >> >> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
> >> >> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
> >> >> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
> >> >> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
> >> >> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
> >> >>
> >> >> Thanks,
> >> >>
> >> >>        Ingo
> >
> > Stephane, I trigger this as well very easily on my machine, 32bit
> > kernel, using the following :
> >
> >
> > perf record -a -g hackbench 10 thread 4000
> >
> I tried that on my Nehalem 64-bit running 3.3.0-rc2 where I reverted
> that chunck of commit 84f2b9b:
> 
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -986,9 +986,6 @@ static void x86_pmu_start(struct perf_event
> *event, int flags)
>         struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>         int idx = event->hw.idx;
> 
> -       if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> -               return;
> -
>         if (WARN_ON_ONCE(idx == -1))
>                 return;
> 
> I have an explanation for the other two WARN_ON_ONCE() but not for that
> one. Friday, I was able to track this down to a situation where from unthr
> we call pmu->stop() but because the event is already marked as not active
> in cpuc->active_mask,  PERF_HES_STOPPED is not set, then
> x86_pmu_start() complains. It happens during frequency adjustments and
> not unthrottling.
> 
> This is odd because the only place where cpuc->active_mask is cleared
> (for the event) is x86_pmu_stop(). So looks like we get into a situation where
> cpuc->active_mask[b] == 0 && event->state != HES_STOPPED. But I don't
> know where this could happen.
> 
> 

I forgot to say my other dev machine, Nehalem 64bit, doesnt trigger the
WARN_ON_ONCE with exact same kernel version, unless I force threads on
same socket...

perf record -a -g taskset 5555 hackbench 10 thread 40000

[95402.265521] ------------[ cut here ]------------
[95402.265528] WARNING: at arch/x86/kernel/cpu/perf_event.c:989
x86_pmu_start+0xdc/0x110()
[95402.265530] Hardware name: ProLiant BL460c G6
[95402.265531] Modules linked in: ipmi_devintf nfsd exportfs ipmi_si
hpilo bnx2x crc32c libcrc32c mdio [last unloaded: scsi_wait_scan]
[95402.265540] Pid: 0, comm: swapper/5 Not tainted 3.3.0-rc2+ #633
[95402.265541] Call Trace:
[95402.265543]  <IRQ>  [<ffffffff81037b9f>] warn_slowpath_common
+0x7f/0xc0
[95402.265549]  [<ffffffff81037bfa>] warn_slowpath_null+0x1a/0x20
[95402.265551]  [<ffffffff810127fc>] x86_pmu_start+0xdc/0x110
[95402.265559]  [<ffffffff810e78fa>]
perf_adjust_freq_unthr_context.part.75+0xda/0x150
[95402.265562]  [<ffffffff810e7b71>] perf_event_task_tick+0x201/0x2b0
[95402.265566]  [<ffffffff8106bf49>] scheduler_tick+0xe9/0x160
[95402.265571]  [<ffffffff8104964e>] update_process_times+0x6e/0x90
[95402.265575]  [<ffffffff8108c084>] tick_sched_timer+0x64/0xc0
[95402.265579]  [<ffffffff8105ffb4>] __run_hrtimer+0x84/0x1f0
[95402.265581]  [<ffffffff8108c020>] ? tick_nohz_handler+0xf0/0xf0
[95402.265584]  [<ffffffff810608e3>] hrtimer_interrupt+0xf3/0x220
[95402.265588]  [<ffffffff8170c049>] smp_apic_timer_interrupt+0x69/0x99
[95402.265593]  [<ffffffff8170af8b>] apic_timer_interrupt+0x6b/0x70
[95402.265594]  <EOI>  [<ffffffff8100ab2e>] ? mwait_idle+0xae/0x280
[95402.265601]  [<ffffffff810011df>] cpu_idle+0x8f/0xd0
[95402.265605]  [<ffffffff816f3ccc>] start_secondary+0x1d6/0x1da
[95402.265607] ---[ end trace 5e345a2582bb0ea3 ]---



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 11:11               ` Eric Dumazet
@ 2012-02-06 14:13                 ` Stephane Eranian
  2012-02-06 14:29                   ` Peter Zijlstra
  2012-02-06 14:38                   ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 14:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

Ok, I found the problem!

it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
The latter can under certain condition stop and restart the event. So we
had:

      stop()
      if (delta > 0) {
           perf_adjust_period() {
               if (period > 8*...) {
                   stop()
                   ...
                   start()
               }
           }
       }
       start()

Could have a double stop() and double start(), thus triggering the warning in
x86_pmu_start().

Will post a patch shortly to fix this.



On Mon, Feb 6, 2012 at 12:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 06 février 2012 à 11:40 +0100, Stephane Eranian a écrit :
>> On Mon, Feb 6, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le lundi 06 février 2012 à 10:54 +0100, Stephane Eranian a écrit :
>> >> On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@google.com> wrote:
>> >> > I am working on it. It is hard to reproduce for me.
>> >> >
>> >> What did you run to trigger this warning? What system is this on?
>> >>
>> >> > On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >> >>
>> >> >> there's yet another one triggering at:
>> >> >>
>> >> >> [89214.962603] ------------[ cut here ]------------
>> >> >> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
>> >> >> [89214.975825] Hardware name: X8DTN
>> >> >> [89214.979268] Modules linked in:
>> >> >> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
>> >> >> [89214.988865] Call Trace:
>> >> >> [89214.991533]  <IRQ>  [<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
>> >> >> [89214.998379]  [<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
>> >> >> [89215.004428]  [<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
>> >> >> [89215.010042]  [<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
>> >> >> [89215.018123]  [<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
>> >> >> [89215.024463]  [<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
>> >> >> [89215.030595]  [<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
>> >> >> [89215.036296]  [<ffffffff810720b0>] update_process_times+0x5e/0x6f
>> >> >> [89215.042512]  [<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
>> >> >> [89215.048387]  [<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
>> >> >> [89215.054087]  [<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >>        Ingo
>> >
>> > Stephane, I trigger this as well very easily on my machine, 32bit
>> > kernel, using the following :
>> >
>> >
>> > perf record -a -g hackbench 10 thread 4000
>> >
>> I tried that on my Nehalem 64-bit running 3.3.0-rc2 where I reverted
>> that chunck of commit 84f2b9b:
>>
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -986,9 +986,6 @@ static void x86_pmu_start(struct perf_event
>> *event, int flags)
>>         struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>         int idx = event->hw.idx;
>>
>> -       if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
>> -               return;
>> -
>>         if (WARN_ON_ONCE(idx == -1))
>>                 return;
>>
>> I have an explanation for the other two WARN_ON_ONCE() but not for that
>> one. Friday, I was able to track this down to a situation where from unthr
>> we call pmu->stop() but because the event is already marked as not active
>> in cpuc->active_mask,  PERF_HES_STOPPED is not set, then
>> x86_pmu_start() complains. It happens during frequency adjustments and
>> not unthrottling.
>>
>> This is odd because the only place where cpuc->active_mask is cleared
>> (for the event) is x86_pmu_stop(). So looks like we get into a situation where
>> cpuc->active_mask[b] == 0 && event->state != HES_STOPPED. But I don't
>> know where this could happen.
>>
>>
>
> I forgot to say my other dev machine, Nehalem 64bit, doesnt trigger the
> WARN_ON_ONCE with exact same kernel version, unless I force threads on
> same socket...
>
> perf record -a -g taskset 5555 hackbench 10 thread 40000
>
> [95402.265521] ------------[ cut here ]------------
> [95402.265528] WARNING: at arch/x86/kernel/cpu/perf_event.c:989
> x86_pmu_start+0xdc/0x110()
> [95402.265530] Hardware name: ProLiant BL460c G6
> [95402.265531] Modules linked in: ipmi_devintf nfsd exportfs ipmi_si
> hpilo bnx2x crc32c libcrc32c mdio [last unloaded: scsi_wait_scan]
> [95402.265540] Pid: 0, comm: swapper/5 Not tainted 3.3.0-rc2+ #633
> [95402.265541] Call Trace:
> [95402.265543]  <IRQ>  [<ffffffff81037b9f>] warn_slowpath_common
> +0x7f/0xc0
> [95402.265549]  [<ffffffff81037bfa>] warn_slowpath_null+0x1a/0x20
> [95402.265551]  [<ffffffff810127fc>] x86_pmu_start+0xdc/0x110
> [95402.265559]  [<ffffffff810e78fa>]
> perf_adjust_freq_unthr_context.part.75+0xda/0x150
> [95402.265562]  [<ffffffff810e7b71>] perf_event_task_tick+0x201/0x2b0
> [95402.265566]  [<ffffffff8106bf49>] scheduler_tick+0xe9/0x160
> [95402.265571]  [<ffffffff8104964e>] update_process_times+0x6e/0x90
> [95402.265575]  [<ffffffff8108c084>] tick_sched_timer+0x64/0xc0
> [95402.265579]  [<ffffffff8105ffb4>] __run_hrtimer+0x84/0x1f0
> [95402.265581]  [<ffffffff8108c020>] ? tick_nohz_handler+0xf0/0xf0
> [95402.265584]  [<ffffffff810608e3>] hrtimer_interrupt+0xf3/0x220
> [95402.265588]  [<ffffffff8170c049>] smp_apic_timer_interrupt+0x69/0x99
> [95402.265593]  [<ffffffff8170af8b>] apic_timer_interrupt+0x6b/0x70
> [95402.265594]  <EOI>  [<ffffffff8100ab2e>] ? mwait_idle+0xae/0x280
> [95402.265601]  [<ffffffff810011df>] cpu_idle+0x8f/0xd0
> [95402.265605]  [<ffffffff816f3ccc>] start_secondary+0x1d6/0x1da
> [95402.265607] ---[ end trace 5e345a2582bb0ea3 ]---
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:13                 ` Stephane Eranian
@ 2012-02-06 14:29                   ` Peter Zijlstra
  2012-02-06 14:32                     ` Stephane Eranian
  2012-02-06 14:38                   ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2012-02-06 14:29 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Eric Dumazet, Ingo Molnar, Markus Trippelsdorf, linux-kernel,
	Paul Mackerras

On Mon, 2012-02-06 at 15:13 +0100, Stephane Eranian wrote:
> Ok, I found the problem!
> 
> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
> The latter can under certain condition stop and restart the event. So we
> had:
> 
>       stop()
>       if (delta > 0) {
>            perf_adjust_period() {
>                if (period > 8*...) {
>                    stop()
>                    ...
>                    start()
>                }
>            }
>        }
>        start()
> 
> Could have a double stop() and double start(), thus triggering the warning in
> x86_pmu_start().
> 
> Will post a patch shortly to fix this. 

Nice, thanks for looking at this!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:29                   ` Peter Zijlstra
@ 2012-02-06 14:32                     ` Stephane Eranian
  2012-02-06 15:02                       ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric Dumazet, Ingo Molnar, Markus Trippelsdorf, linux-kernel,
	Paul Mackerras

On Mon, Feb 6, 2012 at 3:29 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2012-02-06 at 15:13 +0100, Stephane Eranian wrote:
>> Ok, I found the problem!
>>
>> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
>> The latter can under certain condition stop and restart the event. So we
>> had:
>>
>>       stop()
>>       if (delta > 0) {
>>            perf_adjust_period() {
>>                if (period > 8*...) {
>>                    stop()
>>                    ...
>>                    start()
>>                }
>>            }
>>        }
>>        start()
>>
>> Could have a double stop() and double start(), thus triggering the warning in
>> x86_pmu_start().
>>
>> Will post a patch shortly to fix this.
>
> Nice, thanks for looking at this!
>
I am wondering why we stop and restart the code in perf_adjust_period()
when it's called from __perf_event_overflow(). Isn't it supposed to be stopped
already by the model specific interrupt handler. Looks like we do stop/start,
just to get the reload aspect of start. Is that right?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:13                 ` Stephane Eranian
  2012-02-06 14:29                   ` Peter Zijlstra
@ 2012-02-06 14:38                   ` Ingo Molnar
  2012-02-06 14:41                     ` Stephane Eranian
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-02-06 14:38 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Eric Dumazet, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras


* Stephane Eranian <eranian@google.com> wrote:

> Ok, I found the problem!
> 
> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
> The latter can under certain condition stop and restart the event. So we
> had:
> 
>       stop()
>       if (delta > 0) {
>            perf_adjust_period() {
>                if (period > 8*...) {
>                    stop()
>                    ...
>                    start()
>                }
>            }
>        }
>        start()
> 
> Could have a double stop() and double start(), thus triggering the warning in
> x86_pmu_start().
> 
> Will post a patch shortly to fix this.

Please send a patch against tip:master (or perf/urgent), on top 
of:

  84f2b9b2edc0: perf: Remove deprecated WARN_ON_ONCE()

Note that I updated that patch to removed one more WARN_ON() 
than you originally sent - please add it back as appropriate, in 
the delta fix patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:38                   ` Ingo Molnar
@ 2012-02-06 14:41                     ` Stephane Eranian
  2012-02-06 15:47                       ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 14:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

On Mon, Feb 6, 2012 at 3:38 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> Ok, I found the problem!
>>
>> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
>> The latter can under certain condition stop and restart the event. So we
>> had:
>>
>>       stop()
>>       if (delta > 0) {
>>            perf_adjust_period() {
>>                if (period > 8*...) {
>>                    stop()
>>                    ...
>>                    start()
>>                }
>>            }
>>        }
>>        start()
>>
>> Could have a double stop() and double start(), thus triggering the warning in
>> x86_pmu_start().
>>
>> Will post a patch shortly to fix this.
>
> Please send a patch against tip:master (or perf/urgent), on top
> of:
>
>  84f2b9b2edc0: perf: Remove deprecated WARN_ON_ONCE()
>
> Note that I updated that patch to removed one more WARN_ON()
> than you originally sent - please add it back as appropriate, in
> the delta fix patch.
>
I have already added the warning back. Will send against tip:master

> Thanks,
>
>        Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:32                     ` Stephane Eranian
@ 2012-02-06 15:02                       ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2012-02-06 15:02 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Eric Dumazet, Ingo Molnar, Markus Trippelsdorf, linux-kernel,
	Paul Mackerras

On Mon, 2012-02-06 at 15:32 +0100, Stephane Eranian wrote:
> I am wondering why we stop and restart the code in perf_adjust_period()
> when it's called from __perf_event_overflow(). Isn't it supposed to be stopped
> already by the model specific interrupt handler. Looks like we do stop/start,
> just to get the reload aspect of start. Is that right? 

Yes it is in order to deal with the case where an excessively long
period is programmed and we want to force load the new period without
having to wait for the old one to complete.

I hit that case several times with the adaptive code and events that
have very uneven rates.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 14:41                     ` Stephane Eranian
@ 2012-02-06 15:47                       ` Ingo Molnar
  2012-02-06 20:44                         ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-02-06 15:47 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Eric Dumazet, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras


* Stephane Eranian <eranian@google.com> wrote:

> On Mon, Feb 6, 2012 at 3:38 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Stephane Eranian <eranian@google.com> wrote:
> >
> >> Ok, I found the problem!
> >>
> >> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
> >> The latter can under certain condition stop and restart the event. So we
> >> had:
> >>
> >>       stop()
> >>       if (delta > 0) {
> >>            perf_adjust_period() {
> >>                if (period > 8*...) {
> >>                    stop()
> >>                    ...
> >>                    start()
> >>                }
> >>            }
> >>        }
> >>        start()
> >>
> >> Could have a double stop() and double start(), thus triggering the warning in
> >> x86_pmu_start().
> >>
> >> Will post a patch shortly to fix this.
> >
> > Please send a patch against tip:master (or perf/urgent), on top
> > of:
> >
> >  84f2b9b2edc0: perf: Remove deprecated WARN_ON_ONCE()
> >
> > Note that I updated that patch to removed one more WARN_ON()
> > than you originally sent - please add it back as appropriate, in
> > the delta fix patch.
> >
> I have already added the warning back. Will send against tip:master

Great - thanks!

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 15:47                       ` Ingo Molnar
@ 2012-02-06 20:44                         ` Stephane Eranian
  2012-02-06 20:58                           ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 20:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Markus Trippelsdorf, linux-kernel, Peter Zijlstra,
	Paul Mackerras

Ok, I fixed the problem this time but in further testing I uncovered
another issue:


while true; do perf record -a -g hackbench 10 thread 4000; done &
while true; do perf record -a -g hackbench 10 thread 4000; done &

We hit:

[  765.034354] WARNING: at arch/x86/kernel/cpu/perf_event_intel.c:1029
intel_pmu_handle_irq+0x1df/0x2c0()
[  765.034356] Hardware name: System Product Name
[  765.034358] Modules linked in: binfmt_misc psmouse serio_raw
asus_atk0110 usbhid hid ahci pata_jmicron libahci r8169
[  765.034369] Pid: 8903, comm: hackbench Tainted: G        W
3.3.0-rc2-tip #35
[  765.034371] Call Trace:
[  765.034373]  <NMI>  [<ffffffff8104c15f>] warn_slowpath_common+0x7f/0xc0
[  765.034381]  [<ffffffff8104c1ba>] warn_slowpath_null+0x1a/0x20
[  765.034384]  [<ffffffff81027a0f>] intel_pmu_handle_irq+0x1df/0x2c0
[  765.034388]  [<ffffffff8148a606>] ? sock_aio_write+0x6/0x170
[  765.034392]  [<ffffffff81559171>] perf_event_nmi_handler+0x21/0x30
[  765.034395]  [<ffffffff81558a29>] do_nmi+0x129/0x3a0
[  765.034398]  [<ffffffff81557f7d>] restart_nmi+0x1a/0x1e
[  765.034402]  [<ffffffff8103abea>] ? native_write_msr_safe+0xa/0x10
[  765.034405]  [<ffffffff8103abea>] ? native_write_msr_safe+0xa/0x10
[  765.034408]  [<ffffffff8103abea>] ? native_write_msr_safe+0xa/0x10
[  765.034410]  <<EOE>>  <IRQ>  [<ffffffff81027447>]
intel_pmu_disable_event+0x87/0x100
[  765.034416]  [<ffffffff81022a16>] x86_pmu_stop+0x86/0xe0
[  765.034420]  [<ffffffff8110bc3b>]
perf_adjust_freq_unthr_context.part.67+0xeb/0x180
[  765.034423]  [<ffffffff8110bf50>] perf_event_task_tick+0x280/0x300
[  765.034427]  [<ffffffff8107f36f>] scheduler_tick+0xdf/0x150
[  765.034431]  [<ffffffff8105d66e>] update_process_times+0x6e/0x90
[  765.034435]  [<ffffffff810a0a34>] tick_sched_timer+0x64/0xc0
[  765.034438]  [<ffffffff81073286>] __run_hrtimer+0x76/0x1f0
[  765.034441]  [<ffffffff810a09d0>] ? tick_nohz_handler+0xf0/0xf0
[  765.034447]  [<ffffffff8101a3b9>] ? read_tsc+0x9/0x20
[  765.034450]  [<ffffffff81073b93>] hrtimer_interrupt+0xe3/0x200
[  765.034454]  [<ffffffff81560bdc>] ? call_softirq+0x1c/0x30
[  765.034457]  [<ffffffff81561529>] smp_apic_timer_interrupt+0x69/0x99
[  765.034461]  [<ffffffff815601de>] apic_timer_interrupt+0x6e/0x80
[  765.034462]  <EOI>  [<ffffffff811af082>] ? fsnotify+0x1d2/0x2b0
[  765.034470]  [<ffffffff81171085>] ? rw_verify_area+0x75/0xf0
[  765.034473]  [<ffffffff81171071>] ? rw_verify_area+0x61/0xf0
[  765.034475]  [<ffffffff811713a8>] vfs_write+0x88/0x180
[  765.034478]  [<ffffffff811716fa>] sys_write+0x4a/0x90
[  765.034481]  [<ffffffff8155f729>] system_call_fastpath+0x16/0x1b


Which is pointing to the Intel specific:

intel_pmu_handle_irq()
again:
        intel_pmu_ack_status(status);
        if (++loops > 100) {
                WARN_ONCE(1, "perfevents: irq loop stuck!\n");
                perf_event_print_debug();
                intel_pmu_reset();
                goto done;
        }

We are in a constant interrupt loop for a reason I don't yet understand.
Sure we have two counters going, bu given the callstack, one is being
stopped.

I know we talked about that a couple of weeks back, but after all, it may
be that it is not possible to run the frequency adjustment code without
stopping the entire PMU because of risk of interrupts. But it is not clear
to me what's causing this at this point.

I suggest we still apply my patch to fix the x86_pmu_start() first, then
I will submit a second patch to fix that one. Agreed?

On Mon, Feb 6, 2012 at 4:47 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> On Mon, Feb 6, 2012 at 3:38 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Stephane Eranian <eranian@google.com> wrote:
>> >
>> >> Ok, I found the problem!
>> >>
>> >> it comes from perf_adjust_freq_unthr_context() vs perf_adjust_period().
>> >> The latter can under certain condition stop and restart the event. So we
>> >> had:
>> >>
>> >>       stop()
>> >>       if (delta > 0) {
>> >>            perf_adjust_period() {
>> >>                if (period > 8*...) {
>> >>                    stop()
>> >>                    ...
>> >>                    start()
>> >>                }
>> >>            }
>> >>        }
>> >>        start()
>> >>
>> >> Could have a double stop() and double start(), thus triggering the warning in
>> >> x86_pmu_start().
>> >>
>> >> Will post a patch shortly to fix this.
>> >
>> > Please send a patch against tip:master (or perf/urgent), on top
>> > of:
>> >
>> >  84f2b9b2edc0: perf: Remove deprecated WARN_ON_ONCE()
>> >
>> > Note that I updated that patch to removed one more WARN_ON()
>> > than you originally sent - please add it back as appropriate, in
>> > the delta fix patch.
>> >
>> I have already added the warning back. Will send against tip:master
>
> Great - thanks!
>
>        Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 20:44                         ` Stephane Eranian
@ 2012-02-06 20:58                           ` Peter Zijlstra
  2012-02-06 21:25                             ` Stephane Eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2012-02-06 20:58 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Ingo Molnar, Eric Dumazet, Markus Trippelsdorf, linux-kernel,
	Paul Mackerras

On Mon, 2012-02-06 at 21:44 +0100, Stephane Eranian wrote:
> Which is pointing to the Intel specific:
> 
> intel_pmu_handle_irq()
> again:
>         intel_pmu_ack_status(status);
>         if (++loops > 100) {
>                 WARN_ONCE(1, "perfevents: irq loop stuck!\n");
>                 perf_event_print_debug();
>                 intel_pmu_reset();
>                 goto done;
>         }
> 
> We are in a constant interrupt loop for a reason I don't yet understand.
> Sure we have two counters going, bu given the callstack, one is being
> stopped.
> 
> I know we talked about that a couple of weeks back, but after all, it may
> be that it is not possible to run the frequency adjustment code without
> stopping the entire PMU because of risk of interrupts. But it is not clear
> to me what's causing this at this point.

There's Errata's on this, see AAK157 (SNB) BD106 (WSM) AAK157 (NHM).

Ingo and I talked about a 'fix' for that based on the text in the SNB
errata which explains why the reset works. Ingo I was under the
impression you were going to implement that?

> I suggest we still apply my patch to fix the x86_pmu_start() first, then
> I will submit a second patch to fix that one. Agreed? 

Yes.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989
  2012-02-06 20:58                           ` Peter Zijlstra
@ 2012-02-06 21:25                             ` Stephane Eranian
  0 siblings, 0 replies; 19+ messages in thread
From: Stephane Eranian @ 2012-02-06 21:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Eric Dumazet, Markus Trippelsdorf, linux-kernel,
	Paul Mackerras

On Mon, Feb 6, 2012 at 9:58 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2012-02-06 at 21:44 +0100, Stephane Eranian wrote:
>> Which is pointing to the Intel specific:
>>
>> intel_pmu_handle_irq()
>> again:
>>         intel_pmu_ack_status(status);
>>         if (++loops > 100) {
>>                 WARN_ONCE(1, "perfevents: irq loop stuck!\n");
>>                 perf_event_print_debug();
>>                 intel_pmu_reset();
>>                 goto done;
>>         }
>>
>> We are in a constant interrupt loop for a reason I don't yet understand.
>> Sure we have two counters going, bu given the callstack, one is being
>> stopped.
>>
>> I know we talked about that a couple of weeks back, but after all, it may
>> be that it is not possible to run the frequency adjustment code without
>> stopping the entire PMU because of risk of interrupts. But it is not clear
>> to me what's causing this at this point.
>
> There's Errata's on this, see AAK157 (SNB) BD106 (WSM) AAK157 (NHM).
>
Arg, I had forgotten about those.....
I'll check on this further with Intel.

> Ingo and I talked about a 'fix' for that based on the text in the SNB
> errata which explains why the reset works. Ingo I was under the
> impression you were going to implement that?
>
>> I suggest we still apply my patch to fix the x86_pmu_start() first, then
>> I will submit a second patch to fix that one. Agreed?
>
> Yes.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-06 21:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-02 21:17 WARNING: at arch/x86/kernel/cpu/perf_event.c:989 Markus Trippelsdorf
2012-02-02 21:32 ` Markus Trippelsdorf
2012-02-02 22:22   ` Stephane Eranian
2012-02-04 13:51     ` Ingo Molnar
2012-02-04 18:09       ` Stephane Eranian
2012-02-06  9:54         ` Stephane Eranian
2012-02-06 10:12           ` Eric Dumazet
2012-02-06 10:40             ` Stephane Eranian
2012-02-06 11:11               ` Eric Dumazet
2012-02-06 14:13                 ` Stephane Eranian
2012-02-06 14:29                   ` Peter Zijlstra
2012-02-06 14:32                     ` Stephane Eranian
2012-02-06 15:02                       ` Peter Zijlstra
2012-02-06 14:38                   ` Ingo Molnar
2012-02-06 14:41                     ` Stephane Eranian
2012-02-06 15:47                       ` Ingo Molnar
2012-02-06 20:44                         ` Stephane Eranian
2012-02-06 20:58                           ` Peter Zijlstra
2012-02-06 21:25                             ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).