linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/x86/intel: restrict period on Nehalem
@ 2019-08-19 23:13 Josh Hunt
  2019-08-23  8:07 ` Peter Zijlstra
  2019-08-31  9:06 ` [tip: perf/urgent] perf/x86/intel: Restrict " tip-bot2 for Josh Hunt
  0 siblings, 2 replies; 3+ messages in thread
From: Josh Hunt @ 2019-08-19 23:13 UTC (permalink / raw)
  To: linux-kernel, tglx
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	bpuranda, Josh Hunt

We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
some cases when using perf:

perfevents: irq loop stuck!
WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
...
RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
...
Call Trace:
<NMI>
? perf_event_nmi_handler+0x2e/0x50
? intel_pmu_save_and_restart+0x50/0x50
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x6e/0x120
default_do_nmi+0x3e/0x100
do_nmi+0x102/0x160
end_repeat_nmi+0x16/0x50
...
? native_write_msr+0x6/0x20
? native_write_msr+0x6/0x20
</NMI>
intel_pmu_enable_event+0x1ce/0x1f0
x86_pmu_start+0x78/0xa0
x86_pmu_enable+0x252/0x310
__perf_event_task_sched_in+0x181/0x190
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
finish_task_switch+0x158/0x260
__schedule+0x2f6/0x840
? hrtimer_start_range_ns+0x153/0x210
schedule+0x32/0x80
schedule_hrtimeout_range_clock+0x8a/0x100
? hrtimer_init+0x120/0x120
ep_poll+0x2f7/0x3a0
? wake_up_q+0x60/0x60
do_epoll_wait+0xa9/0xc0
__x64_sys_epoll_wait+0x1a/0x20
do_syscall_64+0x4e/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdeb1e96c03
...
---[ end trace 7a8f0b2beff82ee0 ]---

CPU#0: ctrl:       0000000000000000
CPU#0: status:     0000000400000000
CPU#0: overflow:   0000000000000000
CPU#0: fixed:      0000000000000bb0
CPU#0: pebs:       0000000000000000
CPU#0: debugctl:   0000000000000000
CPU#0: active:     0000000600000000
CPU#0:   gen-PMC0 ctrl:  0000000000000000
CPU#0:   gen-PMC0 count: 0000000000000000
CPU#0:   gen-PMC0 left:  0000000000000000
CPU#0:   gen-PMC1 ctrl:  0000000000000000
CPU#0:   gen-PMC1 count: 0000000000000000
CPU#0:   gen-PMC1 left:  0000000000000000
CPU#0:   gen-PMC2 ctrl:  0000000000000000
CPU#0:   gen-PMC2 count: 0000000000000000
CPU#0:   gen-PMC2 left:  0000000000000000
CPU#0:   gen-PMC3 ctrl:  0000000000000000
CPU#0:   gen-PMC3 count: 0000000000000000
CPU#0:   gen-PMC3 left:  0000000000000000
CPU#0: fixed-PMC0 count: 0000000000000000
CPU#0: fixed-PMC1 count: 0000ffffd22ebd19
CPU#0: fixed-PMC2 count: 0000fffffffffff1
core: clearing PMU state on CPU#0

I found that a period limit of 32 was the lowest I could set it to without
the problem reoccurring. The idea for the patch and approach to find the
target value were suggested by Ingo and Thomas.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reported-by: Bhupesh Purandare <bpuranda@akamai.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/lkml/20150501070226.GB18957@gmail.com/
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1908122133310.7324@nanos.tec.linutronix.de/
---
 arch/x86/events/intel/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 648260b5f367..e4c2cb65ea50 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3572,6 +3572,11 @@ static u64 bdw_limit_period(struct perf_event *event, u64 left)
 	return left;
 }
 
+static u64 nhm_limit_period(struct perf_event *event, u64 left)
+{
+	return max(left, 32ULL);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7"	);
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);
@@ -4606,6 +4611,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_nehalem_pebs_event_constraints;
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.extra_regs = intel_nehalem_extra_regs;
+		x86_pmu.limit_period = nhm_limit_period;
 
 		mem_attr = nhm_mem_events_attrs;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf/x86/intel: restrict period on Nehalem
  2019-08-19 23:13 [PATCH] perf/x86/intel: restrict period on Nehalem Josh Hunt
@ 2019-08-23  8:07 ` Peter Zijlstra
  2019-08-31  9:06 ` [tip: perf/urgent] perf/x86/intel: Restrict " tip-bot2 for Josh Hunt
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2019-08-23  8:07 UTC (permalink / raw)
  To: Josh Hunt
  Cc: linux-kernel, tglx, mingo, acme, alexander.shishkin, jolsa,
	namhyung, bpuranda

On Mon, Aug 19, 2019 at 07:13:31PM -0400, Josh Hunt wrote:
> We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
> some cases when using perf:
> 
> perfevents: irq loop stuck!

> I found that a period limit of 32 was the lowest I could set it to without
> the problem reoccurring. The idea for the patch and approach to find the
> target value were suggested by Ingo and Thomas.
> 
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> Reported-by: Bhupesh Purandare <bpuranda@akamai.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: Ingo Molnar <mingo@redhat.com>
> Link: https://lore.kernel.org/lkml/20150501070226.GB18957@gmail.com/
> Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1908122133310.7324@nanos.tec.linutronix.de/

Thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: perf/urgent] perf/x86/intel: Restrict period on Nehalem
  2019-08-19 23:13 [PATCH] perf/x86/intel: restrict period on Nehalem Josh Hunt
  2019-08-23  8:07 ` Peter Zijlstra
@ 2019-08-31  9:06 ` tip-bot2 for Josh Hunt
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Josh Hunt @ 2019-08-31  9:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	acme, Josh Hunt, bpuranda, mingo, jolsa, tglx, namhyung,
	alexander.shishkin, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     44d3bbb6f5e501b873218142fe08cdf62a4ac1f3
Gitweb:        https://git.kernel.org/tip/44d3bbb6f5e501b873218142fe08cdf62a4ac1f3
Author:        Josh Hunt <johunt@akamai.com>
AuthorDate:    Mon, 19 Aug 2019 19:13:31 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 30 Aug 2019 14:27:47 +02:00

perf/x86/intel: Restrict period on Nehalem

We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
some cases when using perf:

perfevents: irq loop stuck!
WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
...
RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
...
Call Trace:
<NMI>
? perf_event_nmi_handler+0x2e/0x50
? intel_pmu_save_and_restart+0x50/0x50
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x6e/0x120
default_do_nmi+0x3e/0x100
do_nmi+0x102/0x160
end_repeat_nmi+0x16/0x50
...
? native_write_msr+0x6/0x20
? native_write_msr+0x6/0x20
</NMI>
intel_pmu_enable_event+0x1ce/0x1f0
x86_pmu_start+0x78/0xa0
x86_pmu_enable+0x252/0x310
__perf_event_task_sched_in+0x181/0x190
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
finish_task_switch+0x158/0x260
__schedule+0x2f6/0x840
? hrtimer_start_range_ns+0x153/0x210
schedule+0x32/0x80
schedule_hrtimeout_range_clock+0x8a/0x100
? hrtimer_init+0x120/0x120
ep_poll+0x2f7/0x3a0
? wake_up_q+0x60/0x60
do_epoll_wait+0xa9/0xc0
__x64_sys_epoll_wait+0x1a/0x20
do_syscall_64+0x4e/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdeb1e96c03
...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: acme@kernel.org
Cc: Josh Hunt <johunt@akamai.com>
Cc: bpuranda@akamai.com
Cc: mingo@redhat.com
Cc: jolsa@redhat.com
Cc: tglx@linutronix.de
Cc: namhyung@kernel.org
Cc: alexander.shishkin@linux.intel.com
Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
---
 arch/x86/events/intel/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 648260b..e4c2cb6 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3572,6 +3572,11 @@ static u64 bdw_limit_period(struct perf_event *event, u64 left)
 	return left;
 }
 
+static u64 nhm_limit_period(struct perf_event *event, u64 left)
+{
+	return max(left, 32ULL);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7"	);
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);
@@ -4606,6 +4611,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_nehalem_pebs_event_constraints;
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.extra_regs = intel_nehalem_extra_regs;
+		x86_pmu.limit_period = nhm_limit_period;
 
 		mem_attr = nhm_mem_events_attrs;
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-31  9:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-19 23:13 [PATCH] perf/x86/intel: restrict period on Nehalem Josh Hunt
2019-08-23  8:07 ` Peter Zijlstra
2019-08-31  9:06 ` [tip: perf/urgent] perf/x86/intel: Restrict " tip-bot2 for Josh Hunt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).