linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Hunt <johunt@akamai.com>
To: linux-kernel@vger.kernel.org, tglx@linutronix.de
Cc: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, jolsa@redhat.com,
	namhyung@kernel.org, bpuranda@akamai.com,
	Josh Hunt <johunt@akamai.com>
Subject: [PATCH] perf/x86/intel: restrict period on Nehalem
Date: Mon, 19 Aug 2019 19:13:31 -0400	[thread overview]
Message-ID: <1566256411-18820-1-git-send-email-johunt@akamai.com> (raw)

We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
some cases when using perf:

perfevents: irq loop stuck!
WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
...
RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
...
Call Trace:
<NMI>
? perf_event_nmi_handler+0x2e/0x50
? intel_pmu_save_and_restart+0x50/0x50
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x6e/0x120
default_do_nmi+0x3e/0x100
do_nmi+0x102/0x160
end_repeat_nmi+0x16/0x50
...
? native_write_msr+0x6/0x20
? native_write_msr+0x6/0x20
</NMI>
intel_pmu_enable_event+0x1ce/0x1f0
x86_pmu_start+0x78/0xa0
x86_pmu_enable+0x252/0x310
__perf_event_task_sched_in+0x181/0x190
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
finish_task_switch+0x158/0x260
__schedule+0x2f6/0x840
? hrtimer_start_range_ns+0x153/0x210
schedule+0x32/0x80
schedule_hrtimeout_range_clock+0x8a/0x100
? hrtimer_init+0x120/0x120
ep_poll+0x2f7/0x3a0
? wake_up_q+0x60/0x60
do_epoll_wait+0xa9/0xc0
__x64_sys_epoll_wait+0x1a/0x20
do_syscall_64+0x4e/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdeb1e96c03
...
---[ end trace 7a8f0b2beff82ee0 ]---

CPU#0: ctrl:       0000000000000000
CPU#0: status:     0000000400000000
CPU#0: overflow:   0000000000000000
CPU#0: fixed:      0000000000000bb0
CPU#0: pebs:       0000000000000000
CPU#0: debugctl:   0000000000000000
CPU#0: active:     0000000600000000
CPU#0:   gen-PMC0 ctrl:  0000000000000000
CPU#0:   gen-PMC0 count: 0000000000000000
CPU#0:   gen-PMC0 left:  0000000000000000
CPU#0:   gen-PMC1 ctrl:  0000000000000000
CPU#0:   gen-PMC1 count: 0000000000000000
CPU#0:   gen-PMC1 left:  0000000000000000
CPU#0:   gen-PMC2 ctrl:  0000000000000000
CPU#0:   gen-PMC2 count: 0000000000000000
CPU#0:   gen-PMC2 left:  0000000000000000
CPU#0:   gen-PMC3 ctrl:  0000000000000000
CPU#0:   gen-PMC3 count: 0000000000000000
CPU#0:   gen-PMC3 left:  0000000000000000
CPU#0: fixed-PMC0 count: 0000000000000000
CPU#0: fixed-PMC1 count: 0000ffffd22ebd19
CPU#0: fixed-PMC2 count: 0000fffffffffff1
core: clearing PMU state on CPU#0

I found that a period limit of 32 was the lowest I could set it to without
the problem reoccurring. The idea for the patch and approach to find the
target value were suggested by Ingo and Thomas.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reported-by: Bhupesh Purandare <bpuranda@akamai.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/lkml/20150501070226.GB18957@gmail.com/
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.1908122133310.7324@nanos.tec.linutronix.de/
---
 arch/x86/events/intel/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 648260b5f367..e4c2cb65ea50 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3572,6 +3572,11 @@ static u64 bdw_limit_period(struct perf_event *event, u64 left)
 	return left;
 }
 
+static u64 nhm_limit_period(struct perf_event *event, u64 left)
+{
+	return max(left, 32ULL);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7"	);
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);
@@ -4606,6 +4611,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_nehalem_pebs_event_constraints;
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.extra_regs = intel_nehalem_extra_regs;
+		x86_pmu.limit_period = nhm_limit_period;
 
 		mem_attr = nhm_mem_events_attrs;
 
-- 
2.7.4


             reply	other threads:[~2019-08-19 23:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-19 23:13 Josh Hunt [this message]
2019-08-23  8:07 ` [PATCH] perf/x86/intel: restrict period on Nehalem Peter Zijlstra
2019-08-31  9:06 ` [tip: perf/urgent] perf/x86/intel: Restrict " tip-bot2 for Josh Hunt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1566256411-18820-1-git-send-email-johunt@akamai.com \
    --to=johunt@akamai.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bpuranda@akamai.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).