linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
@ 2019-04-01 21:46 Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Lendacky, Thomas @ 2019-04-01 21:46 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa

This patch series addresses issues with increased NMI latency in newer
AMD processors that can result in unknown NMI messages when PMC counters
are active.

The following fixes are included in this series:

- Resolve a race condition when disabling an overflowed PMC counter,
  specifically when updating the PMC counter with a new value.
- Resolve handling of active PMC counter overflows in the perf NMI
  handler and when to report that the NMI is not related to a PMC.
- Remove earlier workaround for spurious NMIs by re-ordering the
  PMC stop sequence to disable the PMC first and then remove the PMC
  bit from the active_mask bitmap. As part of disabling the PMC, the
  code will wait for an overflow to be reset.

The last patch re-works the order of when the PMC is removed from the
active_mask. There was a comment from a long time ago about having
to clear the bit in active_mask before disabling the counter because
the perf NMI handler could re-enable the PMC again. Looking at the
handler today, I don't see that as possible, hence the reordering. The
question will be whether the Intel PMC support will now have issues.
There is still support for using x86_pmu_handle_irq() in the Intel
core.c file. Did Intel have any issues with spurious NMIs in the past?
Peter Z, any thoughts on this?

Also, I couldn't completely get rid of the "running" bit because it
is used by arch/x86/events/intel/p4.c. An old commit comment that
seems to indicate the p4 code suffered the spurious interrupts:
03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
So maybe that partially answers my previous question...

---

Changes from v2 (based on feedback from Peter Z):
- Simplified AMD specific disable_all callback by calling the common
  x86_pmu_disable_all() function and then checking and waiting for
  reset of and overflowed PMCs.
- Removed erroneous check for no active counters in the NMI latency
  mitigation patch, which effectively nullified commit 63e6be6d98e1.
- Reworked x86_pmu_stop() in order to remove 63e6be6d98e1.

Changes from v1 (based on feedback from Peter Z):
- Created an AMD specific disable_all callback function to handle the
  disabling of the counters and resolve the race condition
- Created an AMD specific handle_irq callback function that invokes the
  common x86_pmu_handle_irq() function and then performs the NMI latency
  mitigation.
- Take into account the possibility of non-perf NMI sources when applying
  the mitigation.

This patch series is based off of the perf/core branch of tip:
  https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core

  Commit c978b9460fe1 ("Merge tag 'perf-core-for-mingo-5.1-20190225' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core")

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC
  2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
@ 2019-04-01 21:46 ` Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 2/3] x86/perf/amd: Resolve NMI latency issues for active PMCs Lendacky, Thomas
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Lendacky, Thomas @ 2019-04-01 21:46 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa

On AMD processors, the detection of an overflowed counter in the NMI
handler relies on the current value of the counter. So, for example, to
check for overflow on a 48 bit counter, bit 47 is checked to see if it
is 1 (not overflowed) or 0 (overflowed).

There is currently a race condition present when disabling and then
updating the PMC. Increased NMI latency in newer AMD processors makes this
race condition more pronounced. If the counter value has overflowed, it is
possible to update the PMC value before the NMI handler can run. The
updated PMC value is not an overflowed value, so when the perf NMI handler
does run, it will not find an overflowed counter. This may appear as an
unknown NMI resulting in either a panic or a series of messages, depending
on how the kernel is configured.

To eliminate this race condition, the PMC value must be checked after
disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
will wait for the NMI handler to reset any active and overflowed counter
after calling x86_pmu_disable_all().

Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/events/amd/core.c |   64 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 61 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 7d2d7c801dba..beb132593622 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -3,6 +3,7 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/delay.h>
 #include <asm/apicdef.h>
 
 #include "../perf_event.h"
@@ -429,6 +430,63 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 }
 
+/*
+ * When a PMC counter overflows, an NMI is used to process the event and
+ * reset the counter. NMI latency can result in the counter being updated
+ * before the NMI can run, which can result in what appear to be spurious
+ * NMIs. This function is intended to wait for the NMI to run and reset
+ * the counter to avoid possible unhandled NMI messages.
+ */
+#define OVERFLOW_WAIT_COUNT	50
+static void amd_pmu_wait_on_overflow(int idx)
+{
+	unsigned int i;
+	u64 counter;
+
+	/*
+	 * Wait for the counter to be reset if it has overflowed. This loop
+	 * should exit very, very quickly, but just in case, don't wait
+	 * forever...
+	 */
+	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
+		rdmsrl(x86_pmu_event_addr(idx), counter);
+		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
+			break;
+
+		/* Might be in IRQ context, so can't sleep */
+		udelay(1);
+	}
+}
+
+static void amd_pmu_disable_all(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int idx;
+
+	x86_pmu_disable_all();
+
+	/*
+	 * This shouldn't be called from NMI context, but add a safeguard here
+	 * to return, since if we're in NMI context we can't wait for an NMI
+	 * to reset an overflowed counter value.
+	 */
+	if (in_nmi())
+		return;
+
+	/*
+	 * Check each counter for overflow and wait for it to be reset by the
+	 * NMI if it has overflowed. This relies on the fact that all active
+	 * counters are always enabled when this function is caled and
+	 * ARCH_PERFMON_EVENTSEL_INT is always set.
+	 */
+	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+		if (!test_bit(idx, cpuc->active_mask))
+			continue;
+
+		amd_pmu_wait_on_overflow(idx);
+	}
+}
+
 static struct event_constraint *
 amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
@@ -622,7 +680,7 @@ static ssize_t amd_event_sysfs_show(char *page, u64 config)
 static __initconst const struct x86_pmu amd_pmu = {
 	.name			= "AMD",
 	.handle_irq		= x86_pmu_handle_irq,
-	.disable_all		= x86_pmu_disable_all,
+	.disable_all		= amd_pmu_disable_all,
 	.enable_all		= x86_pmu_enable_all,
 	.enable			= x86_pmu_enable_event,
 	.disable		= x86_pmu_disable_event,
@@ -732,7 +790,7 @@ void amd_pmu_enable_virt(void)
 	cpuc->perf_ctr_virt_mask = 0;
 
 	/* Reload all events */
-	x86_pmu_disable_all();
+	amd_pmu_disable_all();
 	x86_pmu_enable_all(0);
 }
 EXPORT_SYMBOL_GPL(amd_pmu_enable_virt);
@@ -750,7 +808,7 @@ void amd_pmu_disable_virt(void)
 	cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
 
 	/* Reload all events */
-	x86_pmu_disable_all();
+	amd_pmu_disable_all();
 	x86_pmu_enable_all(0);
 }
 EXPORT_SYMBOL_GPL(amd_pmu_disable_virt);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 2/3] x86/perf/amd: Resolve NMI latency issues for active PMCs
  2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
@ 2019-04-01 21:46 ` Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 3/3] x86/perf/amd: Remove need to check "running" bit in NMI handler Lendacky, Thomas
  2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
  3 siblings, 0 replies; 14+ messages in thread
From: Lendacky, Thomas @ 2019-04-01 21:46 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa

On AMD processors, the detection of an overflowed PMC counter in the NMI
handler relies on the current value of the PMC. So, for example, to check
for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
overflowed) or 0 (overflowed).

When the perf NMI handler executes it does not know in advance which PMC
counters have overflowed. As such, the NMI handler will process all active
PMC counters that have overflowed. NMI latency in newer AMD processors can
result in multiple overflowed PMC counters being processed in one NMI and
then a subsequent NMI, that does not appear to be a back-to-back NMI, not
finding any PMC counters that have overflowed. This may appear to be an
unhandled NMI resulting in either a panic or a series of messages,
depending on how the kernel was configured.

To mitigate this issue, add an AMD handle_irq callback function,
amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
function and upon return perform some additional processing that will
indicate if the NMI has been handled or would have been handled had an
earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
minimum value of the number of active PMCs or 2 will be set whenever a
PMC is active. This is used to indicate the possible number of NMIs that
can still occur. The value of 2 is used for when an NMI does not arrive
at the LAPIC in time to be collapsed into an already pending NMI. Each
time the function is called without having handled an overflowed counter,
the per-CPU value is checked. If the value is non-zero, it is decremented
and the NMI indicates that it handled the NMI. If the value is zero, then
the NMI indicates that it did not handle the NMI.

Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/events/amd/core.c |   56 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index beb132593622..93ef53713359 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -5,9 +5,12 @@
 #include <linux/slab.h>
 #include <linux/delay.h>
 #include <asm/apicdef.h>
+#include <asm/nmi.h>
 
 #include "../perf_event.h"
 
+static DEFINE_PER_CPU(unsigned int, perf_nmi_counter);
+
 static __initconst const u64 amd_hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -487,6 +490,57 @@ static void amd_pmu_disable_all(void)
 	}
 }
 
+/*
+ * Because of NMI latency, if multiple PMC counters are active or other sources
+ * of NMIs are received, the perf NMI handler can handle one or more overflowed
+ * PMC counters outside of the NMI associated with the PMC overflow. If the NMI
+ * doesn't arrive at the LAPIC in time to become a pending NMI, then the kernel
+ * back-to-back NMI support won't be active. This PMC handler needs to take into
+ * account that this can occur, otherwise this could result in unknown NMI
+ * messages being issued. Examples of this is PMC overflow while in the NMI
+ * handler when multiple PMCs are active or PMC overflow while handling some
+ * other source of an NMI.
+ *
+ * Attempt to mitigate this by using the number of active PMCs to determine
+ * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset
+ * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of the
+ * number of active PMCs or 2. The value of 2 is used in case an NMI does not
+ * arrive at the LAPIC in time to be collapsed into an already pending NMI.
+ */
+static int amd_pmu_handle_irq(struct pt_regs *regs)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int active, handled;
+
+	/*
+	 * Obtain the active count before calling x86_pmu_handle_irq() since
+	 * it is possible that x86_pmu_handle_irq() may make a counter
+	 * inactive (through x86_pmu_stop).
+	 */
+	active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX);
+
+	/* Process any counter overflows */
+	handled = x86_pmu_handle_irq(regs);
+
+	/*
+	 * If a counter was handled, record the number of possible remaining
+	 * NMIs that can occur.
+	 */
+	if (handled) {
+		this_cpu_write(perf_nmi_counter,
+			       min_t(unsigned int, 2, active));
+
+		return handled;
+	}
+
+	if (!this_cpu_read(perf_nmi_counter))
+		return NMI_DONE;
+
+	this_cpu_dec(perf_nmi_counter);
+
+	return NMI_HANDLED;
+}
+
 static struct event_constraint *
 amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
@@ -679,7 +733,7 @@ static ssize_t amd_event_sysfs_show(char *page, u64 config)
 
 static __initconst const struct x86_pmu amd_pmu = {
 	.name			= "AMD",
-	.handle_irq		= x86_pmu_handle_irq,
+	.handle_irq		= amd_pmu_handle_irq,
 	.disable_all		= amd_pmu_disable_all,
 	.enable_all		= x86_pmu_enable_all,
 	.enable			= x86_pmu_enable_event,


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 3/3] x86/perf/amd: Remove need to check "running" bit in NMI handler
  2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
  2019-04-01 21:46 ` [RFC PATCH v3 2/3] x86/perf/amd: Resolve NMI latency issues for active PMCs Lendacky, Thomas
@ 2019-04-01 21:46 ` Lendacky, Thomas
  2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
  3 siblings, 0 replies; 14+ messages in thread
From: Lendacky, Thomas @ 2019-04-01 21:46 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa

Spurious interrupt support was adding to perf in commit:
63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")

The two previous patches (resolving the race condition when disabling a
PMC and NMI latency mitigation) allow for the removal of this older
spurious interrupt support.

Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
is cleared before disabling the PMC, which sets up a race condition. This
race condition was mitigated by introducing the running bitmap. That race
condition can be eliminated by first disabling the PMC, waiting for PMC
reset on overflow and then clearing the bit for the PMC in the active_mask
bitmap. The NMI handler will not re-enable a disabled counter.

If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
mitigation support will guard against any unhandled NMI messages.

Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/events/amd/core.c |   19 ++++++++++++++++++-
 arch/x86/events/core.c     |   13 +++----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 93ef53713359..a5e33db2f17a 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -490,6 +490,23 @@ static void amd_pmu_disable_all(void)
 	}
 }
 
+static void amd_pmu_disable_event(struct perf_event *event)
+{
+	x86_pmu_disable_event(event);
+
+	/*
+	 * This can be called from NMI context (via x86_pmu_stop). The counter
+	 * may have overflowed, but either way, we'll never see it get reset
+	 * by the NMI if we're already in the NMI. And the NMI latency support
+	 * below will take care of any pending NMI that might have been
+	 * generated by the overflow.
+	 */
+	if (in_nmi())
+		return;
+
+	amd_pmu_wait_on_overflow(event->hw.idx);
+}
+
 /*
  * Because of NMI latency, if multiple PMC counters are active or other sources
  * of NMIs are received, the perf NMI handler can handle one or more overflowed
@@ -737,7 +754,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.disable_all		= amd_pmu_disable_all,
 	.enable_all		= x86_pmu_enable_all,
 	.enable			= x86_pmu_enable_event,
-	.disable		= x86_pmu_disable_event,
+	.disable		= amd_pmu_disable_event,
 	.hw_config		= amd_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
 	.eventsel		= MSR_K7_EVNTSEL0,
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e2b1447192a8..81911e11a15d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1349,8 +1349,9 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) {
+	if (test_bit(hwc->idx, cpuc->active_mask)) {
 		x86_pmu.disable(event);
+		__clear_bit(hwc->idx, cpuc->active_mask);
 		cpuc->events[hwc->idx] = NULL;
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
@@ -1447,16 +1448,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
-		if (!test_bit(idx, cpuc->active_mask)) {
-			/*
-			 * Though we deactivated the counter some cpus
-			 * might still deliver spurious interrupts still
-			 * in flight. Catch them:
-			 */
-			if (__test_and_clear_bit(idx, cpuc->running))
-				handled++;
+		if (!test_bit(idx, cpuc->active_mask))
 			continue;
-		}
 
 		event = cpuc->events[idx];
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
                   ` (2 preceding siblings ...)
  2019-04-01 21:46 ` [RFC PATCH v3 3/3] x86/perf/amd: Remove need to check "running" bit in NMI handler Lendacky, Thomas
@ 2019-04-02 13:03 ` Peter Zijlstra
  2019-04-02 13:09   ` Lendacky, Thomas
  2019-04-02 13:22   ` Cyrill Gorcunov
  3 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-02 13:03 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: x86, linux-kernel, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa, gorcunov, Vince Weaver, Stephane Eranian

On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote:
> This patch series addresses issues with increased NMI latency in newer
> AMD processors that can result in unknown NMI messages when PMC counters
> are active.
> 
> The following fixes are included in this series:
> 
> - Resolve a race condition when disabling an overflowed PMC counter,
>   specifically when updating the PMC counter with a new value.
> - Resolve handling of active PMC counter overflows in the perf NMI
>   handler and when to report that the NMI is not related to a PMC.
> - Remove earlier workaround for spurious NMIs by re-ordering the
>   PMC stop sequence to disable the PMC first and then remove the PMC
>   bit from the active_mask bitmap. As part of disabling the PMC, the
>   code will wait for an overflow to be reset.
> 
> The last patch re-works the order of when the PMC is removed from the
> active_mask. There was a comment from a long time ago about having
> to clear the bit in active_mask before disabling the counter because
> the perf NMI handler could re-enable the PMC again. Looking at the
> handler today, I don't see that as possible, hence the reordering. The
> question will be whether the Intel PMC support will now have issues.
> There is still support for using x86_pmu_handle_irq() in the Intel
> core.c file. Did Intel have any issues with spurious NMIs in the past?
> Peter Z, any thoughts on this?

I can't remember :/ I suppose we'll see if anything pops up after these
here patches. At least then we get a chance to properly document things.

> Also, I couldn't completely get rid of the "running" bit because it
> is used by arch/x86/events/intel/p4.c. An old commit comment that
> seems to indicate the p4 code suffered the spurious interrupts:
> 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
> So maybe that partially answers my previous question...

Yeah, the P4 code is magic, and I don't have any such machines left, nor
do I think does Cyrill who wrote much of that.

I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
but maybe I'm wrong.

Ideally we'd find a willing victim to maintain that thing, or possibly
just delete it, dunno if anybody still cares.


Anyway, I like these patches, but I cannot apply since you send them
base64 encoded and my script chokes on that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
@ 2019-04-02 13:09   ` Lendacky, Thomas
  2019-04-02 13:22   ` Cyrill Gorcunov
  1 sibling, 0 replies; 14+ messages in thread
From: Lendacky, Thomas @ 2019-04-02 13:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
	Jiri Olsa, gorcunov, Vince Weaver, Stephane Eranian

On 4/2/19 8:03 AM, Peter Zijlstra wrote:
> On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote:
>> This patch series addresses issues with increased NMI latency in newer
>> AMD processors that can result in unknown NMI messages when PMC counters
>> are active.
>>
>> The following fixes are included in this series:
>>
>> - Resolve a race condition when disabling an overflowed PMC counter,
>>    specifically when updating the PMC counter with a new value.
>> - Resolve handling of active PMC counter overflows in the perf NMI
>>    handler and when to report that the NMI is not related to a PMC.
>> - Remove earlier workaround for spurious NMIs by re-ordering the
>>    PMC stop sequence to disable the PMC first and then remove the PMC
>>    bit from the active_mask bitmap. As part of disabling the PMC, the
>>    code will wait for an overflow to be reset.
>>
>> The last patch re-works the order of when the PMC is removed from the
>> active_mask. There was a comment from a long time ago about having
>> to clear the bit in active_mask before disabling the counter because
>> the perf NMI handler could re-enable the PMC again. Looking at the
>> handler today, I don't see that as possible, hence the reordering. The
>> question will be whether the Intel PMC support will now have issues.
>> There is still support for using x86_pmu_handle_irq() in the Intel
>> core.c file. Did Intel have any issues with spurious NMIs in the past?
>> Peter Z, any thoughts on this?
> 
> I can't remember :/ I suppose we'll see if anything pops up after these
> here patches. At least then we get a chance to properly document things.
> 
>> Also, I couldn't completely get rid of the "running" bit because it
>> is used by arch/x86/events/intel/p4.c. An old commit comment that
>> seems to indicate the p4 code suffered the spurious interrupts:
>> 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
>> So maybe that partially answers my previous question...
> 
> Yeah, the P4 code is magic, and I don't have any such machines left, nor
> do I think does Cyrill who wrote much of that.
> 
> I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
> but maybe I'm wrong.
> 
> Ideally we'd find a willing victim to maintain that thing, or possibly
> just delete it, dunno if anybody still cares.
> 
> 
> Anyway, I like these patches, but I cannot apply since you send them
> base64 encoded and my script chokes on that.

Hmmm, I'm using stgit and it's either that or our mail system that is
causing the base64 encoding. It happened once before, let me re-send using
straight git. I'll remove the RFC on the re-send, too.

Thanks,
Tom
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
  2019-04-02 13:09   ` Lendacky, Thomas
@ 2019-04-02 13:22   ` Cyrill Gorcunov
  2019-04-02 14:53     ` Vince Weaver
  1 sibling, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2019-04-02 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lendacky, Thomas, x86, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim,
	Thomas Gleixner, Jiri Olsa, Vince Weaver, Stephane Eranian

On Tue, Apr 02, 2019 at 03:03:02PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote:
> > This patch series addresses issues with increased NMI latency in newer
> > AMD processors that can result in unknown NMI messages when PMC counters
> > are active.
> > 
> > The following fixes are included in this series:
> > 
> > - Resolve a race condition when disabling an overflowed PMC counter,
> >   specifically when updating the PMC counter with a new value.
> > - Resolve handling of active PMC counter overflows in the perf NMI
> >   handler and when to report that the NMI is not related to a PMC.
> > - Remove earlier workaround for spurious NMIs by re-ordering the
> >   PMC stop sequence to disable the PMC first and then remove the PMC
> >   bit from the active_mask bitmap. As part of disabling the PMC, the
> >   code will wait for an overflow to be reset.
> > 
> > The last patch re-works the order of when the PMC is removed from the
> > active_mask. There was a comment from a long time ago about having
> > to clear the bit in active_mask before disabling the counter because
> > the perf NMI handler could re-enable the PMC again. Looking at the
> > handler today, I don't see that as possible, hence the reordering. The
> > question will be whether the Intel PMC support will now have issues.
> > There is still support for using x86_pmu_handle_irq() in the Intel
> > core.c file. Did Intel have any issues with spurious NMIs in the past?
> > Peter Z, any thoughts on this?
> 
> I can't remember :/ I suppose we'll see if anything pops up after these
> here patches. At least then we get a chance to properly document things.
> 
> > Also, I couldn't completely get rid of the "running" bit because it
> > is used by arch/x86/events/intel/p4.c. An old commit comment that
> > seems to indicate the p4 code suffered the spurious interrupts:
> > 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
> > So maybe that partially answers my previous question...
> 
> Yeah, the P4 code is magic, and I don't have any such machines left, nor
> do I think does Cyrill who wrote much of that.

It was so long ago :) What I remember from the head is some of the counters
were borken on hardware level so that I had to use only one counter instead
of two present in the system. And there were spurious NMIs too. I think
we can move this "running" bit to per-cpu base declared inside p4 code
only, so get rid of it from cpu_hw_events?

> I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
> but maybe I'm wrong.

No, you're correct. p4 was crashing many times before we manage to make
it more-less stable. The main problem though that to find working p4 box
is really a problem.

> Ideally we'd find a willing victim to maintain that thing, or possibly
> just delete it, dunno if anybody still cares.

As to me, I would rather mark this p4pmu code as deprecated, until there
is *real* need for its support.

> 
> Anyway, I like these patches, but I cannot apply since you send them
> base64 encoded and my script chokes on that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 13:22   ` Cyrill Gorcunov
@ 2019-04-02 14:53     ` Vince Weaver
  2019-04-02 15:09       ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2019-04-02 14:53 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Tue, 2 Apr 2019, Cyrill Gorcunov wrote:
> On Tue, Apr 02, 2019 at 03:03:02PM +0200, Peter Zijlstra wrote:
> > I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
> > but maybe I'm wrong.
> 
> No, you're correct. p4 was crashing many times before we manage to make
> it more-less stable. The main problem though that to find working p4 box
> is really a problem.

I do have some a functioning p4 system I can test on.
I can easily run the fuzzer and report crashes, but I only have limited 
time/skills to actually fix the problems it turns up.

One nice thing is that as of Linux 5.0 *finally* the fuzzer can run 
indefinitely on most modern Intel chips without crashing (still triggers a 
few warnings).  So finally we have the ability to tell when a new crash is 
a regression and potentially can bisect it.  Although obviously this 
doesn't necessarily apply to the p4.

I do think the number of people trying to run perf on a p4 is probably 
pretty small these days.

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 14:53     ` Vince Weaver
@ 2019-04-02 15:09       ` Cyrill Gorcunov
  2019-04-02 21:13         ` Vince Weaver
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2019-04-02 15:09 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Tue, Apr 02, 2019 at 10:53:38AM -0400, Vince Weaver wrote:
> On Tue, 2 Apr 2019, Cyrill Gorcunov wrote:
> > On Tue, Apr 02, 2019 at 03:03:02PM +0200, Peter Zijlstra wrote:
> > > I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
> > > but maybe I'm wrong.
> > 
> > No, you're correct. p4 was crashing many times before we manage to make
> > it more-less stable. The main problem though that to find working p4 box
> > is really a problem.
> 
> I do have some a functioning p4 system I can test on.
> I can easily run the fuzzer and report crashes, but I only have limited 
> time/skills to actually fix the problems it turns up.

You know, running fuzzer on p4 might worth in anycase. As to potential
problems to fix -- i could try find some time slot for, still quite
limited too 'cause of many other duties :(

> One nice thing is that as of Linux 5.0 *finally* the fuzzer can run 
> indefinitely on most modern Intel chips without crashing (still triggers a 
> few warnings).  So finally we have the ability to tell when a new crash is 
> a regression and potentially can bisect it.  Although obviously this 
> doesn't necessarily apply to the p4.
> 
> I do think the number of people trying to run perf on a p4 is probably 
> pretty small these days.

Quite agree. Moreover current p4 perf code doesn't cover all potential
functionality (for example cascaded counters) and nobody ever complained
about it, I think exactly because not that many p4 boxes left and putting
efforts into this perf module development doesn't look like a good investment,
better to stick with more modern cpus and deprecate p4 with small steps.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 15:09       ` Cyrill Gorcunov
@ 2019-04-02 21:13         ` Vince Weaver
  2019-04-02 21:31           ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2019-04-02 21:13 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Tue, 2 Apr 2019, Cyrill Gorcunov wrote:

> You know, running fuzzer on p4 might worth in anycase. As to potential
> problems to fix -- i could try find some time slot for, still quite
> limited too 'cause of many other duties :(


Well I fired up the Pentium 4
	/dev/sda1 has gone 1457 days without being checked

and eventually got it up to date (running "git pull" in a 5-year old Linux 
tree plus running "apt-get dist-upgrade" at the same time was maybe a 
mistake on a system with 1GB of RAM).

Anyway I have it fuzzing current git and surprisingly while it's hit a few 
WARNINGs and some NMI dazed+confused messages it hasn't actually crashed 
yet.  Not sure if I want to let it fuzz overnight if I'm not here though.

Shame on Intel though for not providing perf JSON files for the 
Pentium 4 event names.

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 21:13         ` Vince Weaver
@ 2019-04-02 21:31           ` Cyrill Gorcunov
  2019-04-03 14:15             ` Vince Weaver
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2019-04-02 21:31 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Tue, Apr 02, 2019 at 05:13:15PM -0400, Vince Weaver wrote:
> 
> > You know, running fuzzer on p4 might worth in anycase. As to potential
> > problems to fix -- i could try find some time slot for, still quite
> > limited too 'cause of many other duties :(
> 
> 
> Well I fired up the Pentium 4
> 	/dev/sda1 has gone 1457 days without being checked

Heh :)

> and eventually got it up to date (running "git pull" in a 5-year old Linux 
> tree plus running "apt-get dist-upgrade" at the same time was maybe a 
> mistake on a system with 1GB of RAM).
> 
> Anyway I have it fuzzing current git and surprisingly while it's hit a few 
> WARNINGs and some NMI dazed+confused messages it hasn't actually crashed 
> yet.  Not sure if I want to let it fuzz overnight if I'm not here though.

Wow, sounds impressive! Ping me if you find something.

> 
> Shame on Intel though for not providing perf JSON files for the 
> Pentium 4 event names.

Mind to point me where json events should lay, I could try to convert
names.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-02 21:31           ` Cyrill Gorcunov
@ 2019-04-03 14:15             ` Vince Weaver
  2019-04-03 14:27               ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2019-04-03 14:15 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Wed, 3 Apr 2019, Cyrill Gorcunov wrote:

> > Shame on Intel though for not providing perf JSON files for the 
> > Pentium 4 event names.
> 
> Mind to point me where json events should lay, I could try to convert
> names.

I was mostly joking about that.  But the event lists are in the kernel 
tree in
	tools/perf/pmu-events/arch/x86/
I don't think anything older than Nehalem is included.

After letting the fuzzer run a bit longer I did manage to get it 
to hard-lock with no messages in the log, though eventually while I was 
fiddling with alt-sysrq over serial port did get this to trigger.
Though if I'm going to start reporting p4 crashes I should start a 
separate thread.

[ 2352.198361] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [perf_fuzzer:27005]
[ 2352.257304] CPU: 1 PID: 27005 Comm: perf_fuzzer Tainted: G        W         5.1.0-rc3+ #6
[ 2352.265458] Hardware name: LENOVO 88088NU/LENOVO, BIOS 2JKT37AUS 07/12/2007
[ 2352.272407] RIP: 0010:smp_call_function_single+0xc9/0xf0
[ 2352.277700] Code: 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 75 34 c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 a2 fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 <8b> 54 24 18 83 e2 01 75 f5 eb ca 8b 05 06 c4 45 01 85 c0 75 88 0f
[ 2352.296415] RSP: 0018:ffffc90004d0fb80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 2352.303961] RAX: 0000000000000000 RBX: ffff888039962500 RCX: ffff88803e520d80
[ 2352.311071] RDX: 0000000000000001 RSI: ffffc90004d0fb80 RDI: ffffc90004d0fb80
[ 2352.318180] RBP: ffffc90004d0fbc0 R08: 0000000000000000 R09: 0000000000000000
[ 2352.325291] R10: ffff888036cc8010 R11: ffff888039978e98 R12: ffffffff8115ec70
[ 2352.332412] R13: 0000000000000001 R14: ffff88803aa15108 R15: ffff88803d5fdb70
[ 2352.339524] FS:  0000000000000000(0000) GS:ffff88803e500000(0000) knlGS:0000000000000000
[ 2352.347587] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2352.353313] CR2: 00007f9d2e56d3b4 CR3: 000000000200e000 CR4: 00000000000006e0
[ 2352.360428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2352.367540] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 2352.374649] Call Trace:
[ 2352.377104]  ? perf_cgroup_attach+0x70/0x70
[ 2352.381276]  ? slab_destroy+0xa5/0x120
[ 2352.385016]  ? perf_cgroup_attach+0x70/0x70
[ 2352.389186]  task_function_call+0x49/0x80
[ 2352.393186]  ? bpf_jit_compile+0x30/0x30
[ 2352.397095]  event_function_call+0x85/0x100
[ 2352.401265]  ? perf_swevent_hrtimer+0x150/0x150
[ 2352.405781]  perf_remove_from_context+0x20/0x60
[ 2352.410295]  perf_event_release_kernel+0x75/0x2e0
[ 2352.414983]  perf_release+0xc/0x10
[ 2352.418373]  __fput+0xaf/0x1f0
[ 2352.421425]  task_work_run+0x7e/0xa0
[ 2352.424990]  do_exit+0x2c6/0xb40
[ 2352.428213]  ? event_function_local.constprop.132+0xe0/0xe0
[ 2352.433767]  ? visit_groups_merge+0xcd/0x180
[ 2352.438027]  do_group_exit+0x3a/0xa0
[ 2352.441598]  get_signal+0x123/0x6c0
[ 2352.445080]  ? __perf_event_task_sched_in+0xed/0x1a0
[ 2352.450030]  do_signal+0x30/0x6a0
[ 2352.453334]  ? finish_task_switch+0x10a/0x290
[ 2352.457685]  ? __schedule+0x207/0x800
[ 2352.461336]  exit_to_usermode_loop+0x5d/0xc0
[ 2352.465593]  prepare_exit_to_usermode+0x53/0x80
[ 2352.470110]  retint_user+0x8/0x8


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-03 14:15             ` Vince Weaver
@ 2019-04-03 14:27               ` Cyrill Gorcunov
  2019-04-03 15:00                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2019-04-03 14:27 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Wed, Apr 03, 2019 at 10:15:33AM -0400, Vince Weaver wrote:
> > Mind to point me where json events should lay, I could try to convert
> > names.
> 
> I was mostly joking about that.  But the event lists are in the kernel 
> tree in
> 	tools/perf/pmu-events/arch/x86/
>
> I don't think anything older than Nehalem is included.

I see ;)

> After letting the fuzzer run a bit longer I did manage to get it 
> to hard-lock with no messages in the log, though eventually while I was 
> fiddling with alt-sysrq over serial port did get this to trigger.
> Though if I'm going to start reporting p4 crashes I should start a 
> separate thread.

Oh. Will try to take a look on, thanks a lot!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
  2019-04-03 14:27               ` Cyrill Gorcunov
@ 2019-04-03 15:00                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2019-04-03 15:00 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Lendacky, Thomas, x86, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa,
	Stephane Eranian

On Wed, Apr 03, 2019 at 05:27:54PM +0300, Cyrill Gorcunov wrote:
> On Wed, Apr 03, 2019 at 10:15:33AM -0400, Vince Weaver wrote:
> > > Mind to point me where json events should lay, I could try to convert
> > > names.
> > 
> > I was mostly joking about that.  But the event lists are in the kernel 
> > tree in
> > 	tools/perf/pmu-events/arch/x86/
> >
> > I don't think anything older than Nehalem is included.
> 
> I see ;)
> 
> > After letting the fuzzer run a bit longer I did manage to get it 
> > to hard-lock with no messages in the log, though eventually while I was 
> > fiddling with alt-sysrq over serial port did get this to trigger.
> > Though if I'm going to start reporting p4 crashes I should start a 
> > separate thread.
> 
> Oh. Will try to take a look on, thanks a lot!

Btw, Vince, is there a chance to fetch last seuqence the
fuzzer inserted into events input?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-04-03 15:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 2/3] x86/perf/amd: Resolve NMI latency issues for active PMCs Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 3/3] x86/perf/amd: Remove need to check "running" bit in NMI handler Lendacky, Thomas
2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
2019-04-02 13:09   ` Lendacky, Thomas
2019-04-02 13:22   ` Cyrill Gorcunov
2019-04-02 14:53     ` Vince Weaver
2019-04-02 15:09       ` Cyrill Gorcunov
2019-04-02 21:13         ` Vince Weaver
2019-04-02 21:31           ` Cyrill Gorcunov
2019-04-03 14:15             ` Vince Weaver
2019-04-03 14:27               ` Cyrill Gorcunov
2019-04-03 15:00                 ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).