All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support
@ 2022-03-17  6:28 Sandipan Das
  2022-03-17  6:28 ` [PATCH 1/7] x86/cpufeatures: Add PerfMonV2 feature bit Sandipan Das
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

Add support for using AMD Performance Monitoring Version 2
(PerfMonV2) features on upcoming processors. New CPU features
are introduced for PerfMonV2 detection. New MSR definitions
are added to make use of an alternative PMC management scheme
based on the new PMC global control and status registers.

The global control register provides the ability to start and
stop multiple PMCs at the same time. This makes it possible
to enable or disable all counters with a single MSR write
instead of writing to the individual PMC control registers
interatively under x86_pmu_{enable,disable}(). The effects
can be seen when counting the same events across multiple
PMCs.

E.g.

  $ sudo perf stat -e "{cycles,instructions,cycles,instructions}" sleep 1

Before:

   Performance counter stats for 'sleep 1':
  
             1013281      cycles
             1452859      instructions              #    1.43  insn per cycle
             1023462      cycles
             1461724      instructions              #    1.43  insn per cycle

         1.001644276 seconds time elapsed
  
         0.001948000 seconds user
         0.000000000 seconds sys

After:

   Performance counter stats for 'sleep 1':
  
              999165      cycles
             1440456      instructions              #    1.44  insn per cycle
              999165      cycles
             1440456      instructions              #    1.44  insn per cycle
  
         1.001879504 seconds time elapsed
  
         0.001817000 seconds user
         0.000000000 seconds sys

No additional failures are seen upon running the following:
  * perf built-in test suite
  * perf_event_tests suite
  * rr test suite

Sandipan Das (7):
  x86/cpufeatures: Add PerfMonV2 feature bit
  x86/msr: Add PerfCntrGlobal* registers
  perf/x86/amd/core: Detect PerfMonV2 support
  perf/x86/amd/core: Detect available counters
  perf/x86/amd/core: Add PerfMonV2 counter control
  perf/x86/amd/core: Add PerfMonV2 overflow handling
  kvm: x86/cpuid: Fix Architectural Performance Monitoring support

 arch/x86/events/amd/core.c         | 217 ++++++++++++++++++++++++++++-
 arch/x86/include/asm/cpufeatures.h |   2 +-
 arch/x86/include/asm/msr-index.h   |   5 +
 arch/x86/include/asm/perf_event.h  |   8 ++
 arch/x86/kernel/cpu/scattered.c    |   1 +
 arch/x86/kvm/cpuid.c               |   5 +
 6 files changed, 230 insertions(+), 8 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/7] x86/cpufeatures: Add PerfMonV2 feature bit
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17  6:28 ` [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers Sandipan Das
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

CPUID Fn80000022 i.e. ExtPerfMonAndDbg advertises some new
performance monitoring features for upcoming AMD processors.

Bit 0 of EAX indicates support for Performance Monitoring
Version 2 (PerfMonV2) features. If found to be set during
PMU initialization, the EBX bits of the same CPUID function
can be used to determine the number of available PMCs for
different PMU types. Additionally, Core PMCs can be managed
using new global control and status registers.

For better utilization of feature words, PerfMonV2 is added
as a scattered feature bit.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 2 +-
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 65d147974f8d..7af0ee914e2d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -201,7 +201,7 @@
 #define X86_FEATURE_INVPCID_SINGLE	( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */
 #define X86_FEATURE_HW_PSTATE		( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK	( 7*32+ 9) /* AMD ProcFeedbackInterface */
-/* FREE!                                ( 7*32+10) */
+#define X86_FEATURE_PERFMON_V2		( 7*32+10) /* AMD Performance Monitoring Version 2 */
 #define X86_FEATURE_PTI			( 7*32+11) /* Kernel Page Table Isolation enabled */
 #define X86_FEATURE_RETPOLINE		( 7*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
 #define X86_FEATURE_RETPOLINE_LFENCE	( 7*32+13) /* "" Use LFENCE for Spectre variant 2 */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 21d1f062895a..8c0d9b5426f7 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -42,6 +42,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
  2022-03-17  6:28 ` [PATCH 1/7] x86/cpufeatures: Add PerfMonV2 feature bit Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 11:25   ` Peter Zijlstra
  2022-03-17  6:28 ` [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support Sandipan Das
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

Add MSR definitions that will be used to enable the new AMD
Performance Monitoring Version 2 (PerfMonV2) features. These
include:

  * Performance Counter Global Control (PerfCntrGlobalCtl)
  * Performance Counter Global Status (PerfCntrGlobalStatus)
  * Performance Counter Global Status Clear (PerfCntrGlobalStatusClr)

The new Performance Counter Global Control and Status MSRs
provide an interface for enabling or disabling multiple
counters at the same time and for testing overflow without
probing the individual registers for each PMC.

The availability of these registers is indicated through the
PerfMonV2 feature bit of CPUID Fn80000022[EAX].

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/include/asm/msr-index.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a4a39c3e0f19..61d1a55e15b8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -504,6 +504,11 @@
 #define AMD_CPPC_DES_PERF(x)		(((x) & 0xff) << 16)
 #define AMD_CPPC_ENERGY_PERF_PREF(x)	(((x) & 0xff) << 24)
 
+/* AMD Performance Counter Global Status and Control MSRs */
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS	0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR	0xc0000302
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL		0xc0000301
+
 /* Fam 17h MSRs */
 #define MSR_F17H_IRPERF			0xc00000e9
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
  2022-03-17  6:28 ` [PATCH 1/7] x86/cpufeatures: Add PerfMonV2 feature bit Sandipan Das
  2022-03-17  6:28 ` [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 11:27   ` Peter Zijlstra
  2022-03-17  6:28 ` [PATCH 4/7] perf/x86/amd/core: Detect available counters Sandipan Das
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

AMD Performance Monitoring Version 2 (PerfMonV2) introduces
some new Core PMU features such as detection of the number
of available PMCs and managing PMCs using global registers
namely, PerfCntrGlobalCtl and PerfCntrGlobalStatus.

Clearing PerfCntrGlobalCtl and PerfCntrGlobalStatus ensures
that all PMCs are inactive and have no pending overflows
when CPUs are onlined or offlined.

The PMU version (x86_pmu.version) now indicates PerfMonV2
support and will be used to bypass the new features on
unsupported processors.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9687a8aef01c..a074af97faa9 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -18,6 +18,9 @@ static unsigned long perf_nmi_window;
 #define AMD_MERGE_EVENT ((0xFULL << 32) | 0xFFULL)
 #define AMD_MERGE_EVENT_ENABLE (AMD_MERGE_EVENT | ARCH_PERFMON_EVENTSEL_ENABLE)
 
+/* PMC Enable and Overflow bits for PerfCntrGlobal* registers */
+static u64 amd_pmu_global_cntr_mask __read_mostly;
+
 static __initconst const u64 amd_hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -510,6 +513,19 @@ static struct amd_nb *amd_alloc_nb(int cpu)
 	return nb;
 }
 
+static void amd_pmu_cpu_reset(int cpu)
+{
+	if (x86_pmu.version < 2)
+		return;
+
+	/* Clear enable bits i.e. PerfCntrGlobalCtl.PerfCntrEn */
+	wrmsrl_on_cpu(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+
+	/* Clear overflow bits i.e. PerfCntrGLobalStatus.PerfCntrOvfl */
+	wrmsrl_on_cpu(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+		      amd_pmu_global_cntr_mask);
+}
+
 static int amd_pmu_cpu_prepare(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
@@ -555,6 +571,8 @@ static void amd_pmu_cpu_starting(int cpu)
 
 	cpuc->amd_nb->nb_id = nb_id;
 	cpuc->amd_nb->refcnt++;
+
+	amd_pmu_cpu_reset(cpu);
 }
 
 static void amd_pmu_cpu_dead(int cpu)
@@ -574,6 +592,8 @@ static void amd_pmu_cpu_dead(int cpu)
 
 		cpuhw->amd_nb = NULL;
 	}
+
+	amd_pmu_cpu_reset(cpu);
 }
 
 /*
@@ -957,6 +977,15 @@ static int __init amd_core_pmu_init(void)
 	x86_pmu.eventsel	= MSR_F15H_PERF_CTL;
 	x86_pmu.perfctr		= MSR_F15H_PERF_CTR;
 	x86_pmu.num_counters	= AMD64_NUM_COUNTERS_CORE;
+
+	/* Check for Performance Monitoring v2 support */
+	if (boot_cpu_has(X86_FEATURE_PERFMON_V2)) {
+		/* Update PMU version for later usage */
+		x86_pmu.version = 2;
+
+		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
+	}
+
 	/*
 	 * AMD Core perfctr has separate MSRs for the NB events, see
 	 * the amd/uncore.c driver.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/7] perf/x86/amd/core: Detect available counters
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
                   ` (2 preceding siblings ...)
  2022-03-17  6:28 ` [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 11:32   ` Peter Zijlstra
  2022-03-17  6:28 ` [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control Sandipan Das
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use CPUID Fn80000022[EBX] to detect the number
of Core PMCs. This offers more flexibility if the counts
change across processor families.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c        | 5 +++++
 arch/x86/include/asm/perf_event.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index a074af97faa9..05d79afe5173 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -980,9 +980,14 @@ static int __init amd_core_pmu_init(void)
 
 	/* Check for Performance Monitoring v2 support */
 	if (boot_cpu_has(X86_FEATURE_PERFMON_V2)) {
+		int ebx = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES);
+
 		/* Update PMU version for later usage */
 		x86_pmu.version = 2;
 
+		/* Find the number of available Core PMCs */
+		x86_pmu.num_counters = EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx);
+
 		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
 	}
 
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8fc1b5003713..d7dfef3e998d 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -365,6 +365,14 @@ struct pebs_xmm {
 	u64 xmm[16*2];	/* two entries for each register */
 };
 
+/*
+ * AMD Extended Performance Monitoring and Debug cpuid feature detection
+ */
+#define EXT_PERFMON_DEBUG_FEATURES		0x80000022
+
+/* Extended Performance Monitoring and Debug EBX feature bits */
+#define EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx)	((ebx) & GENMASK(3, 0))
+
 /*
  * IBS cpuid feature detection
  */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
                   ` (3 preceding siblings ...)
  2022-03-17  6:28 ` [PATCH 4/7] perf/x86/amd/core: Detect available counters Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 11:46   ` Peter Zijlstra
  2022-03-17  6:28 ` [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling Sandipan Das
  2022-03-17  6:28 ` [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support Sandipan Das
  6 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to manage the Core PMCs using
the new global control and status registers. This will be
bypassed on unsupported hardware (x86_pmu.version < 2).

Currently, all PMCs have dedicated control (PERF_CTL) and
counter (PERF_CTR) registers. For a given PMC, the enable
(En) bit of its PERF_CTL register is used to start or stop
counting.

The Performance Counter Global Control (PerfCntrGlobalCtl)
register has enable (PerfCntrEn) bits for each PMC. For a
PMC to start counting, both PERF_CTL and PerfCntrGlobalCtl
enable bits must be set. If either of those are cleared,
the PMC stops counting.

In x86_pmu_{en,dis}able_all(), the PERF_CTL registers of
all active PMCs are written to in a loop. Ideally, PMCs
counting the same event that were started and stopped at
the same time should record the same counts. Due to delays
in between writes to the PERF_CTL registers across loop
iterations, the PMCs cannot be enabled or disabled at the
same instant and hence, record slightly different counts.
This is fixed by enabling or disabling all active PMCs at
the same time with a single write to the PerfCntrGlobalCtl
register.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 58 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 05d79afe5173..532e9bd76bf1 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -596,6 +596,11 @@ static void amd_pmu_cpu_dead(int cpu)
 	amd_pmu_cpu_reset(cpu);
 }
 
+static inline void amd_pmu_set_global_ctl(u64 ctl)
+{
+	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, ctl);
+}
+
 /*
  * When a PMC counter overflows, an NMI is used to process the event and
  * reset the counter. NMI latency can result in the counter being updated
@@ -625,12 +630,32 @@ static void amd_pmu_wait_on_overflow(int idx)
 	}
 }
 
+static void amd_pmu_global_enable_all(int added)
+{
+	amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask);
+}
+
+DEFINE_STATIC_CALL(amd_pmu_enable_all, x86_pmu_enable_all);
+
+static void amd_pmu_enable_all(int added)
+{
+	static_call(amd_pmu_enable_all)(added);
+}
+
+static void amd_pmu_global_disable_all(void)
+{
+	/* Disable all PMCs */
+	amd_pmu_set_global_ctl(0);
+}
+
+DEFINE_STATIC_CALL(amd_pmu_disable_all, x86_pmu_disable_all);
+
 static void amd_pmu_disable_all(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int idx;
 
-	x86_pmu_disable_all();
+	static_call(amd_pmu_disable_all)();
 
 	/*
 	 * This shouldn't be called from NMI context, but add a safeguard here
@@ -671,6 +696,28 @@ static void amd_pmu_disable_event(struct perf_event *event)
 	amd_pmu_wait_on_overflow(event->hw.idx);
 }
 
+static void amd_pmu_global_enable_event(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	/*
+	 * Testing cpu_hw_events.enabled should be skipped in this case unlike
+	 * in x86_pmu_enable_event().
+	 *
+	 * Since cpu_hw_events.enabled is set only after returning from
+	 * x86_pmu_start(), the PMCs must be programmed and kept ready.
+	 * Counting starts only after x86_pmu_enable_all() is called.
+	 */
+	__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
+}
+
+DEFINE_STATIC_CALL(amd_pmu_enable_event, x86_pmu_enable_event);
+
+static void amd_pmu_enable_event(struct perf_event *event)
+{
+	static_call(amd_pmu_enable_event)(event);
+}
+
 /*
  * Because of NMI latency, if multiple PMC counters are active or other sources
  * of NMIs are received, the perf NMI handler can handle one or more overflowed
@@ -929,8 +976,8 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.name			= "AMD",
 	.handle_irq		= amd_pmu_handle_irq,
 	.disable_all		= amd_pmu_disable_all,
-	.enable_all		= x86_pmu_enable_all,
-	.enable			= x86_pmu_enable_event,
+	.enable_all		= amd_pmu_enable_all,
+	.enable			= amd_pmu_enable_event,
 	.disable		= amd_pmu_disable_event,
 	.hw_config		= amd_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
@@ -989,6 +1036,11 @@ static int __init amd_core_pmu_init(void)
 		x86_pmu.num_counters = EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx);
 
 		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
+
+		/* Update PMC handling functions */
+		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
+		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
+		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
 	}
 
 	/*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
                   ` (4 preceding siblings ...)
  2022-03-17  6:28 ` [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 12:01   ` Peter Zijlstra
  2022-03-22  7:06   ` Like Xu
  2022-03-17  6:28 ` [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support Sandipan Das
  6 siblings, 2 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

If AMD Performance Monitoring Version 2 (PerfMonV2) is
supported, use a new scheme to process Core PMC overflows
in the NMI handler using the new global control and status
registers. This will be bypassed on unsupported hardware
(x86_pmu.version < 2).

In x86_pmu_handle_irq(), overflows are detected by testing
the contents of the PERF_CTR register for each active PMC in
a loop. The new scheme instead inspects the overflow bits of
the global status register.

The Performance Counter Global Status (PerfCntrGlobalStatus)
register has overflow (PerfCntrOvfl) bits for each PMC. This
is, however, a read-only MSR. To acknowledge that overflows
have been processed, the NMI handler must clear the bits by
writing to the PerfCntrGlobalStatusClr register.

In x86_pmu_handle_irq(), PMCs counting the same event that
are started and stopped at the same time record slightly
different counts due to delays in between reads from the
PERF_CTR registers. This is fixed by stopping and starting
the PMCs at the same before and with a single write to the
Performance Counter Global Control (PerfCntrGlobalCtl) upon
entering and before exiting the NMI handler.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 125 +++++++++++++++++++++++++++++++++++--
 1 file changed, 121 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 532e9bd76bf1..fbbba981d0bd 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -7,6 +7,7 @@
 #include <linux/delay.h>
 #include <linux/jiffies.h>
 #include <asm/apicdef.h>
+#include <asm/apic.h>
 #include <asm/nmi.h>
 
 #include "../perf_event.h"
@@ -601,6 +602,45 @@ static inline void amd_pmu_set_global_ctl(u64 ctl)
 	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, ctl);
 }
 
+static inline u64 amd_pmu_get_global_overflow(void)
+{
+	u64 status;
+
+	/* PerfCntrGlobalStatus is read-only */
+	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
+
+	return status & amd_pmu_global_cntr_mask;
+}
+
+static inline void amd_pmu_ack_global_overflow(u64 status)
+{
+	/*
+	 * PerfCntrGlobalStatus is read-only but an overflow acknowledgment
+	 * mechanism exists; writing 1 to a bit in PerfCntrGlobalStatusClr
+	 * clears the same bit in PerfCntrGlobalStatus
+	 */
+
+	/* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
+	status &= amd_pmu_global_cntr_mask;
+	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
+}
+
+static bool amd_pmu_legacy_has_overflow(int idx)
+{
+	u64 counter;
+
+	rdmsrl(x86_pmu_event_addr(idx), counter);
+
+	return !(counter & BIT_ULL(x86_pmu.cntval_bits - 1));
+}
+
+static bool amd_pmu_global_has_overflow(int idx)
+{
+	return amd_pmu_get_global_overflow() & BIT_ULL(idx);
+}
+
+DEFINE_STATIC_CALL(amd_pmu_has_overflow, amd_pmu_legacy_has_overflow);
+
 /*
  * When a PMC counter overflows, an NMI is used to process the event and
  * reset the counter. NMI latency can result in the counter being updated
@@ -613,7 +653,6 @@ static inline void amd_pmu_set_global_ctl(u64 ctl)
 static void amd_pmu_wait_on_overflow(int idx)
 {
 	unsigned int i;
-	u64 counter;
 
 	/*
 	 * Wait for the counter to be reset if it has overflowed. This loop
@@ -621,8 +660,7 @@ static void amd_pmu_wait_on_overflow(int idx)
 	 * forever...
 	 */
 	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
-		rdmsrl(x86_pmu_event_addr(idx), counter);
-		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
+		if (!static_call(amd_pmu_has_overflow)(idx))
 			break;
 
 		/* Might be in IRQ context, so can't sleep */
@@ -718,6 +756,83 @@ static void amd_pmu_enable_event(struct perf_event *event)
 	static_call(amd_pmu_enable_event)(event);
 }
 
+static int amd_pmu_global_handle_irq(struct pt_regs *regs)
+{
+	struct perf_sample_data data;
+	struct cpu_hw_events *cpuc;
+	struct hw_perf_event *hwc;
+	struct perf_event *event;
+	u64 val, status, mask;
+	int handled = 0, idx;
+
+	status = amd_pmu_get_global_overflow();
+
+	/* Check if any overflows are pending */
+	if (!status)
+		return 0;
+
+	/* Stop counting */
+	amd_pmu_global_disable_all();
+
+	cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * Some chipsets need to unmask the LVTPC in a particular spot
+	 * inside the nmi handler.  As a result, the unmasking was
+	 * pushed into all the nmi handlers.
+	 *
+	 * This generic handler doesn't seem to have any issues where
+	 * the unmasking occurs so it was left at the top.
+	 *
+	 * N.B. Taken from x86_pmu_handle_irq()
+	 */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
+	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+		if (!test_bit(idx, cpuc->active_mask))
+			continue;
+
+		event = cpuc->events[idx];
+		hwc = &event->hw;
+		val = x86_perf_event_update(event);
+		mask = BIT_ULL(idx);
+
+		if (!(status & mask))
+			continue;
+
+		/* Event overflow */
+		handled++;
+		perf_sample_data_init(&data, 0, hwc->last_period);
+
+		if (!x86_perf_event_set_period(event))
+			continue;
+
+		if (perf_event_overflow(event, &data, regs))
+			x86_pmu_stop(event, 0);
+
+		status &= ~mask;
+	}
+
+	/*
+	 * It should never be the case that some overflows are not handled as
+	 * the corresponding PMCs are expected to be inactive according to the
+	 * active_mask
+	 */
+	WARN_ON(status > 0);
+
+	/* Clear overflow bits */
+	amd_pmu_ack_global_overflow(~status);
+
+	inc_irq_stat(apic_perf_irqs);
+
+	/* Resume counting */
+	amd_pmu_global_enable_all(0);
+
+	return handled;
+}
+
+DEFINE_STATIC_CALL(amd_pmu_handle_irq, x86_pmu_handle_irq);
+
 /*
  * Because of NMI latency, if multiple PMC counters are active or other sources
  * of NMIs are received, the perf NMI handler can handle one or more overflowed
@@ -741,7 +856,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
 	int handled;
 
 	/* Process any counter overflows */
-	handled = x86_pmu_handle_irq(regs);
+	handled = static_call(amd_pmu_handle_irq)(regs);
 
 	/*
 	 * If a counter was handled, record a timestamp such that un-handled
@@ -1041,6 +1156,8 @@ static int __init amd_core_pmu_init(void)
 		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
 		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
 		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
+		static_call_update(amd_pmu_has_overflow, amd_pmu_global_has_overflow);
+		static_call_update(amd_pmu_handle_irq, amd_pmu_global_handle_irq);
 	}
 
 	/*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support
  2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
                   ` (5 preceding siblings ...)
  2022-03-17  6:28 ` [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling Sandipan Das
@ 2022-03-17  6:28 ` Sandipan Das
  2022-03-17 12:07   ` Peter Zijlstra
  2022-03-22  7:31   ` Like Xu
  6 siblings, 2 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-17  6:28 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	eranian, ananth.narayan, ravi.bangoria, santosh.shukla,
	sandipan.das

CPUID 0xA provides information on Architectural Performance
Monitoring features on some x86 processors. It advertises a
PMU version which Qemu uses to determine the availability of
additional MSRs to manage the PMCs.

Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
the same, the kernel constructs return values based on the
x86_pmu_capability irrespective of the vendor.

This CPUID function and additional MSRs are not supported on
AMD processors. If PerfMonV2 is detected, the PMU version is
set to 2 and guest startup breaks because of an attempt to
access a non-existent MSR. Return zeros to avoid this.

Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
Reported-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/kvm/cpuid.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b8f8d268d058..1d9ca5726167 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -865,6 +865,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		union cpuid10_eax eax;
 		union cpuid10_edx edx;
 
+		if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+
 		perf_get_x86_pmu_capability(&cap);
 
 		/*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers
  2022-03-17  6:28 ` [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers Sandipan Das
@ 2022-03-17 11:25   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 11:25 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:31AM +0530, Sandipan Das wrote:
> Add MSR definitions that will be used to enable the new AMD
> Performance Monitoring Version 2 (PerfMonV2) features. These
> include:
> 
>   * Performance Counter Global Control (PerfCntrGlobalCtl)
>   * Performance Counter Global Status (PerfCntrGlobalStatus)
>   * Performance Counter Global Status Clear (PerfCntrGlobalStatusClr)
> 
> The new Performance Counter Global Control and Status MSRs
> provide an interface for enabling or disabling multiple
> counters at the same time and for testing overflow without
> probing the individual registers for each PMC.
> 
> The availability of these registers is indicated through the
> PerfMonV2 feature bit of CPUID Fn80000022[EAX].
> 
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index a4a39c3e0f19..61d1a55e15b8 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -504,6 +504,11 @@
>  #define AMD_CPPC_DES_PERF(x)		(((x) & 0xff) << 16)
>  #define AMD_CPPC_ENERGY_PERF_PREF(x)	(((x) & 0xff) << 24)
>  
> +/* AMD Performance Counter Global Status and Control MSRs */
> +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS	0xc0000300
> +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR	0xc0000302
> +#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL		0xc0000301

My OCD would suggest ordering them by number :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support
  2022-03-17  6:28 ` [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support Sandipan Das
@ 2022-03-17 11:27   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 11:27 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:32AM +0530, Sandipan Das wrote:
> AMD Performance Monitoring Version 2 (PerfMonV2) introduces
> some new Core PMU features such as detection of the number
> of available PMCs and managing PMCs using global registers
> namely, PerfCntrGlobalCtl and PerfCntrGlobalStatus.
> 
> Clearing PerfCntrGlobalCtl and PerfCntrGlobalStatus ensures
> that all PMCs are inactive and have no pending overflows
> when CPUs are onlined or offlined.
> 
> The PMU version (x86_pmu.version) now indicates PerfMonV2
> support and will be used to bypass the new features on
> unsupported processors.
> 
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---

> +static void amd_pmu_cpu_reset(int cpu)
> +{
> +	if (x86_pmu.version < 2)
> +		return;
> +
> +	/* Clear enable bits i.e. PerfCntrGlobalCtl.PerfCntrEn */
> +	wrmsrl_on_cpu(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
> +
> +	/* Clear overflow bits i.e. PerfCntrGLobalStatus.PerfCntrOvfl */
> +	wrmsrl_on_cpu(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
> +		      amd_pmu_global_cntr_mask);
> +}

I think these can be wrmsrl() both starting and dead run on the target
cpu.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/7] perf/x86/amd/core: Detect available counters
  2022-03-17  6:28 ` [PATCH 4/7] perf/x86/amd/core: Detect available counters Sandipan Das
@ 2022-03-17 11:32   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 11:32 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:33AM +0530, Sandipan Das wrote:
> If AMD Performance Monitoring Version 2 (PerfMonV2) is
> supported, use CPUID Fn80000022[EBX] to detect the number
> of Core PMCs. This offers more flexibility if the counts
> change across processor families.
> 
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  arch/x86/events/amd/core.c        | 5 +++++
>  arch/x86/include/asm/perf_event.h | 8 ++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index a074af97faa9..05d79afe5173 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -980,9 +980,14 @@ static int __init amd_core_pmu_init(void)
>  
>  	/* Check for Performance Monitoring v2 support */
>  	if (boot_cpu_has(X86_FEATURE_PERFMON_V2)) {
> +		int ebx = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES);
> +
>  		/* Update PMU version for later usage */
>  		x86_pmu.version = 2;
>  
> +		/* Find the number of available Core PMCs */
> +		x86_pmu.num_counters = EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx);
> +
>  		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
>  	}

I prefer using unions like cpuid10_ebx. Such much easier to read than
all this shouting.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control
  2022-03-17  6:28 ` [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control Sandipan Das
@ 2022-03-17 11:46   ` Peter Zijlstra
  2022-03-18  8:02     ` Sandipan Das
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 11:46 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:34AM +0530, Sandipan Das wrote:
> @@ -625,12 +630,32 @@ static void amd_pmu_wait_on_overflow(int idx)
>  	}
>  }
>  
> +static void amd_pmu_global_enable_all(int added)
> +{
> +	amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pmu_enable_all, x86_pmu_enable_all);
> +
> +static void amd_pmu_enable_all(int added)
> +{
> +	static_call(amd_pmu_enable_all)(added);
> +}
> +
> +static void amd_pmu_global_disable_all(void)
> +{
> +	/* Disable all PMCs */
> +	amd_pmu_set_global_ctl(0);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pmu_disable_all, x86_pmu_disable_all);
> +
>  static void amd_pmu_disable_all(void)
>  {
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	int idx;
>  
> -	x86_pmu_disable_all();
> +	static_call(amd_pmu_disable_all)();
>  
>  	/*
>  	 * This shouldn't be called from NMI context, but add a safeguard here
> @@ -671,6 +696,28 @@ static void amd_pmu_disable_event(struct perf_event *event)
>  	amd_pmu_wait_on_overflow(event->hw.idx);
>  }
>  
> +static void amd_pmu_global_enable_event(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	/*
> +	 * Testing cpu_hw_events.enabled should be skipped in this case unlike
> +	 * in x86_pmu_enable_event().
> +	 *
> +	 * Since cpu_hw_events.enabled is set only after returning from
> +	 * x86_pmu_start(), the PMCs must be programmed and kept ready.
> +	 * Counting starts only after x86_pmu_enable_all() is called.
> +	 */
> +	__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pmu_enable_event, x86_pmu_enable_event);
> +
> +static void amd_pmu_enable_event(struct perf_event *event)
> +{
> +	static_call(amd_pmu_enable_event)(event);
> +}
> +
>  /*
>   * Because of NMI latency, if multiple PMC counters are active or other sources
>   * of NMIs are received, the perf NMI handler can handle one or more overflowed
> @@ -929,8 +976,8 @@ static __initconst const struct x86_pmu amd_pmu = {
>  	.name			= "AMD",
>  	.handle_irq		= amd_pmu_handle_irq,
>  	.disable_all		= amd_pmu_disable_all,
> -	.enable_all		= x86_pmu_enable_all,
> -	.enable			= x86_pmu_enable_event,
> +	.enable_all		= amd_pmu_enable_all,
> +	.enable			= amd_pmu_enable_event,
>  	.disable		= amd_pmu_disable_event,
>  	.hw_config		= amd_pmu_hw_config,
>  	.schedule_events	= x86_schedule_events,
> @@ -989,6 +1036,11 @@ static int __init amd_core_pmu_init(void)
>  		x86_pmu.num_counters = EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx);
>  
>  		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
> +
> +		/* Update PMC handling functions */
> +		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
> +		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
> +		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
>  	}


This makes no sense to me...

First and foremost, *please* tell me your shiny new hardware fixed the
terrible behaviour that requires the wait_on_overflow hacks in
amd_pmu_disable_all().

Second, all these x86_pmu methods are already static_calls per
arch/x86/events/core.c. So what you want to do is something like:

	x86_pmu = amd_pmu;
	if (amd_v2) {
		x86_pmu.disable_all = amd_v2_disable_all;
		x86_pmu.enable_all  = amd_v2_enable_all;
	}

And leave it at that.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-17  6:28 ` [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling Sandipan Das
@ 2022-03-17 12:01   ` Peter Zijlstra
  2022-03-17 17:45     ` Stephane Eranian
  2022-03-18  8:18     ` Sandipan Das
  2022-03-22  7:06   ` Like Xu
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 12:01 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:35AM +0530, Sandipan Das wrote:

> +static inline u64 amd_pmu_get_global_overflow(void)
> +{
> +	u64 status;
> +
> +	/* PerfCntrGlobalStatus is read-only */
> +	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
> +
> +	return status & amd_pmu_global_cntr_mask;
> +}
> +
> +static inline void amd_pmu_ack_global_overflow(u64 status)
> +{
> +	/*
> +	 * PerfCntrGlobalStatus is read-only but an overflow acknowledgment
> +	 * mechanism exists; writing 1 to a bit in PerfCntrGlobalStatusClr
> +	 * clears the same bit in PerfCntrGlobalStatus
> +	 */
> +
> +	/* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
> +	status &= amd_pmu_global_cntr_mask;
> +	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
> +}
> +
> +static bool amd_pmu_legacy_has_overflow(int idx)
> +{
> +	u64 counter;
> +
> +	rdmsrl(x86_pmu_event_addr(idx), counter);
> +
> +	return !(counter & BIT_ULL(x86_pmu.cntval_bits - 1));
> +}
> +
> +static bool amd_pmu_global_has_overflow(int idx)
> +{
> +	return amd_pmu_get_global_overflow() & BIT_ULL(idx);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pmu_has_overflow, amd_pmu_legacy_has_overflow);
> +
>  /*
>   * When a PMC counter overflows, an NMI is used to process the event and
>   * reset the counter. NMI latency can result in the counter being updated
> @@ -613,7 +653,6 @@ static inline void amd_pmu_set_global_ctl(u64 ctl)
>  static void amd_pmu_wait_on_overflow(int idx)
>  {
>  	unsigned int i;
> -	u64 counter;
>  
>  	/*
>  	 * Wait for the counter to be reset if it has overflowed. This loop
> @@ -621,8 +660,7 @@ static void amd_pmu_wait_on_overflow(int idx)
>  	 * forever...
>  	 */
>  	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
> -		rdmsrl(x86_pmu_event_addr(idx), counter);
> -		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
> +		if (!static_call(amd_pmu_has_overflow)(idx))
>  			break;
>  
>  		/* Might be in IRQ context, so can't sleep */

This scares me... please tell me you fixed that mess.

> @@ -718,6 +756,83 @@ static void amd_pmu_enable_event(struct perf_event *event)
>  	static_call(amd_pmu_enable_event)(event);
>  }
>  
> +static int amd_pmu_global_handle_irq(struct pt_regs *regs)
> +{
> +	struct perf_sample_data data;
> +	struct cpu_hw_events *cpuc;
> +	struct hw_perf_event *hwc;
> +	struct perf_event *event;
> +	u64 val, status, mask;
> +	int handled = 0, idx;
> +
> +	status = amd_pmu_get_global_overflow();
> +
> +	/* Check if any overflows are pending */
> +	if (!status)
> +		return 0;
> +
> +	/* Stop counting */
> +	amd_pmu_global_disable_all();


This seems weird to me, I'd first disable it, then read status. MSR
access is expensive, you want to shut down the PMU asap.

Also, this is written like PMI would not be the primary NMI source,
which seems somewhat unlikely.

> +
> +	cpuc = this_cpu_ptr(&cpu_hw_events);
> +
> +	/*
> +	 * Some chipsets need to unmask the LVTPC in a particular spot
> +	 * inside the nmi handler.  As a result, the unmasking was
> +	 * pushed into all the nmi handlers.
> +	 *
> +	 * This generic handler doesn't seem to have any issues where
> +	 * the unmasking occurs so it was left at the top.
> +	 *
> +	 * N.B. Taken from x86_pmu_handle_irq()
> +	 */

Please write an AMD specific comment here. Note how 'recent' Intel chips
ended up pushing this to the end of the handler. Verify with your
hardware team where they want this and write as much of the rationale as
you're allowed to share in the comment.

> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +
> +	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
> +		if (!test_bit(idx, cpuc->active_mask))
> +			continue;
> +
> +		event = cpuc->events[idx];
> +		hwc = &event->hw;
> +		val = x86_perf_event_update(event);
> +		mask = BIT_ULL(idx);
> +
> +		if (!(status & mask))
> +			continue;
> +
> +		/* Event overflow */
> +		handled++;
> +		perf_sample_data_init(&data, 0, hwc->last_period);
> +
> +		if (!x86_perf_event_set_period(event))
> +			continue;
> +
> +		if (perf_event_overflow(event, &data, regs))
> +			x86_pmu_stop(event, 0);
> +
> +		status &= ~mask;
> +	}
> +
> +	/*
> +	 * It should never be the case that some overflows are not handled as
> +	 * the corresponding PMCs are expected to be inactive according to the
> +	 * active_mask
> +	 */
> +	WARN_ON(status > 0);
> +
> +	/* Clear overflow bits */
> +	amd_pmu_ack_global_overflow(~status);
> +
> +	inc_irq_stat(apic_perf_irqs);
> +
> +	/* Resume counting */
> +	amd_pmu_global_enable_all(0);

I think this is broken vs perf_pmu_{dis,en}able(), note how
intel_pmu_handle_irq() saves/restores the enable state.

> +
> +	return handled;
> +}
> +
> +DEFINE_STATIC_CALL(amd_pmu_handle_irq, x86_pmu_handle_irq);
> +
>  /*
>   * Because of NMI latency, if multiple PMC counters are active or other sources
>   * of NMIs are received, the perf NMI handler can handle one or more overflowed
> @@ -741,7 +856,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
>  	int handled;
>  
>  	/* Process any counter overflows */
> -	handled = x86_pmu_handle_irq(regs);
> +	handled = static_call(amd_pmu_handle_irq)(regs);
>  
>  	/*
>  	 * If a counter was handled, record a timestamp such that un-handled
> @@ -1041,6 +1156,8 @@ static int __init amd_core_pmu_init(void)
>  		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
>  		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
>  		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
> +		static_call_update(amd_pmu_has_overflow, amd_pmu_global_has_overflow);
> +		static_call_update(amd_pmu_handle_irq, amd_pmu_global_handle_irq);
>  	}

Same, all this static_call() stuff is misguided.

Also, if you feel like it, you can create amd_pmu_v2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support
  2022-03-17  6:28 ` [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support Sandipan Das
@ 2022-03-17 12:07   ` Peter Zijlstra
  2022-03-18  7:59     ` Sandipan Das
  2022-03-22  7:31   ` Like Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-17 12:07 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 11:58:36AM +0530, Sandipan Das wrote:
> CPUID 0xA provides information on Architectural Performance
> Monitoring features on some x86 processors. It advertises a
> PMU version which Qemu uses to determine the availability of
> additional MSRs to manage the PMCs.
> 
> Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
> the same, the kernel constructs return values based on the
> x86_pmu_capability irrespective of the vendor.
> 
> This CPUID function and additional MSRs are not supported on
> AMD processors. If PerfMonV2 is detected, the PMU version is
> set to 2 and guest startup breaks because of an attempt to
> access a non-existent MSR. Return zeros to avoid this.
> 
> Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
> Reported-by: Vasant Hegde <vasant.hegde@amd.com>
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  arch/x86/kvm/cpuid.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b8f8d268d058..1d9ca5726167 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -865,6 +865,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>  		union cpuid10_eax eax;
>  		union cpuid10_edx edx;
>  
> +		if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> +			break;
> +		}
> +

Because actually implementing perfmon-v2 would've been too convenient,
right? IIRC you're very close to actually supporing perfmon-v2
capability wise here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-17 12:01   ` Peter Zijlstra
@ 2022-03-17 17:45     ` Stephane Eranian
  2022-03-18  8:18     ` Sandipan Das
  1 sibling, 0 replies; 22+ messages in thread
From: Stephane Eranian @ 2022-03-17 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sandipan Das, linux-kernel, linux-perf-users, x86, bp,
	dave.hansen, acme, mark.rutland, alexander.shishkin, namhyung,
	jolsa, tglx, mingo, pbonzini, jmattson, like.xu.linux,
	ananth.narayan, ravi.bangoria, santosh.shukla

On Thu, Mar 17, 2022 at 5:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 17, 2022 at 11:58:35AM +0530, Sandipan Das wrote:
>
> > +static inline u64 amd_pmu_get_global_overflow(void)
> > +{
> > +     u64 status;
> > +
> > +     /* PerfCntrGlobalStatus is read-only */
> > +     rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
> > +
> > +     return status & amd_pmu_global_cntr_mask;
> > +}
> > +
> > +static inline void amd_pmu_ack_global_overflow(u64 status)
> > +{
> > +     /*
> > +      * PerfCntrGlobalStatus is read-only but an overflow acknowledgment
> > +      * mechanism exists; writing 1 to a bit in PerfCntrGlobalStatusClr
> > +      * clears the same bit in PerfCntrGlobalStatus
> > +      */
> > +
> > +     /* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
> > +     status &= amd_pmu_global_cntr_mask;
> > +     wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
> > +}
> > +
> > +static bool amd_pmu_legacy_has_overflow(int idx)
> > +{
> > +     u64 counter;
> > +
> > +     rdmsrl(x86_pmu_event_addr(idx), counter);
> > +
> > +     return !(counter & BIT_ULL(x86_pmu.cntval_bits - 1));
> > +}
> > +
> > +static bool amd_pmu_global_has_overflow(int idx)
> > +{
> > +     return amd_pmu_get_global_overflow() & BIT_ULL(idx);
> > +}
> > +
> > +DEFINE_STATIC_CALL(amd_pmu_has_overflow, amd_pmu_legacy_has_overflow);
> > +
> >  /*
> >   * When a PMC counter overflows, an NMI is used to process the event and
> >   * reset the counter. NMI latency can result in the counter being updated
> > @@ -613,7 +653,6 @@ static inline void amd_pmu_set_global_ctl(u64 ctl)
> >  static void amd_pmu_wait_on_overflow(int idx)
> >  {
> >       unsigned int i;
> > -     u64 counter;
> >
> >       /*
> >        * Wait for the counter to be reset if it has overflowed. This loop
> > @@ -621,8 +660,7 @@ static void amd_pmu_wait_on_overflow(int idx)
> >        * forever...
> >        */
> >       for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
> > -             rdmsrl(x86_pmu_event_addr(idx), counter);
> > -             if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
> > +             if (!static_call(amd_pmu_has_overflow)(idx))
> >                       break;
> >
> >               /* Might be in IRQ context, so can't sleep */
>
> This scares me... please tell me you fixed that mess.
>
> > @@ -718,6 +756,83 @@ static void amd_pmu_enable_event(struct perf_event *event)
> >       static_call(amd_pmu_enable_event)(event);
> >  }
> >
> > +static int amd_pmu_global_handle_irq(struct pt_regs *regs)
> > +{
> > +     struct perf_sample_data data;
> > +     struct cpu_hw_events *cpuc;
> > +     struct hw_perf_event *hwc;
> > +     struct perf_event *event;
> > +     u64 val, status, mask;
> > +     int handled = 0, idx;
> > +
> > +     status = amd_pmu_get_global_overflow();
> > +
> > +     /* Check if any overflows are pending */
> > +     if (!status)
> > +             return 0;
> > +
> > +     /* Stop counting */
> > +     amd_pmu_global_disable_all();
>
>
> This seems weird to me, I'd first disable it, then read status. MSR
> access is expensive, you want to shut down the PMU asap.
>
> Also, this is written like PMI would not be the primary NMI source,
> which seems somewhat unlikely.
>
> > +
> > +     cpuc = this_cpu_ptr(&cpu_hw_events);
> > +
> > +     /*
> > +      * Some chipsets need to unmask the LVTPC in a particular spot
> > +      * inside the nmi handler.  As a result, the unmasking was
> > +      * pushed into all the nmi handlers.
> > +      *
> > +      * This generic handler doesn't seem to have any issues where
> > +      * the unmasking occurs so it was left at the top.
> > +      *
> > +      * N.B. Taken from x86_pmu_handle_irq()
> > +      */
>
> Please write an AMD specific comment here. Note how 'recent' Intel chips
> ended up pushing this to the end of the handler. Verify with your
> hardware team where they want this and write as much of the rationale as
> you're allowed to share in the comment.
>
> > +     apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +
> > +     for (idx = 0; idx < x86_pmu.num_counters; idx++) {
> > +             if (!test_bit(idx, cpuc->active_mask))
> > +                     continue;
> > +
> > +             event = cpuc->events[idx];
> > +             hwc = &event->hw;
> > +             val = x86_perf_event_update(event);
> > +             mask = BIT_ULL(idx);
> > +
> > +             if (!(status & mask))
> > +                     continue;
> > +
> > +             /* Event overflow */
> > +             handled++;
> > +             perf_sample_data_init(&data, 0, hwc->last_period);
> > +
> > +             if (!x86_perf_event_set_period(event))
> > +                     continue;
> > +
> > +             if (perf_event_overflow(event, &data, regs))
> > +                     x86_pmu_stop(event, 0);
> > +
> > +             status &= ~mask;
> > +     }
> > +
> > +     /*
> > +      * It should never be the case that some overflows are not handled as
> > +      * the corresponding PMCs are expected to be inactive according to the
> > +      * active_mask
> > +      */
> > +     WARN_ON(status > 0);
> > +
> > +     /* Clear overflow bits */
> > +     amd_pmu_ack_global_overflow(~status);
> > +
> > +     inc_irq_stat(apic_perf_irqs);
> > +
> > +     /* Resume counting */
> > +     amd_pmu_global_enable_all(0);
>
> I think this is broken vs perf_pmu_{dis,en}able(), note how
> intel_pmu_handle_irq() saves/restores the enable state.
>
> > +
> > +     return handled;
> > +}
> > +
> > +DEFINE_STATIC_CALL(amd_pmu_handle_irq, x86_pmu_handle_irq);
> > +
> >  /*
> >   * Because of NMI latency, if multiple PMC counters are active or other sources
> >   * of NMIs are received, the perf NMI handler can handle one or more overflowed
> > @@ -741,7 +856,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
> >       int handled;
> >
> >       /* Process any counter overflows */
> > -     handled = x86_pmu_handle_irq(regs);
> > +     handled = static_call(amd_pmu_handle_irq)(regs);
> >
> >       /*
> >        * If a counter was handled, record a timestamp such that un-handled
> > @@ -1041,6 +1156,8 @@ static int __init amd_core_pmu_init(void)
> >               static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
> >               static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
> >               static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
> > +             static_call_update(amd_pmu_has_overflow, amd_pmu_global_has_overflow);
> > +             static_call_update(amd_pmu_handle_irq, amd_pmu_global_handle_irq);
> >       }
>
> Same, all this static_call() stuff is misguided.
>
> Also, if you feel like it, you can create amd_pmu_v2.

Given the number of overrides, that would also make more sense to me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support
  2022-03-17 12:07   ` Peter Zijlstra
@ 2022-03-18  7:59     ` Sandipan Das
  0 siblings, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-18  7:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla


On 3/17/2022 5:37 PM, Peter Zijlstra wrote:
> On Thu, Mar 17, 2022 at 11:58:36AM +0530, Sandipan Das wrote:
>> CPUID 0xA provides information on Architectural Performance
>> Monitoring features on some x86 processors. It advertises a
>> PMU version which Qemu uses to determine the availability of
>> additional MSRs to manage the PMCs.
>>
>> Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
>> the same, the kernel constructs return values based on the
>> x86_pmu_capability irrespective of the vendor.
>>
>> This CPUID function and additional MSRs are not supported on
>> AMD processors. If PerfMonV2 is detected, the PMU version is
>> set to 2 and guest startup breaks because of an attempt to
>> access a non-existent MSR. Return zeros to avoid this.
>>
>> Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
>> Reported-by: Vasant Hegde <vasant.hegde@amd.com>
>> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>> ---
>>  arch/x86/kvm/cpuid.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index b8f8d268d058..1d9ca5726167 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -865,6 +865,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>>  		union cpuid10_eax eax;
>>  		union cpuid10_edx edx;
>>  
>> +		if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
>> +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>> +			break;
>> +		}
>> +
> 
> Because actually implementing perfmon-v2 would've been too convenient,
> right? IIRC you're very close to actually supporing perfmon-v2
> capability wise here.

That's in the works as well. However, in this case, Qemu will have
to test CPUID Fn8000_0022 EAX bit 0 to see if the global control and
status registers are available.

- Sandipan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control
  2022-03-17 11:46   ` Peter Zijlstra
@ 2022-03-18  8:02     ` Sandipan Das
  2022-03-18 10:52       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2022-03-18  8:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla


On 3/17/2022 5:16 PM, Peter Zijlstra wrote:
> On Thu, Mar 17, 2022 at 11:58:34AM +0530, Sandipan Das wrote:
>> @@ -625,12 +630,32 @@ static void amd_pmu_wait_on_overflow(int idx)
>>  	}
>>  }
>>  
>> +static void amd_pmu_global_enable_all(int added)
>> +{
>> +	amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask);
>> +}
>> +
>> +DEFINE_STATIC_CALL(amd_pmu_enable_all, x86_pmu_enable_all);
>> +
>> +static void amd_pmu_enable_all(int added)
>> +{
>> +	static_call(amd_pmu_enable_all)(added);
>> +}
>> +
>> +static void amd_pmu_global_disable_all(void)
>> +{
>> +	/* Disable all PMCs */
>> +	amd_pmu_set_global_ctl(0);
>> +}
>> +
>> +DEFINE_STATIC_CALL(amd_pmu_disable_all, x86_pmu_disable_all);
>> +
>>  static void amd_pmu_disable_all(void)
>>  {
>>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>  	int idx;
>>  
>> -	x86_pmu_disable_all();
>> +	static_call(amd_pmu_disable_all)();
>>  
>>  	/*
>>  	 * This shouldn't be called from NMI context, but add a safeguard here
>> @@ -671,6 +696,28 @@ static void amd_pmu_disable_event(struct perf_event *event)
>>  	amd_pmu_wait_on_overflow(event->hw.idx);
>>  }
>>  
>> +static void amd_pmu_global_enable_event(struct perf_event *event)
>> +{
>> +	struct hw_perf_event *hwc = &event->hw;
>> +
>> +	/*
>> +	 * Testing cpu_hw_events.enabled should be skipped in this case unlike
>> +	 * in x86_pmu_enable_event().
>> +	 *
>> +	 * Since cpu_hw_events.enabled is set only after returning from
>> +	 * x86_pmu_start(), the PMCs must be programmed and kept ready.
>> +	 * Counting starts only after x86_pmu_enable_all() is called.
>> +	 */
>> +	__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
>> +}
>> +
>> +DEFINE_STATIC_CALL(amd_pmu_enable_event, x86_pmu_enable_event);
>> +
>> +static void amd_pmu_enable_event(struct perf_event *event)
>> +{
>> +	static_call(amd_pmu_enable_event)(event);
>> +}
>> +
>>  /*
>>   * Because of NMI latency, if multiple PMC counters are active or other sources
>>   * of NMIs are received, the perf NMI handler can handle one or more overflowed
>> @@ -929,8 +976,8 @@ static __initconst const struct x86_pmu amd_pmu = {
>>  	.name			= "AMD",
>>  	.handle_irq		= amd_pmu_handle_irq,
>>  	.disable_all		= amd_pmu_disable_all,
>> -	.enable_all		= x86_pmu_enable_all,
>> -	.enable			= x86_pmu_enable_event,
>> +	.enable_all		= amd_pmu_enable_all,
>> +	.enable			= amd_pmu_enable_event,
>>  	.disable		= amd_pmu_disable_event,
>>  	.hw_config		= amd_pmu_hw_config,
>>  	.schedule_events	= x86_schedule_events,
>> @@ -989,6 +1036,11 @@ static int __init amd_core_pmu_init(void)
>>  		x86_pmu.num_counters = EXT_PERFMON_DEBUG_NUM_CORE_PMC(ebx);
>>  
>>  		amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
>> +
>> +		/* Update PMC handling functions */
>> +		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
>> +		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
>> +		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
>>  	}
> 
> 
> This makes no sense to me...
> 
> First and foremost, *please* tell me your shiny new hardware fixed the
> terrible behaviour that requires the wait_on_overflow hacks in
> amd_pmu_disable_all().

Unfortunately, that workaround is still required.

> 
> Second, all these x86_pmu methods are already static_calls per
> arch/x86/events/core.c. So what you want to do is something like:
> 
> 	x86_pmu = amd_pmu;
> 	if (amd_v2) {
> 		x86_pmu.disable_all = amd_v2_disable_all;
> 		x86_pmu.enable_all  = amd_v2_enable_all;
> 	}
> 
> And leave it at that.
> 

Sure. I'll clean this up.

- Sandipan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-17 12:01   ` Peter Zijlstra
  2022-03-17 17:45     ` Stephane Eranian
@ 2022-03-18  8:18     ` Sandipan Das
  1 sibling, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-18  8:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla


On 3/17/2022 5:31 PM, Peter Zijlstra wrote:
> On Thu, Mar 17, 2022 at 11:58:35AM +0530, Sandipan Das wrote:
> 
>> +static inline u64 amd_pmu_get_global_overflow(void)
>> +{
>> +	u64 status;
>> +
>> +	/* PerfCntrGlobalStatus is read-only */
>> +	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
>> +
>> +	return status & amd_pmu_global_cntr_mask;
>> +}
>> +
>> +static inline void amd_pmu_ack_global_overflow(u64 status)
>> +{
>> +	/*
>> +	 * PerfCntrGlobalStatus is read-only but an overflow acknowledgment
>> +	 * mechanism exists; writing 1 to a bit in PerfCntrGlobalStatusClr
>> +	 * clears the same bit in PerfCntrGlobalStatus
>> +	 */
>> +
>> +	/* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
>> +	status &= amd_pmu_global_cntr_mask;
>> +	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
>> +}
>> +
>> +static bool amd_pmu_legacy_has_overflow(int idx)
>> +{
>> +	u64 counter;
>> +
>> +	rdmsrl(x86_pmu_event_addr(idx), counter);
>> +
>> +	return !(counter & BIT_ULL(x86_pmu.cntval_bits - 1));
>> +}
>> +
>> +static bool amd_pmu_global_has_overflow(int idx)
>> +{
>> +	return amd_pmu_get_global_overflow() & BIT_ULL(idx);
>> +}
>> +
>> +DEFINE_STATIC_CALL(amd_pmu_has_overflow, amd_pmu_legacy_has_overflow);
>> +
>>  /*
>>   * When a PMC counter overflows, an NMI is used to process the event and
>>   * reset the counter. NMI latency can result in the counter being updated
>> @@ -613,7 +653,6 @@ static inline void amd_pmu_set_global_ctl(u64 ctl)
>>  static void amd_pmu_wait_on_overflow(int idx)
>>  {
>>  	unsigned int i;
>> -	u64 counter;
>>  
>>  	/*
>>  	 * Wait for the counter to be reset if it has overflowed. This loop
>> @@ -621,8 +660,7 @@ static void amd_pmu_wait_on_overflow(int idx)
>>  	 * forever...
>>  	 */
>>  	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
>> -		rdmsrl(x86_pmu_event_addr(idx), counter);
>> -		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
>> +		if (!static_call(amd_pmu_has_overflow)(idx))
>>  			break;
>>  
>>  		/* Might be in IRQ context, so can't sleep */
> 
> This scares me... please tell me you fixed that mess.
> 
>> @@ -718,6 +756,83 @@ static void amd_pmu_enable_event(struct perf_event *event)
>>  	static_call(amd_pmu_enable_event)(event);
>>  }
>>  
>> +static int amd_pmu_global_handle_irq(struct pt_regs *regs)
>> +{
>> +	struct perf_sample_data data;
>> +	struct cpu_hw_events *cpuc;
>> +	struct hw_perf_event *hwc;
>> +	struct perf_event *event;
>> +	u64 val, status, mask;
>> +	int handled = 0, idx;
>> +
>> +	status = amd_pmu_get_global_overflow();
>> +
>> +	/* Check if any overflows are pending */
>> +	if (!status)
>> +		return 0;
>> +
>> +	/* Stop counting */
>> +	amd_pmu_global_disable_all();
> 
> 
> This seems weird to me, I'd first disable it, then read status. MSR
> access is expensive, you want to shut down the PMU asap.
> 
> Also, this is written like PMI would not be the primary NMI source,
> which seems somewhat unlikely.
> 

Yes, PMI is the primary NMI source. Will fix this.

>> +
>> +	cpuc = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	/*
>> +	 * Some chipsets need to unmask the LVTPC in a particular spot
>> +	 * inside the nmi handler.  As a result, the unmasking was
>> +	 * pushed into all the nmi handlers.
>> +	 *
>> +	 * This generic handler doesn't seem to have any issues where
>> +	 * the unmasking occurs so it was left at the top.
>> +	 *
>> +	 * N.B. Taken from x86_pmu_handle_irq()
>> +	 */
> 
> Please write an AMD specific comment here. Note how 'recent' Intel chips
> ended up pushing this to the end of the handler. Verify with your
> hardware team where they want this and write as much of the rationale as
> you're allowed to share in the comment.
> 

Sure. I'll follow-up on this.

>> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +
>> +	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
>> +		if (!test_bit(idx, cpuc->active_mask))
>> +			continue;
>> +
>> +		event = cpuc->events[idx];
>> +		hwc = &event->hw;
>> +		val = x86_perf_event_update(event);
>> +		mask = BIT_ULL(idx);
>> +
>> +		if (!(status & mask))
>> +			continue;
>> +
>> +		/* Event overflow */
>> +		handled++;
>> +		perf_sample_data_init(&data, 0, hwc->last_period);
>> +
>> +		if (!x86_perf_event_set_period(event))
>> +			continue;
>> +
>> +		if (perf_event_overflow(event, &data, regs))
>> +			x86_pmu_stop(event, 0);
>> +
>> +		status &= ~mask;
>> +	}
>> +
>> +	/*
>> +	 * It should never be the case that some overflows are not handled as
>> +	 * the corresponding PMCs are expected to be inactive according to the
>> +	 * active_mask
>> +	 */
>> +	WARN_ON(status > 0);
>> +
>> +	/* Clear overflow bits */
>> +	amd_pmu_ack_global_overflow(~status);
>> +
>> +	inc_irq_stat(apic_perf_irqs);
>> +
>> +	/* Resume counting */
>> +	amd_pmu_global_enable_all(0);
> 
> I think this is broken vs perf_pmu_{dis,en}able(), note how
> intel_pmu_handle_irq() saves/restores the enable state.
> 

Yes, it is. Will fix it.

>> +
>> +	return handled;
>> +}
>> +
>> +DEFINE_STATIC_CALL(amd_pmu_handle_irq, x86_pmu_handle_irq);
>> +
>>  /*
>>   * Because of NMI latency, if multiple PMC counters are active or other sources
>>   * of NMIs are received, the perf NMI handler can handle one or more overflowed
>> @@ -741,7 +856,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
>>  	int handled;
>>  
>>  	/* Process any counter overflows */
>> -	handled = x86_pmu_handle_irq(regs);
>> +	handled = static_call(amd_pmu_handle_irq)(regs);
>>  
>>  	/*
>>  	 * If a counter was handled, record a timestamp such that un-handled
>> @@ -1041,6 +1156,8 @@ static int __init amd_core_pmu_init(void)
>>  		static_call_update(amd_pmu_enable_all, amd_pmu_global_enable_all);
>>  		static_call_update(amd_pmu_disable_all, amd_pmu_global_disable_all);
>>  		static_call_update(amd_pmu_enable_event, amd_pmu_global_enable_event);
>> +		static_call_update(amd_pmu_has_overflow, amd_pmu_global_has_overflow);
>> +		static_call_update(amd_pmu_handle_irq, amd_pmu_global_handle_irq);
>>  	}
> 
> Same, all this static_call() stuff is misguided.
> 
> Also, if you feel like it, you can create amd_pmu_v2.

Sure.

- Sandipan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control
  2022-03-18  8:02     ` Sandipan Das
@ 2022-03-18 10:52       ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-03-18 10:52 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, dave.hansen, acme,
	mark.rutland, alexander.shishkin, namhyung, jolsa, tglx, mingo,
	pbonzini, jmattson, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Fri, Mar 18, 2022 at 01:32:04PM +0530, Sandipan Das wrote:
> On 3/17/2022 5:16 PM, Peter Zijlstra wrote:

> > First and foremost, *please* tell me your shiny new hardware fixed the
> > terrible behaviour that requires the wait_on_overflow hacks in
> > amd_pmu_disable_all().
> 
> Unfortunately, that workaround is still required.

So much sadness :-(

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-17  6:28 ` [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling Sandipan Das
  2022-03-17 12:01   ` Peter Zijlstra
@ 2022-03-22  7:06   ` Like Xu
  2022-03-22  8:37     ` Sandipan Das
  1 sibling, 1 reply; 22+ messages in thread
From: Like Xu @ 2022-03-22  7:06 UTC (permalink / raw)
  To: Sandipan Das
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, linux-kernel,
	linux-perf-users, x86

On 17/3/2022 2:28 pm, Sandipan Das wrote:
> +		val = x86_perf_event_update(event);

The variable 'val' set but not used.

> +		mask = BIT_ULL(idx);
> +
> +		if (!(status & mask))

Needed here ?

> +			continue;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support
  2022-03-17  6:28 ` [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support Sandipan Das
  2022-03-17 12:07   ` Peter Zijlstra
@ 2022-03-22  7:31   ` Like Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Like Xu @ 2022-03-22  7:31 UTC (permalink / raw)
  To: Sandipan Das
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, kvm list,
	linux-kernel, linux-perf-users, x86

On 17/3/2022 2:28 pm, Sandipan Das wrote:
> CPUID 0xA provides information on Architectural Performance
> Monitoring features on some x86 processors. It advertises a
> PMU version which Qemu uses to determine the availability of
> additional MSRs to manage the PMCs.
> 
> Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
> the same, the kernel constructs return values based on the
> x86_pmu_capability irrespective of the vendor.
> 
> This CPUID function and additional MSRs are not supported on
> AMD processors. If PerfMonV2 is detected, the PMU version is
> set to 2 and guest startup breaks because of an attempt to
> access a non-existent MSR. Return zeros to avoid this.
> 
> Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
> Reported-by: Vasant Hegde <vasant.hegde@amd.com>

The new 0003 patch introduces this issue (and more kvm issues)
due to "x86_pmu.version = 2", so this is not a fix in the strictest sense.

Btw, do you need my effort to virtualize AMD PerfMonV2 ?

> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>   arch/x86/kvm/cpuid.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b8f8d268d058..1d9ca5726167 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -865,6 +865,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>   		union cpuid10_eax eax;
>   		union cpuid10_edx edx;
>   
> +		if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> +			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> +			break;
> +		}
> +
>   		perf_get_x86_pmu_capability(&cap);
>   
>   		/*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling
  2022-03-22  7:06   ` Like Xu
@ 2022-03-22  8:37     ` Sandipan Das
  0 siblings, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2022-03-22  8:37 UTC (permalink / raw)
  To: Like Xu
  Cc: peterz, bp, dave.hansen, acme, mark.rutland, alexander.shishkin,
	namhyung, jolsa, tglx, mingo, pbonzini, jmattson, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, linux-kernel,
	linux-perf-users, x86


On 3/22/2022 12:36 PM, Like Xu wrote:
> On 17/3/2022 2:28 pm, Sandipan Das wrote:
>> +        val = x86_perf_event_update(event);
> 
> The variable 'val' set but not used.
> 
>> +        mask = BIT_ULL(idx);
>> +
>> +        if (!(status & mask))
> 
> Needed here ?
> 

If you are referring to this previous usage of 'val' within
x86_pmu_handle_irq():

  if (val & (1ULL << (x86_pmu.cntval_bits - 1)))

then, no. Instead of looking at bit 47 of the raw counter value,
one can look at the overflow bits of the global status register
to determine if a counter overflow has occurred.

Will remove 'val' as you suggested since it is no longer needed
for any decision making.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-03-22  8:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17  6:28 [PATCH 0/7] perf/x86/amd/core: Add AMD PerfMonV2 support Sandipan Das
2022-03-17  6:28 ` [PATCH 1/7] x86/cpufeatures: Add PerfMonV2 feature bit Sandipan Das
2022-03-17  6:28 ` [PATCH 2/7] x86/msr: Add PerfCntrGlobal* registers Sandipan Das
2022-03-17 11:25   ` Peter Zijlstra
2022-03-17  6:28 ` [PATCH 3/7] perf/x86/amd/core: Detect PerfMonV2 support Sandipan Das
2022-03-17 11:27   ` Peter Zijlstra
2022-03-17  6:28 ` [PATCH 4/7] perf/x86/amd/core: Detect available counters Sandipan Das
2022-03-17 11:32   ` Peter Zijlstra
2022-03-17  6:28 ` [PATCH 5/7] perf/x86/amd/core: Add PerfMonV2 counter control Sandipan Das
2022-03-17 11:46   ` Peter Zijlstra
2022-03-18  8:02     ` Sandipan Das
2022-03-18 10:52       ` Peter Zijlstra
2022-03-17  6:28 ` [PATCH 6/7] perf/x86/amd/core: Add PerfMonV2 overflow handling Sandipan Das
2022-03-17 12:01   ` Peter Zijlstra
2022-03-17 17:45     ` Stephane Eranian
2022-03-18  8:18     ` Sandipan Das
2022-03-22  7:06   ` Like Xu
2022-03-22  8:37     ` Sandipan Das
2022-03-17  6:28 ` [PATCH 7/7] kvm: x86/cpuid: Fix Architectural Performance Monitoring support Sandipan Das
2022-03-17 12:07   ` Peter Zijlstra
2022-03-18  7:59     ` Sandipan Das
2022-03-22  7:31   ` Like Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.