[PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support
@ 2023-04-10 10:50 Like Xu
  2023-04-10 10:50 ` [PATCH V5 01/10] KVM: x86/pmu: Expose reprogram_counters() in pmu.h Like Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

Starting with Zen4, core PMU on AMD platforms such as Genoa and
Ryzen-7000 will support PerfMonV2, and it is also compatible with
legacy PERFCTR_CORE behavior and MSR addresses.

If you don't have access to the hardware specification, the commits
d6d0c7f681fd..7685665c390d for host perf can also bring a quick
overview. Its main change is the addition of three MSR's equivalent
to Intel V2, namely global_ctrl, global_status, global_status_clear.

It is worth noting that this feature is very attractive for reducing the
overhead of PMU virtualization, since multiple MSR accesses to multiple
counters will be replaced by a single access to the global register,
plus more accuracy gain when multiple guest counters are used.

All related testcases are passed on a Genoa box.
Please feel free to run more tests, add more or share comments.

Patch 0001-0007 could be applied earlier, which may help reduce
the burden on industrious reviewers.

Previous:
https://lore.kernel.org/kvm/20230214050757.9623-1-likexu@tencent.com/

V4 -> V5 Changelog:
- Avoid pronouns in the changelogs and comments; (Sean)
- Drop the assumption that KVM can blindly set v2 without changes; (Sean)
- Grab host CPUID and clear here (instead of setting); (Sean)
- Clarification of behaviours from spec-defined and HW observations; (Sean)
- Drop the use of the intermediate "entry"; (Sean)
- Use BUILD_BUG_ON() to avoid potential null-pointer deref bug; (Sean)
- Add a patch to cap nr_arch_gp_counters in the common flow; (Sean)
- Add sanitize check for pmu->nr_arch_gp_counters; (Sean)
- Rewrite changelogs which doesn't depend on the shortlog; (Sean)
- State what the patch actually does, not "should do"; (Sean)
- Drop the useless multiple line comment; (Sean)
- Apply a better short log; (Sean)
- Drop the performance blurb; (Sean)
- Drop the "The", i.e. just "AMD PerfMonV2 defines ..."; (Sean)
- s/hanlders/handlers; (Sean)
- s/intel/Intel; (Sean)
- Drop useless message on pmc_is_globally_enabled(); (Sean)
- Tweak "return 1" to follow the patterns for other MSR helpers; (Sean)
- Add assumptions about reusing global_ovf_ctrl_mask; (Sean)

Like Xu (10):
  KVM: x86/pmu: Expose reprogram_counters() in pmu.h
  KVM: x86/pmu: Return #GP if user sets the GLOBAL_STATUS reserved bits
  KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic
  KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled
  KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  KVM: x86/pmu: Forget PERFCTR_CORE if the min num of counters isn't met
  KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap
  KVM: x86/cpuid: Add a KVM-only leaf to redirect AMD PerfMonV2 flag
  KVM: x86/svm/pmu: Add AMD PerfMonV2 support
  KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

 arch/x86/include/asm/kvm-x86-pmu-ops.h |  1 -
 arch/x86/kvm/cpuid.c                   | 30 +++++++++-
 arch/x86/kvm/pmu.c                     | 83 +++++++++++++++++++++++---
 arch/x86/kvm/pmu.h                     | 32 +++++++++-
 arch/x86/kvm/reverse_cpuid.h           |  7 +++
 arch/x86/kvm/svm/pmu.c                 | 67 +++++++++++++++------
 arch/x86/kvm/svm/svm.c                 | 19 +++++-
 arch/x86/kvm/vmx/pmu_intel.c           | 32 ++--------
 arch/x86/kvm/x86.c                     | 10 ++++
 9 files changed, 221 insertions(+), 60 deletions(-)


base-commit: dfdeda67ea2dac57d2d7506d65cfe5a0878ad285
-- 
2.40.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V5 01/10] KVM: x86/pmu: Expose reprogram_counters() in pmu.h
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 02/10] KVM: x86/pmu: Return #GP if user sets the GLOBAL_STATUS reserved bits Like Xu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

The optimization stands on its own, whereas the code movement is
justified only by the incoming AMD PMU v2 support.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.h           | 12 ++++++++++++
 arch/x86/kvm/vmx/pmu_intel.c | 12 ------------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5c7bbf03b599..986563aeeef8 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -201,6 +201,18 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
 	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 }
 
+static inline void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
+{
+	int bit;
+
+	if (!diff)
+		return;
+
+	for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
+		set_bit(bit, pmu->reprogram_pmi);
+	kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 741efe2c497b..1f9c3e916a21 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -73,18 +73,6 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 	}
 }
 
-static void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
-{
-	int bit;
-
-	if (!diff)
-		return;
-
-	for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
-		set_bit(bit, pmu->reprogram_pmi);
-	kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
-}
-
 static bool intel_hw_event_available(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 02/10] KVM: x86/pmu: Return #GP if user sets the GLOBAL_STATUS reserved bits
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
  2023-04-10 10:50 ` [PATCH V5 01/10] KVM: x86/pmu: Expose reprogram_counters() in pmu.h Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 03/10] KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic Like Xu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

Return #GP if KVM user space attempts to set a reserved bit for guest.
If the user space sets reserved bits when restoring the MSR_CORE_
PERF_GLOBAL_STATUS register, these bits will be accidentally returned
when the guest runs a read access to this register, and cannot be cleared
up inside the guest, which makes the guest's PMI handler very confused.

Note, reusing global_ovf_ctrl_mask as global_status_mask will be broken
if KVM supports higher versions of Intel arch pmu.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1f9c3e916a21..343b3182b7f4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -399,7 +399,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			reprogram_fixed_counters(pmu, data);
 		break;
 	case MSR_CORE_PERF_GLOBAL_STATUS:
-		if (!msr_info->host_initiated)
+		/*
+		 * Caution, the assumption here is that some of the bits (such as
+		 * ASCI, CTR_FREEZE, and LBR_FREEZE) are not yet supported by KVM.
+		 */
+		if (!msr_info->host_initiated || (data & pmu->global_ovf_ctrl_mask))
 			return 1; /* RO MSR */
 
 		pmu->global_status = data;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 03/10] KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
  2023-04-10 10:50 ` [PATCH V5 01/10] KVM: x86/pmu: Expose reprogram_counters() in pmu.h Like Xu
  2023-04-10 10:50 ` [PATCH V5 02/10] KVM: x86/pmu: Return #GP if user sets the GLOBAL_STATUS reserved bits Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 04/10] KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled Like Xu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

AMD PerfMonV2 defines three registers similar to part of the Intel
v2 PMU registers, including the GLOBAL_CTRL, GLOBAL_STATUS and
GLOBAL_OVF_CTRL MSRs. For better code reuse, this specific part of
the handling can be extracted to make it generic for X86 as a straight
code movement.

Specifically, the kvm_pmu_set/get_msr() handlers of GLOBAL_STATUS,
GLOBAL_CTRL, GLOBAL_OVF_CTRL defined for Intel are moved to generic
pmu.c and the callback function .pmc_is_globally_enabled is removed,
which is very helpful to introduce the AMD PerfMonV2 code later.

The new eponymous pmc_is_globally_enabled() works well as legacy AMD
vPMU version is indexed as 1. Note that the specific *_is_valid_msr will
continue to be used to avoid cross-vendor MSR access.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/include/asm/kvm-x86-pmu-ops.h |  1 -
 arch/x86/kvm/pmu.c                     | 61 ++++++++++++++++++++++----
 arch/x86/kvm/pmu.h                     | 17 ++++++-
 arch/x86/kvm/svm/pmu.c                 |  9 ----
 arch/x86/kvm/vmx/pmu_intel.c           | 14 +-----
 5 files changed, 70 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index c17e3e96fc1d..6c98f4bb4228 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -13,7 +13,6 @@ BUILD_BUG_ON(1)
  * at the call sites.
  */
 KVM_X86_PMU_OP(hw_event_available)
-KVM_X86_PMU_OP(pmc_is_enabled)
 KVM_X86_PMU_OP(pmc_idx_to_pmc)
 KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
 KVM_X86_PMU_OP(msr_idx_to_pmc)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 597a8f8f90b9..69d0a3ae7b45 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -93,11 +93,6 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
 #undef __KVM_X86_PMU_OP
 }
 
-static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
-{
-	return static_call(kvm_x86_pmu_pmc_is_enabled)(pmc);
-}
-
 static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
 {
 	struct kvm_pmu *pmu = container_of(irq_work, struct kvm_pmu, irq_work);
@@ -577,13 +572,63 @@ static void kvm_pmu_mark_pmc_in_use(struct kvm_vcpu *vcpu, u32 msr)
 
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
-	return static_call(kvm_x86_pmu_get_msr)(vcpu, msr_info);
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	u32 msr = msr_info->index;
+
+	switch (msr) {
+	case MSR_CORE_PERF_GLOBAL_STATUS:
+		msr_info->data = pmu->global_status;
+		break;
+	case MSR_CORE_PERF_GLOBAL_CTRL:
+		msr_info->data = pmu->global_ctrl;
+		break;
+	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+		msr_info->data = 0;
+		break;
+	default:
+		return static_call(kvm_x86_pmu_get_msr)(vcpu, msr_info);
+	}
+
+	return 0;
 }
 
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
-	kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index);
-	return static_call(kvm_x86_pmu_set_msr)(vcpu, msr_info);
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	u32 msr = msr_info->index;
+	u64 data = msr_info->data;
+	u64 diff;
+
+	switch (msr) {
+	case MSR_CORE_PERF_GLOBAL_STATUS:
+		if (!msr_info->host_initiated || (data & pmu->global_ovf_ctrl_mask))
+			return 1; /* RO MSR */
+
+		pmu->global_status = data;
+		break;
+	case MSR_CORE_PERF_GLOBAL_CTRL:
+		if (!kvm_valid_perf_global_ctrl(pmu, data))
+			return 1;
+
+		if (pmu->global_ctrl != data) {
+			diff = pmu->global_ctrl ^ data;
+			pmu->global_ctrl = data;
+			reprogram_counters(pmu, diff);
+		}
+		break;
+	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+		if (data & pmu->global_ovf_ctrl_mask)
+			return 1;
+
+		if (!msr_info->host_initiated)
+			pmu->global_status &= ~data;
+		break;
+	default:
+		kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index);
+		return static_call(kvm_x86_pmu_set_msr)(vcpu, msr_info);
+	}
+
+	return 0;
 }
 
 /* refresh PMU settings. This function generally is called when underlying
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 986563aeeef8..dd7c7d4ffe3b 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -20,7 +20,6 @@
 
 struct kvm_pmu_ops {
 	bool (*hw_event_available)(struct kvm_pmc *pmc);
-	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
 		unsigned int idx, u64 *mask);
@@ -213,6 +212,22 @@ static inline void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
 	kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
 }
 
+/*
+ * Check if a PMC is enabled by comparing it against global_ctrl bits.
+ *
+ * If the current version of vPMU doesn't have global_ctrl MSR,
+ * all vPMCs are enabled (return TRUE).
+ */
+static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	if (pmu->version < 2)
+		return true;
+
+	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 5fa939e411d8..70143275e0a7 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -78,14 +78,6 @@ static bool amd_hw_event_available(struct kvm_pmc *pmc)
 	return true;
 }
 
-/* check if a PMC is enabled by comparing it against global_ctrl bits. Because
- * AMD CPU doesn't have global_ctrl MSR, all PMCs are enabled (return TRUE).
- */
-static bool amd_pmc_is_enabled(struct kvm_pmc *pmc)
-{
-	return true;
-}
-
 static bool amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -220,7 +212,6 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.hw_event_available = amd_hw_event_available,
-	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 343b3182b7f4..99d07ccb1869 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -95,17 +95,6 @@ static bool intel_hw_event_available(struct kvm_pmc *pmc)
 	return true;
 }
 
-/* check if a PMC is enabled by comparing it with globl_ctrl bits. */
-static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
-{
-	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-	if (!intel_pmu_has_perf_global_ctrl(pmu))
-		return true;
-
-	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
-}
-
 static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -793,7 +782,7 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
 		pmc = intel_pmc_idx_to_pmc(pmu, bit);
 
 		if (!pmc || !pmc_speculative_in_use(pmc) ||
-		    !intel_pmc_is_enabled(pmc) || !pmc->perf_event)
+		    !pmc_is_globally_enabled(pmc) || !pmc->perf_event)
 			continue;
 
 		/*
@@ -808,7 +797,6 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
 
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.hw_event_available = intel_hw_event_available,
-	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 04/10] KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (2 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 03/10] KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met Like Xu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

Add an explicit !enable_pmu check as relying on kvm_pmu_cap to be
zeroed isn't obvious. Although when !enable_pmu, KVM will have
zero-padded kvm_pmu_cap to do subsequent CPUID leaf assignments.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b736ddb42088..52a7acf2c965 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -970,7 +970,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		union cpuid10_eax eax;
 		union cpuid10_edx edx;
 
-		if (!static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
+		if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (3 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 04/10] KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-11  5:36   ` Jim Mattson
  2023-04-10 10:50 ` [PATCH V5 06/10] KVM: x86/pmu: Forget PERFCTR_CORE if the min " Like Xu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

Disable PMU support when running on AMD and perf reports fewer than four
general purpose counters. All AMD PMUs must define at least four counters
due to AMD's legacy architecture hardcoding the number of counters
without providing a way to enumerate the number of counters to software,
e.g. from AMD's APM:

 The legacy architecture defines four performance counters (PerfCtrn)
 and corresponding event-select registers (PerfEvtSeln).

Virtualizing fewer than four counters can lead to guest instability as
software expects four counters to be available.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index dd7c7d4ffe3b..002b527360f4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 			enable_pmu = false;
 	}
 
+	if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
+		enable_pmu = false;
+
 	if (!enable_pmu) {
 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
 		return;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 06/10] KVM: x86/pmu: Forget PERFCTR_CORE if the min num of counters isn't met
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (4 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 07/10] KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap Like Xu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

A sanity check on the number of counters enumerated by perf is added.
PERFCTR_CORE support is explicitly dropped if the min number isn't met.
E.g. if KVM needs 6 counters and perf says there are 4, then something
is wrong and enumerating 6 to a guest is only going to cause more issues.

Opportunistically, the kvm_cpu_cap_check_and_set() is applied to simplify
the host check before setting the PERFCTR_CORE flag.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/svm/svm.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7584eb85410b..683f1b480fcb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4928,9 +4928,18 @@ static __init void svm_set_cpu_caps(void)
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
 
-	/* AMD PMU PERFCTR_CORE CPUID */
-	if (enable_pmu && boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
-		kvm_cpu_cap_set(X86_FEATURE_PERFCTR_CORE);
+	if (enable_pmu) {
+		/*
+		 * Enumerate support for PERFCTR_CORE if and only if KVM has
+		 * access to enough counters to virtualize "core" support,
+		 * otherwise limit vPMU support to the legacy number of counters.
+		 */
+		if (kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS_CORE)
+			kvm_pmu_cap.num_counters_gp = min(AMD64_NUM_COUNTERS,
+							  kvm_pmu_cap.num_counters_gp);
+		else
+			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE);
+	}
 
 	/* CPUID 0x8000001F (SME/SEV features) */
 	sev_set_cpu_caps();
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 07/10] KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (5 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 06/10] KVM: x86/pmu: Forget PERFCTR_CORE if the min " Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 08/10] KVM: x86/cpuid: Add a KVM-only leaf to redirect AMD PerfMonV2 flag Like Xu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

A sanity check is added to limit the number of AMD guest counters,
which help avoid a situation if KVM only has access to 4 counters, but
user space sets guest X86_FEATURE_PERFCTR_CORE anyways.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/svm/pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 70143275e0a7..825b9cc26ae5 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -170,6 +170,9 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	else
 		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
 
+	pmu->nr_arch_gp_counters = min_t(unsigned int, pmu->nr_arch_gp_counters,
+					 kvm_pmu_cap.num_counters_gp);
+
 	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
 	pmu->reserved_bits = 0xfffffff000280000ull;
 	pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 08/10] KVM: x86/cpuid: Add a KVM-only leaf to redirect AMD PerfMonV2 flag
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (6 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 07/10] KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 09/10] KVM: x86/svm/pmu: Add AMD PerfMonV2 support Like Xu
  2023-04-10 10:50 ` [PATCH V5 10/10] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022 Like Xu
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

A KVM-only leaf for AMD's PerfMonV2 feature flag is defined to redirect
the kernel's scattered version to its architectural location, e.g. so that
KVM can query guest support via guest_cpuid_has().

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/reverse_cpuid.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index a5717282bb9c..56cbdb24400a 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -15,6 +15,7 @@ enum kvm_only_cpuid_leafs {
 	CPUID_12_EAX	 = NCAPINTS,
 	CPUID_7_1_EDX,
 	CPUID_8000_0007_EDX,
+	CPUID_8000_0022_EAX,
 	NR_KVM_CPU_CAPS,
 
 	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
@@ -47,6 +48,9 @@ enum kvm_only_cpuid_leafs {
 /* CPUID level 0x80000007 (EDX). */
 #define KVM_X86_FEATURE_CONSTANT_TSC	KVM_X86_FEATURE(CPUID_8000_0007_EDX, 8)
 
+/* CPUID level 0x80000022 (EAX) */
+#define KVM_X86_FEATURE_PERFMON_V2	KVM_X86_FEATURE(CPUID_8000_0022_EAX, 0)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -74,6 +78,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_7_1_EDX]       = {         7, 1, CPUID_EDX},
 	[CPUID_8000_0007_EDX] = {0x80000007, 0, CPUID_EDX},
 	[CPUID_8000_0021_EAX] = {0x80000021, 0, CPUID_EAX},
+	[CPUID_8000_0022_EAX] = {0x80000022, 0, CPUID_EAX},
 };
 
 /*
@@ -108,6 +113,8 @@ static __always_inline u32 __feature_translate(int x86_feature)
 		return KVM_X86_FEATURE_SGX_EDECCSSA;
 	else if (x86_feature == X86_FEATURE_CONSTANT_TSC)
 		return KVM_X86_FEATURE_CONSTANT_TSC;
+	else if (x86_feature == X86_FEATURE_PERFMON_V2)
+		return KVM_X86_FEATURE_PERFMON_V2;
 
 	return x86_feature;
 }
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 09/10] KVM: x86/svm/pmu: Add AMD PerfMonV2 support
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (7 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 08/10] KVM: x86/cpuid: Add a KVM-only leaf to redirect AMD PerfMonV2 flag Like Xu
@ 2023-04-10 10:50 ` Like Xu
  2023-04-10 10:50 ` [PATCH V5 10/10] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022 Like Xu
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Sandipan Das

From: Like Xu <likexu@tencent.com>

If AMD Performance Monitoring Version 2 (PerfMonV2) is detected by
the guest, it can use a new scheme to manage the Core PMCs using the
new global control and status registers.

In addition to benefiting from the PerfMonV2 functionality in the same
way as the host (higher precision), the guest also can reduce the number
of vm-exits by lowering the total number of MSRs accesses.

In terms of implementation details, amd_is_valid_msr() is resurrected
since three newly added MSRs could not be mapped to one vPMC.
The possibility of emulating PerfMonV2 on the mainframe has also
been eliminated for reasons of precision.

Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c     | 24 +++++++++++++++++-
 arch/x86/kvm/svm/pmu.c | 55 ++++++++++++++++++++++++++++++++++--------
 arch/x86/kvm/x86.c     | 10 ++++++++
 3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 69d0a3ae7b45..7d2678f06863 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -577,11 +577,18 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 	switch (msr) {
 	case MSR_CORE_PERF_GLOBAL_STATUS:
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
 		msr_info->data = pmu->global_status;
 		break;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+		/* Based on the observed HW. */
+		fallthrough;
 	case MSR_CORE_PERF_GLOBAL_CTRL:
 		msr_info->data = pmu->global_ctrl;
 		break;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+		/* Based on the observed HW. */
+		fallthrough;
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		msr_info->data = 0;
 		break;
@@ -599,13 +606,26 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u64 data = msr_info->data;
 	u64 diff;
 
+	/*
+	 * Note, AMD ignores writes to reserved bits and read-only PMU MSRs,
+	 * whereas Intel generates #GP on attempts to write reserved/RO MSRs.
+	 */
 	switch (msr) {
 	case MSR_CORE_PERF_GLOBAL_STATUS:
 		if (!msr_info->host_initiated || (data & pmu->global_ovf_ctrl_mask))
 			return 1; /* RO MSR */
+		fallthrough;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+		/* Per PPR, Read-only MSR. Writes are ignored. */
+		if (!msr_info->host_initiated)
+			break;
 
 		pmu->global_status = data;
 		break;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+		/* Based on the observed HW. */
+		data &= ~pmu->global_ctrl_mask;
+		fallthrough;
 	case MSR_CORE_PERF_GLOBAL_CTRL:
 		if (!kvm_valid_perf_global_ctrl(pmu, data))
 			return 1;
@@ -619,7 +639,9 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		if (data & pmu->global_ovf_ctrl_mask)
 			return 1;
-
+		fallthrough;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+		/* Based on the observed HW. */
 		if (!msr_info->host_initiated)
 			pmu->global_status &= ~data;
 		break;
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 825b9cc26ae5..56607a3f6a47 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -94,12 +94,6 @@ static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 	return amd_pmc_idx_to_pmc(vcpu_to_pmu(vcpu), idx & ~(3u << 30));
 }
 
-static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
-{
-	/* All MSRs refer to exactly one PMC, so msr_idx_to_pmc is enough.  */
-	return false;
-}
-
 static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -111,6 +105,29 @@ static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 	return pmc;
 }
 
+static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	switch (msr) {
+	case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
+		return pmu->version > 0;
+	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
+		return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+		return pmu->version > 1;
+	default:
+		if (msr > MSR_F15H_PERF_CTR5 &&
+		    msr < MSR_F15H_PERF_CTL0 + 2 * pmu->nr_arch_gp_counters)
+			return pmu->version > 1;
+		break;
+	}
+
+	return amd_msr_idx_to_pmc(vcpu, msr);
+}
+
 static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -164,23 +181,39 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	union cpuid_0x80000022_ebx ebx;
 
-	if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
+	pmu->version = 1;
+	if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) {
+		pmu->version = 2;
+		/*
+		 * Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest
+		 * CPUID entry is guaranteed to be non-NULL.
+		 */
+		BUILD_BUG_ON(x86_feature_cpuid(X86_FEATURE_PERFMON_V2).function != 0x80000022 ||
+			     x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index);
+		ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx;
+		pmu->nr_arch_gp_counters = ebx.split.num_core_pmc;
+	} else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
 		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
-	else
+	} else {
 		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
+	}
 
 	pmu->nr_arch_gp_counters = min_t(unsigned int, pmu->nr_arch_gp_counters,
 					 kvm_pmu_cap.num_counters_gp);
 
+	if (pmu->version > 1) {
+		pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
+		pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask;
+	}
+
 	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
 	pmu->reserved_bits = 0xfffffff000280000ull;
 	pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
-	pmu->version = 1;
 	/* not applicable to AMD; but clean them to prevent any fall out */
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->nr_arch_fixed_counters = 0;
-	pmu->global_status = 0;
 	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 }
 
@@ -211,6 +244,8 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 		pmc_stop_counter(pmc);
 		pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
 	}
+
+	pmu->global_ctrl = pmu->global_status = 0;
 }
 
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a86ad45a53b8..a389ebd2ded1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1468,6 +1468,10 @@ static const u32 msrs_to_save_pmu[] = {
 	MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
 	MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
 	MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
+
+	MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+	MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+	MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
 };
 
 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
@@ -7110,6 +7114,12 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		    kvm_pmu_cap.num_counters_fixed)
 			return;
 		break;
+	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+		if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
+			return;
+		break;
 	case MSR_IA32_XFD:
 	case MSR_IA32_XFD_ERR:
 		if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 10/10] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022
  2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
                   ` (8 preceding siblings ...)
  2023-04-10 10:50 ` [PATCH V5 09/10] KVM: x86/svm/pmu: Add AMD PerfMonV2 support Like Xu
@ 2023-04-10 10:50 ` Like Xu
  9 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-04-10 10:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Sandipan Das

From: Like Xu <likexu@tencent.com>

CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new
performance monitoring features for AMD processors.

Bit 0 of EAX indicates support for Performance Monitoring Version 2
(PerfMonV2) features. If found to be set during PMU initialization,
the EBX bits of the same CPUID function can be used to determine
the number of available PMCs for different PMU types.

Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so that
guests can make use of the PerfMonV2 features.

Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/cpuid.c   | 28 +++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c |  4 ++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 52a7acf2c965..9b25036ab042 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -756,6 +756,10 @@ void kvm_set_cpu_caps(void)
 		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */
 	);
 
+	kvm_cpu_cap_init_kvm_defined(CPUID_8000_0022_EAX,
+		F(PERFMON_V2)
+	);
+
 	/*
 	 * Synthesize "LFENCE is serializing" into the AMD-defined entry in
 	 * KVM's supported CPUID if the feature is reported as supported by the
@@ -1150,7 +1154,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->edx = 0;
 		break;
 	case 0x80000000:
-		entry->eax = min(entry->eax, 0x80000021);
+		entry->eax = min(entry->eax, 0x80000022);
 		/*
 		 * Serializing LFENCE is reported in a multitude of ways, and
 		 * NullSegClearsBase is not reported in CPUID on Zen2; help
@@ -1255,6 +1259,28 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->ebx = entry->ecx = entry->edx = 0;
 		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
 		break;
+	/* AMD Extended Performance Monitoring and Debug */
+	case 0x80000022: {
+		union cpuid_0x80000022_ebx ebx;
+
+		entry->ecx = entry->edx = 0;
+		if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
+			entry->eax = entry->ebx;
+			break;
+		}
+
+		cpuid_entry_override(entry, CPUID_8000_0022_EAX);
+
+		if (kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
+			ebx.split.num_core_pmc = kvm_pmu_cap.num_counters_gp;
+		else if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
+			ebx.split.num_core_pmc = AMD64_NUM_COUNTERS_CORE;
+		else
+			ebx.split.num_core_pmc = AMD64_NUM_COUNTERS;
+
+		entry->ebx = ebx.full;
+		break;
+	}
 	/*Add support for Centaur's CPUID instruction*/
 	case 0xC0000000:
 		/*Just support up to 0xC0000004 now*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 683f1b480fcb..dc1a9104c274 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4939,6 +4939,10 @@ static __init void svm_set_cpu_caps(void)
 							  kvm_pmu_cap.num_counters_gp);
 		else
 			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE);
+
+		if (kvm_pmu_cap.version != 2 ||
+		    !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
+			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2);
 	}
 
 	/* CPUID 0x8000001F (SME/SEV features) */
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-10 10:50 ` [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met Like Xu
@ 2023-04-11  5:36   ` Jim Mattson
  2023-04-11  6:17     ` Like Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2023-04-11  5:36 UTC (permalink / raw)
  To: Like Xu; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> Disable PMU support when running on AMD and perf reports fewer than four
> general purpose counters. All AMD PMUs must define at least four counters
> due to AMD's legacy architecture hardcoding the number of counters
> without providing a way to enumerate the number of counters to software,
> e.g. from AMD's APM:
>
>  The legacy architecture defines four performance counters (PerfCtrn)
>  and corresponding event-select registers (PerfEvtSeln).
>
> Virtualizing fewer than four counters can lead to guest instability as
> software expects four counters to be available.

I'm confused. Isn't zero less than four?

> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/pmu.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index dd7c7d4ffe3b..002b527360f4 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>                         enable_pmu = false;
>         }
>
> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> +               enable_pmu = false;
> +
>         if (!enable_pmu) {
>                 memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>                 return;
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-11  5:36   ` Jim Mattson
@ 2023-04-11  6:17     ` Like Xu
  2023-04-11 12:58       ` Jim Mattson
  0 siblings, 1 reply; 19+ messages in thread
From: Like Xu @ 2023-04-11  6:17 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On 11/4/2023 1:36 pm, Jim Mattson wrote:
> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> Disable PMU support when running on AMD and perf reports fewer than four
>> general purpose counters. All AMD PMUs must define at least four counters
>> due to AMD's legacy architecture hardcoding the number of counters
>> without providing a way to enumerate the number of counters to software,
>> e.g. from AMD's APM:
>>
>>   The legacy architecture defines four performance counters (PerfCtrn)
>>   and corresponding event-select registers (PerfEvtSeln).
>>
>> Virtualizing fewer than four counters can lead to guest instability as
>> software expects four counters to be available.
> 
> I'm confused. Isn't zero less than four?

As I understand it, you are saying that virtualization of zero counter is also 
reasonable.
If so, the above statement could be refined as:

	Virtualizing fewer than four counters when vPMU is enabled may lead to guest 
instability
	as software expects at least four counters to be available, thus the vPMU is 
disabled if the
	minimum number of KVM supported counters is not reached during initialization.

Jim, does this help you or could you explain more about your confusion ?

> 
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
>>   arch/x86/kvm/pmu.h | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>> index dd7c7d4ffe3b..002b527360f4 100644
>> --- a/arch/x86/kvm/pmu.h
>> +++ b/arch/x86/kvm/pmu.h
>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>                          enable_pmu = false;
>>          }
>>
>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
>> +               enable_pmu = false;
>> +
>>          if (!enable_pmu) {
>>                  memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>                  return;
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-11  6:17     ` Like Xu
@ 2023-04-11 12:58       ` Jim Mattson
  2023-04-11 13:17         ` Like Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2023-04-11 12:58 UTC (permalink / raw)
  To: Like Xu; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 11/4/2023 1:36 pm, Jim Mattson wrote:
> > On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> From: Like Xu <likexu@tencent.com>
> >>
> >> Disable PMU support when running on AMD and perf reports fewer than four
> >> general purpose counters. All AMD PMUs must define at least four counters
> >> due to AMD's legacy architecture hardcoding the number of counters
> >> without providing a way to enumerate the number of counters to software,
> >> e.g. from AMD's APM:
> >>
> >>   The legacy architecture defines four performance counters (PerfCtrn)
> >>   and corresponding event-select registers (PerfEvtSeln).
> >>
> >> Virtualizing fewer than four counters can lead to guest instability as
> >> software expects four counters to be available.
> >
> > I'm confused. Isn't zero less than four?
>
> As I understand it, you are saying that virtualization of zero counter is also
> reasonable.
> If so, the above statement could be refined as:
>
>         Virtualizing fewer than four counters when vPMU is enabled may lead to guest
> instability
>         as software expects at least four counters to be available, thus the vPMU is
> disabled if the
>         minimum number of KVM supported counters is not reached during initialization.
>
> Jim, does this help you or could you explain more about your confusion ?

You say that "fewer than four counters can lead to guest instability
as software expects four counters to be available." Your solution is
to disable the PMU, which leaves zero counters available. Zero is less
than four. Hence, by your claim, disabling the PMU can lead to guest
instability. I don't see how this is an improvement over one, two, or
three counters.

> >
> >> Suggested-by: Sean Christopherson <seanjc@google.com>
> >> Signed-off-by: Like Xu <likexu@tencent.com>
> >> ---
> >>   arch/x86/kvm/pmu.h | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> >> index dd7c7d4ffe3b..002b527360f4 100644
> >> --- a/arch/x86/kvm/pmu.h
> >> +++ b/arch/x86/kvm/pmu.h
> >> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
> >>                          enable_pmu = false;
> >>          }
> >>
> >> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> >> +               enable_pmu = false;
> >> +
> >>          if (!enable_pmu) {
> >>                  memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> >>                  return;
> >> --
> >> 2.40.0
> >>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-11 12:58       ` Jim Mattson
@ 2023-04-11 13:17         ` Like Xu
  2023-04-11 14:58           ` Jim Mattson
  0 siblings, 1 reply; 19+ messages in thread
From: Like Xu @ 2023-04-11 13:17 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On 11/4/2023 8:58 pm, Jim Mattson wrote:
> On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 11/4/2023 1:36 pm, Jim Mattson wrote:
>>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>
>>>> From: Like Xu <likexu@tencent.com>
>>>>
>>>> Disable PMU support when running on AMD and perf reports fewer than four
>>>> general purpose counters. All AMD PMUs must define at least four counters
>>>> due to AMD's legacy architecture hardcoding the number of counters
>>>> without providing a way to enumerate the number of counters to software,
>>>> e.g. from AMD's APM:
>>>>
>>>>    The legacy architecture defines four performance counters (PerfCtrn)
>>>>    and corresponding event-select registers (PerfEvtSeln).
>>>>
>>>> Virtualizing fewer than four counters can lead to guest instability as
>>>> software expects four counters to be available.
>>>
>>> I'm confused. Isn't zero less than four?
>>
>> As I understand it, you are saying that virtualization of zero counter is also
>> reasonable.
>> If so, the above statement could be refined as:
>>
>>          Virtualizing fewer than four counters when vPMU is enabled may lead to guest
>> instability
>>          as software expects at least four counters to be available, thus the vPMU is
>> disabled if the
>>          minimum number of KVM supported counters is not reached during initialization.
>>
>> Jim, does this help you or could you explain more about your confusion ?
> 
> You say that "fewer than four counters can lead to guest instability
> as software expects four counters to be available." Your solution is
> to disable the PMU, which leaves zero counters available. Zero is less
> than four. Hence, by your claim, disabling the PMU can lead to guest
> instability. I don't see how this is an improvement over one, two, or
> three counters.

As you know, AMD pmu lacks an architected method (such as CPUID) to
indicate that the VM does not have any pmu counters available for the
current platform. Guests like Linux tend to check if their first counters
exist and work properly to infer that other pmu counters exist.

If KVM chooses to emulate greater than 1 less than 4 counters, then the
AMD guest PMU agent may assume that there are legacy 4 counters all
present (it's what the APM specifies), which requires the legacy code
to add #GP error handling for counters that should exist but actually not.

So at Sean's suggestion, we took a conservative approach. If KVM detects
less than 4 counters, we think KVM (under the current configuration and
platform) is not capable of emulating the most basic AMD pmu capability.
A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3.

Does this help you ? I wouldn't mind a better move.

> 
>>>
>>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>>> Signed-off-by: Like Xu <likexu@tencent.com>
>>>> ---
>>>>    arch/x86/kvm/pmu.h | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>>>> index dd7c7d4ffe3b..002b527360f4 100644
>>>> --- a/arch/x86/kvm/pmu.h
>>>> +++ b/arch/x86/kvm/pmu.h
>>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>>>                           enable_pmu = false;
>>>>           }
>>>>
>>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
>>>> +               enable_pmu = false;
>>>> +
>>>>           if (!enable_pmu) {
>>>>                   memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>>>                   return;
>>>> --
>>>> 2.40.0
>>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-11 13:17         ` Like Xu
@ 2023-04-11 14:58           ` Jim Mattson
  2023-04-19  9:34             ` Like Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2023-04-11 14:58 UTC (permalink / raw)
  To: Like Xu; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 11/4/2023 8:58 pm, Jim Mattson wrote:
> > On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> On 11/4/2023 1:36 pm, Jim Mattson wrote:
> >>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>>>
> >>>> From: Like Xu <likexu@tencent.com>
> >>>>
> >>>> Disable PMU support when running on AMD and perf reports fewer than four
> >>>> general purpose counters. All AMD PMUs must define at least four counters
> >>>> due to AMD's legacy architecture hardcoding the number of counters
> >>>> without providing a way to enumerate the number of counters to software,
> >>>> e.g. from AMD's APM:
> >>>>
> >>>>    The legacy architecture defines four performance counters (PerfCtrn)
> >>>>    and corresponding event-select registers (PerfEvtSeln).
> >>>>
> >>>> Virtualizing fewer than four counters can lead to guest instability as
> >>>> software expects four counters to be available.
> >>>
> >>> I'm confused. Isn't zero less than four?
> >>
> >> As I understand it, you are saying that virtualization of zero counter is also
> >> reasonable.
> >> If so, the above statement could be refined as:
> >>
> >>          Virtualizing fewer than four counters when vPMU is enabled may lead to guest
> >> instability
> >>          as software expects at least four counters to be available, thus the vPMU is
> >> disabled if the
> >>          minimum number of KVM supported counters is not reached during initialization.
> >>
> >> Jim, does this help you or could you explain more about your confusion ?
> >
> > You say that "fewer than four counters can lead to guest instability
> > as software expects four counters to be available." Your solution is
> > to disable the PMU, which leaves zero counters available. Zero is less
> > than four. Hence, by your claim, disabling the PMU can lead to guest
> > instability. I don't see how this is an improvement over one, two, or
> > three counters.
>
> As you know, AMD pmu lacks an architected method (such as CPUID) to
> indicate that the VM does not have any pmu counters available for the
> current platform. Guests like Linux tend to check if their first counters
> exist and work properly to infer that other pmu counters exist.

"Guests like Linux," or just Linux? What do you mean by "tend"? When
do they perform this check, and when do they not?

> If KVM chooses to emulate greater than 1 less than 4 counters, then the
> AMD guest PMU agent may assume that there are legacy 4 counters all
> present (it's what the APM specifies), which requires the legacy code
> to add #GP error handling for counters that should exist but actually not.

I would argue that regardless of the number of counters emulated, a
guest PMU agent may assume that the 4 legacy counters are present,
since that's what the APM specifies.

> So at Sean's suggestion, we took a conservative approach. If KVM detects
> less than 4 counters, we think KVM (under the current configuration and
> platform) is not capable of emulating the most basic AMD pmu capability.
> A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3

Which specific guest operating systems is this change intended for?

> Does this help you ? I wouldn't mind a better move.

Which AMD platforms have less than 4 counters available?

>
> >
> >>>
> >>>> Suggested-by: Sean Christopherson <seanjc@google.com>
> >>>> Signed-off-by: Like Xu <likexu@tencent.com>
> >>>> ---
> >>>>    arch/x86/kvm/pmu.h | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> >>>> index dd7c7d4ffe3b..002b527360f4 100644
> >>>> --- a/arch/x86/kvm/pmu.h
> >>>> +++ b/arch/x86/kvm/pmu.h
> >>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
> >>>>                           enable_pmu = false;
> >>>>           }
> >>>>
> >>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)

Does this actually guarantee that the requisite number of counters are
available and will always be available while the guest is running?
What happens if some other client of the host perf subsystem requests
a CPU-pinned counter after this checck?

> >>>> +               enable_pmu = false;
> >>>> +
> >>>>           if (!enable_pmu) {
> >>>>                   memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> >>>>                   return;
> >>>> --
> >>>> 2.40.0
> >>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-11 14:58           ` Jim Mattson
@ 2023-04-19  9:34             ` Like Xu
  2023-05-24 23:37               ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Like Xu @ 2023-04-19  9:34 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

Jim, sorry for the late reply.

On 11/4/2023 10:58 pm, Jim Mattson wrote:
> On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 11/4/2023 8:58 pm, Jim Mattson wrote:
>>> On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>
>>>> On 11/4/2023 1:36 pm, Jim Mattson wrote:
>>>>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>>>
>>>>>> From: Like Xu <likexu@tencent.com>
>>>>>>
>>>>>> Disable PMU support when running on AMD and perf reports fewer than four
>>>>>> general purpose counters. All AMD PMUs must define at least four counters
>>>>>> due to AMD's legacy architecture hardcoding the number of counters
>>>>>> without providing a way to enumerate the number of counters to software,
>>>>>> e.g. from AMD's APM:
>>>>>>
>>>>>>     The legacy architecture defines four performance counters (PerfCtrn)
>>>>>>     and corresponding event-select registers (PerfEvtSeln).
>>>>>>
>>>>>> Virtualizing fewer than four counters can lead to guest instability as
>>>>>> software expects four counters to be available.
>>>>>
>>>>> I'm confused. Isn't zero less than four?
>>>>
>>>> As I understand it, you are saying that virtualization of zero counter is also
>>>> reasonable.
>>>> If so, the above statement could be refined as:
>>>>
>>>>           Virtualizing fewer than four counters when vPMU is enabled may lead to guest
>>>> instability
>>>>           as software expects at least four counters to be available, thus the vPMU is
>>>> disabled if the
>>>>           minimum number of KVM supported counters is not reached during initialization.
>>>>
>>>> Jim, does this help you or could you explain more about your confusion ?
>>>
>>> You say that "fewer than four counters can lead to guest instability
>>> as software expects four counters to be available." Your solution is
>>> to disable the PMU, which leaves zero counters available. Zero is less
>>> than four. Hence, by your claim, disabling the PMU can lead to guest
>>> instability. I don't see how this is an improvement over one, two, or
>>> three counters.
>>
>> As you know, AMD pmu lacks an architected method (such as CPUID) to
>> indicate that the VM does not have any pmu counters available for the
>> current platform. Guests like Linux tend to check if their first counters
>> exist and work properly to infer that other pmu counters exist.
> 
> "Guests like Linux," or just Linux? What do you mean by "tend"? When
> do they perform this check, and when do they not?

We do not know how guests that do not disclose their source code
will detect the presence of pmu counters.

For upstream Linux guests, such a check is implemented in the check_hw_exists(),
which checks the counters one by one, often with an error on the first counter,
and then disables pmu from the kernel perspective.

The key point is that the KVM implementation cannot rely on assumptions about
the guest kernel version, and considering that the above check was added very early,
existing Linux guest instances will most likely (tend to) check the first 
counter and
error out (a VM could also check all of the possible counters and use a bitmap with
holes to track any functional counters).

> 
>> If KVM chooses to emulate greater than 1 less than 4 counters, then the
>> AMD guest PMU agent may assume that there are legacy 4 counters all
>> present (it's what the APM specifies), which requires the legacy code
>> to add #GP error handling for counters that should exist but actually not.
> 
> I would argue that regardless of the number of counters emulated, a
> guest PMU agent may assume that the 4 legacy counters are present,
> since that's what the APM specifies.

I certainly agree that, for example, a particular cpu model is stated in the spec
to have certain features (e.g. uncore pmu), but the KVM does not or chooses
not ro emulate them, for security reasons (e.g. side channel attacks), which
does violate the defined behavior of the hardware spec, such as here where
enable_pmu is false, which is not possible on almost all real hardware today.

> 
>> So at Sean's suggestion, we took a conservative approach. If KVM detects
>> less than 4 counters, we think KVM (under the current configuration and
>> platform) is not capable of emulating the most basic AMD pmu capability.
>> A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3
> 
> Which specific guest operating systems is this change intended for?
> 
>> Does this help you ? I wouldn't mind a better move.
> 
> Which AMD platforms have less than 4 counters available?

All this is for L2 Linux guest, as pmu on L1 Linux guest will be disabled by L0.

> 
>>
>>>
>>>>>
>>>>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>>>>> Signed-off-by: Like Xu <likexu@tencent.com>
>>>>>> ---
>>>>>>     arch/x86/kvm/pmu.h | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>>>>>> index dd7c7d4ffe3b..002b527360f4 100644
>>>>>> --- a/arch/x86/kvm/pmu.h
>>>>>> +++ b/arch/x86/kvm/pmu.h
>>>>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>>>>>                            enable_pmu = false;
>>>>>>            }
>>>>>>
>>>>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> 
> Does this actually guarantee that the requisite number of counters are
> available and will always be available while the guest is running?

Not 100%, the scheduling of physical counters depends on the host perf scheduler.

I noticed that many cloud vendors want to make sure that hardware resources
are given exclusively to VMs, but for upstream, the availability of resources
should depend entirely on the host administrators, and a VMM should take away
access to resources at any time, such as vcpu time slice.

Any attempts in the direction of exclusive use will be thwarted.

> What happens if some other client of the host perf subsystem requests
> a CPU-pinned counter after this checck?

Normal perf use does not grab the counters allocated for kvm, NMI-watchdog
maybe one, but it will be moved to other timer hardware like HPET.

Of interest is that some ebpf programs that access the pmu hardware directly
use the interface that perf sub-system presents to KVM in the kernel.

> 
>>>>>> +               enable_pmu = false;
>>>>>> +
>>>>>>            if (!enable_pmu) {
>>>>>>                    memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>>>>>                    return;
>>>>>> --
>>>>>> 2.40.0
>>>>>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-04-19  9:34             ` Like Xu
@ 2023-05-24 23:37               ` Sean Christopherson
  2023-05-29 14:25                 ` Like Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2023-05-24 23:37 UTC (permalink / raw)
  To: Like Xu; +Cc: Jim Mattson, Paolo Bonzini, kvm, linux-kernel

On Wed, Apr 19, 2023, Like Xu wrote:
> Jim, sorry for the late reply.
> 
> On 11/4/2023 10:58 pm, Jim Mattson wrote:
> > > > > Jim, does this help you or could you explain more about your confusion ?
> > > > 
> > > > You say that "fewer than four counters can lead to guest instability
> > > > as software expects four counters to be available." Your solution is
> > > > to disable the PMU, which leaves zero counters available. Zero is less
> > > > than four. Hence, by your claim, disabling the PMU can lead to guest
> > > > instability. I don't see how this is an improvement over one, two, or
> > > > three counters.

KVM can't do the right thing regardless.  I would rather have KVM explicitly tell
userspace via that it can't support a vPMU than to carry on with a bogus and
unexpected setup.

> > Does this actually guarantee that the requisite number of counters are
> > available and will always be available while the guest is running?
> 
> Not 100%, the scheduling of physical counters depends on the host perf scheduler.

Or put differently, the same thing that happens on Intel.  kvm_pmu_cap.num_counters_gp
is the number of counters reported by perf when KVM loads, i.e. barring oddities,
it's the number of counters present in the host.  Most importantly, if perf doesn't
find the expected number of counters, perf will bail and use software only events,
and then clear all of x86_pmu.

In other words, KVM's new sanity *should* be a nop with respect to current
behavior.  If we're concerned about "unnecessarily" hiding the PMU when there are
1-3 counters, I'd be ok with a WARN_ON_ONCE().

Actually, looking more closely, there's unaddressed feedback from v4[*].  Folding
that in, we can enable the sanity check for both Intel and AMD, though that's a
bit of a lie since Intel will be '1'.  But the code looks pretty!

	if (enable_pmu) {
		perf_get_x86_pmu_capability(&kvm_pmu_cap);

		/*
		 * WARN if perf did NOT disable hardware PMU if the number of
		 * architecturally required GP counters aren't present, i.e. if
		 * there are a non-zero number of counters, but fewer than what
		 * is architecturally required.
		 */
		if (!kvm_pmu_cap.num_counters_gp ||
		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS))
			enable_pmu = false;
		else if (is_intel && !kvm_pmu_cap.version)
			enable_pmu = false;
	}

	if (!enable_pmu) {
		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
		return;
	}

[*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met
  2023-05-24 23:37               ` Sean Christopherson
@ 2023-05-29 14:25                 ` Like Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Like Xu @ 2023-05-29 14:25 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Jim Mattson, Paolo Bonzini, kvm, linux-kernel

On 25/5/2023 7:37 am, Sean Christopherson wrote:
> On Wed, Apr 19, 2023, Like Xu wrote:
>> Jim, sorry for the late reply.
>>
>> On 11/4/2023 10:58 pm, Jim Mattson wrote:
>>>>>> Jim, does this help you or could you explain more about your confusion ?
>>>>>
>>>>> You say that "fewer than four counters can lead to guest instability
>>>>> as software expects four counters to be available." Your solution is
>>>>> to disable the PMU, which leaves zero counters available. Zero is less
>>>>> than four. Hence, by your claim, disabling the PMU can lead to guest
>>>>> instability. I don't see how this is an improvement over one, two, or
>>>>> three counters.
> 
> KVM can't do the right thing regardless.  I would rather have KVM explicitly tell
> userspace via that it can't support a vPMU than to carry on with a bogus and
> unexpected setup.
> 
>>> Does this actually guarantee that the requisite number of counters are
>>> available and will always be available while the guest is running?
>>
>> Not 100%, the scheduling of physical counters depends on the host perf scheduler.
> 
> Or put differently, the same thing that happens on Intel.  kvm_pmu_cap.num_counters_gp
> is the number of counters reported by perf when KVM loads, i.e. barring oddities,
> it's the number of counters present in the host.  Most importantly, if perf doesn't
> find the expected number of counters, perf will bail and use software only events,
> and then clear all of x86_pmu.
> 
> In other words, KVM's new sanity *should* be a nop with respect to current
> behavior.  If we're concerned about "unnecessarily" hiding the PMU when there are
> 1-3 counters, I'd be ok with a WARN_ON_ONCE().
> 
> Actually, looking more closely, there's unaddressed feedback from v4[*].  Folding
> that in, we can enable the sanity check for both Intel and AMD, though that's a
> bit of a lie since Intel will be '1'.  But the code looks pretty!

The below diff looks good to me. Please confirm that the other patches are in 
good shape
so that we can iterate on the next version. Thanks.

> 
> 	if (enable_pmu) {
> 		perf_get_x86_pmu_capability(&kvm_pmu_cap);
> 
> 		/*
> 		 * WARN if perf did NOT disable hardware PMU if the number of
> 		 * architecturally required GP counters aren't present, i.e. if
> 		 * there are a non-zero number of counters, but fewer than what
> 		 * is architecturally required.
> 		 */
> 		if (!kvm_pmu_cap.num_counters_gp ||
> 		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS))
> 			enable_pmu = false;
> 		else if (is_intel && !kvm_pmu_cap.version)
> 			enable_pmu = false;
> 	}
> 
> 	if (!enable_pmu) {
> 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> 		return;
> 	}
> 
> [*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-29 14:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-10 10:50 [PATCH V5 00/10] KVM: x86: Add AMD Guest PerfMonV2 PMU support Like Xu
2023-04-10 10:50 ` [PATCH V5 01/10] KVM: x86/pmu: Expose reprogram_counters() in pmu.h Like Xu
2023-04-10 10:50 ` [PATCH V5 02/10] KVM: x86/pmu: Return #GP if user sets the GLOBAL_STATUS reserved bits Like Xu
2023-04-10 10:50 ` [PATCH V5 03/10] KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic Like Xu
2023-04-10 10:50 ` [PATCH V5 04/10] KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled Like Xu
2023-04-10 10:50 ` [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met Like Xu
2023-04-11  5:36   ` Jim Mattson
2023-04-11  6:17     ` Like Xu
2023-04-11 12:58       ` Jim Mattson
2023-04-11 13:17         ` Like Xu
2023-04-11 14:58           ` Jim Mattson
2023-04-19  9:34             ` Like Xu
2023-05-24 23:37               ` Sean Christopherson
2023-05-29 14:25                 ` Like Xu
2023-04-10 10:50 ` [PATCH V5 06/10] KVM: x86/pmu: Forget PERFCTR_CORE if the min " Like Xu
2023-04-10 10:50 ` [PATCH V5 07/10] KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap Like Xu
2023-04-10 10:50 ` [PATCH V5 08/10] KVM: x86/cpuid: Add a KVM-only leaf to redirect AMD PerfMonV2 flag Like Xu
2023-04-10 10:50 ` [PATCH V5 09/10] KVM: x86/svm/pmu: Add AMD PerfMonV2 support Like Xu
2023-04-10 10:50 ` [PATCH V5 10/10] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022 Like Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).