kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes
@ 2022-03-02 11:13 Like Xu
  2022-03-02 11:13 ` [PATCH v2 01/12] KVM: x86/pmu: Update comments for AMD gp counters Like Xu
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

Hi,

This is a follow up to [0]. By keeping the same semantics of eventsel
for gp and fixed counters, the reprogram code could be made more
symmetrical, simpler and even faster [1], and it also fixes the
obsolescence amd_event_mapping issue [2].

One of the notable changes is that we ended up removing the
reprogram_{gp, fixed}_counter() functions and replacing it with the
merged reprogram_counter(), where KVM programs pmc->perf_event
with only the PERF_TYPE_RAW type for any type of counter
(suggested by Jim as well).  PeterZ confirmed the idea, "think so;
the HARDWARE is just a convenience wrapper over RAW IIRC". 

Practically, this change drops the guest pmu support on the hosts without
X86_FEATURE_ARCH_PERFMON  (the oldest Pentium 4), where the
PERF_TYPE_HARDWAR is intentionally introduced so that hosts can
map the architectural guest PMU events to their own.

Some code refactoring helps to review key changes more easily.
Patches are based on top of kvm/master (ec756e40e271).

The 11nd patch removes the call trace in the commit message while we still
think that kvm->arch.pmu_event_filter requires SRCU protection in terms
of pmu_event_filter functionality, similar to "kvm->arch.msr_filter".

Patch 0012 is trying to expose some reserved bits for Milan use only, and
the related discussion can be seen here [3], thanks to Jim and Ravikumar.

Please check more details in each commit and feel free to comment.

[0] https://lore.kernel.org/kvm/20211116122030.4698-1-likexu@tencent.com/
[1] https://lore.kernel.org/kvm/ebfac3c7-fbc6-78a5-50c5-005ea11cc6ca@gmail.com/
[2] https://lore.kernel.org/kvm/20220117085307.93030-1-likexu@tencent.com/
[3] https://lore.kernel.org/kvm/c7b418f5-7014-d322-ea47-2d4ee9c2748c@gmail.com/

Thanks,

v1: https://lore.kernel.org/kvm/20220221115201.22208-1-likexu@tencent.com/
v1 -> v2 Changelog:
- Get all the vPMC tests I have on hand to pass;
- Update obsolete AMD PMU comments; (Jim)
- Fix my carelessness for [-Wconstant-logical-operand] in 0009; (0day)
- Add "u64 new_config" to reuse perf_event for fixed CTRs; (0day)
- Add patch 0012 to attract attention to review;

Like Xu (12):
  KVM: x86/pmu: Update comments for AMD gp counters
  KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics
  KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()
  KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()
  KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()
  KVM: x86/pmu: Use only the uniformly exported interface
    reprogram_counter()
  KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,
    fixed}counter()
  perf: x86/core: Add interface to query perfmon_event_map[] directly
  KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()
  KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context
  KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU
  KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292

 arch/x86/events/core.c            |  11 ++
 arch/x86/include/asm/perf_event.h |   6 +
 arch/x86/kvm/pmu.c                | 182 ++++++++++++------------------
 arch/x86/kvm/pmu.h                |   6 +-
 arch/x86/kvm/svm/pmu.c            |  57 ++++------
 arch/x86/kvm/vmx/pmu_intel.c      |  64 ++++++-----
 6 files changed, 149 insertions(+), 177 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/12] KVM: x86/pmu: Update comments for AMD gp counters
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 02/12] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics Like Xu
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

The obsolete comment could more accurately state that AMD platforms
have two base MSR addresses and two different maximum numbers
for gp counters, depending on the X86_FEATURE_PERFCTR_CORE feature.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b1a02993782b..3f09af678b2c 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -34,7 +34,9 @@
  *   However AMD doesn't support fixed-counters;
  * - There are three types of index to access perf counters (PMC):
  *     1. MSR (named msr): For example Intel has MSR_IA32_PERFCTRn and AMD
- *        has MSR_K7_PERFCTRn.
+ *        has MSR_K7_PERFCTRn and, for families 15H and later,
+ *        MSR_F15H_PERF_CTRn, where MSR_F15H_PERF_CTR[0-3] are
+ *        aliased to MSR_K7_PERFCTRn.
  *     2. MSR Index (named idx): This normally is used by RDPMC instruction.
  *        For instance AMD RDPMC instruction uses 0000_0003h in ECX to access
  *        C001_0007h (MSR_K7_PERCTR3). Intel has a similar mechanism, except
@@ -46,7 +48,8 @@
  *        between pmc and perf counters is as the following:
  *        * Intel: [0 .. INTEL_PMC_MAX_GENERIC-1] <=> gp counters
  *                 [INTEL_PMC_IDX_FIXED .. INTEL_PMC_IDX_FIXED + 2] <=> fixed
- *        * AMD:   [0 .. AMD64_NUM_COUNTERS-1] <=> gp counters
+ *        * AMD:   [0 .. AMD64_NUM_COUNTERS-1] and, for families 15H
+ *          and later, [0 .. AMD64_NUM_COUNTERS_CORE-1] <=> gp counters
  */
 
 static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/12] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
  2022-03-02 11:13 ` [PATCH v2 01/12] KVM: x86/pmu: Update comments for AMD gp counters Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 03/12] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter() Like Xu
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Checking the kvm->arch.pmu_event_filter policy in both gp and fixed
code paths was somewhat redundant, so common parts can be extracted,
which reduces code footprint and improves readability.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/pmu.c | 61 +++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3f09af678b2c..fda963161951 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -182,13 +182,43 @@ static int cmp_u64(const void *a, const void *b)
 	return *(__u64 *)a - *(__u64 *)b;
 }
 
+static bool check_pmu_event_filter(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu_event_filter *filter;
+	struct kvm *kvm = pmc->vcpu->kvm;
+	bool allow_event = true;
+	__u64 key;
+	int idx;
+
+	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
+	if (!filter)
+		goto out;
+
+	if (pmc_is_gp(pmc)) {
+		key = pmc->eventsel & AMD64_RAW_EVENT_MASK_NB;
+		if (bsearch(&key, filter->events, filter->nevents,
+			    sizeof(__u64), cmp_u64))
+			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
+		else
+			allow_event = filter->action == KVM_PMU_EVENT_DENY;
+	} else {
+		idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+		if (filter->action == KVM_PMU_EVENT_DENY &&
+		    test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+			allow_event = false;
+		if (filter->action == KVM_PMU_EVENT_ALLOW &&
+		    !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
+			allow_event = false;
+	}
+
+out:
+	return allow_event;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 {
 	u64 config;
 	u32 type = PERF_TYPE_RAW;
-	struct kvm *kvm = pmc->vcpu->kvm;
-	struct kvm_pmu_event_filter *filter;
-	bool allow_event = true;
 
 	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
 		printk_once("kvm pmu: pin control bit is ignored\n");
@@ -200,17 +230,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
 		return;
 
-	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
-	if (filter) {
-		__u64 key = eventsel & AMD64_RAW_EVENT_MASK_NB;
-
-		if (bsearch(&key, filter->events, filter->nevents,
-			    sizeof(__u64), cmp_u64))
-			allow_event = filter->action == KVM_PMU_EVENT_ALLOW;
-		else
-			allow_event = filter->action == KVM_PMU_EVENT_DENY;
-	}
-	if (!allow_event)
+	if (!check_pmu_event_filter(pmc))
 		return;
 
 	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
@@ -245,23 +265,14 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 {
 	unsigned en_field = ctrl & 0x3;
 	bool pmi = ctrl & 0x8;
-	struct kvm_pmu_event_filter *filter;
-	struct kvm *kvm = pmc->vcpu->kvm;
 
 	pmc_pause_counter(pmc);
 
 	if (!en_field || !pmc_is_enabled(pmc))
 		return;
 
-	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
-	if (filter) {
-		if (filter->action == KVM_PMU_EVENT_DENY &&
-		    test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			return;
-		if (filter->action == KVM_PMU_EVENT_ALLOW &&
-		    !test_bit(idx, (ulong *)&filter->fixed_counter_bitmap))
-			return;
-	}
+	if (!check_pmu_event_filter(pmc))
+		return;
 
 	if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
 		return;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/12] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
  2022-03-02 11:13 ` [PATCH v2 01/12] KVM: x86/pmu: Update comments for AMD gp counters Like Xu
  2022-03-02 11:13 ` [PATCH v2 02/12] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 04/12] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter() Like Xu
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Passing the reference "struct kvm_pmc *pmc" when creating
pmc->perf_event is sufficient. This change helps to simplify the
calling convention by replacing reprogram_{gp, fixed}_counter()
with reprogram_counter() seamlessly.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 17 +++++------------
 arch/x86/kvm/pmu.h           |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++--------------
 3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index fda963161951..0ce33a2798cd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -288,18 +288,13 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 }
 EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
 
-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
+void reprogram_counter(struct kvm_pmc *pmc)
 {
-	struct kvm_pmc *pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, pmc_idx);
-
-	if (!pmc)
-		return;
-
 	if (pmc_is_gp(pmc))
 		reprogram_gp_counter(pmc, pmc->eventsel);
 	else {
-		int idx = pmc_idx - INTEL_PMC_IDX_FIXED;
-		u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
+		int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+		u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
 
 		reprogram_fixed_counter(pmc, ctrl, idx);
 	}
@@ -318,8 +313,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 			clear_bit(bit, pmu->reprogram_pmi);
 			continue;
 		}
-
-		reprogram_counter(pmu, bit);
+		reprogram_counter(pmc);
 	}
 
 	/*
@@ -505,13 +499,12 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 
 static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
 {
-	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	u64 prev_count;
 
 	prev_count = pmc->counter;
 	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
 
-	reprogram_counter(pmu, pmc->idx);
+	reprogram_counter(pmc);
 	if (pmc->counter < prev_count)
 		__kvm_perf_overflow(pmc, false);
 }
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7a7b8d5b775e..b529c54dc309 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -142,7 +142,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
-void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
+void reprogram_counter(struct kvm_pmc *pmc);
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 4e5b1eeeb77c..20f2b5f5102b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -56,16 +56,32 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 	pmu->fixed_ctr_ctrl = data;
 }
 
+static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
+{
+	if (pmc_idx < INTEL_PMC_IDX_FIXED)
+		return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
+				  MSR_P6_EVNTSEL0);
+	else {
+		u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
+
+		return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
+	}
+}
+
 /* function is called when global control register has been updated. */
 static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
 {
 	int bit;
 	u64 diff = pmu->global_ctrl ^ data;
+	struct kvm_pmc *pmc;
 
 	pmu->global_ctrl = data;
 
-	for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
-		reprogram_counter(pmu, bit);
+	for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) {
+		pmc = intel_pmc_idx_to_pmc(pmu, bit);
+		if (pmc)
+			reprogram_counter(pmc);
+	}
 }
 
 static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
@@ -101,18 +117,6 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
 	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
 }
 
-static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
-{
-	if (pmc_idx < INTEL_PMC_IDX_FIXED)
-		return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
-				  MSR_P6_EVNTSEL0);
-	else {
-		u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
-
-		return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0);
-	}
-}
-
 static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/12] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (2 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 03/12] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 05/12] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter() Like Xu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Because inside reprogram_gp_counter() it is bound to assign the requested
eventel to pmc->eventsel, this assignment step can be moved forward, thus
simplifying the passing of parameters to "struct kvm_pmc *pmc" only.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 7 +++----
 arch/x86/kvm/pmu.h           | 2 +-
 arch/x86/kvm/svm/pmu.c       | 3 ++-
 arch/x86/kvm/vmx/pmu_intel.c | 3 ++-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0ce33a2798cd..7b8a5f973a63 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -215,16 +215,15 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	return allow_event;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+void reprogram_gp_counter(struct kvm_pmc *pmc)
 {
 	u64 config;
 	u32 type = PERF_TYPE_RAW;
+	u64 eventsel = pmc->eventsel;
 
 	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
 		printk_once("kvm pmu: pin control bit is ignored\n");
 
-	pmc->eventsel = eventsel;
-
 	pmc_pause_counter(pmc);
 
 	if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
@@ -291,7 +290,7 @@ EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
 void reprogram_counter(struct kvm_pmc *pmc)
 {
 	if (pmc_is_gp(pmc))
-		reprogram_gp_counter(pmc, pmc->eventsel);
+		reprogram_gp_counter(pmc);
 	else {
 		int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
 		u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b529c54dc309..4db50c290c62 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -140,7 +140,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 	return sample_period;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
+void reprogram_gp_counter(struct kvm_pmc *pmc);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
 void reprogram_counter(struct kvm_pmc *pmc);
 
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index d4de52409335..7ff9ccaca0a4 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -265,7 +265,8 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (data == pmc->eventsel)
 			return 0;
 		if (!(data & pmu->reserved_bits)) {
-			reprogram_gp_counter(pmc, data);
+			pmc->eventsel = data;
+			reprogram_gp_counter(pmc);
 			return 0;
 		}
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 20f2b5f5102b..2eefde7e4b1a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -448,7 +448,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (data == pmc->eventsel)
 				return 0;
 			if (!(data & pmu->reserved_bits)) {
-				reprogram_gp_counter(pmc, data);
+				pmc->eventsel = data;
+				reprogram_gp_counter(pmc);
 				return 0;
 			}
 		} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/12] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (3 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 04/12] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 06/12] KVM: x86/pmu: Use only the uniformly exported interface reprogram_counter() Like Xu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Since afrer reprogram_fixed_counter() is called, it's bound to assign
the requested fixed_ctr_ctrl to pmu->fixed_ctr_ctrl, this assignment step
can be moved forward (the stale value for diff is saved extra early),
thus simplifying the passing of parameters.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 13 ++++++-------
 arch/x86/kvm/pmu.h           |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++--------
 3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b8a5f973a63..282e6e859c46 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -260,8 +260,11 @@ void reprogram_gp_counter(struct kvm_pmc *pmc)
 }
 EXPORT_SYMBOL_GPL(reprogram_gp_counter);
 
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
+void reprogram_fixed_counter(struct kvm_pmc *pmc)
 {
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
+	u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
 	unsigned en_field = ctrl & 0x3;
 	bool pmi = ctrl & 0x8;
 
@@ -291,12 +294,8 @@ void reprogram_counter(struct kvm_pmc *pmc)
 {
 	if (pmc_is_gp(pmc))
 		reprogram_gp_counter(pmc);
-	else {
-		int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
-		u8 ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, idx);
-
-		reprogram_fixed_counter(pmc, ctrl, idx);
-	}
+	else
+		reprogram_fixed_counter(pmc);
 }
 EXPORT_SYMBOL_GPL(reprogram_counter);
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4db50c290c62..70a982c3cdad 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -141,7 +141,7 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 }
 
 void reprogram_gp_counter(struct kvm_pmc *pmc);
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
+void reprogram_fixed_counter(struct kvm_pmc *pmc);
 void reprogram_counter(struct kvm_pmc *pmc);
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 2eefde7e4b1a..3ddbfdd16cd0 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -37,23 +37,23 @@ static int fixed_pmc_events[] = {1, 0, 7};
 
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
+	struct kvm_pmc *pmc;
+	u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
 	int i;
 
+	pmu->fixed_ctr_ctrl = data;
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		u8 new_ctrl = fixed_ctrl_field(data, i);
-		u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i);
-		struct kvm_pmc *pmc;
-
-		pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
+		u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);
 
 		if (old_ctrl == new_ctrl)
 			continue;
 
-		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
-		reprogram_fixed_counter(pmc, new_ctrl, i);
-	}
+		pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
 
-	pmu->fixed_ctr_ctrl = data;
+		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
+		reprogram_fixed_counter(pmc);
+	}
 }
 
 static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/12] KVM: x86/pmu: Use only the uniformly exported interface reprogram_counter()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (4 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 05/12] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter() Like Xu
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Since reprogram_counter(), reprogram_{gp, fixed}_counter() currently have
the same incoming parameter "struct kvm_pmc *pmc", the callers can simplify
the conetxt by using uniformly exported interface, which makes reprogram_
{gp, fixed}_counter() static and eliminates EXPORT_SYMBOL_GPL.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 6 ++----
 arch/x86/kvm/pmu.h           | 2 --
 arch/x86/kvm/svm/pmu.c       | 2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 4 ++--
 4 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 282e6e859c46..5299488b002c 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -215,7 +215,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	return allow_event;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc)
+static void reprogram_gp_counter(struct kvm_pmc *pmc)
 {
 	u64 config;
 	u32 type = PERF_TYPE_RAW;
@@ -258,9 +258,8 @@ void reprogram_gp_counter(struct kvm_pmc *pmc)
 			      (eventsel & HSW_IN_TX),
 			      (eventsel & HSW_IN_TX_CHECKPOINTED));
 }
-EXPORT_SYMBOL_GPL(reprogram_gp_counter);
 
-void reprogram_fixed_counter(struct kvm_pmc *pmc)
+static void reprogram_fixed_counter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
@@ -288,7 +287,6 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc)
 			      !(en_field & 0x1), /* exclude kernel */
 			      pmi, false, false);
 }
-EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
 
 void reprogram_counter(struct kvm_pmc *pmc)
 {
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 70a982c3cdad..201b99628423 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -140,8 +140,6 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 	return sample_period;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc);
-void reprogram_fixed_counter(struct kvm_pmc *pmc);
 void reprogram_counter(struct kvm_pmc *pmc);
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 7ff9ccaca0a4..a18bf636fbce 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -266,7 +266,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		if (!(data & pmu->reserved_bits)) {
 			pmc->eventsel = data;
-			reprogram_gp_counter(pmc);
+			reprogram_counter(pmc);
 			return 0;
 		}
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 3ddbfdd16cd0..19b78a9d9d47 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -52,7 +52,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
 
 		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
-		reprogram_fixed_counter(pmc);
+		reprogram_counter(pmc);
 	}
 }
 
@@ -449,7 +449,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 				return 0;
 			if (!(data & pmu->reserved_bits)) {
 				pmc->eventsel = data;
-				reprogram_gp_counter(pmc);
+				reprogram_counter(pmc);
 				return 0;
 			}
 		} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (5 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 06/12] KVM: x86/pmu: Use only the uniformly exported interface reprogram_counter() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-08  0:36   ` Jim Mattson
  2022-03-02 11:13 ` [PATCH v2 08/12] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu,
	Peter Zijlstra

From: Like Xu <likexu@tencent.com>

The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
fixed counter using the PERF_TYPE_HARDWAR type and the gp being
able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
depending on the pmc->eventsel value.

After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
been setup with the same semantic value and will not be changed during
the guest runtime. But essentially, "the HARDWARE is just a convenience
wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
to use the PERF_TYPE_RAW type only to program both gp and fixed
counters naturally in the reprogram_counter().

To make the gp and fixed counters more semantically symmetrical,
the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
via fixed_ctr_ctrl before the pmc_reprogram_counter() call.

Cc: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 128 +++++++++++++----------------------
 arch/x86/kvm/vmx/pmu_intel.c |   2 +-
 2 files changed, 47 insertions(+), 83 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5299488b002c..00e1660c10ca 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	return allow_event;
 }
 
-static void reprogram_gp_counter(struct kvm_pmc *pmc)
-{
-	u64 config;
-	u32 type = PERF_TYPE_RAW;
-	u64 eventsel = pmc->eventsel;
-
-	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
-		printk_once("kvm pmu: pin control bit is ignored\n");
-
-	pmc_pause_counter(pmc);
-
-	if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
-		return;
-
-	if (!check_pmu_event_filter(pmc))
-		return;
-
-	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
-			  ARCH_PERFMON_EVENTSEL_INV |
-			  ARCH_PERFMON_EVENTSEL_CMASK |
-			  HSW_IN_TX |
-			  HSW_IN_TX_CHECKPOINTED))) {
-		config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
-		if (config != PERF_COUNT_HW_MAX)
-			type = PERF_TYPE_HARDWARE;
-	}
-
-	if (type == PERF_TYPE_RAW)
-		config = eventsel & AMD64_RAW_EVENT_MASK;
-
-	if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
-		return;
-
-	pmc_release_perf_event(pmc);
-
-	pmc->current_config = eventsel;
-	pmc_reprogram_counter(pmc, type, config,
-			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
-			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-			      eventsel & ARCH_PERFMON_EVENTSEL_INT,
-			      (eventsel & HSW_IN_TX),
-			      (eventsel & HSW_IN_TX_CHECKPOINTED));
-}
-
-static void reprogram_fixed_counter(struct kvm_pmc *pmc)
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-	int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
-	u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
-	unsigned en_field = ctrl & 0x3;
-	bool pmi = ctrl & 0x8;
 
-	pmc_pause_counter(pmc);
+	if (pmc_is_fixed(pmc))
+		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
 
-	if (!en_field || !pmc_is_enabled(pmc))
-		return;
-
-	if (!check_pmu_event_filter(pmc))
-		return;
-
-	if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
-		return;
-
-	pmc_release_perf_event(pmc);
-
-	pmc->current_config = (u64)ctrl;
-	pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
-			      kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
-			      !(en_field & 0x2), /* exclude user */
-			      !(en_field & 0x1), /* exclude kernel */
-			      pmi, false, false);
+	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
 }
 
 void reprogram_counter(struct kvm_pmc *pmc)
 {
-	if (pmc_is_gp(pmc))
-		reprogram_gp_counter(pmc);
-	else
-		reprogram_fixed_counter(pmc);
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	u64 eventsel = pmc->eventsel;
+	u64 new_config = eventsel;
+	u8 fixed_ctr_ctrl;
+
+	pmc_pause_counter(pmc);
+
+	if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
+		return;
+
+	if (!check_pmu_event_filter(pmc))
+		return;
+
+	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
+		printk_once("kvm pmu: pin control bit is ignored\n");
+
+	if (pmc_is_fixed(pmc)) {
+		fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+						  pmc->idx - INTEL_PMC_IDX_FIXED);
+		if (fixed_ctr_ctrl & 0x1)
+			eventsel |= ARCH_PERFMON_EVENTSEL_OS;
+		if (fixed_ctr_ctrl & 0x2)
+			eventsel |= ARCH_PERFMON_EVENTSEL_USR;
+		if (fixed_ctr_ctrl & 0x8)
+			eventsel |= ARCH_PERFMON_EVENTSEL_INT;
+		new_config = (u64)fixed_ctr_ctrl;
+	}
+
+	if (pmc->current_config == new_config && pmc_resume_counter(pmc))
+		return;
+
+	pmc_release_perf_event(pmc);
+
+	pmc->current_config = new_config;
+	pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
+			(eventsel & AMD64_RAW_EVENT_MASK),
+			!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
+			!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
+			eventsel & ARCH_PERFMON_EVENTSEL_INT,
+			(eventsel & HSW_IN_TX),
+			(eventsel & HSW_IN_TX_CHECKPOINTED));
 }
 EXPORT_SYMBOL_GPL(reprogram_counter);
 
@@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 	kvm_pmu_refresh(vcpu);
 }
 
-static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
-{
-	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-	if (pmc_is_fixed(pmc))
-		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
-			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
-
-	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
-}
-
 /* Release perf_events for vPMCs that have been unused for a full time slice.  */
 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 19b78a9d9d47..d823fbe4e155 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->reserved_bits = 0xffffffff00200000ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
-	if (!entry || !vcpu->kvm->arch.enable_pmu)
+	if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))
 		return;
 	eax.full = entry->eax;
 	edx.full = entry->edx;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/12] perf: x86/core: Add interface to query perfmon_event_map[] directly
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (6 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 09/12] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config() Like Xu
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu,
	Peter Zijlstra

From: Like Xu <likexu@tencent.com>

Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel
platforms and amd_[f17h]_perfmon_event_map on the AMD platforms.

Early clumsy KVM code or other potential perf_event users may have
hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so
it would not make sense to program a common hardware event based
on the generic "enum perf_hw_id" once the two tables do not match.

Let's provide an interface for callers outside the perf subsystem to get
the counter config based on the perfmon_event_map currently in use,
and it also helps to save bytes.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/events/core.c            | 11 +++++++++++
 arch/x86/include/asm/perf_event.h |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e686c5e0537b..e760a1348c62 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2996,3 +2996,14 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->events_mask_len	= x86_pmu.events_mask_len;
 }
 EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
+
+u64 perf_get_hw_event_config(int hw_event)
+{
+	int max = x86_pmu.max_events;
+
+	if (hw_event < max)
+		return x86_pmu.event_map(array_index_nospec(hw_event, max));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_get_hw_event_config);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8fc1b5003713..822927045406 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -477,6 +477,7 @@ struct x86_pmu_lbr {
 };
 
 extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern u64 perf_get_hw_event_config(int hw_event);
 extern void perf_check_microcode(void);
 extern void perf_clear_dirty_counters(void);
 extern int x86_perf_rdpmc_index(struct perf_event *event);
@@ -486,6 +487,11 @@ static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	memset(cap, 0, sizeof(*cap));
 }
 
+static inline u64 perf_get_hw_event_config(int hw_event)
+{
+	return 0;
+}
+
 static inline void perf_events_lapic_init(void)	{ }
 static inline void perf_check_microcode(void) { }
 #endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/12] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config()
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (7 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 08/12] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 10/12] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context Like Xu
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

With the help of perf_get_hw_event_config(), KVM could query the correct
EVENTSEL_{EVENT, UMASK} pair of a kernel-generic hw event directly from
the different *_perfmon_event_map[] by the kernel's pre-defined perf_hw_id.

Also extend the bit range of the comparison field to
AMD64_RAW_EVENT_MASK_NB to prevent AMD from
defining EventSelect[11:8] into perfmon_event_map[] one day.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 00e1660c10ca..9fb7d29e5fdd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -472,13 +472,8 @@ static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
 static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
 	unsigned int perf_hw_id)
 {
-	u64 old_eventsel = pmc->eventsel;
-	unsigned int config;
-
-	pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK);
-	config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
-	pmc->eventsel = old_eventsel;
-	return config == perf_hw_id;
+	return !((pmc->eventsel ^ perf_get_hw_event_config(perf_hw_id)) &
+		AMD64_RAW_EVENT_MASK_NB);
 }
 
 static inline bool cpl_is_matched(struct kvm_pmc *pmc)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/12] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (8 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 09/12] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config() Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 11/12] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU Like Xu
  2022-03-02 11:13 ` [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW,
which means that the table that maps perf_hw_id to event select values is
no longer useful, at least for AMD.

For Intel, the logic to check if the pmu event reported by Intel cpuid is
not available is still required, in which case pmc_perf_hw_id() could be
renamed to hw_event_is_unavail() and a bool value is returned to replace
the semantics of "PERF_COUNT_HW_MAX+1".

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           |  6 +++---
 arch/x86/kvm/pmu.h           |  2 +-
 arch/x86/kvm/svm/pmu.c       | 34 +++-------------------------------
 arch/x86/kvm/vmx/pmu_intel.c | 11 ++++-------
 4 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9fb7d29e5fdd..60f44252540a 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -114,9 +114,6 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		.config = config,
 	};
 
-	if (type == PERF_TYPE_HARDWARE && config >= PERF_COUNT_HW_MAX)
-		return;
-
 	attr.sample_period = get_sample_period(pmc, pmc->counter);
 
 	if (in_tx)
@@ -190,6 +187,9 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	__u64 key;
 	int idx;
 
+	if (kvm_x86_ops.pmu_ops->hw_event_is_unavail(pmc))
+		return false;
+
 	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
 	if (!filter)
 		goto out;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 201b99628423..a2b4037759a2 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -24,7 +24,7 @@ struct kvm_event_hw_type_mapping {
 };
 
 struct kvm_pmu_ops {
-	unsigned int (*pmc_perf_hw_id)(struct kvm_pmc *pmc);
+	bool (*hw_event_is_unavail)(struct kvm_pmc *pmc);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index a18bf636fbce..41c9b9e2aec2 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -33,18 +33,6 @@ enum index {
 	INDEX_ERROR,
 };
 
-/* duplicated from amd_perfmon_event_map, K7 and above should work. */
-static struct kvm_event_hw_type_mapping amd_event_mapping[] = {
-	[0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
-	[1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
-	[2] = { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES },
-	[3] = { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES },
-	[4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
-	[5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
-	[6] = { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
-	[7] = { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
-};
-
 static unsigned int get_msr_base(struct kvm_pmu *pmu, enum pmu_type type)
 {
 	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
@@ -138,25 +126,9 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
 	return &pmu->gp_counters[msr_to_index(msr)];
 }
 
-static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
+static bool amd_hw_event_is_unavail(struct kvm_pmc *pmc)
 {
-	u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
-	u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
-	int i;
-
-	/* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
-	if (WARN_ON(pmc_is_fixed(pmc)))
-		return PERF_COUNT_HW_MAX;
-
-	for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
-		if (amd_event_mapping[i].eventsel == event_select
-		    && amd_event_mapping[i].unit_mask == unit_mask)
-			break;
-
-	if (i == ARRAY_SIZE(amd_event_mapping))
-		return PERF_COUNT_HW_MAX;
-
-	return amd_event_mapping[i].event_type;
+	return false;
 }
 
 /* check if a PMC is enabled by comparing it against global_ctrl bits. Because
@@ -322,7 +294,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 }
 
 struct kvm_pmu_ops amd_pmu_ops = {
-	.pmc_perf_hw_id = amd_pmc_perf_hw_id,
+	.hw_event_is_unavail = amd_hw_event_is_unavail,
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d823fbe4e155..9b94674cc5fa 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -84,7 +84,7 @@ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
 	}
 }
 
-static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
+static bool intel_hw_event_is_unavail(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
@@ -98,15 +98,12 @@ static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
 
 		/* disable event that reported as not present by cpuid */
 		if ((i < 7) && !(pmu->available_event_types & (1 << i)))
-			return PERF_COUNT_HW_MAX + 1;
+			return true;
 
 		break;
 	}
 
-	if (i == ARRAY_SIZE(intel_arch_events))
-		return PERF_COUNT_HW_MAX;
-
-	return intel_arch_events[i].event_type;
+	return false;
 }
 
 /* check if a PMC is enabled by comparing it with globl_ctrl bits. */
@@ -721,7 +718,7 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 }
 
 struct kvm_pmu_ops intel_pmu_ops = {
-	.pmc_perf_hw_id = intel_pmc_perf_hw_id,
+	.hw_event_is_unavail = intel_hw_event_is_unavail,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/12] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (9 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 10/12] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 11:13 ` [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
  11 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Similar to "kvm->arch.msr_filter", KVM should guarantee that vCPUs will
see either the previous filter or the new filter when user space calls
KVM_SET_PMU_EVENT_FILTER ioctl with the vCPU running so that guest
pmu events with identical settings in both the old and new filter have
deterministic behavior.

Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/pmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 60f44252540a..17c61c990282 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -185,11 +185,12 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	struct kvm *kvm = pmc->vcpu->kvm;
 	bool allow_event = true;
 	__u64 key;
-	int idx;
+	int idx, srcu_idx;
 
 	if (kvm_x86_ops.pmu_ops->hw_event_is_unavail(pmc))
 		return false;
 
+	srcu_idx = srcu_read_lock(&kvm->srcu);
 	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
 	if (!filter)
 		goto out;
@@ -212,6 +213,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	}
 
 out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return allow_event;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
                   ` (10 preceding siblings ...)
  2022-03-02 11:13 ` [PATCH v2 11/12] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU Like Xu
@ 2022-03-02 11:13 ` Like Xu
  2022-03-02 17:52   ` Jim Mattson
  11 siblings, 1 reply; 20+ messages in thread
From: Like Xu @ 2022-03-02 11:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

The AMD Family 19h Models 00h-0Fh Processors may experience sampling
inaccuracies that cause the following performance counters to overcount
retire-based events. To count the non-FP affected PMC events correctly,
a patched guest with a target vCPU model would:

    - Use Core::X86::Msr::PERF_CTL2 to count the events, and
    - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
    - Program Core::X86::Msr::PERF_CTL2[20] to 0b.

To support this use of AMD guests, KVM should not reserve bit 43
only for counter #2. Treatment of other cases remains unchanged.

AMD hardware team clarified that the conditions under which the
overcounting can happen, is quite rare. This change may make those
PMU driver developers who have read errata #1292 less disappointed.

Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/svm/pmu.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 41c9b9e2aec2..05b4e4f2bb66 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,20 @@
 #include "pmu.h"
 #include "svm.h"
 
+/*
+ * As a workaround of "Retire Based Events May Overcount" for erratum 1292,
+ * some patched guests may set PERF_CTL2[43] to 1b and PERF_CTL2[20] to 0b
+ * to count the non-FP affected PMC events correctly.
+ *
+ * Note, tests show that the counter difference before and after using the
+ * workaround is not significant. Host will be scheduling CTR2 indiscriminately.
+ */
+static inline bool vcpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+	return guest_cpuid_family(vcpu) == 0x19 &&
+		guest_cpuid_model(vcpu) < 0x10;
+}
+
 enum pmu_type {
 	PMU_TYPE_COUNTER = 0,
 	PMU_TYPE_EVNTSEL,
@@ -224,6 +238,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	struct kvm_pmc *pmc;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u64 reserved_bits;
 
 	/* MSR_PERFCTRn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -236,7 +251,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	if (pmc) {
 		if (data == pmc->eventsel)
 			return 0;
-		if (!(data & pmu->reserved_bits)) {
+		reserved_bits = pmu->reserved_bits;
+		if (pmc->idx == 2 && vcpu_overcount_retire_events(vcpu))
+			reserved_bits &= ~BIT_ULL(43);
+		if (!(data & reserved_bits)) {
 			pmc->eventsel = data;
 			reprogram_counter(pmc);
 			return 0;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-02 11:13 ` [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
@ 2022-03-02 17:52   ` Jim Mattson
  2022-03-04  9:46     ` Like Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2022-03-02 17:52 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu

On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> The AMD Family 19h Models 00h-0Fh Processors may experience sampling
> inaccuracies that cause the following performance counters to overcount
> retire-based events. To count the non-FP affected PMC events correctly,
> a patched guest with a target vCPU model would:
>
>     - Use Core::X86::Msr::PERF_CTL2 to count the events, and
>     - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>     - Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>
> To support this use of AMD guests, KVM should not reserve bit 43
> only for counter #2. Treatment of other cases remains unchanged.
>
> AMD hardware team clarified that the conditions under which the
> overcounting can happen, is quite rare. This change may make those
> PMU driver developers who have read errata #1292 less disappointed.
>
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>

This seems unnecessarily convoluted. As I've said previously, KVM
should not ever synthesize a #GP for any value written to a
PerfEvtSeln MSR when emulating an AMD CPU.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-02 17:52   ` Jim Mattson
@ 2022-03-04  9:46     ` Like Xu
  2022-03-04 19:06       ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Like Xu @ 2022-03-04  9:46 UTC (permalink / raw)
  To: Jim Mattson, Paolo Bonzini
  Cc: kvm, Sean Christopherson, Wanpeng Li, Vitaly Kuznetsov,
	Joerg Roedel, linux-kernel

On 3/3/2022 1:52 am, Jim Mattson wrote:
> On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> The AMD Family 19h Models 00h-0Fh Processors may experience sampling
>> inaccuracies that cause the following performance counters to overcount
>> retire-based events. To count the non-FP affected PMC events correctly,
>> a patched guest with a target vCPU model would:
>>
>>      - Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>      - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>      - Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>
>> To support this use of AMD guests, KVM should not reserve bit 43
>> only for counter #2. Treatment of other cases remains unchanged.
>>
>> AMD hardware team clarified that the conditions under which the
>> overcounting can happen, is quite rare. This change may make those
>> PMU driver developers who have read errata #1292 less disappointed.
>>
>> Reported-by: Jim Mattson <jmattson@google.com>
>> Signed-off-by: Like Xu <likexu@tencent.com>
> 
> This seems unnecessarily convoluted. As I've said previously, KVM
> should not ever synthesize a #GP for any value written to a
> PerfEvtSeln MSR when emulating an AMD CPU.

IMO, we should "never synthesize a #GP" for all AMD MSRs,
not just for AMD PMU msrs, or keep the status quo.

I agree with you on this AMD #GP transition, but we need at least one
kernel cycle to make a more radical change and we don't know Paolo's
attitude and more, we haven't received a tidal wave of user complaints.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-04  9:46     ` Like Xu
@ 2022-03-04 19:06       ` Jim Mattson
  2022-03-08 11:25         ` Like Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2022-03-04 19:06 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel

On Fri, Mar 4, 2022 at 1:47 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 3/3/2022 1:52 am, Jim Mattson wrote:
> > On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> From: Like Xu <likexu@tencent.com>
> >>
> >> The AMD Family 19h Models 00h-0Fh Processors may experience sampling
> >> inaccuracies that cause the following performance counters to overcount
> >> retire-based events. To count the non-FP affected PMC events correctly,
> >> a patched guest with a target vCPU model would:
> >>
> >>      - Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>      - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>      - Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>
> >> To support this use of AMD guests, KVM should not reserve bit 43
> >> only for counter #2. Treatment of other cases remains unchanged.
> >>
> >> AMD hardware team clarified that the conditions under which the
> >> overcounting can happen, is quite rare. This change may make those
> >> PMU driver developers who have read errata #1292 less disappointed.
> >>
> >> Reported-by: Jim Mattson <jmattson@google.com>
> >> Signed-off-by: Like Xu <likexu@tencent.com>
> >
> > This seems unnecessarily convoluted. As I've said previously, KVM
> > should not ever synthesize a #GP for any value written to a
> > PerfEvtSeln MSR when emulating an AMD CPU.
>
> IMO, we should "never synthesize a #GP" for all AMD MSRs,
> not just for AMD PMU msrs, or keep the status quo.

Then, why are you proposing this change? :-)

We should continue to synthesize a #GP for an attempt to set "must be
zero" bits or for rule violations, like "address must be canonical."
However, we have absolutely no business making up our own hardware
specification. This is a bug, and it should be fixed, like any other
bug.

> I agree with you on this AMD #GP transition, but we need at least one
> kernel cycle to make a more radical change and we don't know Paolo's
> attitude and more, we haven't received a tidal wave of user complaints.

Again, if this is your stance, why are you proposing this change? :-)

If you wait until you have a tidal wave of user complaints, you have
waited too long. It's much better to be proactive than reactive.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()
  2022-03-02 11:13 ` [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter() Like Xu
@ 2022-03-08  0:36   ` Jim Mattson
  2022-03-08 11:43     ` Like Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2022-03-08  0:36 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel, Like Xu,
	Peter Zijlstra

On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
> fixed counter using the PERF_TYPE_HARDWAR type and the gp being
> able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
> depending on the pmc->eventsel value.
>
> After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
> for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
> been setup with the same semantic value and will not be changed during
> the guest runtime. But essentially, "the HARDWARE is just a convenience
> wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
> to use the PERF_TYPE_RAW type only to program both gp and fixed
> counters naturally in the reprogram_counter().
>
> To make the gp and fixed counters more semantically symmetrical,
> the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
> via fixed_ctr_ctrl before the pmc_reprogram_counter() call.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/pmu.c           | 128 +++++++++++++----------------------
>  arch/x86/kvm/vmx/pmu_intel.c |   2 +-
>  2 files changed, 47 insertions(+), 83 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 5299488b002c..00e1660c10ca 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>         return allow_event;
>  }
>
> -static void reprogram_gp_counter(struct kvm_pmc *pmc)
> -{
> -       u64 config;
> -       u32 type = PERF_TYPE_RAW;
> -       u64 eventsel = pmc->eventsel;
> -
> -       if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
> -               printk_once("kvm pmu: pin control bit is ignored\n");
> -
> -       pmc_pause_counter(pmc);
> -
> -       if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
> -               return;
> -
> -       if (!check_pmu_event_filter(pmc))
> -               return;
> -
> -       if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
> -                         ARCH_PERFMON_EVENTSEL_INV |
> -                         ARCH_PERFMON_EVENTSEL_CMASK |
> -                         HSW_IN_TX |
> -                         HSW_IN_TX_CHECKPOINTED))) {
> -               config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
> -               if (config != PERF_COUNT_HW_MAX)
> -                       type = PERF_TYPE_HARDWARE;
> -       }
> -
> -       if (type == PERF_TYPE_RAW)
> -               config = eventsel & AMD64_RAW_EVENT_MASK;
> -
> -       if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
> -               return;
> -
> -       pmc_release_perf_event(pmc);
> -
> -       pmc->current_config = eventsel;
> -       pmc_reprogram_counter(pmc, type, config,
> -                             !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> -                             !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> -                             eventsel & ARCH_PERFMON_EVENTSEL_INT,
> -                             (eventsel & HSW_IN_TX),
> -                             (eventsel & HSW_IN_TX_CHECKPOINTED));
> -}
> -
> -static void reprogram_fixed_counter(struct kvm_pmc *pmc)
> +static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>  {
>         struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> -       int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
> -       u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
> -       unsigned en_field = ctrl & 0x3;
> -       bool pmi = ctrl & 0x8;
>
> -       pmc_pause_counter(pmc);
> +       if (pmc_is_fixed(pmc))
> +               return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> +                       pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>
> -       if (!en_field || !pmc_is_enabled(pmc))
> -               return;
> -
> -       if (!check_pmu_event_filter(pmc))
> -               return;
> -
> -       if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
> -               return;
> -
> -       pmc_release_perf_event(pmc);
> -
> -       pmc->current_config = (u64)ctrl;
> -       pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
> -                             kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
> -                             !(en_field & 0x2), /* exclude user */
> -                             !(en_field & 0x1), /* exclude kernel */
> -                             pmi, false, false);
> +       return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>  }
>
>  void reprogram_counter(struct kvm_pmc *pmc)
>  {
> -       if (pmc_is_gp(pmc))
> -               reprogram_gp_counter(pmc);
> -       else
> -               reprogram_fixed_counter(pmc);
> +       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> +       u64 eventsel = pmc->eventsel;
> +       u64 new_config = eventsel;
> +       u8 fixed_ctr_ctrl;
> +
> +       pmc_pause_counter(pmc);
> +
> +       if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
> +               return;
> +
> +       if (!check_pmu_event_filter(pmc))
> +               return;
> +
> +       if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
> +               printk_once("kvm pmu: pin control bit is ignored\n");
> +
> +       if (pmc_is_fixed(pmc)) {
> +               fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> +                                                 pmc->idx - INTEL_PMC_IDX_FIXED);
> +               if (fixed_ctr_ctrl & 0x1)
> +                       eventsel |= ARCH_PERFMON_EVENTSEL_OS;
> +               if (fixed_ctr_ctrl & 0x2)
> +                       eventsel |= ARCH_PERFMON_EVENTSEL_USR;
> +               if (fixed_ctr_ctrl & 0x8)
> +                       eventsel |= ARCH_PERFMON_EVENTSEL_INT;
> +               new_config = (u64)fixed_ctr_ctrl;
> +       }
> +
> +       if (pmc->current_config == new_config && pmc_resume_counter(pmc))
> +               return;
> +
> +       pmc_release_perf_event(pmc);
> +
> +       pmc->current_config = new_config;
> +       pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> +                       (eventsel & AMD64_RAW_EVENT_MASK),
> +                       !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> +                       !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> +                       eventsel & ARCH_PERFMON_EVENTSEL_INT,
> +                       (eventsel & HSW_IN_TX),
> +                       (eventsel & HSW_IN_TX_CHECKPOINTED));

It seems that this extremely long argument list was motivated by the
differences between the two original call sites. Now that you have
mocked up a full eventsel (with USR, OS, INT, IN_TX, and IN_TXCP bits)
for the fixed counters, why not pass the entire eventsel as the third
argument and drop all of the rest? Then, pmc_reprogram_counter() can
extract/check the bits of interest.

>  }
>  EXPORT_SYMBOL_GPL(reprogram_counter);
>
> @@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
>         kvm_pmu_refresh(vcpu);
>  }
>
> -static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
> -{
> -       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> -
> -       if (pmc_is_fixed(pmc))
> -               return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> -                       pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
> -
> -       return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
> -}
> -
>  /* Release perf_events for vPMCs that have been unused for a full time slice.  */
>  void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 19b78a9d9d47..d823fbe4e155 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>         pmu->reserved_bits = 0xffffffff00200000ull;
>
>         entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
> -       if (!entry || !vcpu->kvm->arch.enable_pmu)
> +       if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))

This change seems unrelated.

>                 return;
>         eax.full = entry->eax;
>         edx.full = entry->edx;
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-04 19:06       ` Jim Mattson
@ 2022-03-08 11:25         ` Like Xu
  2022-03-08 16:14           ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Like Xu @ 2022-03-08 11:25 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel

On 5/3/2022 3:06 am, Jim Mattson wrote:
> We should continue to synthesize a #GP for an attempt to set "must be
> zero" bits or for rule violations, like "address must be canonical."

Actually, I do stand in the same position as you.

> However, we have absolutely no business making up our own hardware
> specification. This is a bug, and it should be fixed, like any other
> bug.
Current virtual hardware interfaces do not strictly comply with vendor 
specifications
and may not be the same in the first step of enablement, or some of them may have
to be compromised later out of various complexity.

The behavior of AMD's "synthesize a #GP" to "reserved without qualification" bits
is clearly a legacy tech decision (not sure if it was intentional). We may need 
a larger
independent patch set to apply this one-time surgery, including of course this 
pmu issue.

What do you think ?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter()
  2022-03-08  0:36   ` Jim Mattson
@ 2022-03-08 11:43     ` Like Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Like Xu @ 2022-03-08 11:43 UTC (permalink / raw)
  To: Jim Mattson, Paolo Bonzini
  Cc: kvm, Sean Christopherson, Wanpeng Li, Vitaly Kuznetsov,
	Joerg Roedel, linux-kernel, Like Xu, Peter Zijlstra

On 8/3/2022 8:36 am, Jim Mattson wrote:
> On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> The code sketch for reprogram_{gp, fixed}_counter() is similar, while the
>> fixed counter using the PERF_TYPE_HARDWAR type and the gp being
>> able to use either PERF_TYPE_HARDWAR or PERF_TYPE_RAW type
>> depending on the pmc->eventsel value.
>>
>> After 'commit 761875634a5e ("KVM: x86/pmu: Setup pmc->eventsel
>> for fixed PMCs")', the pmc->eventsel of the fixed counter will also have
>> been setup with the same semantic value and will not be changed during
>> the guest runtime. But essentially, "the HARDWARE is just a convenience
>> wrapper over RAW IIRC", quoated from Peterz. So it could be pretty safe
>> to use the PERF_TYPE_RAW type only to program both gp and fixed
>> counters naturally in the reprogram_counter().
>>
>> To make the gp and fixed counters more semantically symmetrical,
>> the selection of EVENTSEL_{USER, OS, INT} bits is temporarily translated
>> via fixed_ctr_ctrl before the pmc_reprogram_counter() call.
>>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Suggested-by: Jim Mattson <jmattson@google.com>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
>>   arch/x86/kvm/pmu.c           | 128 +++++++++++++----------------------
>>   arch/x86/kvm/vmx/pmu_intel.c |   2 +-
>>   2 files changed, 47 insertions(+), 83 deletions(-)
>>
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 5299488b002c..00e1660c10ca 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -215,85 +215,60 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
>>          return allow_event;
>>   }
>>
>> -static void reprogram_gp_counter(struct kvm_pmc *pmc)
>> -{
>> -       u64 config;
>> -       u32 type = PERF_TYPE_RAW;
>> -       u64 eventsel = pmc->eventsel;
>> -
>> -       if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
>> -               printk_once("kvm pmu: pin control bit is ignored\n");
>> -
>> -       pmc_pause_counter(pmc);
>> -
>> -       if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
>> -               return;
>> -
>> -       if (!check_pmu_event_filter(pmc))
>> -               return;
>> -
>> -       if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>> -                         ARCH_PERFMON_EVENTSEL_INV |
>> -                         ARCH_PERFMON_EVENTSEL_CMASK |
>> -                         HSW_IN_TX |
>> -                         HSW_IN_TX_CHECKPOINTED))) {
>> -               config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
>> -               if (config != PERF_COUNT_HW_MAX)
>> -                       type = PERF_TYPE_HARDWARE;
>> -       }
>> -
>> -       if (type == PERF_TYPE_RAW)
>> -               config = eventsel & AMD64_RAW_EVENT_MASK;
>> -
>> -       if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
>> -               return;
>> -
>> -       pmc_release_perf_event(pmc);
>> -
>> -       pmc->current_config = eventsel;
>> -       pmc_reprogram_counter(pmc, type, config,
>> -                             !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
>> -                             !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
>> -                             eventsel & ARCH_PERFMON_EVENTSEL_INT,
>> -                             (eventsel & HSW_IN_TX),
>> -                             (eventsel & HSW_IN_TX_CHECKPOINTED));
>> -}
>> -
>> -static void reprogram_fixed_counter(struct kvm_pmc *pmc)
>> +static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>>   {
>>          struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> -       int idx = pmc->idx - INTEL_PMC_IDX_FIXED;
>> -       u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
>> -       unsigned en_field = ctrl & 0x3;
>> -       bool pmi = ctrl & 0x8;
>>
>> -       pmc_pause_counter(pmc);
>> +       if (pmc_is_fixed(pmc))
>> +               return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> +                       pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>>
>> -       if (!en_field || !pmc_is_enabled(pmc))
>> -               return;
>> -
>> -       if (!check_pmu_event_filter(pmc))
>> -               return;
>> -
>> -       if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
>> -               return;
>> -
>> -       pmc_release_perf_event(pmc);
>> -
>> -       pmc->current_config = (u64)ctrl;
>> -       pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
>> -                             kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
>> -                             !(en_field & 0x2), /* exclude user */
>> -                             !(en_field & 0x1), /* exclude kernel */
>> -                             pmi, false, false);
>> +       return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>>   }
>>
>>   void reprogram_counter(struct kvm_pmc *pmc)
>>   {
>> -       if (pmc_is_gp(pmc))
>> -               reprogram_gp_counter(pmc);
>> -       else
>> -               reprogram_fixed_counter(pmc);
>> +       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> +       u64 eventsel = pmc->eventsel;
>> +       u64 new_config = eventsel;
>> +       u8 fixed_ctr_ctrl;
>> +
>> +       pmc_pause_counter(pmc);
>> +
>> +       if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
>> +               return;
>> +
>> +       if (!check_pmu_event_filter(pmc))
>> +               return;
>> +
>> +       if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
>> +               printk_once("kvm pmu: pin control bit is ignored\n");
>> +
>> +       if (pmc_is_fixed(pmc)) {
>> +               fixed_ctr_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> +                                                 pmc->idx - INTEL_PMC_IDX_FIXED);
>> +               if (fixed_ctr_ctrl & 0x1)
>> +                       eventsel |= ARCH_PERFMON_EVENTSEL_OS;
>> +               if (fixed_ctr_ctrl & 0x2)
>> +                       eventsel |= ARCH_PERFMON_EVENTSEL_USR;
>> +               if (fixed_ctr_ctrl & 0x8)
>> +                       eventsel |= ARCH_PERFMON_EVENTSEL_INT;
>> +               new_config = (u64)fixed_ctr_ctrl;
>> +       }
>> +
>> +       if (pmc->current_config == new_config && pmc_resume_counter(pmc))
>> +               return;
>> +
>> +       pmc_release_perf_event(pmc);
>> +
>> +       pmc->current_config = new_config;
>> +       pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
>> +                       (eventsel & AMD64_RAW_EVENT_MASK),
>> +                       !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
>> +                       !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
>> +                       eventsel & ARCH_PERFMON_EVENTSEL_INT,
>> +                       (eventsel & HSW_IN_TX),
>> +                       (eventsel & HSW_IN_TX_CHECKPOINTED));
> 
> It seems that this extremely long argument list was motivated by the
> differences between the two original call sites. Now that you have
> mocked up a full eventsel (with USR, OS, INT, IN_TX, and IN_TXCP bits)
> for the fixed counters, why not pass the entire eventsel as the third

I've thought about it.

I'm trying to pass-in generic bits (EVENT_MASK, USER, OS, INT) to
pmc_reprogram_counter() and let the latter handle the implementation details
of assembling the "struct perf_event_attr" to talk carefully with perf core.

I suppose the fixed counters doesn't support IN_TX* bits and I try to
clean those two away in another patch as you know.

In terms of code readability, the current one achieves a good balance.

> argument and drop all of the rest? Then, pmc_reprogram_counter() can
> extract/check the bits of interest.
> 
>>   }
>>   EXPORT_SYMBOL_GPL(reprogram_counter);
>>
>> @@ -451,17 +426,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
>>          kvm_pmu_refresh(vcpu);
>>   }
>>
>> -static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>> -{
>> -       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>> -
>> -       if (pmc_is_fixed(pmc))
>> -               return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
>> -                       pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
>> -
>> -       return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>> -}
>> -
>>   /* Release perf_events for vPMCs that have been unused for a full time slice.  */
>>   void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
>>   {
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 19b78a9d9d47..d823fbe4e155 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -492,7 +492,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>          pmu->reserved_bits = 0xffffffff00200000ull;
>>
>>          entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
>> -       if (!entry || !vcpu->kvm->arch.enable_pmu)
>> +       if (!entry || !vcpu->kvm->arch.enable_pmu || !boot_cpu_has(X86_FEATURE_ARCH_PERFMON))
> 
> This change seems unrelated.

The intention of using the PERF_TYPE_HARDWARE type is to emulate guest architecture
PMU on a host without architecture PMU (the oldest Pentium 4), for which the 
guest vPMC
needs to be reprogrammed using the kernel generic perf_hw_id.

This is the most original story of PERF_TYPE_HARDWARE, thanks to history teacher 
Paolo,
who also suggested this way to drop this kind of support.

> 
>>                  return;
>>          eax.full = entry->eax;
>>          edx.full = entry->edx;
>> --
>> 2.35.1
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
  2022-03-08 11:25         ` Like Xu
@ 2022-03-08 16:14           ` Jim Mattson
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2022-03-08 16:14 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, kvm, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, linux-kernel

On Tue, Mar 8, 2022 at 3:25 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 5/3/2022 3:06 am, Jim Mattson wrote:
> > We should continue to synthesize a #GP for an attempt to set "must be
> > zero" bits or for rule violations, like "address must be canonical."
>
> Actually, I do stand in the same position as you.
>
> > However, we have absolutely no business making up our own hardware
> > specification. This is a bug, and it should be fixed, like any other
> > bug.
> Current virtual hardware interfaces do not strictly comply with vendor
> specifications
> and may not be the same in the first step of enablement, or some of them may have
> to be compromised later out of various complexity.
>
> The behavior of AMD's "synthesize a #GP" to "reserved without qualification" bits
> is clearly a legacy tech decision (not sure if it was intentional). We may need
> a larger
> independent patch set to apply this one-time surgery, including of course this
> pmu issue.
>
> What do you think ?

The PMU issue needs to be fixed ASAP, since a Linux guest will set the
"host-only" bit on a CPU that doesn't support it, and Linux expects
the bit to be ignored and the remainder of the PerfEvtSeln to be
written. Currently, KVM synthesizes #GP and the PerfEvtSeln is not
written.

I don't believe it is necessary to fix all related issues at one time.
Incremental fixes should be fine.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-03-08 16:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02 11:13 [PATCH v2 00/12] KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes Like Xu
2022-03-02 11:13 ` [PATCH v2 01/12] KVM: x86/pmu: Update comments for AMD gp counters Like Xu
2022-03-02 11:13 ` [PATCH v2 02/12] KVM: x86/pmu: Extract check_pmu_event_filter() from the same semantics Like Xu
2022-03-02 11:13 ` [PATCH v2 03/12] KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter() Like Xu
2022-03-02 11:13 ` [PATCH v2 04/12] KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter() Like Xu
2022-03-02 11:13 ` [PATCH v2 05/12] KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter() Like Xu
2022-03-02 11:13 ` [PATCH v2 06/12] KVM: x86/pmu: Use only the uniformly exported interface reprogram_counter() Like Xu
2022-03-02 11:13 ` [PATCH v2 07/12] KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp, fixed}counter() Like Xu
2022-03-08  0:36   ` Jim Mattson
2022-03-08 11:43     ` Like Xu
2022-03-02 11:13 ` [PATCH v2 08/12] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
2022-03-02 11:13 ` [PATCH v2 09/12] KVM: x86/pmu: Replace pmc_perf_hw_id() with perf_get_hw_event_config() Like Xu
2022-03-02 11:13 ` [PATCH v2 10/12] KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context Like Xu
2022-03-02 11:13 ` [PATCH v2 11/12] KVM: x86/pmu: Protect kvm->arch.pmu_event_filter with SRCU Like Xu
2022-03-02 11:13 ` [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
2022-03-02 17:52   ` Jim Mattson
2022-03-04  9:46     ` Like Xu
2022-03-04 19:06       ` Jim Mattson
2022-03-08 11:25         ` Like Xu
2022-03-08 16:14           ` Jim Mattson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).