All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions
@ 2021-11-30  7:42 Like Xu
  2021-11-30  7:42 ` [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs Like Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

Hi,

[ Jim is on holiday, so I'm here to continue this work. ]

Some cloud customers need accurate virtualization of two
basic PMU events on x86 hardware: "instructions retired" and
"branch instructions retired". The existing PMU virtualization code
fails to account for instructions (e.g, CPUID) that are emulated by KVM.

Accurately virtualizing all PMU events for all microarchitectures is a
herculean task, let's just stick to the two events covered by this set.

Eric Hankland wrote this code originally, but his plate is full, so Jim
and I volunteered to shepherd the changes through upstream acceptance.

Thanks,

v1 -> v2 Changelog:
- Include the patch set [1] and drop the intel_find_fixed_event(); [Paolo]
  (we will fix the misleading Intel CPUID events in another patch set)
- Drop checks for pmc->perf_event or event state or event type;
- Increase a counter once its umask bits and the first 8 select bits are matched;
- Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
- Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
- Add counter enable check for kvm_pmu_trigger_event();
- Add vcpu CPL check for kvm_pmu_trigger_event(); [Jim]

Previous:
https://lore.kernel.org/kvm/20211112235235.1125060-2-jmattson@google.com/

[1] https://lore.kernel.org/kvm/20211119064856.77948-1-likexu@tencent.com/

Jim Mattson (1):
  KVM: x86: Update vPMCs when retiring branch instructions

Like Xu (5):
  KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()
  KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  KVM: x86: Update vPMCs when retiring instructions

 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/emulate.c          |  55 ++++++++------
 arch/x86/kvm/kvm_emulate.h      |   1 +
 arch/x86/kvm/pmu.c              | 128 ++++++++++++++++++++++----------
 arch/x86/kvm/pmu.h              |   5 +-
 arch/x86/kvm/svm/pmu.c          |  19 ++---
 arch/x86/kvm/vmx/nested.c       |   7 +-
 arch/x86/kvm/vmx/pmu_intel.c    |  44 ++++++-----
 arch/x86/kvm/x86.c              |   5 ++
 9 files changed, 167 insertions(+), 98 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-06 19:57   ` Jim Mattson
       [not found]   ` <CALMp9eTq8H_bJOVKwi_7j3Kum9RvW6o-G3zCLUFco1A1cvNrkQ@mail.gmail.com>
  2021-11-30  7:42 ` [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() Like Xu
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

The current pmc->eventsel for fixed counter is underutilised. The
pmc->eventsel can be setup for all known available fixed counters
since we have mapping between fixed pmc index and
the intel_arch_events array.

Either gp or fixed counter, it will simplify the later checks for
consistency between eventsel and perf_hw_id.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1b7456b2177b..b7ab5fd03681 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -459,6 +459,21 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return 1;
 }
 
+static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
+{
+	size_t size = ARRAY_SIZE(fixed_pmc_events);
+	struct kvm_pmc *pmc;
+	u32 event;
+	int i;
+
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+		pmc = &pmu->fixed_counters[i];
+		event = fixed_pmc_events[array_index_nospec(i, size)];
+		pmc->eventsel = (intel_arch_events[event].unit_mask << 8) |
+			intel_arch_events[event].eventsel;
+	}
+}
+
 static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -506,6 +521,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 			edx.split.bit_width_fixed, x86_pmu.bit_width_fixed);
 		pmu->counter_bitmask[KVM_PMC_FIXED] =
 			((u64)1 << edx.split.bit_width_fixed) - 1;
+		setup_fixed_pmc_eventsel(pmu);
 	}
 
 	pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) |
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
  2021-11-30  7:42 ` [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-09  3:52   ` Jim Mattson
  2022-02-05  1:55   ` Jim Mattson
  2021-11-30  7:42 ` [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() Like Xu
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

The find_arch_event() returns a "unsigned int" value,
which is used by the pmc_reprogram_counter() to
program a PERF_TYPE_HARDWARE type perf_event.

The returned value is actually the kernel defined gernic
perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
incoming parameters for better self-explanation.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           | 8 +-------
 arch/x86/kvm/pmu.h           | 3 +--
 arch/x86/kvm/svm/pmu.c       | 8 ++++----
 arch/x86/kvm/vmx/pmu_intel.c | 9 +++++----
 4 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 09873f6488f7..3b3ccf5b1106 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -174,7 +174,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
-	u8 event_select, unit_mask;
 	struct kvm *kvm = pmc->vcpu->kvm;
 	struct kvm_pmu_event_filter *filter;
 	int i;
@@ -206,17 +205,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	if (!allow_event)
 		return;
 
-	event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
-	unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
-
 	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
 			  ARCH_PERFMON_EVENTSEL_INV |
 			  ARCH_PERFMON_EVENTSEL_CMASK |
 			  HSW_IN_TX |
 			  HSW_IN_TX_CHECKPOINTED))) {
-		config = kvm_x86_ops.pmu_ops->find_arch_event(pmc_to_pmu(pmc),
-						      event_select,
-						      unit_mask);
+		config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
 		if (config != PERF_COUNT_HW_MAX)
 			type = PERF_TYPE_HARDWARE;
 	}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 59d6b76203d5..dd7dbb1c5048 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -24,8 +24,7 @@ struct kvm_event_hw_type_mapping {
 };
 
 struct kvm_pmu_ops {
-	unsigned (*find_arch_event)(struct kvm_pmu *pmu, u8 event_select,
-				    u8 unit_mask);
+	unsigned int (*pmc_perf_hw_id)(struct kvm_pmc *pmc);
 	unsigned (*find_fixed_event)(int idx);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 0cf05e4caa4c..fb0ce8cda8a7 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -138,10 +138,10 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
 	return &pmu->gp_counters[msr_to_index(msr)];
 }
 
-static unsigned amd_find_arch_event(struct kvm_pmu *pmu,
-				    u8 event_select,
-				    u8 unit_mask)
+static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
 {
+	u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
+	u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
@@ -323,7 +323,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 }
 
 struct kvm_pmu_ops amd_pmu_ops = {
-	.find_arch_event = amd_find_arch_event,
+	.pmc_perf_hw_id = amd_pmc_perf_hw_id,
 	.find_fixed_event = amd_find_fixed_event,
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b7ab5fd03681..67a0188ecdc5 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -68,10 +68,11 @@ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
 		reprogram_counter(pmu, bit);
 }
 
-static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
-				      u8 event_select,
-				      u8 unit_mask)
+static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
 {
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
+	u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
@@ -719,7 +720,7 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 }
 
 struct kvm_pmu_ops intel_pmu_ops = {
-	.find_arch_event = intel_find_arch_event,
+	.pmc_perf_hw_id = intel_pmc_perf_hw_id,
 	.find_fixed_event = intel_find_fixed_event,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
  2021-11-30  7:42 ` [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs Like Xu
  2021-11-30  7:42 ` [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-09  3:58   ` Jim Mattson
  2021-11-30  7:42 ` [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() Like Xu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Since we set the same semantic event value for the fixed counter in
pmc->eventsel, returning the perf_hw_id for the fixed counter via
find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
with the help of pmc_is_fixed() check.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c           |  2 +-
 arch/x86/kvm/pmu.h           |  1 -
 arch/x86/kvm/svm/pmu.c       | 11 ++++-------
 arch/x86/kvm/vmx/pmu_intel.c | 19 +++----------------
 4 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3b3ccf5b1106..b7a1ae28ab87 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -262,7 +262,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 
 	pmc->current_config = (u64)ctrl;
 	pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
-			      kvm_x86_ops.pmu_ops->find_fixed_event(idx),
+			      kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
 			      !(en_field & 0x2), /* exclude user */
 			      !(en_field & 0x1), /* exclude kernel */
 			      pmi, false, false);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index dd7dbb1c5048..c91d9725aafd 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -25,7 +25,6 @@ struct kvm_event_hw_type_mapping {
 
 struct kvm_pmu_ops {
 	unsigned int (*pmc_perf_hw_id)(struct kvm_pmc *pmc);
-	unsigned (*find_fixed_event)(int idx);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index fb0ce8cda8a7..12d8b301065a 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -144,6 +144,10 @@ static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
 	u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 	int i;
 
+	/* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
+	if (WARN_ON(pmc_is_fixed(pmc)))
+		return PERF_COUNT_HW_MAX;
+
 	for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
 		if (amd_event_mapping[i].eventsel == event_select
 		    && amd_event_mapping[i].unit_mask == unit_mask)
@@ -155,12 +159,6 @@ static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
 	return amd_event_mapping[i].event_type;
 }
 
-/* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
-static unsigned amd_find_fixed_event(int idx)
-{
-	return PERF_COUNT_HW_MAX;
-}
-
 /* check if a PMC is enabled by comparing it against global_ctrl bits. Because
  * AMD CPU doesn't have global_ctrl MSR, all PMCs are enabled (return TRUE).
  */
@@ -324,7 +322,6 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 
 struct kvm_pmu_ops amd_pmu_ops = {
 	.pmc_perf_hw_id = amd_pmc_perf_hw_id,
-	.find_fixed_event = amd_find_fixed_event,
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 67a0188ecdc5..ad0e53b0d7bf 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -76,9 +76,9 @@ static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
-		if (intel_arch_events[i].eventsel == event_select
-		    && intel_arch_events[i].unit_mask == unit_mask
-		    && (pmu->available_event_types & (1 << i)))
+		if (intel_arch_events[i].eventsel == event_select &&
+		    intel_arch_events[i].unit_mask == unit_mask &&
+		    (pmc_is_fixed(pmc) || pmu->available_event_types & (1 << i)))
 			break;
 
 	if (i == ARRAY_SIZE(intel_arch_events))
@@ -87,18 +87,6 @@ static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
 	return intel_arch_events[i].event_type;
 }
 
-static unsigned intel_find_fixed_event(int idx)
-{
-	u32 event;
-	size_t size = ARRAY_SIZE(fixed_pmc_events);
-
-	if (idx >= size)
-		return PERF_COUNT_HW_MAX;
-
-	event = fixed_pmc_events[array_index_nospec(idx, size)];
-	return intel_arch_events[event].event_type;
-}
-
 /* check if a PMC is enabled by comparing it with globl_ctrl bits. */
 static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
 {
@@ -721,7 +709,6 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 
 struct kvm_pmu_ops intel_pmu_ops = {
 	.pmc_perf_hw_id = intel_pmc_perf_hw_id,
-	.find_fixed_event = intel_find_fixed_event,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
                   ` (2 preceding siblings ...)
  2021-11-30  7:42 ` [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-09  4:25   ` Jim Mattson
  2021-11-30  7:42 ` [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions Like Xu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

From: Like Xu <likexu@tencent.com>

Depending on whether intr should be triggered or not, KVM registers
two different event overflow callbacks in the perf_event context.

The code skeleton of these two functions is very similar, so
the pmc->intr can be stored into pmc from pmc_reprogram_counter()
which provides smaller instructions footprint against the
u-architecture branch predictor.

The __kvm_perf_overflow() can be called in non-nmi contexts
and a flag is needed to distinguish the caller context and thus
avoid a check on kvm_is_in_guest(), otherwise we might get
warnings from suspicious RCU or check_preemption_disabled().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c              | 58 ++++++++++++++++-----------------
 2 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e41ad1ead721..6c2b2331ffeb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -495,6 +495,7 @@ struct kvm_pmc {
 	 */
 	u64 current_config;
 	bool is_paused;
+	bool intr;
 };
 
 struct kvm_pmu {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b7a1ae28ab87..a20207ee4014 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -55,43 +55,41 @@ static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
 	kvm_pmu_deliver_pmi(vcpu);
 }
 
-static void kvm_perf_overflow(struct perf_event *perf_event,
-			      struct perf_sample_data *data,
-			      struct pt_regs *regs)
+static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 {
-	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 
-	if (!test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) {
-		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
-		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
-	}
+	/* Ignore counters that have been reprogrammed already. */
+	if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
+		return;
+
+	__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
+	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
+
+	if (!pmc->intr)
+		return;
+
+	/*
+	 * Inject PMI. If vcpu was in a guest mode during NMI PMI
+	 * can be ejected on a guest mode re-entry. Otherwise we can't
+	 * be sure that vcpu wasn't executing hlt instruction at the
+	 * time of vmexit and is not going to re-enter guest mode until
+	 * woken up. So we should wake it, but this is impossible from
+	 * NMI context. Do it from irq work instead.
+	 */
+	if (in_pmi && !kvm_is_in_guest())
+		irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
+	else
+		kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
 }
 
-static void kvm_perf_overflow_intr(struct perf_event *perf_event,
-				   struct perf_sample_data *data,
-				   struct pt_regs *regs)
+static void kvm_perf_overflow(struct perf_event *perf_event,
+			      struct perf_sample_data *data,
+			      struct pt_regs *regs)
 {
 	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
-	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-	if (!test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) {
-		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
-		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 
-		/*
-		 * Inject PMI. If vcpu was in a guest mode during NMI PMI
-		 * can be ejected on a guest mode re-entry. Otherwise we can't
-		 * be sure that vcpu wasn't executing hlt instruction at the
-		 * time of vmexit and is not going to re-enter guest mode until
-		 * woken up. So we should wake it, but this is impossible from
-		 * NMI context. Do it from irq work instead.
-		 */
-		if (!kvm_is_in_guest())
-			irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
-		else
-			kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
-	}
+	__kvm_perf_overflow(pmc, true);
 }
 
 static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
@@ -126,7 +124,6 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	}
 
 	event = perf_event_create_kernel_counter(&attr, -1, current,
-						 intr ? kvm_perf_overflow_intr :
 						 kvm_perf_overflow, pmc);
 	if (IS_ERR(event)) {
 		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
@@ -138,6 +135,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	pmc_to_pmu(pmc)->event_count++;
 	clear_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
 	pmc->is_paused = false;
+	pmc->intr = intr;
 }
 
 static void pmc_pause_counter(struct kvm_pmc *pmc)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
                   ` (3 preceding siblings ...)
  2021-11-30  7:42 ` [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-09  4:33   ` Jim Mattson
  2021-11-30  7:42 ` [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions Like Xu
  2021-12-09 19:30 ` [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Paolo Bonzini
  6 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu, Peter Zijlstra

From: Like Xu <likexu@tencent.com>

When KVM retires a guest instruction through emulation, increment any
vPMCs that are configured to monitor "instructions retired," and
update the sample period of those counters so that they will overflow
at the right time.

Signed-off-by: Eric Hankland <ehankland@google.com>
[jmattson:
  - Split the code to increment "branch instructions retired" into a
    separate commit.
  - Added 'static' to kvm_pmu_incr_counter() definition.
  - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
    PERF_EVENT_STATE_ACTIVE.
]
Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
[likexu:
  - Drop checks for pmc->perf_event or event state or event type
  - Increase a counter once its umask bits and the first 8 select bits are matched
  - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
  - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
  - Add counter enable and CPL check for kvm_pmu_trigger_event();
]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/pmu.h |  1 +
 arch/x86/kvm/x86.c |  3 +++
 3 files changed, 64 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a20207ee4014..8abdadb7e22a 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -482,6 +482,66 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 	kvm_pmu_reset(vcpu);
 }
 
+static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	u64 prev_count;
+
+	prev_count = pmc->counter;
+	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
+
+	reprogram_counter(pmu, pmc->idx);
+	if (pmc->counter < prev_count)
+		__kvm_perf_overflow(pmc, false);
+}
+
+static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
+	unsigned int perf_hw_id)
+{
+	u64 old_eventsel = pmc->eventsel;
+	unsigned int config;
+
+	pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK);
+	config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
+	pmc->eventsel = old_eventsel;
+	return config == perf_hw_id;
+}
+
+static inline bool cpl_is_matched(struct kvm_pmc *pmc)
+{
+	bool select_os, select_user;
+	u64 config = pmc->current_config;
+
+	if (pmc_is_gp(pmc)) {
+		select_os = config & ARCH_PERFMON_EVENTSEL_OS;
+		select_user = config & ARCH_PERFMON_EVENTSEL_USR;
+	} else {
+		select_os = config & 0x1;
+		select_user = config & 0x2;
+	}
+
+	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
+}
+
+void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+	int i;
+
+	for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
+		pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+		if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
+			continue;
+
+		/* Ignore checks for edge detect, pin control, invert and CMASK bits */
+		if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc))
+			kvm_pmu_incr_counter(pmc);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
+
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_pmu_event_filter tmp, *filter;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index c91d9725aafd..7a7b8d5b775e 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -157,6 +157,7 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
+void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a05a26471f19..83371be00771 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7978,6 +7978,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	if (unlikely(!r))
 		return 0;
 
+	kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
+
 	/*
 	 * rflags is the old, "raw" value of the flags.  The new value has
 	 * not been saved yet.
@@ -8240,6 +8242,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
+			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
 			kvm_rip_write(vcpu, ctxt->eip);
 			if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
 				r = kvm_vcpu_do_singlestep(vcpu);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
                   ` (4 preceding siblings ...)
  2021-11-30  7:42 ` [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions Like Xu
@ 2021-11-30  7:42 ` Like Xu
  2021-12-09  4:40   ` Jim Mattson
  2021-12-09 19:30 ` [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Paolo Bonzini
  6 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-11-30  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

From: Jim Mattson <jmattson@google.com>

When KVM retires a guest branch instruction through emulation,
increment any vPMCs that are configured to monitor "branch
instructions retired," and update the sample period of those counters
so that they will overflow at the right time.

Signed-off-by: Eric Hankland <ehankland@google.com>
[jmattson:
  - Split the code to increment "branch instructions retired" into a
    separate commit.
  - Moved/consolidated the calls to kvm_pmu_trigger_event() in the
    emulation of VMLAUNCH/VMRESUME to accommodate the evolution of
    that code.
]
Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/emulate.c     | 55 +++++++++++++++++++++-----------------
 arch/x86/kvm/kvm_emulate.h |  1 +
 arch/x86/kvm/vmx/nested.c  |  7 +++--
 arch/x86/kvm/x86.c         |  2 ++
 4 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 28b1a4e57827..166a145fc1e6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -175,6 +175,7 @@
 #define No16	    ((u64)1 << 53)  /* No 16 bit operand */
 #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
+#define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -191,8 +192,9 @@
 #define FASTOP_SIZE 8
 
 struct opcode {
-	u64 flags : 56;
-	u64 intercept : 8;
+	u64 flags;
+	u8 intercept;
+	u8 pad[7];
 	union {
 		int (*execute)(struct x86_emulate_ctxt *ctxt);
 		const struct opcode *group;
@@ -4364,10 +4366,10 @@ static const struct opcode group4[] = {
 static const struct opcode group5[] = {
 	F(DstMem | SrcNone | Lock,		em_inc),
 	F(DstMem | SrcNone | Lock,		em_dec),
-	I(SrcMem | NearBranch,			em_call_near_abs),
-	I(SrcMemFAddr | ImplicitOps,		em_call_far),
-	I(SrcMem | NearBranch,			em_jmp_abs),
-	I(SrcMemFAddr | ImplicitOps,		em_jmp_far),
+	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
+	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
+	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
+	I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
 	I(SrcMem | Stack | TwoMemOp,		em_push), D(Undefined),
 };
 
@@ -4577,7 +4579,7 @@ static const struct opcode opcode_table[256] = {
 	I2bvIP(DstDI | SrcDX | Mov | String | Unaligned, em_in, ins, check_perm_in), /* insb, insw/insd */
 	I2bvIP(SrcSI | DstDX | String, em_out, outs, check_perm_out), /* outsb, outsw/outsd */
 	/* 0x70 - 0x7F */
-	X16(D(SrcImmByte | NearBranch)),
+	X16(D(SrcImmByte | NearBranch | IsBranch)),
 	/* 0x80 - 0x87 */
 	G(ByteOp | DstMem | SrcImm, group1),
 	G(DstMem | SrcImm, group1),
@@ -4596,7 +4598,7 @@ static const struct opcode opcode_table[256] = {
 	DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)),
 	/* 0x98 - 0x9F */
 	D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
-	I(SrcImmFAddr | No64, em_call_far), N,
+	I(SrcImmFAddr | No64 | IsBranch, em_call_far), N,
 	II(ImplicitOps | Stack, em_pushf, pushf),
 	II(ImplicitOps | Stack, em_popf, popf),
 	I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf),
@@ -4616,17 +4618,19 @@ static const struct opcode opcode_table[256] = {
 	X8(I(DstReg | SrcImm64 | Mov, em_mov)),
 	/* 0xC0 - 0xC7 */
 	G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
-	I(ImplicitOps | NearBranch | SrcImmU16, em_ret_near_imm),
-	I(ImplicitOps | NearBranch, em_ret),
+	I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
+	I(ImplicitOps | NearBranch | IsBranch, em_ret),
 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
 	I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
 	G(ByteOp, group11), G(0, group11),
 	/* 0xC8 - 0xCF */
-	I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave),
-	I(ImplicitOps | SrcImmU16, em_ret_far_imm),
-	I(ImplicitOps, em_ret_far),
-	D(ImplicitOps), DI(SrcImmByte, intn),
-	D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret),
+	I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter),
+	I(Stack | IsBranch, em_leave),
+	I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
+	I(ImplicitOps | IsBranch, em_ret_far),
+	D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+	D(ImplicitOps | No64 | IsBranch),
+	II(ImplicitOps | IsBranch, em_iret, iret),
 	/* 0xD0 - 0xD7 */
 	G(Src2One | ByteOp, group2), G(Src2One, group2),
 	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4637,14 +4641,15 @@ static const struct opcode opcode_table[256] = {
 	/* 0xD8 - 0xDF */
 	N, E(0, &escape_d9), N, E(0, &escape_db), N, E(0, &escape_dd), N, N,
 	/* 0xE0 - 0xE7 */
-	X3(I(SrcImmByte | NearBranch, em_loop)),
-	I(SrcImmByte | NearBranch, em_jcxz),
+	X3(I(SrcImmByte | NearBranch | IsBranch, em_loop)),
+	I(SrcImmByte | NearBranch | IsBranch, em_jcxz),
 	I2bvIP(SrcImmUByte | DstAcc, em_in,  in,  check_perm_in),
 	I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
 	/* 0xE8 - 0xEF */
-	I(SrcImm | NearBranch, em_call), D(SrcImm | ImplicitOps | NearBranch),
-	I(SrcImmFAddr | No64, em_jmp_far),
-	D(SrcImmByte | ImplicitOps | NearBranch),
+	I(SrcImm | NearBranch | IsBranch, em_call),
+	D(SrcImm | ImplicitOps | NearBranch | IsBranch),
+	I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
+	D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
 	I2bvIP(SrcDX | DstAcc, em_in,  in,  check_perm_in),
 	I2bvIP(SrcAcc | DstDX, em_out, out, check_perm_out),
 	/* 0xF0 - 0xF7 */
@@ -4660,7 +4665,7 @@ static const struct opcode opcode_table[256] = {
 static const struct opcode twobyte_table[256] = {
 	/* 0x00 - 0x0F */
 	G(0, group6), GD(0, &group7), N, N,
-	N, I(ImplicitOps | EmulateOnUD, em_syscall),
+	N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
 	II(ImplicitOps | Priv, em_clts, clts), N,
 	DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
 	N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4691,8 +4696,8 @@ static const struct opcode twobyte_table[256] = {
 	IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
 	II(ImplicitOps | Priv, em_rdmsr, rdmsr),
 	IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
-	I(ImplicitOps | EmulateOnUD, em_sysenter),
-	I(ImplicitOps | Priv | EmulateOnUD, em_sysexit),
+	I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
+	I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
 	N, N,
 	N, N, N, N, N, N, N, N,
 	/* 0x40 - 0x4F */
@@ -4710,7 +4715,7 @@ static const struct opcode twobyte_table[256] = {
 	N, N, N, N,
 	N, N, N, GP(SrcReg | DstMem | ModRM | Mov, &pfx_0f_6f_0f_7f),
 	/* 0x80 - 0x8F */
-	X16(D(SrcImm | NearBranch)),
+	X16(D(SrcImm | NearBranch | IsBranch)),
 	/* 0x90 - 0x9F */
 	X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)),
 	/* 0xA0 - 0xA7 */
@@ -5224,6 +5229,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 		ctxt->d |= opcode.flags;
 	}
 
+	ctxt->is_branch = opcode.flags & IsBranch;
+
 	/* Unrecognised? */
 	if (ctxt->d == 0)
 		return EMULATION_FAILED;
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 68b420289d7e..39eded2426ff 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -369,6 +369,7 @@ struct x86_emulate_ctxt {
 	struct fetch_cache fetch;
 	struct read_cache io_read;
 	struct read_cache mem_read;
+	bool is_branch;
 };
 
 /* Repeat String Operation Prefix */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 95f3823b3a9d..6e05d36aedd5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3504,10 +3504,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (evmptrld_status == EVMPTRLD_ERROR) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
 		return 1;
-	} else if (CC(evmptrld_status == EVMPTRLD_VMFAIL)) {
-		return nested_vmx_failInvalid(vcpu);
 	}
 
+	kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
+
+	if (CC(evmptrld_status == EVMPTRLD_VMFAIL))
+		return nested_vmx_failInvalid(vcpu);
+
 	if (CC(!evmptr_is_valid(vmx->nested.hv_evmcs_vmptr) &&
 	       vmx->nested.current_vmptr == INVALID_GPA))
 		return nested_vmx_failInvalid(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83371be00771..c28226dcbe4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8243,6 +8243,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
 			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
+			if (ctxt->is_branch)
+				kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
 			kvm_rip_write(vcpu, ctxt->eip);
 			if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
 				r = kvm_vcpu_do_singlestep(vcpu);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-11-30  7:42 ` [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs Like Xu
@ 2021-12-06 19:57   ` Jim Mattson
  2021-12-09 18:49     ` Paolo Bonzini
       [not found]   ` <CALMp9eTq8H_bJOVKwi_7j3Kum9RvW6o-G3zCLUFco1A1cvNrkQ@mail.gmail.com>
  1 sibling, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-06 19:57 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> The current pmc->eventsel for fixed counter is underutilised. The
> pmc->eventsel can be setup for all known available fixed counters
> since we have mapping between fixed pmc index and
> the intel_arch_events array.
>
> Either gp or fixed counter, it will simplify the later checks for
> consistency between eventsel and perf_hw_id.
>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 1b7456b2177b..b7ab5fd03681 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -459,6 +459,21 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>         return 1;
>  }
>
> +static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
> +{
> +       size_t size = ARRAY_SIZE(fixed_pmc_events);
> +       struct kvm_pmc *pmc;
> +       u32 event;
> +       int i;
> +
> +       for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +               pmc = &pmu->fixed_counters[i];
> +               event = fixed_pmc_events[array_index_nospec(i, size)];
How do we know that i < size? For example, Ice Lake supports 4 fixed
counters, but fixed_pmc_events only has three entries.
> +               pmc->eventsel = (intel_arch_events[event].unit_mask << 8) |
> +                       intel_arch_events[event].eventsel;
> +       }
> +}
> +
[Every now and then, gmail likes to revert your plain text setting,
just to keep you on your toes!]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
       [not found]   ` <CALMp9eTq8H_bJOVKwi_7j3Kum9RvW6o-G3zCLUFco1A1cvNrkQ@mail.gmail.com>
@ 2021-12-07  6:07     ` Like Xu
  2021-12-07 17:42       ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-12-07  6:07 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 7/12/2021 3:50 am, Jim Mattson wrote:
> On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
> 
>> From: Like Xu <likexu@tencent.com>
>>
>> The current pmc->eventsel for fixed counter is underutilised. The
>> pmc->eventsel can be setup for all known available fixed counters
>> since we have mapping between fixed pmc index and
>> the intel_arch_events array.
>>
>> Either gp or fixed counter, it will simplify the later checks for
>> consistency between eventsel and perf_hw_id.
>>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
>>   arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 1b7456b2177b..b7ab5fd03681 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -459,6 +459,21 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>          return 1;
>>   }
>>
>> +static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
>> +{
>> +       size_t size = ARRAY_SIZE(fixed_pmc_events);
>> +       struct kvm_pmc *pmc;
>> +       u32 event;
>> +       int i;
>> +
>> +       for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
>> +               pmc = &pmu->fixed_counters[i];
>> +               event = fixed_pmc_events[array_index_nospec(i, size)];
>>
> 
> How do we know that i < size? For example, Ice Lake supports 4
> fixed counters, but fixed_pmc_events only has three entries.

With the help of macro MAX_FIXED_COUNTERS,
the fourth or more fixed counter is currently not supported in KVM.

If the user space sets a super set of CPUID supported by KVM,
any pmu emulation failure is to be expected, right ?

Waiting for more comments from you on this patch set.

> 
> 
>> +               pmc->eventsel = (intel_arch_events[event].unit_mask << 8) |
>> +                       intel_arch_events[event].eventsel;
>> +       }
>> +}
>> +
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-12-07  6:07     ` Like Xu
@ 2021-12-07 17:42       ` Jim Mattson
  0 siblings, 0 replies; 45+ messages in thread
From: Jim Mattson @ 2021-12-07 17:42 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Dec 6, 2021 at 10:08 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 7/12/2021 3:50 am, Jim Mattson wrote:
> > On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
> >
> >> From: Like Xu <likexu@tencent.com>
> >>
> >> The current pmc->eventsel for fixed counter is underutilised. The
> >> pmc->eventsel can be setup for all known available fixed counters
> >> since we have mapping between fixed pmc index and
> >> the intel_arch_events array.
> >>
> >> Either gp or fixed counter, it will simplify the later checks for
> >> consistency between eventsel and perf_hw_id.
> >>
> >> Signed-off-by: Like Xu <likexu@tencent.com>
> >> ---
> >>   arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++
> >>   1 file changed, 16 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> >> index 1b7456b2177b..b7ab5fd03681 100644
> >> --- a/arch/x86/kvm/vmx/pmu_intel.c
> >> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> >> @@ -459,6 +459,21 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
> >> struct msr_data *msr_info)
> >>          return 1;
> >>   }
> >>
> >> +static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
> >> +{
> >> +       size_t size = ARRAY_SIZE(fixed_pmc_events);
> >> +       struct kvm_pmc *pmc;
> >> +       u32 event;
> >> +       int i;
> >> +
> >> +       for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> >> +               pmc = &pmu->fixed_counters[i];
> >> +               event = fixed_pmc_events[array_index_nospec(i, size)];
> >>
> >
> > How do we know that i < size? For example, Ice Lake supports 4
> > fixed counters, but fixed_pmc_events only has three entries.
>
> With the help of macro MAX_FIXED_COUNTERS,
> the fourth or more fixed counter is currently not supported in KVM.

Thanks for the hint. I see it now.

> If the user space sets a super set of CPUID supported by KVM,
> any pmu emulation failure is to be expected, right ?

Actually, I would expect a misconfigured VM to elicit an error. I
don't see the advantage of mis-emulating an unsupported configuration.
But maybe that's just me.

> Waiting for more comments from you on this patch set.

I'll try to get to them this week. Thanks for following up while I was
on holiday.

> >
> >
> >> +               pmc->eventsel = (intel_arch_events[event].unit_mask << 8) |
> >> +                       intel_arch_events[event].eventsel;
> >> +       }
> >> +}
> >> +
> >>
> >>
> >>
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2021-11-30  7:42 ` [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() Like Xu
@ 2021-12-09  3:52   ` Jim Mattson
  2021-12-09  8:11     ` Like Xu
  2021-12-09 19:24     ` Paolo Bonzini
  2022-02-05  1:55   ` Jim Mattson
  1 sibling, 2 replies; 45+ messages in thread
From: Jim Mattson @ 2021-12-09  3:52 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> The find_arch_event() returns a "unsigned int" value,
> which is used by the pmc_reprogram_counter() to
> program a PERF_TYPE_HARDWARE type perf_event.
>
> The returned value is actually the kernel defined generic

Typo: generic.

> perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
> incoming parameters for better self-explanation.
>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/pmu.c           | 8 +-------
>  arch/x86/kvm/pmu.h           | 3 +--
>  arch/x86/kvm/svm/pmu.c       | 8 ++++----
>  arch/x86/kvm/vmx/pmu_intel.c | 9 +++++----
>  4 files changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 09873f6488f7..3b3ccf5b1106 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -174,7 +174,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
>  void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>  {
>         unsigned config, type = PERF_TYPE_RAW;
> -       u8 event_select, unit_mask;
>         struct kvm *kvm = pmc->vcpu->kvm;
>         struct kvm_pmu_event_filter *filter;
>         int i;
> @@ -206,17 +205,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>         if (!allow_event)
>                 return;
>
> -       event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> -       unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
> -
>         if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>                           ARCH_PERFMON_EVENTSEL_INV |
>                           ARCH_PERFMON_EVENTSEL_CMASK |
>                           HSW_IN_TX |
>                           HSW_IN_TX_CHECKPOINTED))) {

The mechanics of the change look fine, but I do have some questions,
for my own understanding.

Why don't we just use PERF_TYPE_RAW for guest counters all of the
time? What is the advantage of matching entries in a table so that we
can use PERF_TYPE_HARDWARE?

Why do the HSW_IN_TX* bits result in bypassing this clause, when these
bits are extracted as arguments to pmc_reprogram_counter below?

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()
  2021-11-30  7:42 ` [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() Like Xu
@ 2021-12-09  3:58   ` Jim Mattson
  0 siblings, 0 replies; 45+ messages in thread
From: Jim Mattson @ 2021-12-09  3:58 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> Since we set the same semantic event value for the fixed counter in
> pmc->eventsel, returning the perf_hw_id for the fixed counter via
> find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
> with the help of pmc_is_fixed() check.
>
> Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-11-30  7:42 ` [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() Like Xu
@ 2021-12-09  4:25   ` Jim Mattson
  2021-12-09  8:28     ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-09  4:25 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> Depending on whether intr should be triggered or not, KVM registers
> two different event overflow callbacks in the perf_event context.
>
> The code skeleton of these two functions is very similar, so
> the pmc->intr can be stored into pmc from pmc_reprogram_counter()
> which provides smaller instructions footprint against the
> u-architecture branch predictor.
>
> The __kvm_perf_overflow() can be called in non-nmi contexts
> and a flag is needed to distinguish the caller context and thus
> avoid a check on kvm_is_in_guest(), otherwise we might get
> warnings from suspicious RCU or check_preemption_disabled().
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/pmu.c              | 58 ++++++++++++++++-----------------
>  2 files changed, 29 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e41ad1ead721..6c2b2331ffeb 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -495,6 +495,7 @@ struct kvm_pmc {
>          */
>         u64 current_config;
>         bool is_paused;
> +       bool intr;
>  };
>
>  struct kvm_pmu {
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index b7a1ae28ab87..a20207ee4014 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -55,43 +55,41 @@ static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
>         kvm_pmu_deliver_pmi(vcpu);
>  }
>
> -static void kvm_perf_overflow(struct perf_event *perf_event,
> -                             struct perf_sample_data *data,
> -                             struct pt_regs *regs)
> +static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
>  {
> -       struct kvm_pmc *pmc = perf_event->overflow_handler_context;
>         struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>
> -       if (!test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) {
> -               __set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
> -               kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
> -       }
> +       /* Ignore counters that have been reprogrammed already. */
> +       if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
> +               return;
> +
> +       __set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
> +       kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
> +
> +       if (!pmc->intr)
> +               return;
> +
> +       /*
> +        * Inject PMI. If vcpu was in a guest mode during NMI PMI
> +        * can be ejected on a guest mode re-entry. Otherwise we can't
> +        * be sure that vcpu wasn't executing hlt instruction at the
> +        * time of vmexit and is not going to re-enter guest mode until
> +        * woken up. So we should wake it, but this is impossible from
> +        * NMI context. Do it from irq work instead.
> +        */
> +       if (in_pmi && !kvm_is_in_guest())
> +               irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
> +       else
> +               kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
>  }
>
> -static void kvm_perf_overflow_intr(struct perf_event *perf_event,
> -                                  struct perf_sample_data *data,
> -                                  struct pt_regs *regs)
> +static void kvm_perf_overflow(struct perf_event *perf_event,
> +                             struct perf_sample_data *data,
> +                             struct pt_regs *regs)
>  {
>         struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> -       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> -
> -       if (!test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) {
> -               __set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
> -               kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
>
> -               /*
> -                * Inject PMI. If vcpu was in a guest mode during NMI PMI
> -                * can be ejected on a guest mode re-entry. Otherwise we can't
> -                * be sure that vcpu wasn't executing hlt instruction at the
> -                * time of vmexit and is not going to re-enter guest mode until
> -                * woken up. So we should wake it, but this is impossible from
> -                * NMI context. Do it from irq work instead.
> -                */
> -               if (!kvm_is_in_guest())
> -                       irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
> -               else
> -                       kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
> -       }
> +       __kvm_perf_overflow(pmc, true);
>  }
>
>  static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> @@ -126,7 +124,6 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>         }
>
>         event = perf_event_create_kernel_counter(&attr, -1, current,
> -                                                intr ? kvm_perf_overflow_intr :
>                                                  kvm_perf_overflow, pmc);

Not your change, but if the event is counting anything based on
cycles, and the guest TSC is scaled to run at a different rate from
the host TSC, doesn't the initial value of the underlying hardware
counter have to be adjusted as well, so that the interrupt arrives
when the guest's counter overflows rather than when the host's counter
overflows?

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions
  2021-11-30  7:42 ` [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions Like Xu
@ 2021-12-09  4:33   ` Jim Mattson
  2021-12-09  8:44     ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-09  4:33 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu, Peter Zijlstra

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> When KVM retires a guest instruction through emulation, increment any
> vPMCs that are configured to monitor "instructions retired," and
> update the sample period of those counters so that they will overflow
> at the right time.
>
> Signed-off-by: Eric Hankland <ehankland@google.com>
> [jmattson:
>   - Split the code to increment "branch instructions retired" into a
>     separate commit.
>   - Added 'static' to kvm_pmu_incr_counter() definition.
>   - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
>     PERF_EVENT_STATE_ACTIVE.
> ]
> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> [likexu:
>   - Drop checks for pmc->perf_event or event state or event type
>   - Increase a counter once its umask bits and the first 8 select bits are matched
>   - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
>   - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
>   - Add counter enable and CPL check for kvm_pmu_trigger_event();
> ]
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---

> +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
> +{
> +       struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +       struct kvm_pmc *pmc;
> +       int i;
> +
> +       for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
> +               pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
> +
> +               if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
> +                       continue;
> +
> +               /* Ignore checks for edge detect, pin control, invert and CMASK bits */

I don't understand how we can ignore these checks. Doesn't that
violate the architectural specification?

> +               if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc))
> +                       kvm_pmu_incr_counter(pmc);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
> +

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions
  2021-11-30  7:42 ` [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions Like Xu
@ 2021-12-09  4:40   ` Jim Mattson
  0 siblings, 0 replies; 45+ messages in thread
From: Jim Mattson @ 2021-12-09  4:40 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Jim Mattson <jmattson@google.com>
>
> When KVM retires a guest branch instruction through emulation,
> increment any vPMCs that are configured to monitor "branch
> instructions retired," and update the sample period of those counters
> so that they will overflow at the right time.
>
> Signed-off-by: Eric Hankland <ehankland@google.com>
> [jmattson:
>   - Split the code to increment "branch instructions retired" into a
>     separate commit.
>   - Moved/consolidated the calls to kvm_pmu_trigger_event() in the
>     emulation of VMLAUNCH/VMRESUME to accommodate the evolution of
>     that code.
> ]
> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
> Signed-off-by: Jim Mattson <jmattson@google.com>

I'm not sure if this is appropriate, but...

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2021-12-09  3:52   ` Jim Mattson
@ 2021-12-09  8:11     ` Like Xu
  2021-12-09 19:24     ` Paolo Bonzini
  1 sibling, 0 replies; 45+ messages in thread
From: Like Xu @ 2021-12-09  8:11 UTC (permalink / raw)
  To: Jim Mattson, Peter Zijlstra (Intel OTC, Netherlander)
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 9/12/2021 11:52 am, Jim Mattson wrote:
> On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> The find_arch_event() returns a "unsigned int" value,
>> which is used by the pmc_reprogram_counter() to
>> program a PERF_TYPE_HARDWARE type perf_event.
>>
>> The returned value is actually the kernel defined generic
> 
> Typo: generic.
> 
>> perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
>> incoming parameters for better self-explanation.
>>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
>>   arch/x86/kvm/pmu.c           | 8 +-------
>>   arch/x86/kvm/pmu.h           | 3 +--
>>   arch/x86/kvm/svm/pmu.c       | 8 ++++----
>>   arch/x86/kvm/vmx/pmu_intel.c | 9 +++++----
>>   4 files changed, 11 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 09873f6488f7..3b3ccf5b1106 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -174,7 +174,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
>>   void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>>   {
>>          unsigned config, type = PERF_TYPE_RAW;
>> -       u8 event_select, unit_mask;
>>          struct kvm *kvm = pmc->vcpu->kvm;
>>          struct kvm_pmu_event_filter *filter;
>>          int i;
>> @@ -206,17 +205,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>>          if (!allow_event)
>>                  return;
>>
>> -       event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
>> -       unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>> -
>>          if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>>                            ARCH_PERFMON_EVENTSEL_INV |
>>                            ARCH_PERFMON_EVENTSEL_CMASK |
>>                            HSW_IN_TX |
>>                            HSW_IN_TX_CHECKPOINTED))) {
> 
> The mechanics of the change look fine, but I do have some questions,
> for my own understanding.
> 
> Why don't we just use PERF_TYPE_RAW for guest counters all of the
> time? What is the advantage of matching entries in a table so that we
> can use PERF_TYPE_HARDWARE?

The first reason is we need PERF_TYPE_HARDWARE for fixed counters.

And then we might wonder whether we can create perf-event faster
using PERF_TYPE_HARDWARE compared to PERF_TYPE_HARDWARE.

But the (current) answer is no, and probably the opposite:

# The cost (nanosecond) of calling perf_event_create_kernel_counter()
PERF_TYPE_RAW
Max= 1072211
Min= 11122
Avg= 41681.7

PERF_TYPE_HARDWARE
Max= 46184215
Min= 16194
Avg= 250650

So why don't we just use PERF_TYPE_RAW for just all gp counters ?

Hi Peter, do you have any comments to invalidate this proposal ?

> 
> Why do the HSW_IN_TX* bits result in bypassing this clause, when these
> bits are extracted as arguments to pmc_reprogram_counter below?

Once upon the time, the "PERF_TYPE_RAW" was introduced in the
perf with comment "available TYPE space, raw is the max value",
which means, per my understanding, it's our final type choice
for creating a valid perf_event when HSW_IN_TX* bits are set
and KVM needs to hack other perf_event_attr stuff for this
HSW_IN_TX feature with the help of extracted arguments.

> 
> Reviewed-by: Jim Mattson <jmattson@google.com>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-09  4:25   ` Jim Mattson
@ 2021-12-09  8:28     ` Like Xu
  2021-12-10  0:54       ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-12-09  8:28 UTC (permalink / raw)
  To: Jim Mattson, Andi Kleen, Kim Phillips
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 9/12/2021 12:25 pm, Jim Mattson wrote:
> 
> Not your change, but if the event is counting anything based on
> cycles, and the guest TSC is scaled to run at a different rate from
> the host TSC, doesn't the initial value of the underlying hardware
> counter have to be adjusted as well, so that the interrupt arrives
> when the guest's counter overflows rather than when the host's counter
> overflows?

I've thought about this issue too and at least the Intel Specification
did not let me down on this detail:

	"The counter changes in the VMX non-root mode will follow
	VMM's use of the TSC offset or TSC scaling VMX controls"

Not knowing if AMD or the real world hardware
will live up to this expectation and I'm pessimistic.

cc Andi and Kim.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions
  2021-12-09  4:33   ` Jim Mattson
@ 2021-12-09  8:44     ` Like Xu
  2021-12-09  9:23       ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-12-09  8:44 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu, Peter Zijlstra

On 9/12/2021 12:33 pm, Jim Mattson wrote:
> On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> When KVM retires a guest instruction through emulation, increment any
>> vPMCs that are configured to monitor "instructions retired," and
>> update the sample period of those counters so that they will overflow
>> at the right time.
>>
>> Signed-off-by: Eric Hankland <ehankland@google.com>
>> [jmattson:
>>    - Split the code to increment "branch instructions retired" into a
>>      separate commit.
>>    - Added 'static' to kvm_pmu_incr_counter() definition.
>>    - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
>>      PERF_EVENT_STATE_ACTIVE.
>> ]
>> Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
>> Signed-off-by: Jim Mattson <jmattson@google.com>
>> [likexu:
>>    - Drop checks for pmc->perf_event or event state or event type
>>    - Increase a counter once its umask bits and the first 8 select bits are matched
>>    - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
>>    - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
>>    - Add counter enable and CPL check for kvm_pmu_trigger_event();
>> ]
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
> 
>> +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
>> +{
>> +       struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +       struct kvm_pmc *pmc;
>> +       int i;
>> +
>> +       for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
>> +               pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
>> +
>> +               if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
>> +                       continue;
>> +
>> +               /* Ignore checks for edge detect, pin control, invert and CMASK bits */
> 
> I don't understand how we can ignore these checks. Doesn't that
> violate the architectural specification?

OK, let's take a conservative approach in the V3.

> 
>> +               if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc))
>> +                       kvm_pmu_incr_counter(pmc);
>> +       }
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
>> +
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions
  2021-12-09  8:44     ` Like Xu
@ 2021-12-09  9:23       ` Like Xu
  0 siblings, 0 replies; 45+ messages in thread
From: Like Xu @ 2021-12-09  9:23 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu, Peter Zijlstra

>>> +               /* Ignore checks for edge detect, pin control, invert and 
>>> CMASK bits */
>>
>> I don't understand how we can ignore these checks. Doesn't that
>> violate the architectural specification?
> 
> OK, let's take a conservative approach in the V3.
> 

Hi Jim, does this version look good to you ?

---

 From 4ad42d98ce26d324fa2f72c38fe2c42fe04f2d6d Mon Sep 17 00:00:00 2001
From: Like Xu <likexu@tencent.com>
Date: Tue, 30 Nov 2021 15:42:20 +0800
Subject: [PATCH 5/6] KVM: x86: Update vPMCs when retiring instructions

From: Like Xu <likexu@tencent.com>

When KVM retires a guest instruction through emulation, increment any
vPMCs that are configured to monitor "instructions retired," and
update the sample period of those counters so that they will overflow
at the right time.

Signed-off-by: Eric Hankland <ehankland@google.com>
[jmattson:
   - Split the code to increment "branch instructions retired" into a
     separate commit.
   - Added 'static' to kvm_pmu_incr_counter() definition.
   - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
     PERF_EVENT_STATE_ACTIVE.
]
Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
[likexu:
   - Drop checks for pmc->perf_event or event state or event type
   - Increase a counter only its umask bits and the first 8 select bits are matched
   - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
   - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
   - Add counter enable and CPL check for kvm_pmu_trigger_event();
]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
---
  arch/x86/kvm/pmu.c | 73 ++++++++++++++++++++++++++++++++++++++++++----
  arch/x86/kvm/pmu.h |  1 +
  arch/x86/kvm/x86.c |  3 ++
  3 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a20207ee4014..db510dae3241 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -22,6 +22,14 @@
  /* This is enough to filter the vast majority of currently defined events. */
  #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300

+#define PMC_EVENTSEL_ARCH_MASK \
+	(ARCH_PERFMON_EVENTSEL_EVENT | \
+	 ARCH_PERFMON_EVENTSEL_UMASK | \
+	 ARCH_PERFMON_EVENTSEL_USR | \
+	 ARCH_PERFMON_EVENTSEL_OS | \
+	 ARCH_PERFMON_EVENTSEL_INT | \
+	 ARCH_PERFMON_EVENTSEL_ENABLE)
+
  /* NOTE:
   * - Each perf counter is defined as "struct kvm_pmc";
   * - There are two types of perf counters: general purpose (gp) and fixed.
@@ -203,11 +211,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
  	if (!allow_event)
  		return;

-	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
-			  ARCH_PERFMON_EVENTSEL_INV |
-			  ARCH_PERFMON_EVENTSEL_CMASK |
-			  HSW_IN_TX |
-			  HSW_IN_TX_CHECKPOINTED))) {
+	if (!(eventsel & ~PMC_EVENTSEL_ARCH_MASK)) {
  		config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
  		if (config != PERF_COUNT_HW_MAX)
  			type = PERF_TYPE_HARDWARE;
@@ -482,6 +486,65 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
  	kvm_pmu_reset(vcpu);
  }

+static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	u64 prev_count;
+
+	prev_count = pmc->counter;
+	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
+
+	reprogram_counter(pmu, pmc->idx);
+	if (pmc->counter < prev_count)
+		__kvm_perf_overflow(pmc, false);
+}
+
+static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
+	unsigned int perf_hw_id)
+{
+	if (pmc->eventsel & ~PMC_EVENTSEL_ARCH_MASK)
+		return false;
+
+	return kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc) == perf_hw_id;
+}
+
+static inline bool cpl_is_matched(struct kvm_pmc *pmc)
+{
+	bool select_os, select_user;
+	u64 config = pmc->current_config;
+
+	if (pmc_is_gp(pmc)) {
+		select_os = config & ARCH_PERFMON_EVENTSEL_OS;
+		select_user = config & ARCH_PERFMON_EVENTSEL_USR;
+	} else {
+		select_os = config & 0x1;
+		select_user = config & 0x2;
+	}
+
+	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
+}
+
+void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+	int i;
+
+	if (!pmu->version)
+		return;
+
+	for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
+		pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+		if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
+			continue;
+
+		if (eventsel_match_perf_hw_id(pmc, perf_hw_id) && cpl_is_matched(pmc))
+			kvm_pmu_incr_counter(pmc);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
+
  int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
  {
  	struct kvm_pmu_event_filter tmp, *filter;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index c91d9725aafd..7a7b8d5b775e 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -157,6 +157,7 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu);
  void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
  void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
  int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
+void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id);

  bool is_vmware_backdoor_pmc(u32 pmc_idx);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1aaf37e1bd0f..68b65e243eb3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7980,6 +7980,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
  	if (unlikely(!r))
  		return 0;

+	kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
+
  	/*
  	 * rflags is the old, "raw" value of the flags.  The new value has
  	 * not been saved yet.
@@ -8242,6 +8244,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
  		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
  		if (!ctxt->have_exception ||
  		    exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
+			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
  			kvm_rip_write(vcpu, ctxt->eip);
  			if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
  				r = kvm_vcpu_do_singlestep(vcpu);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-12-06 19:57   ` Jim Mattson
@ 2021-12-09 18:49     ` Paolo Bonzini
  2021-12-09 18:53       ` Jim Mattson
  2021-12-10 10:20       ` Like Xu
  0 siblings, 2 replies; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-09 18:49 UTC (permalink / raw)
  To: Jim Mattson, Like Xu
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm, linux-kernel, Like Xu

On 12/6/21 20:57, Jim Mattson wrote:
>> +
>> +       for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
>> +               pmc = &pmu->fixed_counters[i];
>> +               event = fixed_pmc_events[array_index_nospec(i, size)];
> How do we know that i < size? For example, Ice Lake supports 4 fixed
> counters, but fixed_pmc_events only has three entries.

We don't, and it's a preexisting bug in intel_pmu_refresh.  Either we hack around it like

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1b7456b2177b..6f03c8bf1bc2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -500,8 +500,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
  		pmu->nr_arch_fixed_counters = 0;
  	} else {
  		pmu->nr_arch_fixed_counters =
-			min_t(int, edx.split.num_counters_fixed,
-			      x86_pmu.num_counters_fixed);
+			min3(ARRAY_SIZE(fixed_pmc_events),
+			     (size_t) edx.split.num_counters_fixed,
+			     (size_t) x86_pmu.num_counters_fixed);
  		edx.split.bit_width_fixed = min_t(int,
  			edx.split.bit_width_fixed, x86_pmu.bit_width_fixed);
  		pmu->counter_bitmask[KVM_PMC_FIXED] =

or we modify find_fixed_event and its caller to support PERF_TYPE_RAW
counters, and then add support for the IceLake TOPDOWN.SLOTS fixed
counter.

What's your preference?

Paolo

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-12-09 18:49     ` Paolo Bonzini
@ 2021-12-09 18:53       ` Jim Mattson
  2021-12-09 18:57         ` Paolo Bonzini
  2021-12-10 10:20       ` Like Xu
  1 sibling, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-09 18:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Like Xu, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Thu, Dec 9, 2021 at 10:49 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/6/21 20:57, Jim Mattson wrote:
> >> +
> >> +       for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> >> +               pmc = &pmu->fixed_counters[i];
> >> +               event = fixed_pmc_events[array_index_nospec(i, size)];
> > How do we know that i < size? For example, Ice Lake supports 4 fixed
> > counters, but fixed_pmc_events only has three entries.
>
> We don't, and it's a preexisting bug in intel_pmu_refresh.  Either we hack around it like
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 1b7456b2177b..6f03c8bf1bc2 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -500,8 +500,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>                 pmu->nr_arch_fixed_counters = 0;
>         } else {
>                 pmu->nr_arch_fixed_counters =
> -                       min_t(int, edx.split.num_counters_fixed,
> -                             x86_pmu.num_counters_fixed);
> +                       min3(ARRAY_SIZE(fixed_pmc_events),
> +                            (size_t) edx.split.num_counters_fixed,
> +                            (size_t) x86_pmu.num_counters_fixed);
>                 edx.split.bit_width_fixed = min_t(int,
>                         edx.split.bit_width_fixed, x86_pmu.bit_width_fixed);
>                 pmu->counter_bitmask[KVM_PMC_FIXED] =
>
> or we modify find_fixed_event and its caller to support PERF_TYPE_RAW
> counters, and then add support for the IceLake TOPDOWN.SLOTS fixed
> counter.
>
> What's your preference?

As Like points out, KVM_GET_SUPPORTED_CPUID indicates that only three
fixed counters are supported. So, per the KVM contract, if userspace
configures four in the guest cpuid info, all bets are off. I don't
like that contract, but changing it means introducing KVM_SET_CPUID3.
:-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-12-09 18:53       ` Jim Mattson
@ 2021-12-09 18:57         ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-09 18:57 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Like Xu, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 12/9/21 19:53, Jim Mattson wrote:
>> How do we know that i < size? For example, Ice Lake supports 4 fixed
>> counters, but fixed_pmc_events only has three entries.
> 
> We don't, and it's a preexisting bug in intel_pmu_refresh.
> As Like points out, KVM_GET_SUPPORTED_CPUID indicates that only three
> fixed counters are supported. So, per the KVM contract, if userspace
> configures four in the guest cpuid info, all bets are off.

Out of bounds accesses are not part of the contract though, even if 
squashed by an unorthodox use of array_index_nospec.  So I'll post my hack.

> I don't like that contract, but changing it means introducing KVM_SET_CPUID3.

And especially it means getting it right, which is the difficult part.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2021-12-09  3:52   ` Jim Mattson
  2021-12-09  8:11     ` Like Xu
@ 2021-12-09 19:24     ` Paolo Bonzini
  1 sibling, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-09 19:24 UTC (permalink / raw)
  To: Jim Mattson, Like Xu
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm, linux-kernel, Like Xu

On 12/9/21 04:52, Jim Mattson wrote:
> Why don't we just use PERF_TYPE_RAW for guest counters all of the
> time? What is the advantage of matching entries in a table so that we
> can use PERF_TYPE_HARDWARE?

In theory it's so that hosts without support for architectural PMU 
events can map the architectural PMU events to their own.  In practice, 
we can probably return early from intel_pmu_refresh if the host does not 
support X86_FEATURE_ARCH_PERFMON, because only the oldest Pentium 4 
toasters^Wprocessors probably have VMX but lack architectural PMU.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions
  2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
                   ` (5 preceding siblings ...)
  2021-11-30  7:42 ` [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions Like Xu
@ 2021-12-09 19:30 ` Paolo Bonzini
  2021-12-16 10:14   ` Like Xu
  6 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-09 19:30 UTC (permalink / raw)
  To: Like Xu
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 11/30/21 08:42, Like Xu wrote:
> Hi,
> 
> [ Jim is on holiday, so I'm here to continue this work. ]
> 
> Some cloud customers need accurate virtualization of two
> basic PMU events on x86 hardware: "instructions retired" and
> "branch instructions retired". The existing PMU virtualization code
> fails to account for instructions (e.g, CPUID) that are emulated by KVM.
> 
> Accurately virtualizing all PMU events for all microarchitectures is a
> herculean task, let's just stick to the two events covered by this set.
> 
> Eric Hankland wrote this code originally, but his plate is full, so Jim
> and I volunteered to shepherd the changes through upstream acceptance.
> 
> Thanks,
> 
> v1 -> v2 Changelog:
> - Include the patch set [1] and drop the intel_find_fixed_event(); [Paolo]
>    (we will fix the misleading Intel CPUID events in another patch set)
> - Drop checks for pmc->perf_event or event state or event type;
> - Increase a counter once its umask bits and the first 8 select bits are matched;
> - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
> - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
> - Add counter enable check for kvm_pmu_trigger_event();
> - Add vcpu CPL check for kvm_pmu_trigger_event(); [Jim]
> 
> Previous:
> https://lore.kernel.org/kvm/20211112235235.1125060-2-jmattson@google.com/
> 
> [1] https://lore.kernel.org/kvm/20211119064856.77948-1-likexu@tencent.com/
> 
> Jim Mattson (1):
>    KVM: x86: Update vPMCs when retiring branch instructions
> 
> Like Xu (5):
>    KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
>    KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
>    KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()
>    KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
>    KVM: x86: Update vPMCs when retiring instructions
> 
>   arch/x86/include/asm/kvm_host.h |   1 +
>   arch/x86/kvm/emulate.c          |  55 ++++++++------
>   arch/x86/kvm/kvm_emulate.h      |   1 +
>   arch/x86/kvm/pmu.c              | 128 ++++++++++++++++++++++----------
>   arch/x86/kvm/pmu.h              |   5 +-
>   arch/x86/kvm/svm/pmu.c          |  19 ++---
>   arch/x86/kvm/vmx/nested.c       |   7 +-
>   arch/x86/kvm/vmx/pmu_intel.c    |  44 ++++++-----
>   arch/x86/kvm/x86.c              |   5 ++
>   9 files changed, 167 insertions(+), 98 deletions(-)
> 

Queued patches 1-4, thanks.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-09  8:28     ` Like Xu
@ 2021-12-10  0:54       ` Jim Mattson
  2021-12-10  9:35         ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-10  0:54 UTC (permalink / raw)
  To: Like Xu
  Cc: Andi Kleen, Kim Phillips, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Thu, Dec 9, 2021 at 12:28 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 9/12/2021 12:25 pm, Jim Mattson wrote:
> >
> > Not your change, but if the event is counting anything based on
> > cycles, and the guest TSC is scaled to run at a different rate from
> > the host TSC, doesn't the initial value of the underlying hardware
> > counter have to be adjusted as well, so that the interrupt arrives
> > when the guest's counter overflows rather than when the host's counter
> > overflows?
>
> I've thought about this issue too and at least the Intel Specification
> did not let me down on this detail:
>
>         "The counter changes in the VMX non-root mode will follow
>         VMM's use of the TSC offset or TSC scaling VMX controls"

Where do you see this? I see similar text regarding TSC packets in the
section on Intel Processor Trace, but nothing about PMU counters
advancing at a scaled TSC frequency.

> Not knowing if AMD or the real world hardware
> will live up to this expectation and I'm pessimistic.
>
> cc Andi and Kim.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10  0:54       ` Jim Mattson
@ 2021-12-10  9:35         ` Paolo Bonzini
  2021-12-10 10:11           ` Like Xu
  2021-12-10 22:55           ` Jim Mattson
  0 siblings, 2 replies; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-10  9:35 UTC (permalink / raw)
  To: Jim Mattson, Like Xu
  Cc: Andi Kleen, Kim Phillips, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, kvm, linux-kernel, Like Xu

On 12/10/21 01:54, Jim Mattson wrote:
> On Thu, Dec 9, 2021 at 12:28 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 9/12/2021 12:25 pm, Jim Mattson wrote:
>>>
>>> Not your change, but if the event is counting anything based on
>>> cycles, and the guest TSC is scaled to run at a different rate from
>>> the host TSC, doesn't the initial value of the underlying hardware
>>> counter have to be adjusted as well, so that the interrupt arrives
>>> when the guest's counter overflows rather than when the host's counter
>>> overflows?
>>
>> I've thought about this issue too and at least the Intel Specification
>> did not let me down on this detail:
>>
>>          "The counter changes in the VMX non-root mode will follow
>>          VMM's use of the TSC offset or TSC scaling VMX controls"
> 
> Where do you see this? I see similar text regarding TSC packets in the
> section on Intel Processor Trace, but nothing about PMU counters
> advancing at a scaled TSC frequency.

Indeed it seems quite unlikely that PMU counters can count fractionally.

Even for tracing the SDM says "Like the value returned by RDTSC, TSC 
packets will include these adjustments, but other timing packets (such 
as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone 
TSC packets are typically generated only when generation of other timing 
packets (MTCs and CYCs) has ceased for a period of time", I'm not even 
sure it's a good thing that the values in TSC packets are scaled and offset.

Back to the PMU, for non-architectural counters it's not really possible 
to know if they count in cycles or not.  So it may not be a good idea to 
special case the architectural counters.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10  9:35         ` Paolo Bonzini
@ 2021-12-10 10:11           ` Like Xu
  2021-12-10 22:55           ` Jim Mattson
  1 sibling, 0 replies; 45+ messages in thread
From: Like Xu @ 2021-12-10 10:11 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Andi Kleen, Kim Phillips, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, kvm, linux-kernel, Like Xu

On 10/12/2021 5:35 pm, Paolo Bonzini wrote:
> On 12/10/21 01:54, Jim Mattson wrote:
>> On Thu, Dec 9, 2021 at 12:28 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>
>>> On 9/12/2021 12:25 pm, Jim Mattson wrote:
>>>>
>>>> Not your change, but if the event is counting anything based on
>>>> cycles, and the guest TSC is scaled to run at a different rate from
>>>> the host TSC, doesn't the initial value of the underlying hardware
>>>> counter have to be adjusted as well, so that the interrupt arrives
>>>> when the guest's counter overflows rather than when the host's counter
>>>> overflows?
>>>
>>> I've thought about this issue too and at least the Intel Specification
>>> did not let me down on this detail:
>>>
>>>          "The counter changes in the VMX non-root mode will follow
>>>          VMM's use of the TSC offset or TSC scaling VMX controls"
>>

Emm, before I left Intel to play AMD, my hardware architect gave
me a verbal yes about any reported TSC values for vmx non-root mode
(including the Timed LBR or PEBS records or PT packages) as long as
we enable the relevant VM execution control bits.

Not sure if it's true for legacy platforms.

>> Where do you see this? I see similar text regarding TSC packets in the
>> section on Intel Processor Trace, but nothing about PMU counters
>> advancing at a scaled TSC frequency.
> 
> Indeed it seems quite unlikely that PMU counters can count fractionally.
> 
> Even for tracing the SDM says "Like the value returned by RDTSC, TSC packets 
> will include these adjustments, but other timing packets (such as MTC, CYC, and 
> CBR) are not impacted".  Considering that "stand-alone TSC packets are typically 
> generated only when generation of other timing packets (MTCs and CYCs) has 
> ceased for a period of time", I'm not even sure it's a good thing that the 
> values in TSC packets are scaled and offset.

There are some discussion that cannot be made public.

We recommend (as software developers) that any PMU enabled guest
should keep the host/guest TSC as a joint progression for
performance tuning since guest doesn't have AMPERF capability.

> 
> Back to the PMU, for non-architectural counters it's not really possible to know 
> if they count in cycles or not.  So it may not be a good idea to special case 
> the architectural counters.

Yes captain.
Let's see if we have real world challenges or bugs to explore this detail further.

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
  2021-12-09 18:49     ` Paolo Bonzini
  2021-12-09 18:53       ` Jim Mattson
@ 2021-12-10 10:20       ` Like Xu
  1 sibling, 0 replies; 45+ messages in thread
From: Like Xu @ 2021-12-10 10:20 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	kvm, linux-kernel, Like Xu, linux-perf-users

On 10/12/2021 2:49 am, Paolo Bonzini wrote:
> 
> or we modify find_fixed_event and its caller to support PERF_TYPE_RAW
> counters, and then add support for the IceLake TOPDOWN.SLOTS fixed
> counter.

It looks like the TOPDOWN approach from "Yasin Ahmad"
has recently been widely advertised among developers.

But I still need bandwidth to clean up some perf driver stuff
in order to enable topdown counter smoothly in the guest.

cc linux-perf-users@vger.kernel.org for user voices.

> 
> What's your preference?
> 
> Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10  9:35         ` Paolo Bonzini
  2021-12-10 10:11           ` Like Xu
@ 2021-12-10 22:55           ` Jim Mattson
  2021-12-10 22:59             ` Paolo Bonzini
  1 sibling, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-10 22:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Like Xu, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Fri, Dec 10, 2021 at 1:35 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/10/21 01:54, Jim Mattson wrote:
> > On Thu, Dec 9, 2021 at 12:28 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> On 9/12/2021 12:25 pm, Jim Mattson wrote:
> >>>
> >>> Not your change, but if the event is counting anything based on
> >>> cycles, and the guest TSC is scaled to run at a different rate from
> >>> the host TSC, doesn't the initial value of the underlying hardware
> >>> counter have to be adjusted as well, so that the interrupt arrives
> >>> when the guest's counter overflows rather than when the host's counter
> >>> overflows?
> >>
> >> I've thought about this issue too and at least the Intel Specification
> >> did not let me down on this detail:
> >>
> >>          "The counter changes in the VMX non-root mode will follow
> >>          VMM's use of the TSC offset or TSC scaling VMX controls"
> >
> > Where do you see this? I see similar text regarding TSC packets in the
> > section on Intel Processor Trace, but nothing about PMU counters
> > advancing at a scaled TSC frequency.
>
> Indeed it seems quite unlikely that PMU counters can count fractionally.
>
> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
> packets will include these adjustments, but other timing packets (such
> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
> TSC packets are typically generated only when generation of other timing
> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
> sure it's a good thing that the values in TSC packets are scaled and offset.
>
> Back to the PMU, for non-architectural counters it's not really possible
> to know if they count in cycles or not.  So it may not be a good idea to
> special case the architectural counters.

In that case, what we're doing with the guest PMU is not
virtualization. I don't know what it is, but it's not virtualization.

Exposing non-architectural events is questionable with live migration,
and TSC scaling is unnecessary without live migration. I suppose you
could have a migration pool with different SKUs of the same generation
with 'seemingly compatible' PMU events but different TSC frequencies,
in which case it might be reasonable to expose non-architectural
events, but I would argue that any of those 'seemingly compatible'
events are actually not compatible if they count in cycles.

Unless, of course, Like is right, and the PMU counters do count fractionally.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10 22:55           ` Jim Mattson
@ 2021-12-10 22:59             ` Paolo Bonzini
  2021-12-10 23:31               ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2021-12-10 22:59 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Like Xu, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On 12/10/21 23:55, Jim Mattson wrote:
>>
>> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
>> packets will include these adjustments, but other timing packets (such
>> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
>> TSC packets are typically generated only when generation of other timing
>> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
>> sure it's a good thing that the values in TSC packets are scaled and offset.
>>
>> Back to the PMU, for non-architectural counters it's not really possible
>> to know if they count in cycles or not.  So it may not be a good idea to
>> special case the architectural counters.
>
> In that case, what we're doing with the guest PMU is not
> virtualization. I don't know what it is, but it's not virtualization.

It is virtualization even if it is incompatible with live migration to a 
different SKU (where, as you point out below, multiple TSC frequencies 
might also count as multiple SKUs).  But yeah, it's virtualization with 
more caveats than usual.

> Exposing non-architectural events is questionable with live migration,
> and TSC scaling is unnecessary without live migration. I suppose you
> could have a migration pool with different SKUs of the same generation
> with 'seemingly compatible' PMU events but different TSC frequencies,
> in which case it might be reasonable to expose non-architectural
> events, but I would argue that any of those 'seemingly compatible'
> events are actually not compatible if they count in cycles.

I agree.  Support for marshaling/unmarshaling PMU state exists but it's 
more useful for intra-host updates than for actual live migration, since 
these days most live migration will use TSC scaling on the destination.

Paolo

> 
> Unless, of course, Like is right, and the PMU counters do count fractionally.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10 22:59             ` Paolo Bonzini
@ 2021-12-10 23:31               ` Jim Mattson
  2021-12-12  4:56                 ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-10 23:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Like Xu, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Fri, Dec 10, 2021 at 2:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/10/21 23:55, Jim Mattson wrote:
> >>
> >> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
> >> packets will include these adjustments, but other timing packets (such
> >> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
> >> TSC packets are typically generated only when generation of other timing
> >> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
> >> sure it's a good thing that the values in TSC packets are scaled and offset.
> >>
> >> Back to the PMU, for non-architectural counters it's not really possible
> >> to know if they count in cycles or not.  So it may not be a good idea to
> >> special case the architectural counters.
> >
> > In that case, what we're doing with the guest PMU is not
> > virtualization. I don't know what it is, but it's not virtualization.
>
> It is virtualization even if it is incompatible with live migration to a
> different SKU (where, as you point out below, multiple TSC frequencies
> might also count as multiple SKUs).  But yeah, it's virtualization with
> more caveats than usual.

It's not virtualization if the counters don't count at the rate the
guest expects them to count.

> > Exposing non-architectural events is questionable with live migration,
> > and TSC scaling is unnecessary without live migration. I suppose you
> > could have a migration pool with different SKUs of the same generation
> > with 'seemingly compatible' PMU events but different TSC frequencies,
> > in which case it might be reasonable to expose non-architectural
> > events, but I would argue that any of those 'seemingly compatible'
> > events are actually not compatible if they count in cycles.
> I agree.  Support for marshaling/unmarshaling PMU state exists but it's
> more useful for intra-host updates than for actual live migration, since
> these days most live migration will use TSC scaling on the destination.
>
> Paolo
>
> >
> > Unless, of course, Like is right, and the PMU counters do count fractionally.
> >
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-10 23:31               ` Jim Mattson
@ 2021-12-12  4:56                 ` Jim Mattson
  2021-12-13  6:37                   ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-12  4:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Like Xu, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Fri, Dec 10, 2021 at 3:31 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Dec 10, 2021 at 2:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 12/10/21 23:55, Jim Mattson wrote:
> > >>
> > >> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
> > >> packets will include these adjustments, but other timing packets (such
> > >> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
> > >> TSC packets are typically generated only when generation of other timing
> > >> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
> > >> sure it's a good thing that the values in TSC packets are scaled and offset.
> > >>
> > >> Back to the PMU, for non-architectural counters it's not really possible
> > >> to know if they count in cycles or not.  So it may not be a good idea to
> > >> special case the architectural counters.
> > >
> > > In that case, what we're doing with the guest PMU is not
> > > virtualization. I don't know what it is, but it's not virtualization.
> >
> > It is virtualization even if it is incompatible with live migration to a
> > different SKU (where, as you point out below, multiple TSC frequencies
> > might also count as multiple SKUs).  But yeah, it's virtualization with
> > more caveats than usual.
>
> It's not virtualization if the counters don't count at the rate the
> guest expects them to count.

Per the SDM, unhalted reference cycles count at "a fixed frequency."
If the frequency changes on migration, then the value of this event is
questionable at best. For unhalted core cycles, on the other hand, the
SDM says, "The performance counter for this event counts across
performance state transitions using different core clock frequencies."
That does seem to permit frequency changes on migration, but I suspect
that software expects the event to count at a fixed frequency if
INVARIANT_TSC is set.

I'm not sure that I buy your argument regarding consistency. In
general, I would expect the hypervisor to exclude non-architected
events from the allow-list for any VM instances running in a
heterogeneous migration pool. Certainly, those events could be allowed
in a heterogeneous migration pool consisting of multiple SKUs of the
same microarchitecture running at different clock frequencies, but
that seems like a niche case.


> > > Exposing non-architectural events is questionable with live migration,
> > > and TSC scaling is unnecessary without live migration. I suppose you
> > > could have a migration pool with different SKUs of the same generation
> > > with 'seemingly compatible' PMU events but different TSC frequencies,
> > > in which case it might be reasonable to expose non-architectural
> > > events, but I would argue that any of those 'seemingly compatible'
> > > events are actually not compatible if they count in cycles.
> > I agree.  Support for marshaling/unmarshaling PMU state exists but it's
> > more useful for intra-host updates than for actual live migration, since
> > these days most live migration will use TSC scaling on the destination.
> >
> > Paolo
> >
> > >
> > > Unless, of course, Like is right, and the PMU counters do count fractionally.
> > >
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-12  4:56                 ` Jim Mattson
@ 2021-12-13  6:37                   ` Jim Mattson
  2021-12-16  9:57                     ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2021-12-13  6:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Like Xu, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Sat, Dec 11, 2021 at 8:56 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Dec 10, 2021 at 3:31 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Fri, Dec 10, 2021 at 2:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >
> > > On 12/10/21 23:55, Jim Mattson wrote:
> > > >>
> > > >> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
> > > >> packets will include these adjustments, but other timing packets (such
> > > >> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
> > > >> TSC packets are typically generated only when generation of other timing
> > > >> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
> > > >> sure it's a good thing that the values in TSC packets are scaled and offset.
> > > >>
> > > >> Back to the PMU, for non-architectural counters it's not really possible
> > > >> to know if they count in cycles or not.  So it may not be a good idea to
> > > >> special case the architectural counters.
> > > >
> > > > In that case, what we're doing with the guest PMU is not
> > > > virtualization. I don't know what it is, but it's not virtualization.
> > >
> > > It is virtualization even if it is incompatible with live migration to a
> > > different SKU (where, as you point out below, multiple TSC frequencies
> > > might also count as multiple SKUs).  But yeah, it's virtualization with
> > > more caveats than usual.
> >
> > It's not virtualization if the counters don't count at the rate the
> > guest expects them to count.
>
> Per the SDM, unhalted reference cycles count at "a fixed frequency."
> If the frequency changes on migration, then the value of this event is
> questionable at best. For unhalted core cycles, on the other hand, the
> SDM says, "The performance counter for this event counts across
> performance state transitions using different core clock frequencies."
> That does seem to permit frequency changes on migration, but I suspect
> that software expects the event to count at a fixed frequency if
> INVARIANT_TSC is set.

Actually, I now realize that unhalted reference cycles is independent
of the host or guest TSC, so it is not affected by TSC scaling.
However, we still have to decide on a specific fixed frequency to
virtualize so that the frequency doesn't change on migration. As a
practical matter, it may be the case that the reference cycles
frequency is the same on all processors in a migration pool, and we
don't have to do anything.


> I'm not sure that I buy your argument regarding consistency. In
> general, I would expect the hypervisor to exclude non-architected
> events from the allow-list for any VM instances running in a
> heterogeneous migration pool. Certainly, those events could be allowed
> in a heterogeneous migration pool consisting of multiple SKUs of the
> same microarchitecture running at different clock frequencies, but
> that seems like a niche case.
>
>
> > > > Exposing non-architectural events is questionable with live migration,
> > > > and TSC scaling is unnecessary without live migration. I suppose you
> > > > could have a migration pool with different SKUs of the same generation
> > > > with 'seemingly compatible' PMU events but different TSC frequencies,
> > > > in which case it might be reasonable to expose non-architectural
> > > > events, but I would argue that any of those 'seemingly compatible'
> > > > events are actually not compatible if they count in cycles.
> > > I agree.  Support for marshaling/unmarshaling PMU state exists but it's
> > > more useful for intra-host updates than for actual live migration, since
> > > these days most live migration will use TSC scaling on the destination.
> > >
> > > Paolo
> > >
> > > >
> > > > Unless, of course, Like is right, and the PMU counters do count fractionally.
> > > >
> > >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-13  6:37                   ` Jim Mattson
@ 2021-12-16  9:57                     ` Like Xu
  2021-12-16 17:52                       ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2021-12-16  9:57 UTC (permalink / raw)
  To: Jim Mattson, Paolo Bonzini
  Cc: Andi Kleen, Kim Phillips, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, kvm, linux-kernel, Like Xu

On 13/12/2021 2:37 pm, Jim Mattson wrote:
> On Sat, Dec 11, 2021 at 8:56 PM Jim Mattson <jmattson@google.com> wrote:
>>
>> On Fri, Dec 10, 2021 at 3:31 PM Jim Mattson <jmattson@google.com> wrote:
>>>
>>> On Fri, Dec 10, 2021 at 2:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>> On 12/10/21 23:55, Jim Mattson wrote:
>>>>>>
>>>>>> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
>>>>>> packets will include these adjustments, but other timing packets (such
>>>>>> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
>>>>>> TSC packets are typically generated only when generation of other timing
>>>>>> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
>>>>>> sure it's a good thing that the values in TSC packets are scaled and offset.
>>>>>>
>>>>>> Back to the PMU, for non-architectural counters it's not really possible
>>>>>> to know if they count in cycles or not.  So it may not be a good idea to
>>>>>> special case the architectural counters.
>>>>>
>>>>> In that case, what we're doing with the guest PMU is not
>>>>> virtualization. I don't know what it is, but it's not virtualization.

It's a use of profiling guest on the host side, like "perf kvm" and in that case,
we need to convert the guest's TSC values with the host view, taking into
account the guest TSC scaling.

>>>>
>>>> It is virtualization even if it is incompatible with live migration to a
>>>> different SKU (where, as you point out below, multiple TSC frequencies
>>>> might also count as multiple SKUs).  But yeah, it's virtualization with
>>>> more caveats than usual.
>>>
>>> It's not virtualization if the counters don't count at the rate the
>>> guest expects them to count.

We do have "Use TSC scaling" bit in the "Secondary Processor-Based VM-Execution 
Controls".

>>
>> Per the SDM, unhalted reference cycles count at "a fixed frequency."
>> If the frequency changes on migration, then the value of this event is
>> questionable at best. For unhalted core cycles, on the other hand, the
>> SDM says, "The performance counter for this event counts across
>> performance state transitions using different core clock frequencies."
>> That does seem to permit frequency changes on migration, but I suspect
>> that software expects the event to count at a fixed frequency if
>> INVARIANT_TSC is set.

Yes, I may propose that pmu be used in conjunction with INVARIANT_TSC.

> 
> Actually, I now realize that unhalted reference cycles is independent
> of the host or guest TSC, so it is not affected by TSC scaling.

I doubt it.

> However, we still have to decide on a specific fixed frequency to
> virtualize so that the frequency doesn't change on migration. As a
> practical matter, it may be the case that the reference cycles
> frequency is the same on all processors in a migration pool, and we
> don't have to do anything.

Yes, someone is already doing this in a production environment.

> 
> 
>> I'm not sure that I buy your argument regarding consistency. In
>> general, I would expect the hypervisor to exclude non-architected
>> events from the allow-list for any VM instances running in a
>> heterogeneous migration pool. Certainly, those events could be allowed
>> in a heterogeneous migration pool consisting of multiple SKUs of the
>> same microarchitecture running at different clock frequencies, but
>> that seems like a niche case.

IMO, if there are users who want to use the guest PMU, they definitely
want non-architectural events, even without live migration support.

Another input is that we actually have no problem reporting erratic
performance data during live migration transactions or host power
transactions, and there are situations where users want to know
that these kind of things are happening underwater.

The software performance tuners would not trust the perf data from
a single trial, relying more on statistical conclusions.

>>
>>
>>>>> Exposing non-architectural events is questionable with live migration,
>>>>> and TSC scaling is unnecessary without live migration. I suppose you
>>>>> could have a migration pool with different SKUs of the same generation
>>>>> with 'seemingly compatible' PMU events but different TSC frequencies,
>>>>> in which case it might be reasonable to expose non-architectural
>>>>> events, but I would argue that any of those 'seemingly compatible'
>>>>> events are actually not compatible if they count in cycles.
>>>> I agree.  Support for marshaling/unmarshaling PMU state exists but it's
>>>> more useful for intra-host updates than for actual live migration, since
>>>> these days most live migration will use TSC scaling on the destination.
>>>>
>>>> Paolo
>>>>
>>>>>
>>>>> Unless, of course, Like is right, and the PMU counters do count fractionally.
>>>>>
>>>>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions
  2021-12-09 19:30 ` [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Paolo Bonzini
@ 2021-12-16 10:14   ` Like Xu
  0 siblings, 0 replies; 45+ messages in thread
From: Like Xu @ 2021-12-16 10:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 10/12/2021 3:30 am, Paolo Bonzini wrote:
> On 11/30/21 08:42, Like Xu wrote:
>> Hi,
>>
>> [ Jim is on holiday, so I'm here to continue this work. ]
>>
>> Some cloud customers need accurate virtualization of two
>> basic PMU events on x86 hardware: "instructions retired" and
>> "branch instructions retired". The existing PMU virtualization code
>> fails to account for instructions (e.g, CPUID) that are emulated by KVM.
>>
>> Accurately virtualizing all PMU events for all microarchitectures is a
>> herculean task, let's just stick to the two events covered by this set.
>>
>> Eric Hankland wrote this code originally, but his plate is full, so Jim
>> and I volunteered to shepherd the changes through upstream acceptance.
>>
>> Thanks,
>>
>> v1 -> v2 Changelog:
>> - Include the patch set [1] and drop the intel_find_fixed_event(); [Paolo]
>>    (we will fix the misleading Intel CPUID events in another patch set)
>> - Drop checks for pmc->perf_event or event state or event type;
>> - Increase a counter once its umask bits and the first 8 select bits are matched;
>> - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
>> - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
>> - Add counter enable check for kvm_pmu_trigger_event();
>> - Add vcpu CPL check for kvm_pmu_trigger_event(); [Jim]
>>
>> Previous:
>> https://lore.kernel.org/kvm/20211112235235.1125060-2-jmattson@google.com/
>>
>> [1] https://lore.kernel.org/kvm/20211119064856.77948-1-likexu@tencent.com/
>>
>> Jim Mattson (1):
>>    KVM: x86: Update vPMCs when retiring branch instructions
>>
>> Like Xu (5):
>>    KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs
>>    KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
>>    KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()
>>    KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
>>    KVM: x86: Update vPMCs when retiring instructions
>>
>>   arch/x86/include/asm/kvm_host.h |   1 +
>>   arch/x86/kvm/emulate.c          |  55 ++++++++------
>>   arch/x86/kvm/kvm_emulate.h      |   1 +
>>   arch/x86/kvm/pmu.c              | 128 ++++++++++++++++++++++----------
>>   arch/x86/kvm/pmu.h              |   5 +-
>>   arch/x86/kvm/svm/pmu.c          |  19 ++---
>>   arch/x86/kvm/vmx/nested.c       |   7 +-
>>   arch/x86/kvm/vmx/pmu_intel.c    |  44 ++++++-----
>>   arch/x86/kvm/x86.c              |   5 ++
>>   9 files changed, 167 insertions(+), 98 deletions(-)
>>
> 
> Queued patches 1-4, thanks.
> 
> Paolo
> 

Hi Paolo,

do we miss the fourth patch in the kvm/queue tree or are there
any new ideas or comments that we don't take it on board ?

Actually, the motivation is that the v11 pebs is rebased w/ first four patches.

Thanks,
Like Xu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()
  2021-12-16  9:57                     ` Like Xu
@ 2021-12-16 17:52                       ` Jim Mattson
  0 siblings, 0 replies; 45+ messages in thread
From: Jim Mattson @ 2021-12-16 17:52 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Andi Kleen, Kim Phillips, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm, linux-kernel,
	Like Xu

On Thu, Dec 16, 2021 at 1:57 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 13/12/2021 2:37 pm, Jim Mattson wrote:
> > On Sat, Dec 11, 2021 at 8:56 PM Jim Mattson <jmattson@google.com> wrote:
> >>
> >> On Fri, Dec 10, 2021 at 3:31 PM Jim Mattson <jmattson@google.com> wrote:
> >>>
> >>> On Fri, Dec 10, 2021 at 2:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>>
> >>>> On 12/10/21 23:55, Jim Mattson wrote:
> >>>>>>
> >>>>>> Even for tracing the SDM says "Like the value returned by RDTSC, TSC
> >>>>>> packets will include these adjustments, but other timing packets (such
> >>>>>> as MTC, CYC, and CBR) are not impacted".  Considering that "stand-alone
> >>>>>> TSC packets are typically generated only when generation of other timing
> >>>>>> packets (MTCs and CYCs) has ceased for a period of time", I'm not even
> >>>>>> sure it's a good thing that the values in TSC packets are scaled and offset.
> >>>>>>
> >>>>>> Back to the PMU, for non-architectural counters it's not really possible
> >>>>>> to know if they count in cycles or not.  So it may not be a good idea to
> >>>>>> special case the architectural counters.
> >>>>>
> >>>>> In that case, what we're doing with the guest PMU is not
> >>>>> virtualization. I don't know what it is, but it's not virtualization.
>
> It's a use of profiling guest on the host side, like "perf kvm" and in that case,
> we need to convert the guest's TSC values with the host view, taking into
> account the guest TSC scaling.

I'm not sure if you are agreeing with me or disagreeing. Basically, my
argument is that the guest should observe a PMU counter programmed
with the "unhalted core cycles" event to be in sync with the guest's
time stamp counter. (If FREEZE_WHILE_SMM or Freeze_PerfMon_On_PMI is
set, the PMU counter may lag behind the time stamp counter, but it
should never get ahead of it.)

> >>>>
> >>>> It is virtualization even if it is incompatible with live migration to a
> >>>> different SKU (where, as you point out below, multiple TSC frequencies
> >>>> might also count as multiple SKUs).  But yeah, it's virtualization with
> >>>> more caveats than usual.
> >>>
> >>> It's not virtualization if the counters don't count at the rate the
> >>> guest expects them to count.
>
> We do have "Use TSC scaling" bit in the "Secondary Processor-Based VM-Execution
> Controls".

Yes, we do. That's what this discussion has been about. That
VM-execution control is documented as follows:

This control determines whether executions of RDTSC, executions of
RDTSCP, and executions of RDMSR that read from the
IA32_TIME_STAMP_COUNTER MSR return a value modified by the TSC
multiplier field (see Section 23.6.5 and Section 24.3).

The SDM is quite specific about what this VM-execution control bit
does, and it makes no mention of PMU events.

> >>
> >> Per the SDM, unhalted reference cycles count at "a fixed frequency."
> >> If the frequency changes on migration, then the value of this event is
> >> questionable at best. For unhalted core cycles, on the other hand, the
> >> SDM says, "The performance counter for this event counts across
> >> performance state transitions using different core clock frequencies."
> >> That does seem to permit frequency changes on migration, but I suspect
> >> that software expects the event to count at a fixed frequency if
> >> INVARIANT_TSC is set.
>
> Yes, I may propose that pmu be used in conjunction with INVARIANT_TSC.
>
> >
> > Actually, I now realize that unhalted reference cycles is independent
> > of the host or guest TSC, so it is not affected by TSC scaling.
>
> I doubt it.

Well, it should be easy to prove, one way or the other. :-)

> > However, we still have to decide on a specific fixed frequency to
> > virtualize so that the frequency doesn't change on migration. As a
> > practical matter, it may be the case that the reference cycles
> > frequency is the same on all processors in a migration pool, and we
> > don't have to do anything.
>
> Yes, someone is already doing this in a production environment.

I'm sure they are. That doesn't mean PMU virtualization is bug-free.

> >
> >
> >> I'm not sure that I buy your argument regarding consistency. In
> >> general, I would expect the hypervisor to exclude non-architected
> >> events from the allow-list for any VM instances running in a
> >> heterogeneous migration pool. Certainly, those events could be allowed
> >> in a heterogeneous migration pool consisting of multiple SKUs of the
> >> same microarchitecture running at different clock frequencies, but
> >> that seems like a niche case.
>
> IMO, if there are users who want to use the guest PMU, they definitely
> want non-architectural events, even without live migration support.
>
There are two scenarios to support: (1) VMs that run on the same
microarchitecture as reported in the guest CPUID. (2) VMs that don't.

Paolo has argued against scaling the architected "unhalted core
cycles" event, because it is infeasible for KVM to recognize and scale
non-architected events that are also TSC based, and the inconsistency
is ugly.
However, in case (2), it is infeasible for KVM to offer any
non-architected events.

To clarify my earlier position, I am arguing that in case (1), TSC
scaling is not likely to be in use, so consistency is not an issue. In
case (2), I don't want to see the inconsistency that would arise every
time the TSC scaling fgactor changes.

I believe that KVM should be made capable of correctly virtualizing
the "unhalted core cycles" event in the presence of TSC scaling. I'm
happy to put this under a KVM_CAP if there are those who would prefer
that it not.

> Another input is that we actually have no problem reporting erratic
> performance data during live migration transactions or host power
> transactions, and there are situations where users want to know
> that these kind of things are happening underwater.

I have no idea what you are saying.

> The software performance tuners would not trust the perf data from
> a single trial, relying more on statistical conclusions.

Software performance tuning is not the only use of the PMU.

> >>
> >>
> >>>>> Exposing non-architectural events is questionable with live migration,
> >>>>> and TSC scaling is unnecessary without live migration. I suppose you
> >>>>> could have a migration pool with different SKUs of the same generation
> >>>>> with 'seemingly compatible' PMU events but different TSC frequencies,
> >>>>> in which case it might be reasonable to expose non-architectural
> >>>>> events, but I would argue that any of those 'seemingly compatible'
> >>>>> events are actually not compatible if they count in cycles.
> >>>> I agree.  Support for marshaling/unmarshaling PMU state exists but it's
> >>>> more useful for intra-host updates than for actual live migration, since
> >>>> these days most live migration will use TSC scaling on the destination.
> >>>>
> >>>> Paolo
> >>>>
> >>>>>
> >>>>> Unless, of course, Like is right, and the PMU counters do count fractionally.
> >>>>>
> >>>>
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2021-11-30  7:42 ` [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() Like Xu
  2021-12-09  3:52   ` Jim Mattson
@ 2022-02-05  1:55   ` Jim Mattson
  2022-02-09  9:00     ` Like Xu
  1 sibling, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2022-02-05  1:55 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On Mon, Nov 29, 2021 at 11:42 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> The find_arch_event() returns a "unsigned int" value,
> which is used by the pmc_reprogram_counter() to
> program a PERF_TYPE_HARDWARE type perf_event.
>
> The returned value is actually the kernel defined gernic
> perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
> incoming parameters for better self-explanation.
>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/pmu.c           | 8 +-------
>  arch/x86/kvm/pmu.h           | 3 +--
>  arch/x86/kvm/svm/pmu.c       | 8 ++++----
>  arch/x86/kvm/vmx/pmu_intel.c | 9 +++++----
>  4 files changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 09873f6488f7..3b3ccf5b1106 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -174,7 +174,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
>  void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>  {
>         unsigned config, type = PERF_TYPE_RAW;
> -       u8 event_select, unit_mask;
>         struct kvm *kvm = pmc->vcpu->kvm;
>         struct kvm_pmu_event_filter *filter;
>         int i;
> @@ -206,17 +205,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>         if (!allow_event)
>                 return;
>
> -       event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> -       unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
> -
>         if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>                           ARCH_PERFMON_EVENTSEL_INV |
>                           ARCH_PERFMON_EVENTSEL_CMASK |
>                           HSW_IN_TX |
>                           HSW_IN_TX_CHECKPOINTED))) {
> -               config = kvm_x86_ops.pmu_ops->find_arch_event(pmc_to_pmu(pmc),
> -                                                     event_select,
> -                                                     unit_mask);
> +               config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
>                 if (config != PERF_COUNT_HW_MAX)
>                         type = PERF_TYPE_HARDWARE;
>         }
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 59d6b76203d5..dd7dbb1c5048 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -24,8 +24,7 @@ struct kvm_event_hw_type_mapping {
>  };
>
>  struct kvm_pmu_ops {
> -       unsigned (*find_arch_event)(struct kvm_pmu *pmu, u8 event_select,
> -                                   u8 unit_mask);
> +       unsigned int (*pmc_perf_hw_id)(struct kvm_pmc *pmc);
>         unsigned (*find_fixed_event)(int idx);
>         bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
>         struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 0cf05e4caa4c..fb0ce8cda8a7 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -138,10 +138,10 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>         return &pmu->gp_counters[msr_to_index(msr)];
>  }
>
> -static unsigned amd_find_arch_event(struct kvm_pmu *pmu,
> -                                   u8 event_select,
> -                                   u8 unit_mask)
> +static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
>  {
> +       u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
On AMD, the event select is 12 bits.
> +       u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>         int i;
>
>         for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
> @@ -323,7 +323,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
>  }
>
>  struct kvm_pmu_ops amd_pmu_ops = {
> -       .find_arch_event = amd_find_arch_event,
> +       .pmc_perf_hw_id = amd_pmc_perf_hw_id,
>         .find_fixed_event = amd_find_fixed_event,
>         .pmc_is_enabled = amd_pmc_is_enabled,
>         .pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index b7ab5fd03681..67a0188ecdc5 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -68,10 +68,11 @@ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
>                 reprogram_counter(pmu, bit);
>  }
>
> -static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
> -                                     u8 event_select,
> -                                     u8 unit_mask)
> +static unsigned int intel_pmc_perf_hw_id(struct kvm_pmc *pmc)
>  {
> +       struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> +       u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> +       u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>         int i;
>
>         for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
> @@ -719,7 +720,7 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
>  }
>
>  struct kvm_pmu_ops intel_pmu_ops = {
> -       .find_arch_event = intel_find_arch_event,
> +       .pmc_perf_hw_id = intel_pmc_perf_hw_id,
>         .find_fixed_event = intel_find_fixed_event,
>         .pmc_is_enabled = intel_pmc_is_enabled,
>         .pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-05  1:55   ` Jim Mattson
@ 2022-02-09  9:00     ` Like Xu
  2022-02-09 19:30       ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2022-02-09  9:00 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu

On 5/2/2022 9:55 am, Jim Mattson wrote:
>> +static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
>>   {
>> +       u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> On AMD, the event select is 12 bits.

Out of your carefulness, we already know this fact.

This function to get the perf_hw_id by the last 16 bits still works because we 
currently
do not have a 12-bits-select event defined in the amd_event_mapping[]. The 
12-bits-select
events (if any) will be programed in the type of PERF_TYPE_RAW.

IMO, a minor patch that renames some AMD variables and updates the relevant comments
to avoid misunderstandings is quite acceptable.

Thanks,
Like Xu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-09  9:00     ` Like Xu
@ 2022-02-09 19:30       ` Jim Mattson
  2022-02-10 11:28         ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2022-02-09 19:30 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, Like Xu, David Dunn,
	Stephane Eranian

On Wed, Feb 9, 2022 at 1:00 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 5/2/2022 9:55 am, Jim Mattson wrote:
> >> +static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
> >>   {
> >> +       u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> > On AMD, the event select is 12 bits.
>
> Out of your carefulness, we already know this fact.
>
> This function to get the perf_hw_id by the last 16 bits still works because we
> currently
> do not have a 12-bits-select event defined in the amd_event_mapping[]. The
> 12-bits-select
> events (if any) will be programed in the type of PERF_TYPE_RAW.

I beg to differ. It doesn't matter whether there are 12-bit event
selects in amd_event_mapping[] or not. The fundamental problem is that
the equality operation on event selects is broken, because it ignores
the high 4 bits. Hence, we may actually find an entry in that table
that we *think* is for the requested event, but instead it's for some
other event with 0 in the high 4 bits. For example, if the guest
requests event 0x1d0 (retired fused instructions), they will get event
0xd0 instead. According to amd_event_mapping, event 0xd0 is "
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND." However, according to the
Milan PPR, event 0xd0 doesn't exist. So, I don't actually know what
we're counting.

At the very least, we need a patch like the following (which I fully
expect gmail to mangle):

--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -210,7 +210,8 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
        if (!allow_event)
                return;

-       if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
+       if (!(eventsel & ((0xFULL << 32) |
+                         ARCH_PERFMON_EVENTSEL_EDGE |
                          ARCH_PERFMON_EVENTSEL_INV |
                          ARCH_PERFMON_EVENTSEL_CMASK |
                          HSW_IN_TX |

By the way, the following events from amd_event_mapping[] are not
listed in the Milan PPR:
{ 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
{ 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
{ 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
{ 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }

Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
for newer AMD processors?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-09 19:30       ` Jim Mattson
@ 2022-02-10 11:28         ` Like Xu
  2022-02-11  9:56           ` Ravi Bangoria
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2022-02-10 11:28 UTC (permalink / raw)
  To: Bangoria, Ravikumar, Kim Phillips
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, David Dunn, Stephane Eranian,
	Jim Mattson

cc Kim and Ravi to help confirm more details about this change.

On 10/2/2022 3:30 am, Jim Mattson wrote:
> By the way, the following events from amd_event_mapping[] are not
> listed in the Milan PPR:
> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
> 
> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
> for newer AMD processors?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-10 11:28         ` Like Xu
@ 2022-02-11  9:56           ` Ravi Bangoria
  2022-02-11 18:16             ` Jim Mattson
  0 siblings, 1 reply; 45+ messages in thread
From: Ravi Bangoria @ 2022-02-11  9:56 UTC (permalink / raw)
  To: Like Xu, Jim Mattson
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, David Dunn, Stephane Eranian,
	Ravi Bangoria, Kim Phillips



On 10-Feb-22 4:58 PM, Like Xu wrote:
> cc Kim and Ravi to help confirm more details about this change.
> 
> On 10/2/2022 3:30 am, Jim Mattson wrote:
>> By the way, the following events from amd_event_mapping[] are not
>> listed in the Milan PPR:
>> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
>> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
>> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
>> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
>>
>> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
>> for newer AMD processors?

I think Like's other patch series to unify event mapping across kvm
and host will fix it. No?
https://lore.kernel.org/lkml/20220117085307.93030-4-likexu@tencent.com

- Ravi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-11  9:56           ` Ravi Bangoria
@ 2022-02-11 18:16             ` Jim Mattson
  2022-02-14 10:14               ` Ravi Bangoria
  0 siblings, 1 reply; 45+ messages in thread
From: Jim Mattson @ 2022-02-11 18:16 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Like Xu, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, kvm, linux-kernel, David Dunn,
	Stephane Eranian, Kim Phillips

On Fri, Feb 11, 2022 at 1:56 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
>
>
> On 10-Feb-22 4:58 PM, Like Xu wrote:
> > cc Kim and Ravi to help confirm more details about this change.
> >
> > On 10/2/2022 3:30 am, Jim Mattson wrote:
> >> By the way, the following events from amd_event_mapping[] are not
> >> listed in the Milan PPR:
> >> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
> >> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
> >> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
> >> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
> >>
> >> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
> >> for newer AMD processors?
>
> I think Like's other patch series to unify event mapping across kvm
> and host will fix it. No?
> https://lore.kernel.org/lkml/20220117085307.93030-4-likexu@tencent.com

Yes, that should fix it. But why do we even bother? What is the
downside of using PERF_TYPE_RAW all of the time?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-11 18:16             ` Jim Mattson
@ 2022-02-14 10:14               ` Ravi Bangoria
  2022-02-16  7:44                 ` Like Xu
  0 siblings, 1 reply; 45+ messages in thread
From: Ravi Bangoria @ 2022-02-14 10:14 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Like Xu, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, kvm, linux-kernel, David Dunn,
	Stephane Eranian, Kim Phillips, Ravi Bangoria



On 11-Feb-22 11:46 PM, Jim Mattson wrote:
> On Fri, Feb 11, 2022 at 1:56 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>>
>>
>> On 10-Feb-22 4:58 PM, Like Xu wrote:
>>> cc Kim and Ravi to help confirm more details about this change.
>>>
>>> On 10/2/2022 3:30 am, Jim Mattson wrote:
>>>> By the way, the following events from amd_event_mapping[] are not
>>>> listed in the Milan PPR:
>>>> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
>>>> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
>>>> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
>>>> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
>>>>
>>>> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
>>>> for newer AMD processors?
>>
>> I think Like's other patch series to unify event mapping across kvm
>> and host will fix it. No?
>> https://lore.kernel.org/lkml/20220117085307.93030-4-likexu@tencent.com
> 
> Yes, that should fix it. But why do we even bother? What is the
> downside of using PERF_TYPE_RAW all of the time?

There are few places where PERF_TYPE_HARDWARE and PERF_TYPE_RAW are treated
differently. Ex, x86_pmu_event_init(), perf_init_event(). So I think it makes
sense to keep using PERF_TYPE_HARDWARE for generalized events?

Ravi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-14 10:14               ` Ravi Bangoria
@ 2022-02-16  7:44                 ` Like Xu
  2022-02-16 11:24                   ` Ravi Bangoria
  0 siblings, 1 reply; 45+ messages in thread
From: Like Xu @ 2022-02-16  7:44 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, David Dunn, Stephane Eranian,
	Kim Phillips, Jim Mattson

On 14/2/2022 6:14 pm, Ravi Bangoria wrote:
> 
> 
> On 11-Feb-22 11:46 PM, Jim Mattson wrote:
>> On Fri, Feb 11, 2022 at 1:56 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>
>>>
>>>
>>> On 10-Feb-22 4:58 PM, Like Xu wrote:
>>>> cc Kim and Ravi to help confirm more details about this change.
>>>>
>>>> On 10/2/2022 3:30 am, Jim Mattson wrote:
>>>>> By the way, the following events from amd_event_mapping[] are not
>>>>> listed in the Milan PPR:
>>>>> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
>>>>> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
>>>>> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
>>>>> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
>>>>>
>>>>> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
>>>>> for newer AMD processors?

So do we need another amd_f19h_perfmon_event_map[] in the host perf code ?

>>>
>>> I think Like's other patch series to unify event mapping across kvm
>>> and host will fix it. No?
>>> https://lore.kernel.org/lkml/20220117085307.93030-4-likexu@tencent.com
>>
>> Yes, that should fix it. But why do we even bother? What is the
>> downside of using PERF_TYPE_RAW all of the time?
> 
> There are few places where PERF_TYPE_HARDWARE and PERF_TYPE_RAW are treated
> differently. Ex, x86_pmu_event_init(), perf_init_event(). So I think it makes
> sense to keep using PERF_TYPE_HARDWARE for generalized events?
> 
> Ravi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
  2022-02-16  7:44                 ` Like Xu
@ 2022-02-16 11:24                   ` Ravi Bangoria
  0 siblings, 0 replies; 45+ messages in thread
From: Ravi Bangoria @ 2022-02-16 11:24 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, kvm, linux-kernel, David Dunn, Stephane Eranian,
	Kim Phillips, Jim Mattson, Ravi Bangoria



On 16-Feb-22 1:14 PM, Like Xu wrote:
> On 14/2/2022 6:14 pm, Ravi Bangoria wrote:
>>
>>
>> On 11-Feb-22 11:46 PM, Jim Mattson wrote:
>>> On Fri, Feb 11, 2022 at 1:56 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10-Feb-22 4:58 PM, Like Xu wrote:
>>>>> cc Kim and Ravi to help confirm more details about this change.
>>>>>
>>>>> On 10/2/2022 3:30 am, Jim Mattson wrote:
>>>>>> By the way, the following events from amd_event_mapping[] are not
>>>>>> listed in the Milan PPR:
>>>>>> { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
>>>>>> { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
>>>>>> { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
>>>>>> { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }
>>>>>>
>>>>>> Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
>>>>>> for newer AMD processors?
> 
> So do we need another amd_f19h_perfmon_event_map[] in the host perf code ?

I don't think so.

CACHE_REFERENCES/MISSES eventcode and umask for Milan is same as f17h.
Although STALLED_CYCLES_FRONTEND/BACKEND has been removed from PPR
event list, it will continue to work on Milan.

Ravi

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-02-16 11:25 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30  7:42 [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Like Xu
2021-11-30  7:42 ` [PATCH v2 1/6] KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs Like Xu
2021-12-06 19:57   ` Jim Mattson
2021-12-09 18:49     ` Paolo Bonzini
2021-12-09 18:53       ` Jim Mattson
2021-12-09 18:57         ` Paolo Bonzini
2021-12-10 10:20       ` Like Xu
     [not found]   ` <CALMp9eTq8H_bJOVKwi_7j3Kum9RvW6o-G3zCLUFco1A1cvNrkQ@mail.gmail.com>
2021-12-07  6:07     ` Like Xu
2021-12-07 17:42       ` Jim Mattson
2021-11-30  7:42 ` [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() Like Xu
2021-12-09  3:52   ` Jim Mattson
2021-12-09  8:11     ` Like Xu
2021-12-09 19:24     ` Paolo Bonzini
2022-02-05  1:55   ` Jim Mattson
2022-02-09  9:00     ` Like Xu
2022-02-09 19:30       ` Jim Mattson
2022-02-10 11:28         ` Like Xu
2022-02-11  9:56           ` Ravi Bangoria
2022-02-11 18:16             ` Jim Mattson
2022-02-14 10:14               ` Ravi Bangoria
2022-02-16  7:44                 ` Like Xu
2022-02-16 11:24                   ` Ravi Bangoria
2021-11-30  7:42 ` [PATCH v2 3/6] KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() Like Xu
2021-12-09  3:58   ` Jim Mattson
2021-11-30  7:42 ` [PATCH v2 4/6] KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() Like Xu
2021-12-09  4:25   ` Jim Mattson
2021-12-09  8:28     ` Like Xu
2021-12-10  0:54       ` Jim Mattson
2021-12-10  9:35         ` Paolo Bonzini
2021-12-10 10:11           ` Like Xu
2021-12-10 22:55           ` Jim Mattson
2021-12-10 22:59             ` Paolo Bonzini
2021-12-10 23:31               ` Jim Mattson
2021-12-12  4:56                 ` Jim Mattson
2021-12-13  6:37                   ` Jim Mattson
2021-12-16  9:57                     ` Like Xu
2021-12-16 17:52                       ` Jim Mattson
2021-11-30  7:42 ` [PATCH v2 5/6] KVM: x86: Update vPMCs when retiring instructions Like Xu
2021-12-09  4:33   ` Jim Mattson
2021-12-09  8:44     ` Like Xu
2021-12-09  9:23       ` Like Xu
2021-11-30  7:42 ` [PATCH v2 6/6] KVM: x86: Update vPMCs when retiring branch instructions Like Xu
2021-12-09  4:40   ` Jim Mattson
2021-12-09 19:30 ` [PATCH v2 0/6] KVM: x86/pmu: Count two basic events for emulated instructions Paolo Bonzini
2021-12-16 10:14   ` Like Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.