kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions
@ 2022-10-24  9:11 Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 01/24] x86/pmu: Add PDCM check before accessing PERF_CAP register Like Xu
                   ` (24 more replies)
  0 siblings, 25 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:11 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

The patch set includes all the changes on my side (SPR PEBS and AMD
PerfMonV2 are included, except for Arch lbr), which helps to keep the
review time focused. 

There are no major changes in the test logic. A considerable number of
helpers have been added to lib/x86/pmu.[c,h], which really helps the
readability of the code, while hiding some hardware differentiation details.

These are divided into three parts, the first part (01 - 08) is bug fixing,
the second part (09 - 18) is code refactoring, and the third part is the
addition of new test cases. It may also be good to split up and merge
in sequence. They get passed on AMD Zen3/4, Intel ICX/SPR machines.

Please feel free to test them in your own CI environment.

v3: https://lore.kernel.org/kvm/20220819110939.78013-1-likexu@tencent.com/
v3 -> v4 Changelog:
- Add more helpers to the new lib/x86/pmu.h and lib/x86/pmu.c;
  (pmu_cap, perf_capabilities other pmu related stuff)
- Refine commit message and report_ messages;
- Code style fixes (curly braces, #defeine, GENMASK_ULL,
  ternary operator);
- Add more message for "the expected init cnt.count for fixed counter 0"
  (measure_for_overflow() is applied);
- Rename PC_VECTOR to PMI_VECTOR;
- Snapshot pebs_has_baseline() to avoid RDMSR on every touch;
- Drop the idea about is_the_count_reproducible() for AMD;
- Add X86_FEATURE_* in the KUT world;

Like Xu (24):
  x86/pmu: Add PDCM check before accessing PERF_CAP register
  x86/pmu: Test emulation instructions on full-width counters
  x86/pmu: Pop up FW prefix to avoid out-of-context propagation
  x86/pmu: Report SKIP when testing Intel LBR on AMD platforms
  x86/pmu: Fix printed messages for emulated instruction test
  x86/pmu: Introduce __start_event() to drop all of the manual zeroing
  x86/pmu: Introduce multiple_{one, many}() to improve readability
  x86/pmu: Reset the expected count of the fixed counter 0 when i386
  x86: create pmu group for quick pmu-scope testing
  x86/pmu: Refine info to clarify the current support
  x86/pmu: Update rdpmc testcase to cover #GP path
  x86/pmu: Rename PC_VECTOR to PMI_VECTOR for better readability
  x86/pmu: Add lib/x86/pmu.[c.h] and move common code to header files
  x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit
  x86/pmu: Initialize PMU perf_capabilities at pmu_init()
  x86/pmu: Add GP counter related helpers
  x86/pmu: Add GP/Fixed counters reset helpers
  x86/pmu: Add a set of helpers related to global registers
  x86: Add tests for Guest Processor Event Based Sampling (PEBS)
  x86/pmu: Add global helpers to cover Intel Arch PMU Version 1
  x86/pmu: Add gp_events pointer to route different event tables
  x86/pmu: Add nr_gp_counters to limit the number of test counters
  x86/pmu: Update testcases to cover AMD PMU
  x86/pmu: Add AMD Guest PerfMonV2 testcases

 lib/x86/msr.h       |  30 +++
 lib/x86/pmu.c       |  36 ++++
 lib/x86/pmu.h       | 306 +++++++++++++++++++++++++++++++
 lib/x86/processor.h |  80 ++------
 lib/x86/smp.c       |   2 +
 x86/Makefile.common |   1 +
 x86/Makefile.x86_64 |   1 +
 x86/pmu.c           | 297 ++++++++++++++++++------------
 x86/pmu_lbr.c       |  20 +-
 x86/pmu_pebs.c      | 433 ++++++++++++++++++++++++++++++++++++++++++++
 x86/unittests.cfg   |  10 +
 x86/vmx_tests.c     |   1 +
 12 files changed, 1022 insertions(+), 195 deletions(-)
 create mode 100644 lib/x86/pmu.c
 create mode 100644 lib/x86/pmu.h
 create mode 100644 x86/pmu_pebs.c

-- 
2.38.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 01/24] x86/pmu: Add PDCM check before accessing PERF_CAP register
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 02/24] x86/pmu: Test emulation instructions on full-width counters Like Xu
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

On virtual platforms without PDCM support (e.g. AMD), #GP
failure on MSR_IA32_PERF_CAPABILITIES is completely avoidable.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/processor.h | 8 ++++++++
 x86/pmu.c           | 2 +-
 x86/pmu_lbr.c       | 2 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 0324220..f85abe3 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -847,4 +847,12 @@ static inline bool pmu_gp_counter_is_available(int i)
 	return !(cpuid(10).b & BIT(i));
 }
 
+static inline u64 this_cpu_perf_capabilities(void)
+{
+	if (!this_cpu_has(X86_FEATURE_PDCM))
+		return 0;
+
+	return rdmsr(MSR_IA32_PERF_CAPABILITIES);
+}
+
 #endif
diff --git a/x86/pmu.c b/x86/pmu.c
index d59baf1..d278bb5 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -660,7 +660,7 @@ int main(int ac, char **av)
 
 	check_counters();
 
-	if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) {
+	if (this_cpu_perf_capabilities() & PMU_CAP_FW_WRITES) {
 		gp_counter_base = MSR_IA32_PMC0;
 		report_prefix_push("full-width writes");
 		check_counters();
diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index 8dad1f1..c040b14 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -72,7 +72,7 @@ int main(int ac, char **av)
 		return report_summary();
 	}
 
-	perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
+	perf_cap = this_cpu_perf_capabilities();
 
 	if (!(perf_cap & PMU_CAP_LBR_FMT)) {
 		report_skip("(Architectural) LBR is not supported.");
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 02/24] x86/pmu: Test emulation instructions on full-width counters
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 01/24] x86/pmu: Add PDCM check before accessing PERF_CAP register Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 03/24] x86/pmu: Pop up FW prefix to avoid out-of-context propagation Like Xu
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Move check_emulated_instr() into check_counters() so that full-width
counters could be tested with ease by the same test case.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index d278bb5..533851b 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -520,6 +520,9 @@ static void check_emulated_instr(void)
 
 static void check_counters(void)
 {
+	if (is_fep_available())
+		check_emulated_instr();
+
 	check_gp_counters();
 	check_fixed_counters();
 	check_rdpmc();
@@ -655,9 +658,6 @@ int main(int ac, char **av)
 
 	apic_write(APIC_LVTPC, PC_VECTOR);
 
-	if (is_fep_available())
-		check_emulated_instr();
-
 	check_counters();
 
 	if (this_cpu_perf_capabilities() & PMU_CAP_FW_WRITES) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 03/24] x86/pmu: Pop up FW prefix to avoid out-of-context propagation
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 01/24] x86/pmu: Add PDCM check before accessing PERF_CAP register Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 02/24] x86/pmu: Test emulation instructions on full-width counters Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 04/24] x86/pmu: Report SKIP when testing Intel LBR on AMD platforms Like Xu
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The inappropriate prefix "full-width writes" may be propagated to
later test cases if it is not popped out.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 x86/pmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/x86/pmu.c b/x86/pmu.c
index 533851b..c8a2e91 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -665,6 +665,7 @@ int main(int ac, char **av)
 		report_prefix_push("full-width writes");
 		check_counters();
 		check_gp_counters_write_width();
+		report_prefix_pop();
 	}
 
 	return report_summary();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 04/24] x86/pmu: Report SKIP when testing Intel LBR on AMD platforms
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (2 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 03/24] x86/pmu: Pop up FW prefix to avoid out-of-context propagation Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 05/24] x86/pmu: Fix printed messages for emulated instruction test Like Xu
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The test conclusion of running Intel LBR on AMD platforms
should not be PASS, but SKIP, fix it.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 x86/pmu_lbr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index c040b14..a641d79 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -59,7 +59,7 @@ int main(int ac, char **av)
 
 	if (!is_intel()) {
 		report_skip("PMU_LBR test is for intel CPU's only");
-		return 0;
+		return report_summary();
 	}
 
 	if (!this_cpu_has_pmu()) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 05/24] x86/pmu: Fix printed messages for emulated instruction test
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (3 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 04/24] x86/pmu: Report SKIP when testing Intel LBR on AMD platforms Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing Like Xu
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm, Sandipan Das

From: Like Xu <likexu@tencent.com>

This test case uses MSR_IA32_PERFCTR0 to count branch instructions
and PERFCTR1 to count instruction events. The same correspondence
should be maintained at report(), specifically this should use
status bit 1 for instructions and bit 0 for branches.

Fixes: 20cf914 ("x86/pmu: Test PMU virtualization on emulated instructions")
Reported-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index c8a2e91..d303a36 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -512,8 +512,8 @@ static void check_emulated_instr(void)
 	       "branch count");
 	// Additionally check that those counters overflowed properly.
 	status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
-	report(status & 1, "instruction counter overflow");
-	report(status & 2, "branch counter overflow");
+	report(status & 1, "branch counter overflow");
+	report(status & 2, "instruction counter overflow");
 
 	report_prefix_pop();
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (4 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 05/24] x86/pmu: Fix printed messages for emulated instruction test Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:41   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 07/24] x86/pmu: Introduce multiple_{one, many}() to improve readability Like Xu
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Most invocation of start_event() and measure() first sets evt.count=0.
Instead of forcing each caller to ensure count is zeroed, optimize the
count to zero during start_event(), then drop all of the manual zeroing.

Accumulating counts can be handled by reading the current count before
start_event(), and doing something like stuffing a high count to test an
edge case could be handled by an inner helper, __start_event().

For overflow, just open code measure() for that one-off case. Requiring
callers to zero out a field in most common cases isn't exactly flexible.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index d303a36..ba67aa6 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -137,9 +137,9 @@ static void global_disable(pmu_counter_t *cnt)
 			~(1ull << cnt->idx));
 }
 
-
-static void start_event(pmu_counter_t *evt)
+static void __start_event(pmu_counter_t *evt, uint64_t count)
 {
+    evt->count = count;
     wrmsr(evt->ctr, evt->count);
     if (is_gp(evt))
 	    wrmsr(MSR_P6_EVNTSEL0 + event_to_global_idx(evt),
@@ -162,6 +162,11 @@ static void start_event(pmu_counter_t *evt)
     apic_write(APIC_LVTPC, PC_VECTOR);
 }
 
+static void start_event(pmu_counter_t *evt)
+{
+	__start_event(evt, 0);
+}
+
 static void stop_event(pmu_counter_t *evt)
 {
 	global_disable(evt);
@@ -186,6 +191,13 @@ static void measure(pmu_counter_t *evt, int count)
 		stop_event(&evt[i]);
 }
 
+static void __measure(pmu_counter_t *evt, uint64_t count)
+{
+	__start_event(evt, count);
+	loop();
+	stop_event(evt);
+}
+
 static bool verify_event(uint64_t count, struct pmu_event *e)
 {
 	// printf("%d <= %ld <= %d\n", e->min, count, e->max);
@@ -208,7 +220,6 @@ static void check_gp_counter(struct pmu_event *evt)
 	int i;
 
 	for (i = 0; i < nr_gp_counters; i++, cnt.ctr++) {
-		cnt.count = 0;
 		measure(&cnt, 1);
 		report(verify_event(cnt.count, evt), "%s-%d", evt->name, i);
 	}
@@ -235,7 +246,6 @@ static void check_fixed_counters(void)
 	int i;
 
 	for (i = 0; i < nr_fixed_counters; i++) {
-		cnt.count = 0;
 		cnt.ctr = fixed_events[i].unit_sel;
 		measure(&cnt, 1);
 		report(verify_event(cnt.count, &fixed_events[i]), "fixed-%d", i);
@@ -253,14 +263,12 @@ static void check_counters_many(void)
 		if (!pmu_gp_counter_is_available(i))
 			continue;
 
-		cnt[n].count = 0;
 		cnt[n].ctr = gp_counter_base + n;
 		cnt[n].config = EVNTSEL_OS | EVNTSEL_USR |
 			gp_events[i % ARRAY_SIZE(gp_events)].unit_sel;
 		n++;
 	}
 	for (i = 0; i < nr_fixed_counters; i++) {
-		cnt[n].count = 0;
 		cnt[n].ctr = fixed_events[i].unit_sel;
 		cnt[n].config = EVNTSEL_OS | EVNTSEL_USR;
 		n++;
@@ -283,9 +291,8 @@ static void check_counter_overflow(void)
 	pmu_counter_t cnt = {
 		.ctr = gp_counter_base,
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
-		.count = 0,
 	};
-	measure(&cnt, 1);
+	__measure(&cnt, 0);
 	count = cnt.count;
 
 	/* clear status before test */
@@ -311,7 +318,7 @@ static void check_counter_overflow(void)
 		else
 			cnt.config &= ~EVNTSEL_INT;
 		idx = event_to_global_idx(&cnt);
-		measure(&cnt, 1);
+		__measure(&cnt, cnt.count);
 		report(cnt.count == 1, "cntr-%d", i);
 		status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
 		report(status & (1ull << idx), "status-%d", i);
@@ -329,7 +336,6 @@ static void check_gp_counter_cmask(void)
 	pmu_counter_t cnt = {
 		.ctr = gp_counter_base,
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
-		.count = 0,
 	};
 	cnt.config |= (0x2 << EVNTSEL_CMASK_SHIFT);
 	measure(&cnt, 1);
@@ -415,7 +421,6 @@ static void check_running_counter_wrmsr(void)
 	pmu_counter_t evt = {
 		.ctr = gp_counter_base,
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
-		.count = 0,
 	};
 
 	report_prefix_push("running counter wrmsr");
@@ -430,7 +435,6 @@ static void check_running_counter_wrmsr(void)
 	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
 	      rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
 
-	evt.count = 0;
 	start_event(&evt);
 
 	count = -1;
@@ -454,13 +458,11 @@ static void check_emulated_instr(void)
 		.ctr = MSR_IA32_PERFCTR0,
 		/* branch instructions */
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[5].unit_sel,
-		.count = 0,
 	};
 	pmu_counter_t instr_cnt = {
 		.ctr = MSR_IA32_PERFCTR0 + 1,
 		/* instructions */
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
-		.count = 0,
 	};
 	report_prefix_push("emulated instruction");
 
@@ -592,7 +594,6 @@ static void set_ref_cycle_expectations(void)
 	pmu_counter_t cnt = {
 		.ctr = MSR_IA32_PERFCTR0,
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[2].unit_sel,
-		.count = 0,
 	};
 	uint64_t tsc_delta;
 	uint64_t t0, t1, t2, t3;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 07/24] x86/pmu: Introduce multiple_{one, many}() to improve readability
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (5 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 08/24] x86/pmu: Reset the expected count of the fixed counter 0 when i386 Like Xu
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The current measure_one() forces the common case to pass in unnecessary
information in order to give flexibility to a single use case. It's just
syntatic sugar, but it really does help readers as it's not obvious that
the "1" specifies the number of events, whereas multiple_many() and
measure_one() are relatively self-explanatory.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index ba67aa6..3b1ed16 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -181,7 +181,7 @@ static void stop_event(pmu_counter_t *evt)
 	evt->count = rdmsr(evt->ctr);
 }
 
-static void measure(pmu_counter_t *evt, int count)
+static void measure_many(pmu_counter_t *evt, int count)
 {
 	int i;
 	for (i = 0; i < count; i++)
@@ -191,6 +191,11 @@ static void measure(pmu_counter_t *evt, int count)
 		stop_event(&evt[i]);
 }
 
+static void measure_one(pmu_counter_t *evt)
+{
+	measure_many(evt, 1);
+}
+
 static void __measure(pmu_counter_t *evt, uint64_t count)
 {
 	__start_event(evt, count);
@@ -220,7 +225,7 @@ static void check_gp_counter(struct pmu_event *evt)
 	int i;
 
 	for (i = 0; i < nr_gp_counters; i++, cnt.ctr++) {
-		measure(&cnt, 1);
+		measure_one(&cnt);
 		report(verify_event(cnt.count, evt), "%s-%d", evt->name, i);
 	}
 }
@@ -247,7 +252,7 @@ static void check_fixed_counters(void)
 
 	for (i = 0; i < nr_fixed_counters; i++) {
 		cnt.ctr = fixed_events[i].unit_sel;
-		measure(&cnt, 1);
+		measure_one(&cnt);
 		report(verify_event(cnt.count, &fixed_events[i]), "fixed-%d", i);
 	}
 }
@@ -274,7 +279,7 @@ static void check_counters_many(void)
 		n++;
 	}
 
-	measure(cnt, n);
+	measure_many(cnt, n);
 
 	for (i = 0; i < n; i++)
 		if (!verify_counter(&cnt[i]))
@@ -338,7 +343,7 @@ static void check_gp_counter_cmask(void)
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
 	};
 	cnt.config |= (0x2 << EVNTSEL_CMASK_SHIFT);
-	measure(&cnt, 1);
+	measure_one(&cnt);
 	report(cnt.count < gp_events[1].min, "cmask");
 }
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 08/24] x86/pmu: Reset the expected count of the fixed counter 0 when i386
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (6 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 07/24] x86/pmu: Introduce multiple_{one, many}() to improve readability Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 09/24] x86: create pmu group for quick pmu-scope testing Like Xu
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The pmu test check_counter_overflow() always fails with 32-bit binaries.
The cnt.count obtained from the latter run of measure() (based on fixed
counter 0) is not equal to the expected value (based on gp counter 0) and
there is a positive error with a value of 2.

The two extra instructions come from inline wrmsr() and inline rdmsr()
inside the global_disable() binary code block. Specifically, for each msr
access, the i386 code will have two assembly mov instructions before
rdmsr/wrmsr (mark it for fixed counter 0, bit 32), but only one assembly
mov is needed for x86_64 and gp counter 0 on i386.

The sequence of instructions to count events using the #GP and #Fixed
counters is different. Thus the fix is quite high level, to use the same
counter (w/ same instruction sequences) to set initial value for the same
counter. Fix the expected init cnt.count for fixed counter 0 overflow
based on the same fixed counter 0, not always using gp counter 0.

The difference of 1 for this count enables the interrupt to be generated
immediately after the selected event count has been reached, instead of
waiting for the overflow to be propagation through the counter.

Adding a helper to measure/compute the overflow preset value. It
provides a convenient location to document the weird behavior
that's necessary to ensure immediate event delivery.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 3b1ed16..bb6e97e 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -288,17 +288,30 @@ static void check_counters_many(void)
 	report(i == n, "all counters");
 }
 
+static uint64_t measure_for_overflow(pmu_counter_t *cnt)
+{
+	__measure(cnt, 0);
+	/*
+	 * To generate overflow, i.e. roll over to '0', the initial count just
+	 * needs to be preset to the negative expected count.  However, as per
+	 * Intel's SDM, the preset count needs to be incremented by 1 to ensure
+	 * the overflow interrupt is generated immediately instead of possibly
+	 * waiting for the overflow to propagate through the counter.
+	 */
+	assert(cnt->count > 1);
+	return 1 - cnt->count;
+}
+
 static void check_counter_overflow(void)
 {
 	int nr_gp_counters = pmu_nr_gp_counters();
-	uint64_t count;
+	uint64_t overflow_preset;
 	int i;
 	pmu_counter_t cnt = {
 		.ctr = gp_counter_base,
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
 	};
-	__measure(&cnt, 0);
-	count = cnt.count;
+	overflow_preset = measure_for_overflow(&cnt);
 
 	/* clear status before test */
 	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
@@ -309,12 +322,13 @@ static void check_counter_overflow(void)
 		uint64_t status;
 		int idx;
 
-		cnt.count = 1 - count;
+		cnt.count = overflow_preset;
 		if (gp_counter_base == MSR_IA32_PMC0)
 			cnt.count &= (1ull << pmu_gp_counter_width()) - 1;
 
 		if (i == nr_gp_counters) {
 			cnt.ctr = fixed_events[0].unit_sel;
+			cnt.count = measure_for_overflow(&cnt);
 			cnt.count &= (1ull << pmu_fixed_counter_width()) - 1;
 		}
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 09/24] x86: create pmu group for quick pmu-scope testing
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (7 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 08/24] x86/pmu: Reset the expected count of the fixed counter 0 when i386 Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 10/24] x86/pmu: Refine info to clarify the current support Like Xu
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Any agent can run "./run_tests.sh -g pmu" to run all PMU tests easily,
e.g. when verifying the x86/PMU KVM changes.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 x86/unittests.cfg | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index ed65185..07d0507 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -189,6 +189,7 @@ file = pmu.flat
 extra_params = -cpu max
 check = /proc/sys/kernel/nmi_watchdog=0
 accel = kvm
+groups = pmu
 
 [pmu_lbr]
 arch = x86_64
@@ -197,6 +198,7 @@ extra_params = -cpu host,migratable=no
 check = /sys/module/kvm/parameters/ignore_msrs=N
 check = /proc/sys/kernel/nmi_watchdog=0
 accel = kvm
+groups = pmu
 
 [vmware_backdoors]
 file = vmware_backdoors.flat
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 10/24] x86/pmu: Refine info to clarify the current support
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (8 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 09/24] x86: create pmu group for quick pmu-scope testing Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path Like Xu
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Existing unit tests do not cover AMD pmu, nor Intel pmu that is not
architecture (on some obsolete cpu's). AMD's PMU support will be
coming in subsequent commits.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index bb6e97e..15572e3 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -658,7 +658,7 @@ int main(int ac, char **av)
 	buf = malloc(N*64);
 
 	if (!pmu_version()) {
-		report_skip("No pmu is detected!");
+		report_skip("No Intel Arch PMU is detected!");
 		return report_summary();
 	}
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (9 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 10/24] x86/pmu: Refine info to clarify the current support Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:42   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 12/24] x86/pmu: Rename PC_VECTOR to PMI_VECTOR for better readability Like Xu
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Specifying an unsupported PMC encoding will cause a #GP(0).

There are multiple reasons RDPMC can #GP, the one that is being relied
on to guarantee #GP is specifically that the PMC is invalid. The most
extensible solution is to provide a safe variant.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/processor.h | 21 ++++++++++++++++++---
 x86/pmu.c           | 10 ++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index f85abe3..cb396ed 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -438,11 +438,26 @@ static inline int wrmsr_safe(u32 index, u64 val)
 	return exception_vector();
 }
 
-static inline uint64_t rdpmc(uint32_t index)
+static inline int rdpmc_safe(u32 index, uint64_t *val)
 {
 	uint32_t a, d;
-	asm volatile ("rdpmc" : "=a"(a), "=d"(d) : "c"(index));
-	return a | ((uint64_t)d << 32);
+
+	asm volatile (ASM_TRY("1f")
+		      "rdpmc\n\t"
+		      "1:"
+		      : "=a"(a), "=d"(d) : "c"(index) : "memory");
+	*val = (uint64_t)a | ((uint64_t)d << 32);
+	return exception_vector();
+}
+
+static inline uint64_t rdpmc(uint32_t index)
+{
+	uint64_t val;
+	int vector = rdpmc_safe(index, &val);
+
+	assert_msg(!vector, "Unexpected %s on RDPMC(%d)",
+			exception_mnemonic(vector), index);
+	return val;
 }
 
 static inline int write_cr0_safe(ulong val)
diff --git a/x86/pmu.c b/x86/pmu.c
index 15572e3..d0de196 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -651,12 +651,22 @@ static void set_ref_cycle_expectations(void)
 	gp_events[2].max = (gp_events[2].max * cnt.count) / tsc_delta;
 }
 
+static void check_invalid_rdpmc_gp(void)
+{
+	uint64_t val;
+
+	report(rdpmc_safe(64, &val) == GP_VECTOR,
+	       "Expected #GP on RDPMC(64)");
+}
+
 int main(int ac, char **av)
 {
 	setup_vm();
 	handle_irq(PC_VECTOR, cnt_overflow);
 	buf = malloc(N*64);
 
+	check_invalid_rdpmc_gp();
+
 	if (!pmu_version()) {
 		report_skip("No Intel Arch PMU is detected!");
 		return report_summary();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 12/24] x86/pmu: Rename PC_VECTOR to PMI_VECTOR for better readability
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (10 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 13/24] x86/pmu: Add lib/x86/pmu.[c.h] and move common code to header files Like Xu
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The original name "PC_VECTOR" comes from the LVT Performance
Counter Register. Rename it to PMI_VECTOR. That's much more familiar
for KVM developers and it's still correct, e.g. it's the PMI vector
that's programmed into the LVT PC register.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index d0de196..3b36caa 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -11,7 +11,9 @@
 #include <stdint.h>
 
 #define FIXED_CNT_INDEX 32
-#define PC_VECTOR	32
+
+/* Performance Counter Vector for the LVT PC Register */
+#define PMI_VECTOR	32
 
 #define EVNSEL_EVENT_SHIFT	0
 #define EVNTSEL_UMASK_SHIFT	8
@@ -159,7 +161,7 @@ static void __start_event(pmu_counter_t *evt, uint64_t count)
 	    wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, ctrl);
     }
     global_enable(evt);
-    apic_write(APIC_LVTPC, PC_VECTOR);
+    apic_write(APIC_LVTPC, PMI_VECTOR);
 }
 
 static void start_event(pmu_counter_t *evt)
@@ -662,7 +664,7 @@ static void check_invalid_rdpmc_gp(void)
 int main(int ac, char **av)
 {
 	setup_vm();
-	handle_irq(PC_VECTOR, cnt_overflow);
+	handle_irq(PMI_VECTOR, cnt_overflow);
 	buf = malloc(N*64);
 
 	check_invalid_rdpmc_gp();
@@ -686,7 +688,7 @@ int main(int ac, char **av)
 	printf("Fixed counters:      %d\n", pmu_nr_fixed_counters());
 	printf("Fixed counter width: %d\n", pmu_fixed_counter_width());
 
-	apic_write(APIC_LVTPC, PC_VECTOR);
+	apic_write(APIC_LVTPC, PMI_VECTOR);
 
 	check_counters();
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 13/24] x86/pmu: Add lib/x86/pmu.[c.h] and move common code to header files
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (11 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 12/24] x86/pmu: Rename PC_VECTOR to PMI_VECTOR for better readability Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit Like Xu
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Given all the PMU stuff coming in, we need e.g. lib/x86/pmu.h to hold all
of the hardware-defined stuff, e.g. #defines, accessors, helpers and structs
that are dictated by hardware. This will greatly help with code reuse and
reduce unnecessary vm-exit.

Opportunistically move lbr msrs definition to header processor.h.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/msr.h       |   7 ++++
 lib/x86/pmu.c       |   1 +
 lib/x86/pmu.h       | 100 ++++++++++++++++++++++++++++++++++++++++++++
 lib/x86/processor.h |  64 ----------------------------
 x86/Makefile.common |   1 +
 x86/pmu.c           |  25 +----------
 x86/pmu_lbr.c       |  11 +----
 x86/vmx_tests.c     |   1 +
 8 files changed, 112 insertions(+), 98 deletions(-)
 create mode 100644 lib/x86/pmu.c
 create mode 100644 lib/x86/pmu.h

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index fa1c0c8..bbe29fd 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -86,6 +86,13 @@
 #define DEBUGCTLMSR_BTS_OFF_USR		(1UL << 10)
 #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI	(1UL << 11)
 
+#define MSR_LBR_NHM_FROM	0x00000680
+#define MSR_LBR_NHM_TO		0x000006c0
+#define MSR_LBR_CORE_FROM	0x00000040
+#define MSR_LBR_CORE_TO	0x00000060
+#define MSR_LBR_TOS		0x000001c9
+#define MSR_LBR_SELECT		0x000001c8
+
 #define MSR_IA32_MC0_CTL		0x00000400
 #define MSR_IA32_MC0_STATUS		0x00000401
 #define MSR_IA32_MC0_ADDR		0x00000402
diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
new file mode 100644
index 0000000..9d048ab
--- /dev/null
+++ b/lib/x86/pmu.c
@@ -0,0 +1 @@
+#include "pmu.h"
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
new file mode 100644
index 0000000..078a974
--- /dev/null
+++ b/lib/x86/pmu.h
@@ -0,0 +1,100 @@
+#ifndef _X86_PMU_H_
+#define _X86_PMU_H_
+
+#include "processor.h"
+#include "libcflat.h"
+
+#define FIXED_CNT_INDEX 32
+#define MAX_NUM_LBR_ENTRY	  32
+
+/* Performance Counter Vector for the LVT PC Register */
+#define PMI_VECTOR	32
+
+#define DEBUGCTLMSR_LBR	  (1UL <<  0)
+
+#define PMU_CAP_LBR_FMT	  0x3f
+#define PMU_CAP_FW_WRITES	(1ULL << 13)
+
+#define EVNSEL_EVENT_SHIFT	0
+#define EVNTSEL_UMASK_SHIFT	8
+#define EVNTSEL_USR_SHIFT	16
+#define EVNTSEL_OS_SHIFT	17
+#define EVNTSEL_EDGE_SHIFT	18
+#define EVNTSEL_PC_SHIFT	19
+#define EVNTSEL_INT_SHIFT	20
+#define EVNTSEL_EN_SHIF		22
+#define EVNTSEL_INV_SHIF	23
+#define EVNTSEL_CMASK_SHIFT	24
+
+#define EVNTSEL_EN	(1 << EVNTSEL_EN_SHIF)
+#define EVNTSEL_USR	(1 << EVNTSEL_USR_SHIFT)
+#define EVNTSEL_OS	(1 << EVNTSEL_OS_SHIFT)
+#define EVNTSEL_PC	(1 << EVNTSEL_PC_SHIFT)
+#define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
+#define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
+
+static inline u8 pmu_version(void)
+{
+	return cpuid(10).a & 0xff;
+}
+
+static inline bool this_cpu_has_pmu(void)
+{
+	return !!pmu_version();
+}
+
+static inline bool this_cpu_has_perf_global_ctrl(void)
+{
+	return pmu_version() > 1;
+}
+
+static inline u8 pmu_nr_gp_counters(void)
+{
+	return (cpuid(10).a >> 8) & 0xff;
+}
+
+static inline u8 pmu_gp_counter_width(void)
+{
+	return (cpuid(10).a >> 16) & 0xff;
+}
+
+static inline u8 pmu_gp_counter_mask_length(void)
+{
+	return (cpuid(10).a >> 24) & 0xff;
+}
+
+static inline u8 pmu_nr_fixed_counters(void)
+{
+	struct cpuid id = cpuid(10);
+
+	if ((id.a & 0xff) > 1)
+		return id.d & 0x1f;
+	else
+		return 0;
+}
+
+static inline u8 pmu_fixed_counter_width(void)
+{
+	struct cpuid id = cpuid(10);
+
+	if ((id.a & 0xff) > 1)
+		return (id.d >> 5) & 0xff;
+	else
+		return 0;
+}
+
+static inline bool pmu_gp_counter_is_available(int i)
+{
+	/* CPUID.0xA.EBX bit is '1 if they counter is NOT available. */
+	return !(cpuid(10).b & BIT(i));
+}
+
+static inline u64 this_cpu_perf_capabilities(void)
+{
+	if (!this_cpu_has(X86_FEATURE_PDCM))
+		return 0;
+
+	return rdmsr(MSR_IA32_PERF_CAPABILITIES);
+}
+
+#endif /* _X86_PMU_H_ */
diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index cb396ed..ee2b5a2 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -806,68 +806,4 @@ static inline void flush_tlb(void)
 	write_cr4(cr4);
 }
 
-static inline u8 pmu_version(void)
-{
-	return cpuid(10).a & 0xff;
-}
-
-static inline bool this_cpu_has_pmu(void)
-{
-	return !!pmu_version();
-}
-
-static inline bool this_cpu_has_perf_global_ctrl(void)
-{
-	return pmu_version() > 1;
-}
-
-static inline u8 pmu_nr_gp_counters(void)
-{
-	return (cpuid(10).a >> 8) & 0xff;
-}
-
-static inline u8 pmu_gp_counter_width(void)
-{
-	return (cpuid(10).a >> 16) & 0xff;
-}
-
-static inline u8 pmu_gp_counter_mask_length(void)
-{
-	return (cpuid(10).a >> 24) & 0xff;
-}
-
-static inline u8 pmu_nr_fixed_counters(void)
-{
-	struct cpuid id = cpuid(10);
-
-	if ((id.a & 0xff) > 1)
-		return id.d & 0x1f;
-	else
-		return 0;
-}
-
-static inline u8 pmu_fixed_counter_width(void)
-{
-	struct cpuid id = cpuid(10);
-
-	if ((id.a & 0xff) > 1)
-		return (id.d >> 5) & 0xff;
-	else
-		return 0;
-}
-
-static inline bool pmu_gp_counter_is_available(int i)
-{
-	/* CPUID.0xA.EBX bit is '1 if they counter is NOT available. */
-	return !(cpuid(10).b & BIT(i));
-}
-
-static inline u64 this_cpu_perf_capabilities(void)
-{
-	if (!this_cpu_has(X86_FEATURE_PDCM))
-		return 0;
-
-	return rdmsr(MSR_IA32_PERF_CAPABILITIES);
-}
-
 #endif
diff --git a/x86/Makefile.common b/x86/Makefile.common
index b7010e2..8cbdd2a 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -22,6 +22,7 @@ cflatobjs += lib/x86/acpi.o
 cflatobjs += lib/x86/stack.o
 cflatobjs += lib/x86/fault_test.o
 cflatobjs += lib/x86/delay.o
+cflatobjs += lib/x86/pmu.o
 ifeq ($(CONFIG_EFI),y)
 cflatobjs += lib/x86/amd_sev.o
 cflatobjs += lib/efi.o
diff --git a/x86/pmu.c b/x86/pmu.c
index 3b36caa..46e9fca 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -1,6 +1,7 @@
 
 #include "x86/msr.h"
 #include "x86/processor.h"
+#include "x86/pmu.h"
 #include "x86/apic-defs.h"
 #include "x86/apic.h"
 #include "x86/desc.h"
@@ -10,29 +11,6 @@
 #include "libcflat.h"
 #include <stdint.h>
 
-#define FIXED_CNT_INDEX 32
-
-/* Performance Counter Vector for the LVT PC Register */
-#define PMI_VECTOR	32
-
-#define EVNSEL_EVENT_SHIFT	0
-#define EVNTSEL_UMASK_SHIFT	8
-#define EVNTSEL_USR_SHIFT	16
-#define EVNTSEL_OS_SHIFT	17
-#define EVNTSEL_EDGE_SHIFT	18
-#define EVNTSEL_PC_SHIFT	19
-#define EVNTSEL_INT_SHIFT	20
-#define EVNTSEL_EN_SHIF		22
-#define EVNTSEL_INV_SHIF	23
-#define EVNTSEL_CMASK_SHIFT	24
-
-#define EVNTSEL_EN	(1 << EVNTSEL_EN_SHIF)
-#define EVNTSEL_USR	(1 << EVNTSEL_USR_SHIFT)
-#define EVNTSEL_OS	(1 << EVNTSEL_OS_SHIFT)
-#define EVNTSEL_PC	(1 << EVNTSEL_PC_SHIFT)
-#define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
-#define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
-
 #define N 1000000
 
 // These values match the number of instructions and branches in the
@@ -66,7 +44,6 @@ struct pmu_event {
 	{"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
 };
 
-#define PMU_CAP_FW_WRITES	(1ULL << 13)
 static u64 gp_counter_base = MSR_IA32_PERFCTR0;
 
 char *buf;
diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index a641d79..e6d9823 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -1,18 +1,9 @@
 #include "x86/msr.h"
 #include "x86/processor.h"
+#include "x86/pmu.h"
 #include "x86/desc.h"
 
 #define N 1000000
-#define MAX_NUM_LBR_ENTRY	  32
-#define DEBUGCTLMSR_LBR	  (1UL <<  0)
-#define PMU_CAP_LBR_FMT	  0x3f
-
-#define MSR_LBR_NHM_FROM	0x00000680
-#define MSR_LBR_NHM_TO		0x000006c0
-#define MSR_LBR_CORE_FROM	0x00000040
-#define MSR_LBR_CORE_TO	0x00000060
-#define MSR_LBR_TOS		0x000001c9
-#define MSR_LBR_SELECT		0x000001c8
 
 volatile int count;
 u32 lbr_from, lbr_to;
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index aa2ecbb..fd36e43 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9,6 +9,7 @@
 #include "vmx.h"
 #include "msr.h"
 #include "processor.h"
+#include "pmu.h"
 #include "vm.h"
 #include "pci.h"
 #include "fwcfg.h"
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (12 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 13/24] x86/pmu: Add lib/x86/pmu.[c.h] and move common code to header files Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:45   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init() Like Xu
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The type of CPUID accessors can also go in the common pmu. Re-reading
cpuid(10) each time when needed, adding the overhead of eimulating
CPUID isn't meaningless in the grand scheme of the test.

A common "PMU init" routine would allow the library to provide helpers
access to more PMU common information.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  7 +++++++
 lib/x86/pmu.h | 26 +++++++++++++-------------
 lib/x86/smp.c |  2 ++
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 9d048ab..e8b9ae9 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -1 +1,8 @@
 #include "pmu.h"
+
+struct cpuid cpuid_10;
+
+void pmu_init(void)
+{
+    cpuid_10 = cpuid(10);
+}
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 078a974..7f4e797 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -33,9 +33,13 @@
 #define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
 #define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
 
+extern struct cpuid cpuid_10;
+
+void pmu_init(void);
+
 static inline u8 pmu_version(void)
 {
-	return cpuid(10).a & 0xff;
+	return cpuid_10.a & 0xff;
 }
 
 static inline bool this_cpu_has_pmu(void)
@@ -50,35 +54,31 @@ static inline bool this_cpu_has_perf_global_ctrl(void)
 
 static inline u8 pmu_nr_gp_counters(void)
 {
-	return (cpuid(10).a >> 8) & 0xff;
+	return (cpuid_10.a >> 8) & 0xff;
 }
 
 static inline u8 pmu_gp_counter_width(void)
 {
-	return (cpuid(10).a >> 16) & 0xff;
+	return (cpuid_10.a >> 16) & 0xff;
 }
 
 static inline u8 pmu_gp_counter_mask_length(void)
 {
-	return (cpuid(10).a >> 24) & 0xff;
+	return (cpuid_10.a >> 24) & 0xff;
 }
 
 static inline u8 pmu_nr_fixed_counters(void)
 {
-	struct cpuid id = cpuid(10);
-
-	if ((id.a & 0xff) > 1)
-		return id.d & 0x1f;
+	if ((cpuid_10.a & 0xff) > 1)
+		return cpuid_10.d & 0x1f;
 	else
 		return 0;
 }
 
 static inline u8 pmu_fixed_counter_width(void)
 {
-	struct cpuid id = cpuid(10);
-
-	if ((id.a & 0xff) > 1)
-		return (id.d >> 5) & 0xff;
+	if ((cpuid_10.a & 0xff) > 1)
+		return (cpuid_10.d >> 5) & 0xff;
 	else
 		return 0;
 }
@@ -86,7 +86,7 @@ static inline u8 pmu_fixed_counter_width(void)
 static inline bool pmu_gp_counter_is_available(int i)
 {
 	/* CPUID.0xA.EBX bit is '1 if they counter is NOT available. */
-	return !(cpuid(10).b & BIT(i));
+	return !(cpuid_10.b & BIT(i));
 }
 
 static inline u64 this_cpu_perf_capabilities(void)
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index b9b91c7..29197fc 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -4,6 +4,7 @@
 #include <asm/barrier.h>
 
 #include "processor.h"
+#include "pmu.h"
 #include "atomic.h"
 #include "smp.h"
 #include "apic.h"
@@ -155,6 +156,7 @@ void smp_init(void)
 		on_cpu(i, setup_smp_id, 0);
 
 	atomic_inc(&active_cpus);
+	pmu_init();
 }
 
 static void do_reset_apic(void *data)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init()
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (13 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:45   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers Like Xu
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Re-reading PERF_CAPABILITIES each time when needed, adding the
overhead of eimulating RDMSR isn't also meaningless in the grand
scheme of the test.

Based on this, more helpers for full_writes and lbr_fmt can also
be added to increase the readability of the test cases.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  3 +++
 lib/x86/pmu.h | 18 +++++++++++++++---
 x86/pmu.c     |  2 +-
 x86/pmu_lbr.c |  7 ++-----
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index e8b9ae9..35b7efb 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -1,8 +1,11 @@
 #include "pmu.h"
 
 struct cpuid cpuid_10;
+struct pmu_caps pmu;
 
 void pmu_init(void)
 {
     cpuid_10 = cpuid(10);
+    if (this_cpu_has(X86_FEATURE_PDCM))
+        pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 7f4e797..95b17da 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -33,7 +33,12 @@
 #define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
 #define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
 
+struct pmu_caps {
+    u64 perf_cap;
+};
+
 extern struct cpuid cpuid_10;
+extern struct pmu_caps pmu;
 
 void pmu_init(void);
 
@@ -91,10 +96,17 @@ static inline bool pmu_gp_counter_is_available(int i)
 
 static inline u64 this_cpu_perf_capabilities(void)
 {
-	if (!this_cpu_has(X86_FEATURE_PDCM))
-		return 0;
+	return pmu.perf_cap;
+}
 
-	return rdmsr(MSR_IA32_PERF_CAPABILITIES);
+static inline u64 pmu_lbr_version(void)
+{
+	return this_cpu_perf_capabilities() & PMU_CAP_LBR_FMT;
+}
+
+static inline bool pmu_has_full_writes(void)
+{
+	return this_cpu_perf_capabilities() & PMU_CAP_FW_WRITES;
 }
 
 #endif /* _X86_PMU_H_ */
diff --git a/x86/pmu.c b/x86/pmu.c
index 46e9fca..a6329cd 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -669,7 +669,7 @@ int main(int ac, char **av)
 
 	check_counters();
 
-	if (this_cpu_perf_capabilities() & PMU_CAP_FW_WRITES) {
+	if (pmu_has_full_writes()) {
 		gp_counter_base = MSR_IA32_PMC0;
 		report_prefix_push("full-width writes");
 		check_counters();
diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index e6d9823..d013552 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -43,7 +43,6 @@ static bool test_init_lbr_from_exception(u64 index)
 
 int main(int ac, char **av)
 {
-	u64 perf_cap;
 	int max, i;
 
 	setup_vm();
@@ -63,15 +62,13 @@ int main(int ac, char **av)
 		return report_summary();
 	}
 
-	perf_cap = this_cpu_perf_capabilities();
-
-	if (!(perf_cap & PMU_CAP_LBR_FMT)) {
+	if (!pmu_lbr_version()) {
 		report_skip("(Architectural) LBR is not supported.");
 		return report_summary();
 	}
 
 	printf("PMU version:		 %d\n", pmu_version());
-	printf("LBR version:		 %ld\n", perf_cap & PMU_CAP_LBR_FMT);
+	printf("LBR version:		 %ld\n", pmu_lbr_version());
 
 	/* Look for LBR from and to MSRs */
 	lbr_from = MSR_LBR_CORE_FROM;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (14 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init() Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:54   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers Like Xu
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Continuing the theme of code reuse, tests shouldn't need to manually
compute gp_counter_base and gp_event_select_base.

They can be accessed directly after initialization and changed via setters
when they need to be changed in some cases, e.g. full writes.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  2 ++
 lib/x86/pmu.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 x86/pmu.c     | 50 ++++++++++++++++++++++++--------------------------
 3 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 35b7efb..c0d100d 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -8,4 +8,6 @@ void pmu_init(void)
     cpuid_10 = cpuid(10);
     if (this_cpu_has(X86_FEATURE_PDCM))
         pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
+    pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
+    pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 95b17da..7487a30 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -35,6 +35,8 @@
 
 struct pmu_caps {
     u64 perf_cap;
+    u32 msr_gp_counter_base;
+    u32 msr_gp_event_select_base;
 };
 
 extern struct cpuid cpuid_10;
@@ -42,6 +44,46 @@ extern struct pmu_caps pmu;
 
 void pmu_init(void);
 
+static inline u32 gp_counter_base(void)
+{
+	return pmu.msr_gp_counter_base;
+}
+
+static inline void set_gp_counter_base(u32 new_base)
+{
+	pmu.msr_gp_counter_base = new_base;
+}
+
+static inline u32 gp_event_select_base(void)
+{
+	return pmu.msr_gp_event_select_base;
+}
+
+static inline void set_gp_event_select_base(u32 new_base)
+{
+	pmu.msr_gp_event_select_base = new_base;
+}
+
+static inline u32 gp_counter_msr(unsigned int i)
+{
+	return gp_counter_base() + i;
+}
+
+static inline u32 gp_event_select_msr(unsigned int i)
+{
+	return gp_event_select_base() + i;
+}
+
+static inline void write_gp_counter_value(unsigned int i, u64 value)
+{
+	wrmsr(gp_counter_msr(i), value);
+}
+
+static inline void write_gp_event_select(unsigned int i, u64 value)
+{
+	wrmsr(gp_event_select_msr(i), value);
+}
+
 static inline u8 pmu_version(void)
 {
 	return cpuid_10.a & 0xff;
@@ -109,4 +151,9 @@ static inline bool pmu_has_full_writes(void)
 	return this_cpu_perf_capabilities() & PMU_CAP_FW_WRITES;
 }
 
+static inline bool pmu_use_full_writes(void)
+{
+	return gp_counter_base() == MSR_IA32_PMC0;
+}
+
 #endif /* _X86_PMU_H_ */
diff --git a/x86/pmu.c b/x86/pmu.c
index a6329cd..589c7cb 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -44,8 +44,6 @@ struct pmu_event {
 	{"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
 };
 
-static u64 gp_counter_base = MSR_IA32_PERFCTR0;
-
 char *buf;
 
 static inline void loop(void)
@@ -84,7 +82,7 @@ static bool is_gp(pmu_counter_t *evt)
 
 static int event_to_global_idx(pmu_counter_t *cnt)
 {
-	return cnt->ctr - (is_gp(cnt) ? gp_counter_base :
+	return cnt->ctr - (is_gp(cnt) ? gp_counter_base() :
 		(MSR_CORE_PERF_FIXED_CTR0 - FIXED_CNT_INDEX));
 }
 
@@ -121,8 +119,7 @@ static void __start_event(pmu_counter_t *evt, uint64_t count)
     evt->count = count;
     wrmsr(evt->ctr, evt->count);
     if (is_gp(evt))
-	    wrmsr(MSR_P6_EVNTSEL0 + event_to_global_idx(evt),
-			    evt->config | EVNTSEL_EN);
+	    write_gp_event_select(event_to_global_idx(evt), evt->config | EVNTSEL_EN);
     else {
 	    uint32_t ctrl = rdmsr(MSR_CORE_PERF_FIXED_CTR_CTRL);
 	    int shift = (evt->ctr - MSR_CORE_PERF_FIXED_CTR0) * 4;
@@ -150,8 +147,7 @@ static void stop_event(pmu_counter_t *evt)
 {
 	global_disable(evt);
 	if (is_gp(evt))
-		wrmsr(MSR_P6_EVNTSEL0 + event_to_global_idx(evt),
-				evt->config & ~EVNTSEL_EN);
+		write_gp_event_select(event_to_global_idx(evt), evt->config & ~EVNTSEL_EN);
 	else {
 		uint32_t ctrl = rdmsr(MSR_CORE_PERF_FIXED_CTR_CTRL);
 		int shift = (evt->ctr - MSR_CORE_PERF_FIXED_CTR0) * 4;
@@ -198,12 +194,12 @@ static void check_gp_counter(struct pmu_event *evt)
 {
 	int nr_gp_counters = pmu_nr_gp_counters();
 	pmu_counter_t cnt = {
-		.ctr = gp_counter_base,
 		.config = EVNTSEL_OS | EVNTSEL_USR | evt->unit_sel,
 	};
 	int i;
 
-	for (i = 0; i < nr_gp_counters; i++, cnt.ctr++) {
+	for (i = 0; i < nr_gp_counters; i++) {
+		cnt.ctr = gp_counter_msr(i);
 		measure_one(&cnt);
 		report(verify_event(cnt.count, evt), "%s-%d", evt->name, i);
 	}
@@ -247,7 +243,7 @@ static void check_counters_many(void)
 		if (!pmu_gp_counter_is_available(i))
 			continue;
 
-		cnt[n].ctr = gp_counter_base + n;
+		cnt[n].ctr = gp_counter_msr(n);
 		cnt[n].config = EVNTSEL_OS | EVNTSEL_USR |
 			gp_events[i % ARRAY_SIZE(gp_events)].unit_sel;
 		n++;
@@ -287,7 +283,7 @@ static void check_counter_overflow(void)
 	uint64_t overflow_preset;
 	int i;
 	pmu_counter_t cnt = {
-		.ctr = gp_counter_base,
+		.ctr = gp_counter_msr(0),
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
 	};
 	overflow_preset = measure_for_overflow(&cnt);
@@ -297,18 +293,20 @@ static void check_counter_overflow(void)
 
 	report_prefix_push("overflow");
 
-	for (i = 0; i < nr_gp_counters + 1; i++, cnt.ctr++) {
+	for (i = 0; i < nr_gp_counters + 1; i++) {
 		uint64_t status;
 		int idx;
 
 		cnt.count = overflow_preset;
-		if (gp_counter_base == MSR_IA32_PMC0)
+		if (pmu_use_full_writes())
 			cnt.count &= (1ull << pmu_gp_counter_width()) - 1;
 
 		if (i == nr_gp_counters) {
 			cnt.ctr = fixed_events[0].unit_sel;
 			cnt.count = measure_for_overflow(&cnt);
 			cnt.count &= (1ull << pmu_fixed_counter_width()) - 1;
+		} else {
+			cnt.ctr = gp_counter_msr(i);
 		}
 
 		if (i % 2)
@@ -332,7 +330,7 @@ static void check_counter_overflow(void)
 static void check_gp_counter_cmask(void)
 {
 	pmu_counter_t cnt = {
-		.ctr = gp_counter_base,
+		.ctr = gp_counter_msr(0),
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
 	};
 	cnt.config |= (0x2 << EVNTSEL_CMASK_SHIFT);
@@ -367,7 +365,7 @@ static void check_rdpmc(void)
 	for (i = 0; i < nr_gp_counters; i++) {
 		uint64_t x;
 		pmu_counter_t cnt = {
-			.ctr = gp_counter_base + i,
+			.ctr = gp_counter_msr(i),
 			.idx = i
 		};
 
@@ -375,7 +373,7 @@ static void check_rdpmc(void)
 	         * Without full-width writes, only the low 32 bits are writable,
 	         * and the value is sign-extended.
 	         */
-		if (gp_counter_base == MSR_IA32_PERFCTR0)
+		if (gp_counter_base() == MSR_IA32_PERFCTR0)
 			x = (uint64_t)(int64_t)(int32_t)val;
 		else
 			x = (uint64_t)(int64_t)val;
@@ -383,7 +381,7 @@ static void check_rdpmc(void)
 		/* Mask according to the number of supported bits */
 		x &= (1ull << gp_counter_width) - 1;
 
-		wrmsr(gp_counter_base + i, val);
+		write_gp_counter_value(i, val);
 		report(rdpmc(i) == x, "cntr-%d", i);
 
 		exc = test_for_exception(GP_VECTOR, do_rdpmc_fast, &cnt);
@@ -417,7 +415,7 @@ static void check_running_counter_wrmsr(void)
 	uint64_t status;
 	uint64_t count;
 	pmu_counter_t evt = {
-		.ctr = gp_counter_base,
+		.ctr = gp_counter_msr(0),
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
 	};
 
@@ -425,7 +423,7 @@ static void check_running_counter_wrmsr(void)
 
 	start_event(&evt);
 	loop();
-	wrmsr(gp_counter_base, 0);
+	write_gp_counter_value(0, 0);
 	stop_event(&evt);
 	report(evt.count < gp_events[1].min, "cntr");
 
@@ -436,10 +434,10 @@ static void check_running_counter_wrmsr(void)
 	start_event(&evt);
 
 	count = -1;
-	if (gp_counter_base == MSR_IA32_PMC0)
+	if (pmu_use_full_writes())
 		count &= (1ull << pmu_gp_counter_width()) - 1;
 
-	wrmsr(gp_counter_base, count);
+	write_gp_counter_value(0, count);
 
 	loop();
 	stop_event(&evt);
@@ -453,12 +451,12 @@ static void check_emulated_instr(void)
 {
 	uint64_t status, instr_start, brnch_start;
 	pmu_counter_t brnch_cnt = {
-		.ctr = MSR_IA32_PERFCTR0,
+		.ctr = gp_counter_msr(0),
 		/* branch instructions */
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[5].unit_sel,
 	};
 	pmu_counter_t instr_cnt = {
-		.ctr = MSR_IA32_PERFCTR0 + 1,
+		.ctr = gp_counter_msr(1),
 		/* instructions */
 		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
 	};
@@ -472,8 +470,8 @@ static void check_emulated_instr(void)
 
 	brnch_start = -EXPECTED_BRNCH;
 	instr_start = -EXPECTED_INSTR;
-	wrmsr(MSR_IA32_PERFCTR0, brnch_start);
-	wrmsr(MSR_IA32_PERFCTR0 + 1, instr_start);
+	write_gp_counter_value(0, brnch_start);
+	write_gp_counter_value(1, instr_start);
 	// KVM_FEP is a magic prefix that forces emulation so
 	// 'KVM_FEP "jne label\n"' just counts as a single instruction.
 	asm volatile(
@@ -670,7 +668,7 @@ int main(int ac, char **av)
 	check_counters();
 
 	if (pmu_has_full_writes()) {
-		gp_counter_base = MSR_IA32_PMC0;
+		set_gp_counter_base(MSR_IA32_PMC0);
 		report_prefix_push("full-width writes");
 		check_counters();
 		check_gp_counters_write_width();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (15 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:55   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers Like Xu
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

In generic pmu testing, it is very common to initialize the test env by
resetting counters registers. Add these helpers to for code reusability.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  1 +
 lib/x86/pmu.h | 38 ++++++++++++++++++++++++++++++++++++++
 x86/pmu.c     |  2 +-
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index c0d100d..0ce1691 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -10,4 +10,5 @@ void pmu_init(void)
         pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
     pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
     pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
+    reset_all_counters();
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 7487a30..564b672 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -156,4 +156,42 @@ static inline bool pmu_use_full_writes(void)
 	return gp_counter_base() == MSR_IA32_PMC0;
 }
 
+static inline u32 fixed_counter_msr(unsigned int i)
+{
+	return MSR_CORE_PERF_FIXED_CTR0 + i;
+}
+
+static inline void write_fixed_counter_value(unsigned int i, u64 value)
+{
+	wrmsr(fixed_counter_msr(i), value);
+}
+
+static inline void reset_all_gp_counters(void)
+{
+	unsigned int idx;
+
+	for (idx = 0; idx < pmu_nr_gp_counters(); idx++) {
+		write_gp_event_select(idx, 0);
+		write_gp_counter_value(idx, 0);
+	}
+}
+
+static inline void reset_all_fixed_counters(void)
+{
+    unsigned int idx;
+
+	if (!pmu_nr_fixed_counters())
+		return;
+
+	wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+	for (idx = 0; idx < pmu_nr_fixed_counters(); idx++)
+		write_fixed_counter_value(idx, 0);
+}
+
+static inline void reset_all_counters(void)
+{
+    reset_all_gp_counters();
+    reset_all_fixed_counters();
+}
+
 #endif /* _X86_PMU_H_ */
diff --git a/x86/pmu.c b/x86/pmu.c
index 589c7cb..7786b49 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -397,7 +397,7 @@ static void check_rdpmc(void)
 			.idx = i
 		};
 
-		wrmsr(MSR_CORE_PERF_FIXED_CTR0 + i, x);
+		write_fixed_counter_value(i, x);
 		report(rdpmc(i | (1 << 30)) == x, "fixed cntr-%d", i);
 
 		exc = test_for_exception(GP_VECTOR, do_rdpmc_fast, &cnt);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (16 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:56   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 19/24] x86: Add tests for Guest Processor Event Based Sampling (PEBS) Like Xu
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Although AMD and Intel's pmu have the same semantics in terms of
global control features (including ctl and status), their msr indexes
are not the same, and the tests can be fully reused by adding helpers.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  3 +++
 lib/x86/pmu.h | 33 +++++++++++++++++++++++++++++++++
 x86/pmu.c     | 31 +++++++++++++------------------
 3 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 0ce1691..3b6be37 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -10,5 +10,8 @@ void pmu_init(void)
         pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
     pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
     pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
+    pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
+    pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
+    pmu.msr_global_status_clr = MSR_CORE_PERF_GLOBAL_OVF_CTRL;
     reset_all_counters();
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 564b672..ef83934 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -37,6 +37,9 @@ struct pmu_caps {
     u64 perf_cap;
     u32 msr_gp_counter_base;
     u32 msr_gp_event_select_base;
+    u32 msr_global_status;
+    u32 msr_global_ctl;
+    u32 msr_global_status_clr;
 };
 
 extern struct cpuid cpuid_10;
@@ -194,4 +197,34 @@ static inline void reset_all_counters(void)
     reset_all_fixed_counters();
 }
 
+static inline void pmu_clear_global_status(void)
+{
+	wrmsr(pmu.msr_global_status_clr, rdmsr(pmu.msr_global_status));
+}
+
+static inline u64 pmu_get_global_status(void)
+{
+	return rdmsr(pmu.msr_global_status);
+}
+
+static inline u64 pmu_get_global_enable(void)
+{
+	return rdmsr(pmu.msr_global_ctl);
+}
+
+static inline void pmu_set_global_enable(u64 bitmask)
+{
+	wrmsr(pmu.msr_global_ctl, bitmask);
+}
+
+static inline void pmu_reset_global_enable(void)
+{
+	wrmsr(pmu.msr_global_ctl, 0);
+}
+
+static inline void pmu_ack_global_status(u64 value)
+{
+	wrmsr(pmu.msr_global_status_clr, value);
+}
+
 #endif /* _X86_PMU_H_ */
diff --git a/x86/pmu.c b/x86/pmu.c
index 7786b49..015591f 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -103,15 +103,12 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
 static void global_enable(pmu_counter_t *cnt)
 {
 	cnt->idx = event_to_global_idx(cnt);
-
-	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_CTRL) |
-			(1ull << cnt->idx));
+	pmu_set_global_enable(pmu_get_global_enable() | BIT_ULL(cnt->idx));
 }
 
 static void global_disable(pmu_counter_t *cnt)
 {
-	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_CTRL) &
-			~(1ull << cnt->idx));
+	pmu_set_global_enable(pmu_get_global_enable() & ~BIT_ULL(cnt->idx));
 }
 
 static void __start_event(pmu_counter_t *evt, uint64_t count)
@@ -289,7 +286,7 @@ static void check_counter_overflow(void)
 	overflow_preset = measure_for_overflow(&cnt);
 
 	/* clear status before test */
-	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
+	pmu_clear_global_status();
 
 	report_prefix_push("overflow");
 
@@ -316,10 +313,10 @@ static void check_counter_overflow(void)
 		idx = event_to_global_idx(&cnt);
 		__measure(&cnt, cnt.count);
 		report(cnt.count == 1, "cntr-%d", i);
-		status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
+		status = pmu_get_global_status();
 		report(status & (1ull << idx), "status-%d", i);
-		wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, status);
-		status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
+		pmu_ack_global_status(status);
+		status = pmu_get_global_status();
 		report(!(status & (1ull << idx)), "status clear-%d", i);
 		report(check_irq() == (i % 2), "irq-%d", i);
 	}
@@ -428,8 +425,7 @@ static void check_running_counter_wrmsr(void)
 	report(evt.count < gp_events[1].min, "cntr");
 
 	/* clear status before overflow test */
-	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
-	      rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
+	pmu_clear_global_status();
 
 	start_event(&evt);
 
@@ -441,8 +437,8 @@ static void check_running_counter_wrmsr(void)
 
 	loop();
 	stop_event(&evt);
-	status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
-	report(status & 1, "status");
+	status = pmu_get_global_status();
+	report(status & 1, "status msr bit");
 
 	report_prefix_pop();
 }
@@ -462,8 +458,7 @@ static void check_emulated_instr(void)
 	};
 	report_prefix_push("emulated instruction");
 
-	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
-	      rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
+	pmu_clear_global_status();
 
 	start_event(&brnch_cnt);
 	start_event(&instr_cnt);
@@ -497,7 +492,7 @@ static void check_emulated_instr(void)
 		:
 		: "eax", "ebx", "ecx", "edx");
 
-	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+	pmu_reset_global_enable();
 
 	stop_event(&brnch_cnt);
 	stop_event(&instr_cnt);
@@ -509,7 +504,7 @@ static void check_emulated_instr(void)
 	report(brnch_cnt.count - brnch_start >= EXPECTED_BRNCH,
 	       "branch count");
 	// Additionally check that those counters overflowed properly.
-	status = rdmsr(MSR_CORE_PERF_GLOBAL_STATUS);
+	status = pmu_get_global_status();
 	report(status & 1, "branch counter overflow");
 	report(status & 2, "instruction counter overflow");
 
@@ -598,7 +593,7 @@ static void set_ref_cycle_expectations(void)
 	if (!pmu_nr_gp_counters() || !pmu_gp_counter_is_available(2))
 		return;
 
-	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+	pmu_reset_global_enable();
 
 	t0 = fenced_rdtsc();
 	start_event(&cnt);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 19/24] x86: Add tests for Guest Processor Event Based Sampling (PEBS)
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (17 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 20/24] x86/pmu: Add global helpers to cover Intel Arch PMU Version 1 Like Xu
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

This unit-test is intended to test the KVM's support for
the Processor Event Based Sampling (PEBS) which is another
PMU feature on Intel processors (start from Ice Lake Server).

If a bit in PEBS_ENABLE is set to 1, its corresponding counter will
write at least one PEBS records (including partial state of the vcpu
at the time of the current hardware event) to the guest memory on
counter overflow, and trigger an interrupt at a specific DS state.
The format of a PEBS record can be configured by another register.

These tests cover most usage scenarios, for example there are some
specially constructed scenarios (not a typical behaviour of Linux
PEBS driver). It lowers the threshold for others to understand this
feature and opens up more exploration of KVM implementation or
hw feature itself.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/msr.h       |   1 +
 lib/x86/pmu.h       |  29 +++
 x86/Makefile.x86_64 |   1 +
 x86/pmu_pebs.c      | 433 ++++++++++++++++++++++++++++++++++++++++++++
 x86/unittests.cfg   |   8 +
 5 files changed, 472 insertions(+)
 create mode 100644 x86/pmu_pebs.c

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index bbe29fd..68d8837 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -52,6 +52,7 @@
 #define MSR_IA32_MCG_CTL		0x0000017b
 
 #define MSR_IA32_PEBS_ENABLE		0x000003f1
+#define MSR_PEBS_DATA_CFG		0x000003f2
 #define MSR_IA32_DS_AREA		0x00000600
 #define MSR_IA32_PERF_CAPABILITIES	0x00000345
 
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index ef83934..9ba2419 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -14,6 +14,8 @@
 
 #define PMU_CAP_LBR_FMT	  0x3f
 #define PMU_CAP_FW_WRITES	(1ULL << 13)
+#define PMU_CAP_PEBS_BASELINE	(1ULL << 14)
+#define PERF_CAP_PEBS_FORMAT           0xf00
 
 #define EVNSEL_EVENT_SHIFT	0
 #define EVNTSEL_UMASK_SHIFT	8
@@ -33,6 +35,18 @@
 #define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
 #define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
 
+#define GLOBAL_STATUS_BUFFER_OVF_BIT		62
+#define GLOBAL_STATUS_BUFFER_OVF	BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
+
+#define PEBS_DATACFG_MEMINFO	BIT_ULL(0)
+#define PEBS_DATACFG_GP	BIT_ULL(1)
+#define PEBS_DATACFG_XMMS	BIT_ULL(2)
+#define PEBS_DATACFG_LBRS	BIT_ULL(3)
+
+#define ICL_EVENTSEL_ADAPTIVE				(1ULL << 34)
+#define PEBS_DATACFG_LBR_SHIFT	24
+#define MAX_NUM_LBR_ENTRY	32
+
 struct pmu_caps {
     u64 perf_cap;
     u32 msr_gp_counter_base;
@@ -227,4 +241,19 @@ static inline void pmu_ack_global_status(u64 value)
 	wrmsr(pmu.msr_global_status_clr, value);
 }
 
+static inline bool pmu_version_support_pebs(void)
+{
+	return pmu_version() > 1;
+}
+
+static inline u8 pmu_pebs_format(void)
+{
+	return (pmu.perf_cap & PERF_CAP_PEBS_FORMAT ) >> 8;
+}
+
+static inline bool pebs_has_baseline(void)
+{
+	return pmu.perf_cap & PMU_CAP_PEBS_BASELINE;
+}
+
 #endif /* _X86_PMU_H_ */
diff --git a/x86/Makefile.x86_64 b/x86/Makefile.x86_64
index 8f9463c..bd827fe 100644
--- a/x86/Makefile.x86_64
+++ b/x86/Makefile.x86_64
@@ -33,6 +33,7 @@ tests += $(TEST_DIR)/vmware_backdoors.$(exe)
 tests += $(TEST_DIR)/rdpru.$(exe)
 tests += $(TEST_DIR)/pks.$(exe)
 tests += $(TEST_DIR)/pmu_lbr.$(exe)
+tests += $(TEST_DIR)/pmu_pebs.$(exe)
 
 ifeq ($(CONFIG_EFI),y)
 tests += $(TEST_DIR)/amd_sev.$(exe)
diff --git a/x86/pmu_pebs.c b/x86/pmu_pebs.c
new file mode 100644
index 0000000..b318a2c
--- /dev/null
+++ b/x86/pmu_pebs.c
@@ -0,0 +1,433 @@
+#include "x86/msr.h"
+#include "x86/processor.h"
+#include "x86/pmu.h"
+#include "x86/isr.h"
+#include "x86/apic.h"
+#include "x86/apic-defs.h"
+#include "x86/desc.h"
+#include "alloc.h"
+
+#include "vm.h"
+#include "types.h"
+#include "processor.h"
+#include "vmalloc.h"
+#include "alloc_page.h"
+
+/* bits [63:48] provides the size of the current record in bytes */
+#define	RECORD_SIZE_OFFSET	48
+
+static unsigned int max_nr_gp_events;
+static unsigned long *ds_bufer;
+static unsigned long *pebs_buffer;
+static u64 ctr_start_val;
+static bool has_baseline;
+
+struct debug_store {
+	u64	bts_buffer_base;
+	u64	bts_index;
+	u64	bts_absolute_maximum;
+	u64	bts_interrupt_threshold;
+	u64	pebs_buffer_base;
+	u64	pebs_index;
+	u64	pebs_absolute_maximum;
+	u64	pebs_interrupt_threshold;
+	u64	pebs_event_reset[64];
+};
+
+struct pebs_basic {
+	u64 format_size;
+	u64 ip;
+	u64 applicable_counters;
+	u64 tsc;
+};
+
+struct pebs_meminfo {
+	u64 address;
+	u64 aux;
+	u64 latency;
+	u64 tsx_tuning;
+};
+
+struct pebs_gprs {
+	u64 flags, ip, ax, cx, dx, bx, sp, bp, si, di;
+	u64 r8, r9, r10, r11, r12, r13, r14, r15;
+};
+
+struct pebs_xmm {
+	u64 xmm[16*2];	/* two entries for each register */
+};
+
+struct lbr_entry {
+	u64 from;
+	u64 to;
+	u64 info;
+};
+
+enum pmc_type {
+	GP = 0,
+	FIXED,
+};
+
+static uint32_t intel_arch_events[] = {
+	0x00c4, /* PERF_COUNT_HW_BRANCH_INSTRUCTIONS */
+	0x00c5, /* PERF_COUNT_HW_BRANCH_MISSES */
+	0x0300, /* PERF_COUNT_HW_REF_CPU_CYCLES */
+	0x003c, /* PERF_COUNT_HW_CPU_CYCLES */
+	0x00c0, /* PERF_COUNT_HW_INSTRUCTIONS */
+	0x013c, /* PERF_COUNT_HW_BUS_CYCLES */
+	0x4f2e, /* PERF_COUNT_HW_CACHE_REFERENCES */
+	0x412e, /* PERF_COUNT_HW_CACHE_MISSES */
+};
+
+static u64 pebs_data_cfgs[] = {
+	PEBS_DATACFG_MEMINFO,
+	PEBS_DATACFG_GP,
+	PEBS_DATACFG_XMMS,
+	PEBS_DATACFG_LBRS | ((MAX_NUM_LBR_ENTRY -1) << PEBS_DATACFG_LBR_SHIFT),
+};
+
+/* Iterating each counter value is a waste of time, pick a few typical values. */
+static u64 counter_start_values[] = {
+	/* if PEBS counter doesn't overflow at all */
+	0,
+	0xfffffffffff0,
+	/* normal counter overflow to have PEBS records */
+	0xfffffffffffe,
+	/* test whether emulated instructions should trigger PEBS */
+	0xffffffffffff,
+};
+
+static unsigned int get_adaptive_pebs_record_size(u64 pebs_data_cfg)
+{
+	unsigned int sz = sizeof(struct pebs_basic);
+
+	if (!has_baseline)
+		return sz;
+
+	if (pebs_data_cfg & PEBS_DATACFG_MEMINFO)
+		sz += sizeof(struct pebs_meminfo);
+	if (pebs_data_cfg & PEBS_DATACFG_GP)
+		sz += sizeof(struct pebs_gprs);
+	if (pebs_data_cfg & PEBS_DATACFG_XMMS)
+		sz += sizeof(struct pebs_xmm);
+	if (pebs_data_cfg & PEBS_DATACFG_LBRS)
+		sz += MAX_NUM_LBR_ENTRY * sizeof(struct lbr_entry);
+
+	return sz;
+}
+
+static void cnt_overflow(isr_regs_t *regs)
+{
+	apic_write(APIC_EOI, 0);
+}
+
+static inline void workload(void)
+{
+	asm volatile(
+		"mov $0x0, %%eax\n"
+		"cmp $0x0, %%eax\n"
+		"jne label2\n"
+		"jne label2\n"
+		"jne label2\n"
+		"jne label2\n"
+		"mov $0x0, %%eax\n"
+		"cmp $0x0, %%eax\n"
+		"jne label2\n"
+		"jne label2\n"
+		"jne label2\n"
+		"jne label2\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"label2:\n"
+		:
+		:
+		: "eax", "ebx", "ecx", "edx");
+}
+
+static inline void workload2(void)
+{
+	asm volatile(
+		"mov $0x0, %%eax\n"
+		"cmp $0x0, %%eax\n"
+		"jne label3\n"
+		"jne label3\n"
+		"jne label3\n"
+		"jne label3\n"
+		"mov $0x0, %%eax\n"
+		"cmp $0x0, %%eax\n"
+		"jne label3\n"
+		"jne label3\n"
+		"jne label3\n"
+		"jne label3\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"mov $0xa, %%eax\n"
+		"cpuid\n"
+		"label3:\n"
+		:
+		:
+		: "eax", "ebx", "ecx", "edx");
+}
+
+static void alloc_buffers(void)
+{
+	ds_bufer = alloc_page();
+	force_4k_page(ds_bufer);
+	memset(ds_bufer, 0x0, PAGE_SIZE);
+
+	pebs_buffer = alloc_page();
+	force_4k_page(pebs_buffer);
+	memset(pebs_buffer, 0x0, PAGE_SIZE);
+}
+
+static void free_buffers(void)
+{
+	if (ds_bufer)
+		free_page(ds_bufer);
+
+	if (pebs_buffer)
+		free_page(pebs_buffer);
+}
+
+static void pebs_enable(u64 bitmask, u64 pebs_data_cfg)
+{
+	static struct debug_store *ds;
+	u64 baseline_extra_ctrl = 0, fixed_ctr_ctrl = 0;
+	unsigned int idx;
+
+	if (has_baseline)
+		wrmsr(MSR_PEBS_DATA_CFG, pebs_data_cfg);
+
+	ds = (struct debug_store *)ds_bufer;
+	ds->pebs_index = ds->pebs_buffer_base = (unsigned long)pebs_buffer;
+	ds->pebs_absolute_maximum = (unsigned long)pebs_buffer + PAGE_SIZE;
+	ds->pebs_interrupt_threshold = ds->pebs_buffer_base +
+		get_adaptive_pebs_record_size(pebs_data_cfg);
+
+	for (idx = 0; idx < pmu_nr_fixed_counters(); idx++) {
+		if (!(BIT_ULL(FIXED_CNT_INDEX + idx) & bitmask))
+			continue;
+		if (has_baseline)
+			baseline_extra_ctrl = BIT(FIXED_CNT_INDEX + idx * 4);
+		write_fixed_counter_value(idx, ctr_start_val);
+		fixed_ctr_ctrl |= (0xbULL << (idx * 4) | baseline_extra_ctrl);
+	}
+	if (fixed_ctr_ctrl)
+		wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, fixed_ctr_ctrl);
+
+	for (idx = 0; idx < max_nr_gp_events; idx++) {
+		if (!(BIT_ULL(idx) & bitmask))
+			continue;
+		if (has_baseline)
+			baseline_extra_ctrl = ICL_EVENTSEL_ADAPTIVE;
+		write_gp_event_select(idx, EVNTSEL_EN | EVNTSEL_OS | EVNTSEL_USR |
+		      intel_arch_events[idx] | baseline_extra_ctrl);
+		write_gp_counter_value(idx, ctr_start_val);
+	}
+
+	wrmsr(MSR_IA32_DS_AREA,  (unsigned long)ds_bufer);
+	wrmsr(MSR_IA32_PEBS_ENABLE, bitmask);
+	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, bitmask);
+}
+
+static void reset_pebs(void)
+{
+	memset(ds_bufer, 0x0, PAGE_SIZE);
+	memset(pebs_buffer, 0x0, PAGE_SIZE);
+	wrmsr(MSR_IA32_PEBS_ENABLE, 0);
+	wrmsr(MSR_IA32_DS_AREA,  0);
+	if (has_baseline)
+		wrmsr(MSR_PEBS_DATA_CFG, 0);
+
+	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+	wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
+
+	reset_all_counters();
+}
+
+static void pebs_disable(unsigned int idx)
+{
+	/*
+	* If we only clear the PEBS_ENABLE bit, the counter will continue to increment.
+	* In this very tiny time window, if the counter overflows no pebs record will be generated,
+	* but a normal counter irq. Test this fully with two ways.
+	*/
+	if (idx % 2)
+		wrmsr(MSR_IA32_PEBS_ENABLE, 0);
+
+	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+}
+
+static void check_pebs_records(u64 bitmask, u64 pebs_data_cfg)
+{
+	struct pebs_basic *pebs_rec = (struct pebs_basic *)pebs_buffer;
+	struct debug_store *ds = (struct debug_store *)ds_bufer;
+	unsigned int pebs_record_size = get_adaptive_pebs_record_size(pebs_data_cfg);
+	unsigned int count = 0;
+	bool expected, pebs_idx_match, pebs_size_match, data_cfg_match;
+	void *cur_record;
+
+	expected = (ds->pebs_index == ds->pebs_buffer_base) && !pebs_rec->format_size;
+	if (!(rdmsr(MSR_CORE_PERF_GLOBAL_STATUS) & GLOBAL_STATUS_BUFFER_OVF)) {
+		report(expected, "No OVF irq, none PEBS records.");
+		return;
+	}
+
+	if (expected) {
+		report(!expected, "A OVF irq, but none PEBS records.");
+		return;
+	}
+
+	expected = ds->pebs_index >= ds->pebs_interrupt_threshold;
+	cur_record = (void *)pebs_buffer;
+	do {
+		pebs_rec = (struct pebs_basic *)cur_record;
+		pebs_record_size = pebs_rec->format_size >> RECORD_SIZE_OFFSET;
+		pebs_idx_match =
+			pebs_rec->applicable_counters & bitmask;
+		pebs_size_match =
+			pebs_record_size == get_adaptive_pebs_record_size(pebs_data_cfg);
+		data_cfg_match =
+			(pebs_rec->format_size & GENMASK_ULL(47, 0)) == pebs_data_cfg;
+		expected = pebs_idx_match && pebs_size_match && data_cfg_match;
+		report(expected,
+		       "PEBS record (written seq %d) is verified (inclduing size, counters and cfg).", count);
+		cur_record = cur_record + pebs_record_size;
+		count++;
+	} while (expected && (void *)cur_record < (void *)ds->pebs_index);
+
+	if (!expected) {
+		if (!pebs_idx_match)
+			printf("FAIL: The applicable_counters (0x%lx) doesn't match with pmc_bitmask (0x%lx).\n",
+			       pebs_rec->applicable_counters, bitmask);
+		if (!pebs_size_match)
+			printf("FAIL: The pebs_record_size (%d) doesn't match with MSR_PEBS_DATA_CFG (%d).\n",
+			       pebs_record_size, get_adaptive_pebs_record_size(pebs_data_cfg));
+		if (!data_cfg_match)
+			printf("FAIL: The pebs_data_cfg (0x%lx) doesn't match with MSR_PEBS_DATA_CFG (0x%lx).\n",
+			       pebs_rec->format_size & 0xffffffffffff, pebs_data_cfg);
+	}
+}
+
+static void check_one_counter(enum pmc_type type,
+			      unsigned int idx, u64 pebs_data_cfg)
+{
+	int pebs_bit = BIT_ULL(type == FIXED ? FIXED_CNT_INDEX + idx : idx);
+
+	report_prefix_pushf("%s counter %d (0x%lx)",
+			    type == FIXED ? "Extended Fixed" : "GP", idx, ctr_start_val);
+	reset_pebs();
+	pebs_enable(pebs_bit, pebs_data_cfg);
+	workload();
+	pebs_disable(idx);
+	check_pebs_records(pebs_bit, pebs_data_cfg);
+	report_prefix_pop();
+}
+
+/* more than one PEBS records will be generated. */
+static void check_multiple_counters(u64 bitmask, u64 pebs_data_cfg)
+{
+	reset_pebs();
+	pebs_enable(bitmask, pebs_data_cfg);
+	workload2();
+	pebs_disable(0);
+	check_pebs_records(bitmask, pebs_data_cfg);
+}
+
+static void check_pebs_counters(u64 pebs_data_cfg)
+{
+	unsigned int idx;
+	u64 bitmask = 0;
+
+	for (idx = 0; idx < pmu_nr_fixed_counters(); idx++)
+		check_one_counter(FIXED, idx, pebs_data_cfg);
+
+	for (idx = 0; idx < max_nr_gp_events; idx++)
+		check_one_counter(GP, idx, pebs_data_cfg);
+
+	for (idx = 0; idx < pmu_nr_fixed_counters(); idx++)
+		bitmask |= BIT_ULL(FIXED_CNT_INDEX + idx);
+	for (idx = 0; idx < max_nr_gp_events; idx += 2)
+		bitmask |= BIT_ULL(idx);
+	report_prefix_pushf("Multiple (0x%lx)", bitmask);
+	check_multiple_counters(bitmask, pebs_data_cfg);
+	report_prefix_pop();
+}
+
+/*
+ * Known reasons for none PEBS records:
+ *	1. The selected event does not support PEBS;
+ *	2. From a core pmu perspective, the vCPU and pCPU models are not same;
+ * 	3. Guest counter has not yet overflowed or been cross-mapped by the host;
+ */
+int main(int ac, char **av)
+{
+	unsigned int i, j;
+
+	setup_vm();
+
+	max_nr_gp_events = MIN(pmu_nr_gp_counters(), ARRAY_SIZE(intel_arch_events));
+
+	printf("PMU version: %d\n", pmu_version());
+
+	has_baseline = pebs_has_baseline();
+	if (pmu_has_full_writes())
+		set_gp_counter_base(MSR_IA32_PMC0);
+
+	if (!is_intel()) {
+		report_skip("PEBS requires Intel ICX or later, non-Intel detected");
+		return report_summary();
+	} else if (!pmu_version_support_pebs()) {
+		report_skip("PEBS required PMU version 2, reported version is %d", pmu_version());
+		return report_summary();
+	} else if (!pmu_pebs_format()) {
+		report_skip("PEBS not enumerated in PERF_CAPABILITIES");
+		return report_summary();
+	} else if (rdmsr(MSR_IA32_MISC_ENABLE) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL) {
+		report_skip("PEBS unavailable according to MISC_ENABLE");
+		return report_summary();
+	}
+
+	printf("PEBS format: %d\n", pmu_pebs_format());
+	printf("PEBS GP counters: %d\n", pmu_nr_gp_counters());
+	printf("PEBS Fixed counters: %d\n", pmu_nr_fixed_counters());
+	printf("PEBS baseline (Adaptive PEBS): %d\n", has_baseline);
+
+	handle_irq(PMI_VECTOR, cnt_overflow);
+	alloc_buffers();
+
+	for (i = 0; i < ARRAY_SIZE(counter_start_values); i++) {
+		ctr_start_val = counter_start_values[i];
+		check_pebs_counters(0);
+		if (!has_baseline)
+			continue;
+
+		for (j = 0; j < ARRAY_SIZE(pebs_data_cfgs); j++) {
+			report_prefix_pushf("Adaptive (0x%lx)", pebs_data_cfgs[j]);
+			check_pebs_counters(pebs_data_cfgs[j]);
+			report_prefix_pop();
+		}
+	}
+
+	free_buffers();
+
+	return report_summary();
+}
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 07d0507..54f0437 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -200,6 +200,14 @@ check = /proc/sys/kernel/nmi_watchdog=0
 accel = kvm
 groups = pmu
 
+[pmu_pebs]
+arch = x86_64
+file = pmu_pebs.flat
+extra_params = -cpu host,migratable=no
+check = /proc/sys/kernel/nmi_watchdog=0
+accel = kvm
+groups = pmu
+
 [vmware_backdoors]
 file = vmware_backdoors.flat
 extra_params = -machine vmport=on -cpu max
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 20/24] x86/pmu: Add global helpers to cover Intel Arch PMU Version 1
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (18 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 19/24] x86: Add tests for Guest Processor Event Based Sampling (PEBS) Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 21/24] x86/pmu: Add gp_events pointer to route different event tables Like Xu
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

To test Intel arch pmu version 1, most of the basic framework and
use cases which test any PMU counter do not require any changes,
except no access to registers introduced only in PMU version 2.

Adding some guardian's checks can seamlessly support version 1,
while opening the door for normal AMD PMUs tests.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c |  8 +++++---
 lib/x86/pmu.h |  5 +++++
 x86/pmu.c     | 47 +++++++++++++++++++++++++++++++----------------
 3 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 3b6be37..43e6a43 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -10,8 +10,10 @@ void pmu_init(void)
         pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
     pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
     pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
-    pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
-    pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
-    pmu.msr_global_status_clr = MSR_CORE_PERF_GLOBAL_OVF_CTRL;
+    if (this_cpu_support_perf_status()) {
+        pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
+        pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
+        pmu.msr_global_status_clr = MSR_CORE_PERF_GLOBAL_OVF_CTRL;
+    }
     reset_all_counters();
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 9ba2419..fa49a8f 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -116,6 +116,11 @@ static inline bool this_cpu_has_perf_global_ctrl(void)
 	return pmu_version() > 1;
 }
 
+static inline bool this_cpu_support_perf_status(void)
+{
+	return pmu_version() > 1;
+}
+
 static inline u8 pmu_nr_gp_counters(void)
 {
 	return (cpuid_10.a >> 8) & 0xff;
diff --git a/x86/pmu.c b/x86/pmu.c
index 015591f..daeb7a2 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -102,12 +102,18 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
 
 static void global_enable(pmu_counter_t *cnt)
 {
+	if (!this_cpu_has_perf_global_ctrl())
+		return;
+
 	cnt->idx = event_to_global_idx(cnt);
 	pmu_set_global_enable(pmu_get_global_enable() | BIT_ULL(cnt->idx));
 }
 
 static void global_disable(pmu_counter_t *cnt)
 {
+	if (!this_cpu_has_perf_global_ctrl())
+		return;
+
 	pmu_set_global_enable(pmu_get_global_enable() & ~BIT_ULL(cnt->idx));
 }
 
@@ -286,7 +292,8 @@ static void check_counter_overflow(void)
 	overflow_preset = measure_for_overflow(&cnt);
 
 	/* clear status before test */
-	pmu_clear_global_status();
+	if (this_cpu_support_perf_status())
+		pmu_clear_global_status();
 
 	report_prefix_push("overflow");
 
@@ -313,6 +320,10 @@ static void check_counter_overflow(void)
 		idx = event_to_global_idx(&cnt);
 		__measure(&cnt, cnt.count);
 		report(cnt.count == 1, "cntr-%d", i);
+
+		if (!this_cpu_support_perf_status())
+			continue;
+
 		status = pmu_get_global_status();
 		report(status & (1ull << idx), "status-%d", i);
 		pmu_ack_global_status(status);
@@ -425,7 +436,8 @@ static void check_running_counter_wrmsr(void)
 	report(evt.count < gp_events[1].min, "cntr");
 
 	/* clear status before overflow test */
-	pmu_clear_global_status();
+	if (this_cpu_support_perf_status())
+		pmu_clear_global_status();
 
 	start_event(&evt);
 
@@ -437,8 +449,11 @@ static void check_running_counter_wrmsr(void)
 
 	loop();
 	stop_event(&evt);
-	status = pmu_get_global_status();
-	report(status & 1, "status msr bit");
+
+	if (this_cpu_support_perf_status()) {
+		status = pmu_get_global_status();
+		report(status & 1, "status msr bit");
+	}
 
 	report_prefix_pop();
 }
@@ -458,7 +473,8 @@ static void check_emulated_instr(void)
 	};
 	report_prefix_push("emulated instruction");
 
-	pmu_clear_global_status();
+	if (this_cpu_support_perf_status())
+		pmu_clear_global_status();
 
 	start_event(&brnch_cnt);
 	start_event(&instr_cnt);
@@ -492,7 +508,8 @@ static void check_emulated_instr(void)
 		:
 		: "eax", "ebx", "ecx", "edx");
 
-	pmu_reset_global_enable();
+	if (this_cpu_has_perf_global_ctrl())
+		pmu_reset_global_enable();
 
 	stop_event(&brnch_cnt);
 	stop_event(&instr_cnt);
@@ -503,10 +520,12 @@ static void check_emulated_instr(void)
 	       "instruction count");
 	report(brnch_cnt.count - brnch_start >= EXPECTED_BRNCH,
 	       "branch count");
-	// Additionally check that those counters overflowed properly.
-	status = pmu_get_global_status();
-	report(status & 1, "branch counter overflow");
-	report(status & 2, "instruction counter overflow");
+	if (this_cpu_support_perf_status()) {
+		// Additionally check that those counters overflowed properly.
+		status = pmu_get_global_status();
+		report(status & 1, "branch counter overflow");
+		report(status & 2, "instruction counter overflow");
+	}
 
 	report_prefix_pop();
 }
@@ -593,7 +612,8 @@ static void set_ref_cycle_expectations(void)
 	if (!pmu_nr_gp_counters() || !pmu_gp_counter_is_available(2))
 		return;
 
-	pmu_reset_global_enable();
+	if (this_cpu_has_perf_global_ctrl())
+		pmu_reset_global_enable();
 
 	t0 = fenced_rdtsc();
 	start_event(&cnt);
@@ -644,11 +664,6 @@ int main(int ac, char **av)
 		return report_summary();
 	}
 
-	if (pmu_version() == 1) {
-		report_skip("PMU version 1 is not supported.");
-		return report_summary();
-	}
-
 	set_ref_cycle_expectations();
 
 	printf("PMU version:         %d\n", pmu_version());
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 21/24] x86/pmu: Add gp_events pointer to route different event tables
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (19 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 20/24] x86/pmu: Add global helpers to cover Intel Arch PMU Version 1 Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 22/24] x86/pmu: Add nr_gp_counters to limit the number of test counters Like Xu
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

AMD and Intel do not share the same set of coding rules for performance
events, and code to test the same performance event can be reused by
pointing to a different coding table, noting that the table size also
needs to be updated.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 x86/pmu.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index daeb7a2..24d015e 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -30,7 +30,7 @@ struct pmu_event {
 	uint32_t unit_sel;
 	int min;
 	int max;
-} gp_events[] = {
+} intel_gp_events[] = {
 	{"core cycles", 0x003c, 1*N, 50*N},
 	{"instructions", 0x00c0, 10*N, 10.2*N},
 	{"ref cycles", 0x013c, 1*N, 30*N},
@@ -46,6 +46,9 @@ struct pmu_event {
 
 char *buf;
 
+static struct pmu_event *gp_events;
+static unsigned int gp_events_size;
+
 static inline void loop(void)
 {
 	unsigned long tmp, tmp2, tmp3;
@@ -91,7 +94,7 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
 	if (is_gp(cnt)) {
 		int i;
 
-		for (i = 0; i < sizeof(gp_events)/sizeof(gp_events[0]); i++)
+		for (i = 0; i < gp_events_size; i++)
 			if (gp_events[i].unit_sel == (cnt->config & 0xffff))
 				return &gp_events[i];
 	} else
@@ -212,7 +215,7 @@ static void check_gp_counters(void)
 {
 	int i;
 
-	for (i = 0; i < sizeof(gp_events)/sizeof(gp_events[0]); i++)
+	for (i = 0; i < gp_events_size; i++)
 		if (pmu_gp_counter_is_available(i))
 			check_gp_counter(&gp_events[i]);
 		else
@@ -248,7 +251,7 @@ static void check_counters_many(void)
 
 		cnt[n].ctr = gp_counter_msr(n);
 		cnt[n].config = EVNTSEL_OS | EVNTSEL_USR |
-			gp_events[i % ARRAY_SIZE(gp_events)].unit_sel;
+			gp_events[i % gp_events_size].unit_sel;
 		n++;
 	}
 	for (i = 0; i < nr_fixed_counters; i++) {
@@ -603,7 +606,7 @@ static void set_ref_cycle_expectations(void)
 {
 	pmu_counter_t cnt = {
 		.ctr = MSR_IA32_PERFCTR0,
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[2].unit_sel,
+		.config = EVNTSEL_OS | EVNTSEL_USR | intel_gp_events[2].unit_sel,
 	};
 	uint64_t tsc_delta;
 	uint64_t t0, t1, t2, t3;
@@ -639,8 +642,8 @@ static void set_ref_cycle_expectations(void)
 	if (!tsc_delta)
 		return;
 
-	gp_events[2].min = (gp_events[2].min * cnt.count) / tsc_delta;
-	gp_events[2].max = (gp_events[2].max * cnt.count) / tsc_delta;
+	intel_gp_events[2].min = (intel_gp_events[2].min * cnt.count) / tsc_delta;
+	intel_gp_events[2].max = (intel_gp_events[2].max * cnt.count) / tsc_delta;
 }
 
 static void check_invalid_rdpmc_gp(void)
@@ -664,6 +667,8 @@ int main(int ac, char **av)
 		return report_summary();
 	}
 
+	gp_events = (struct pmu_event *)intel_gp_events;
+	gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
 	set_ref_cycle_expectations();
 
 	printf("PMU version:         %d\n", pmu_version());
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 22/24] x86/pmu: Add nr_gp_counters to limit the number of test counters
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (20 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 21/24] x86/pmu: Add gp_events pointer to route different event tables Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU Like Xu
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

The number of counters in amd is fixed (4 or 6), and the test code
can be reused by dynamically switching the maximum number of counters
(and register base addresses), with no change for Intel side.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/pmu.c | 1 +
 lib/x86/pmu.h | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 43e6a43..25e21e5 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -10,6 +10,7 @@ void pmu_init(void)
         pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
     pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
     pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
+    pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
     if (this_cpu_support_perf_status()) {
         pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
         pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index fa49a8f..4312b6e 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -54,6 +54,7 @@ struct pmu_caps {
     u32 msr_global_status;
     u32 msr_global_ctl;
     u32 msr_global_status_clr;
+    unsigned int nr_gp_counters;
 };
 
 extern struct cpuid cpuid_10;
@@ -123,7 +124,13 @@ static inline bool this_cpu_support_perf_status(void)
 
 static inline u8 pmu_nr_gp_counters(void)
 {
-	return (cpuid_10.a >> 8) & 0xff;
+	return pmu.nr_gp_counters;
+}
+
+static inline void set_nr_gp_counters(u8 new_num)
+{
+	if (new_num < pmu_nr_gp_counters())
+		pmu.nr_gp_counters = new_num;
 }
 
 static inline u8 pmu_gp_counter_width(void)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (21 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 22/24] x86/pmu: Add nr_gp_counters to limit the number of test counters Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 17:58   ` Sean Christopherson
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 24/24] x86/pmu: Add AMD Guest PerfMonV2 testcases Like Xu
  2022-11-02 18:35 ` [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Sean Christopherson
  24 siblings, 1 reply; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm, Sandipan Das

From: Like Xu <likexu@tencent.com>

AMD core PMU before Zen4 did not have version numbers, there were
no fixed counters, it had a hard-coded number of generic counters,
bit-width, and only hardware events common across amd generations
(starting with K7) were added to amd_gp_events[] table.

All above differences are instantiated at the detection step, and it
also covers the K7 PMU registers, which is consistent with bare-metal.

Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
 lib/x86/msr.h       | 17 +++++++++++++
 lib/x86/pmu.c       | 29 +++++++++++++++--------
 lib/x86/pmu.h       | 35 +++++++++++++++++++++++++--
 lib/x86/processor.h |  1 +
 x86/pmu.c           | 58 ++++++++++++++++++++++++++++++++++++---------
 5 files changed, 117 insertions(+), 23 deletions(-)

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 68d8837..6cf8f33 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -146,6 +146,23 @@
 #define FAM10H_MMIO_CONF_BASE_SHIFT	20
 #define MSR_FAM10H_NODE_ID		0xc001100c
 
+/* Fam 15h MSRs */
+#define MSR_F15H_PERF_CTL              0xc0010200
+#define MSR_F15H_PERF_CTL0             MSR_F15H_PERF_CTL
+#define MSR_F15H_PERF_CTL1             (MSR_F15H_PERF_CTL + 2)
+#define MSR_F15H_PERF_CTL2             (MSR_F15H_PERF_CTL + 4)
+#define MSR_F15H_PERF_CTL3             (MSR_F15H_PERF_CTL + 6)
+#define MSR_F15H_PERF_CTL4             (MSR_F15H_PERF_CTL + 8)
+#define MSR_F15H_PERF_CTL5             (MSR_F15H_PERF_CTL + 10)
+
+#define MSR_F15H_PERF_CTR              0xc0010201
+#define MSR_F15H_PERF_CTR0             MSR_F15H_PERF_CTR
+#define MSR_F15H_PERF_CTR1             (MSR_F15H_PERF_CTR + 2)
+#define MSR_F15H_PERF_CTR2             (MSR_F15H_PERF_CTR + 4)
+#define MSR_F15H_PERF_CTR3             (MSR_F15H_PERF_CTR + 6)
+#define MSR_F15H_PERF_CTR4             (MSR_F15H_PERF_CTR + 8)
+#define MSR_F15H_PERF_CTR5             (MSR_F15H_PERF_CTR + 10)
+
 /* K8 MSRs */
 #define MSR_K8_TOP_MEM1			0xc001001a
 #define MSR_K8_TOP_MEM2			0xc001001d
diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 25e21e5..7fd2279 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -5,16 +5,25 @@ struct pmu_caps pmu;
 
 void pmu_init(void)
 {
-    cpuid_10 = cpuid(10);
-    if (this_cpu_has(X86_FEATURE_PDCM))
-        pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
-    pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
-    pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
-    pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
-    if (this_cpu_support_perf_status()) {
-        pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
-        pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
-        pmu.msr_global_status_clr = MSR_CORE_PERF_GLOBAL_OVF_CTRL;
+    if (is_intel()) {
+        cpuid_10 = cpuid(10);
+        if (this_cpu_has(X86_FEATURE_PDCM))
+            pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
+        pmu.msr_gp_counter_base = MSR_IA32_PERFCTR0;
+        pmu.msr_gp_event_select_base = MSR_P6_EVNTSEL0;
+        pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
+        if (this_cpu_support_perf_status()) {
+            pmu.msr_global_status = MSR_CORE_PERF_GLOBAL_STATUS;
+            pmu.msr_global_ctl = MSR_CORE_PERF_GLOBAL_CTRL;
+            pmu.msr_global_status_clr = MSR_CORE_PERF_GLOBAL_OVF_CTRL;
+        }
+    } else {
+        pmu.msr_gp_counter_base = MSR_F15H_PERF_CTR0;
+        pmu.msr_gp_event_select_base = MSR_F15H_PERF_CTL0;
+        if (!has_amd_perfctr_core())
+            pmu.nr_gp_counters = AMD64_NUM_COUNTERS;
+        else
+            pmu.nr_gp_counters = AMD64_NUM_COUNTERS_CORE;
     }
     reset_all_counters();
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index 4312b6e..a4e00c5 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -10,6 +10,11 @@
 /* Performance Counter Vector for the LVT PC Register */
 #define PMI_VECTOR	32
 
+#define AMD64_NUM_COUNTERS	4
+#define AMD64_NUM_COUNTERS_CORE	6
+
+#define PMC_DEFAULT_WIDTH	48
+
 #define DEBUGCTLMSR_LBR	  (1UL <<  0)
 
 #define PMU_CAP_LBR_FMT	  0x3f
@@ -84,11 +89,17 @@ static inline void set_gp_event_select_base(u32 new_base)
 
 static inline u32 gp_counter_msr(unsigned int i)
 {
+	if (gp_counter_base() == MSR_F15H_PERF_CTR0)
+		return gp_counter_base() + 2 * i;
+
 	return gp_counter_base() + i;
 }
 
 static inline u32 gp_event_select_msr(unsigned int i)
 {
+	if (gp_event_select_base() == MSR_F15H_PERF_CTL0)
+		return gp_event_select_base() + 2 * i;
+
 	return gp_event_select_base() + i;
 }
 
@@ -104,11 +115,17 @@ static inline void write_gp_event_select(unsigned int i, u64 value)
 
 static inline u8 pmu_version(void)
 {
+	if (!is_intel())
+		return 0;
+
 	return cpuid_10.a & 0xff;
 }
 
 static inline bool this_cpu_has_pmu(void)
 {
+	if (!is_intel())
+		return true;
+
 	return !!pmu_version();
 }
 
@@ -135,12 +152,18 @@ static inline void set_nr_gp_counters(u8 new_num)
 
 static inline u8 pmu_gp_counter_width(void)
 {
-	return (cpuid_10.a >> 16) & 0xff;
+	if (is_intel())
+		return (cpuid_10.a >> 16) & 0xff;
+	else
+		return PMC_DEFAULT_WIDTH;
 }
 
 static inline u8 pmu_gp_counter_mask_length(void)
 {
-	return (cpuid_10.a >> 24) & 0xff;
+	if (is_intel())
+		return (cpuid_10.a >> 24) & 0xff;
+	else
+		return pmu_nr_gp_counters();
 }
 
 static inline u8 pmu_nr_fixed_counters(void)
@@ -161,6 +184,9 @@ static inline u8 pmu_fixed_counter_width(void)
 
 static inline bool pmu_gp_counter_is_available(int i)
 {
+	if (!is_intel())
+		return i < pmu_nr_gp_counters();
+
 	/* CPUID.0xA.EBX bit is '1 if they counter is NOT available. */
 	return !(cpuid_10.b & BIT(i));
 }
@@ -268,4 +294,9 @@ static inline bool pebs_has_baseline(void)
 	return pmu.perf_cap & PMU_CAP_PEBS_BASELINE;
 }
 
+static inline bool has_amd_perfctr_core(void)
+{
+	return this_cpu_has(X86_FEATURE_PERFCTR_CORE);
+}
+
 #endif /* _X86_PMU_H_ */
diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index ee2b5a2..64b36cf 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -252,6 +252,7 @@ static inline bool is_intel(void)
  * Extended Leafs, a.k.a. AMD defined
  */
 #define	X86_FEATURE_SVM			(CPUID(0x80000001, 0, ECX, 2))
+#define	X86_FEATURE_PERFCTR_CORE	(CPUID(0x80000001, 0, ECX, 23))
 #define	X86_FEATURE_NX			(CPUID(0x80000001, 0, EDX, 20))
 #define	X86_FEATURE_GBPAGES		(CPUID(0x80000001, 0, EDX, 26))
 #define	X86_FEATURE_RDTSCP		(CPUID(0x80000001, 0, EDX, 27))
diff --git a/x86/pmu.c b/x86/pmu.c
index 24d015e..d4ef685 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -38,6 +38,11 @@ struct pmu_event {
 	{"llc misses", 0x412e, 1, 1*N},
 	{"branches", 0x00c4, 1*N, 1.1*N},
 	{"branch misses", 0x00c5, 0, 0.1*N},
+}, amd_gp_events[] = {
+	{"core cycles", 0x0076, 1*N, 50*N},
+	{"instructions", 0x00c0, 10*N, 10.2*N},
+	{"branches", 0x00c2, 1*N, 1.1*N},
+	{"branch misses", 0x00c3, 0, 0.1*N},
 }, fixed_events[] = {
 	{"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
 	{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
@@ -79,14 +84,23 @@ static bool check_irq(void)
 
 static bool is_gp(pmu_counter_t *evt)
 {
+	if (!is_intel())
+		return true;
+
 	return evt->ctr < MSR_CORE_PERF_FIXED_CTR0 ||
 		evt->ctr >= MSR_IA32_PMC0;
 }
 
 static int event_to_global_idx(pmu_counter_t *cnt)
 {
-	return cnt->ctr - (is_gp(cnt) ? gp_counter_base() :
-		(MSR_CORE_PERF_FIXED_CTR0 - FIXED_CNT_INDEX));
+	if (is_intel())
+		return cnt->ctr - (is_gp(cnt) ? gp_counter_base() :
+			(MSR_CORE_PERF_FIXED_CTR0 - FIXED_CNT_INDEX));
+
+	if (gp_counter_base() == MSR_F15H_PERF_CTR0)
+		return (cnt->ctr - gp_counter_base()) / 2;
+	else
+		return cnt->ctr - gp_counter_base();
 }
 
 static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
@@ -309,6 +323,9 @@ static void check_counter_overflow(void)
 			cnt.count &= (1ull << pmu_gp_counter_width()) - 1;
 
 		if (i == nr_gp_counters) {
+			if (!is_intel())
+				break;
+
 			cnt.ctr = fixed_events[0].unit_sel;
 			cnt.count = measure_for_overflow(&cnt);
 			cnt.count &= (1ull << pmu_fixed_counter_width()) - 1;
@@ -322,7 +339,10 @@ static void check_counter_overflow(void)
 			cnt.config &= ~EVNTSEL_INT;
 		idx = event_to_global_idx(&cnt);
 		__measure(&cnt, cnt.count);
-		report(cnt.count == 1, "cntr-%d", i);
+		if (is_intel())
+			report(cnt.count == 1, "cntr-%d", i);
+		else
+			report(cnt.count == 0xffffffffffff || cnt.count < 7, "cntr-%d", i);
 
 		if (!this_cpu_support_perf_status())
 			continue;
@@ -464,10 +484,11 @@ static void check_running_counter_wrmsr(void)
 static void check_emulated_instr(void)
 {
 	uint64_t status, instr_start, brnch_start;
+	unsigned int branch_idx = is_intel() ? 5 : 2;
 	pmu_counter_t brnch_cnt = {
 		.ctr = gp_counter_msr(0),
 		/* branch instructions */
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[5].unit_sel,
+		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[branch_idx].unit_sel,
 	};
 	pmu_counter_t instr_cnt = {
 		.ctr = gp_counter_msr(1),
@@ -662,15 +683,21 @@ int main(int ac, char **av)
 
 	check_invalid_rdpmc_gp();
 
-	if (!pmu_version()) {
-		report_skip("No Intel Arch PMU is detected!");
-		return report_summary();
+	if (is_intel()) {
+		if (!pmu_version()) {
+			report_skip("No Intel Arch PMU is detected!");
+			return report_summary();
+		}
+		gp_events = (struct pmu_event *)intel_gp_events;
+		gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
+		report_prefix_push("Intel");
+		set_ref_cycle_expectations();
+	} else {
+		gp_events_size = sizeof(amd_gp_events)/sizeof(amd_gp_events[0]);
+		gp_events = (struct pmu_event *)amd_gp_events;
+		report_prefix_push("AMD");
 	}
 
-	gp_events = (struct pmu_event *)intel_gp_events;
-	gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
-	set_ref_cycle_expectations();
-
 	printf("PMU version:         %d\n", pmu_version());
 	printf("GP counters:         %d\n", pmu_nr_gp_counters());
 	printf("GP counter width:    %d\n", pmu_gp_counter_width());
@@ -690,5 +717,14 @@ int main(int ac, char **av)
 		report_prefix_pop();
 	}
 
+	if (!is_intel()) {
+		report_prefix_push("K7");
+		set_nr_gp_counters(AMD64_NUM_COUNTERS);
+		set_gp_counter_base(MSR_K7_PERFCTR0);
+		set_gp_event_select_base(MSR_K7_EVNTSEL0);
+		check_counters();
+		report_prefix_pop();
+	}
+
 	return report_summary();
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [kvm-unit-tests PATCH v4 24/24] x86/pmu: Add AMD Guest PerfMonV2 testcases
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (22 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU Like Xu
@ 2022-10-24  9:12 ` Like Xu
  2022-11-02 18:35 ` [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Sean Christopherson
  24 siblings, 0 replies; 35+ messages in thread
From: Like Xu @ 2022-10-24  9:12 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Jim Mattson, kvm

From: Like Xu <likexu@tencent.com>

Updated test cases to cover KVM enabling code for AMD Guest PerfMonV2.

The Intel-specific PMU helpers were added to check for AMD cpuid, and
some of the same semantics of MSRs were assigned during the initialization
phase. The vast majority of pmu test cases are reused seamlessly.

On some x86 machines (AMD only), even with retired events, the same
workload is measured repeatedly and the number of events collected is
erratic, which essentially reflects the details of hardware implementation,
and from a software perspective, the type of event is an unprecise event,
which brings a tolerance check in the counter overflow testcases.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 lib/x86/msr.h       | 5 +++++
 lib/x86/pmu.c       | 9 ++++++++-
 lib/x86/pmu.h       | 6 +++++-
 lib/x86/processor.h | 2 +-
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 6cf8f33..c9869be 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -426,6 +426,11 @@
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
 
+/* AMD Performance Counter Global Status and Control MSRs */
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS	0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL		0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR	0xc0000302
+
 /* Geode defined MSRs */
 #define MSR_GEODE_BUSCONT_CONF0		0x00001900
 
diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 7fd2279..d4034cb 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -20,10 +20,17 @@ void pmu_init(void)
     } else {
         pmu.msr_gp_counter_base = MSR_F15H_PERF_CTR0;
         pmu.msr_gp_event_select_base = MSR_F15H_PERF_CTL0;
-        if (!has_amd_perfctr_core())
+        if (this_cpu_has(X86_FEATURE_AMD_PMU_V2))
+            pmu.nr_gp_counters = cpuid(0x80000022).b & 0xf;
+        else if (!has_amd_perfctr_core())
             pmu.nr_gp_counters = AMD64_NUM_COUNTERS;
         else
             pmu.nr_gp_counters = AMD64_NUM_COUNTERS_CORE;
+        if (this_cpu_support_perf_status()) {
+            pmu.msr_global_status = MSR_AMD64_PERF_CNTR_GLOBAL_STATUS;
+            pmu.msr_global_ctl = MSR_AMD64_PERF_CNTR_GLOBAL_CTL;
+            pmu.msr_global_status_clr = MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR;
+        }
     }
     reset_all_counters();
 }
\ No newline at end of file
diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
index a4e00c5..8f5b5ac 100644
--- a/lib/x86/pmu.h
+++ b/lib/x86/pmu.h
@@ -115,8 +115,12 @@ static inline void write_gp_event_select(unsigned int i, u64 value)
 
 static inline u8 pmu_version(void)
 {
-	if (!is_intel())
+	if (!is_intel()) {
+		/* Performance Monitoring Version 2 Supported */
+		if (this_cpu_has(X86_FEATURE_AMD_PMU_V2))
+			return 2;
 		return 0;
+	}
 
 	return cpuid_10.a & 0xff;
 }
diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 64b36cf..7f884f7 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -266,7 +266,7 @@ static inline bool is_intel(void)
 #define X86_FEATURE_PAUSEFILTER		(CPUID(0x8000000A, 0, EDX, 10))
 #define X86_FEATURE_PFTHRESHOLD		(CPUID(0x8000000A, 0, EDX, 12))
 #define	X86_FEATURE_VGIF		(CPUID(0x8000000A, 0, EDX, 16))
-
+#define	X86_FEATURE_AMD_PMU_V2		(CPUID(0x80000022, 0, EAX, 0))
 
 static inline bool this_cpu_has(u64 feature)
 {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing Like Xu
@ 2022-11-02 17:41   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:41 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> +static void __measure(pmu_counter_t *evt, uint64_t count)

Note, this will silently conflict with another in-flight patch[*] to mark
measure() as noinline.

[*] https://lore.kernel.org/all/20220601163012.3404212-1-morbo@google.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path Like Xu
@ 2022-11-02 17:42   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:42 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> +	assert_msg(!vector, "Unexpected %s on RDPMC(%d)",
> +			exception_mnemonic(vector), index);

Alignment is slightly off.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit Like Xu
@ 2022-11-02 17:45   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:45 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> From: Like Xu <likexu@tencent.com>
> 
> The type of CPUID accessors can also go in the common pmu. Re-reading
> cpuid(10) each time when needed, adding the overhead of eimulating
> CPUID isn't meaningless in the grand scheme of the test.
> 
> A common "PMU init" routine would allow the library to provide helpers
> access to more PMU common information.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  lib/x86/pmu.c |  7 +++++++
>  lib/x86/pmu.h | 26 +++++++++++++-------------
>  lib/x86/smp.c |  2 ++
>  3 files changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
> index 9d048ab..e8b9ae9 100644
> --- a/lib/x86/pmu.c
> +++ b/lib/x86/pmu.c
> @@ -1 +1,8 @@
>  #include "pmu.h"
> +
> +struct cpuid cpuid_10;
> +
> +void pmu_init(void)
> +{
> +    cpuid_10 = cpuid(10);

Tabs, not spaces.

> +}
> \ No newline at end of file
> diff --git a/lib/x86/pmu.h b/lib/x86/pmu.h
> index 078a974..7f4e797 100644
> --- a/lib/x86/pmu.h
> +++ b/lib/x86/pmu.h
> @@ -33,9 +33,13 @@
>  #define EVNTSEL_INT	(1 << EVNTSEL_INT_SHIFT)
>  #define EVNTSEL_INV	(1 << EVNTSEL_INV_SHIF)
>  
> +extern struct cpuid cpuid_10;

Instead of taking a raw snapshot of CPUID.0xA, process the CPUID info during
pmu_init() and fill "struct pmu_cap pmu" directly.

> diff --git a/lib/x86/smp.c b/lib/x86/smp.c
> index b9b91c7..29197fc 100644
> --- a/lib/x86/smp.c
> +++ b/lib/x86/smp.c
> @@ -4,6 +4,7 @@
>  #include <asm/barrier.h>
>  
>  #include "processor.h"
> +#include "pmu.h"
>  #include "atomic.h"
>  #include "smp.h"
>  #include "apic.h"
> @@ -155,6 +156,7 @@ void smp_init(void)
>  		on_cpu(i, setup_smp_id, 0);
>  
>  	atomic_inc(&active_cpus);
> +	pmu_init();

Initializing the PMU has nothing to do with SMP initialization.  There's also an
opportunity for more cleanup: all paths call bringup_aps() => enable_x2apic() =>
smp_init(), providing a kitchen sink helper can consolidate that code and provide
a convenient location for PMU initialization.

void bsp_rest_init(void)
{
	bringup_aps();
	enable_x2apic();
	smp_init();
	pmu_init();
}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init()
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init() Like Xu
@ 2022-11-02 17:45   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:45 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> From: Like Xu <likexu@tencent.com>
> 
> Re-reading PERF_CAPABILITIES each time when needed, adding the
> overhead of eimulating RDMSR isn't also meaningless in the grand
> scheme of the test.
> 
> Based on this, more helpers for full_writes and lbr_fmt can also
> be added to increase the readability of the test cases.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  lib/x86/pmu.c |  3 +++
>  lib/x86/pmu.h | 18 +++++++++++++++---
>  x86/pmu.c     |  2 +-
>  x86/pmu_lbr.c |  7 ++-----
>  4 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
> index e8b9ae9..35b7efb 100644
> --- a/lib/x86/pmu.c
> +++ b/lib/x86/pmu.c
> @@ -1,8 +1,11 @@
>  #include "pmu.h"
>  
>  struct cpuid cpuid_10;
> +struct pmu_caps pmu;
>  
>  void pmu_init(void)
>  {
>      cpuid_10 = cpuid(10);
> +    if (this_cpu_has(X86_FEATURE_PDCM))
> +        pmu.perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);

Tabs...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers Like Xu
@ 2022-11-02 17:54   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:54 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
 +static inline u32 gp_counter_base(void)
> +{
> +	return pmu.msr_gp_counter_base;
> +}
> +
> +static inline void set_gp_counter_base(u32 new_base)
> +{
> +	pmu.msr_gp_counter_base = new_base;
> +}
> +
> +static inline u32 gp_event_select_base(void)
> +{
> +	return pmu.msr_gp_event_select_base;
> +}
> +
> +static inline void set_gp_event_select_base(u32 new_base)
> +{
> +	pmu.msr_gp_event_select_base = new_base;
> +}
> +
> +static inline u32 gp_counter_msr(unsigned int i)
> +{
> +	return gp_counter_base() + i;
> +}
> +
> +static inline u32 gp_event_select_msr(unsigned int i)

As propsed in the previous version, I think it makes sense to make these look
like macros so that it's more obvious that the callers are computing an MSR index
and not getting the MSR, e.g.

	MSR_GP_EVENT_SELECTx(i)

> +{
> +	return gp_event_select_base() + i;
> +}
> +
> +static inline void write_gp_counter_value(unsigned int i, u64 value)
> +{
> +	wrmsr(gp_counter_msr(i), value);
> +}
> +
> +static inline void write_gp_event_select(unsigned int i, u64 value)
> +{
> +	wrmsr(gp_event_select_msr(i), value);
> +}

Almost all of these one-line wrappers are unnecessary.  "struct pmu_caps pmu" is
already exposed, just reference "pmu" directly.  And for the rdmsr/wrmsr wrappers,
the code I wanted to dedup was the calculation of the MSR index, hiding the actual
WRMSR and RDMSR operations are a net-negative IMO.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers Like Xu
@ 2022-11-02 17:55   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:55 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> +static inline u32 fixed_counter_msr(unsigned int i)
> +{
> +	return MSR_CORE_PERF_FIXED_CTR0 + i;

This should be added in a separate patch.
> +}
> +
> +static inline void write_fixed_counter_value(unsigned int i, u64 value)
> +{
> +	wrmsr(fixed_counter_msr(i), value);
> +}
> +
> +static inline void reset_all_gp_counters(void)
> +{
> +	unsigned int idx;
> +
> +	for (idx = 0; idx < pmu_nr_gp_counters(); idx++) {
> +		write_gp_event_select(idx, 0);
> +		write_gp_counter_value(idx, 0);
> +	}
> +}
> +
> +static inline void reset_all_fixed_counters(void)
> +{
> +    unsigned int idx;
> +
> +	if (!pmu_nr_fixed_counters())
> +		return;
> +
> +	wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> +	for (idx = 0; idx < pmu_nr_fixed_counters(); idx++)
> +		write_fixed_counter_value(idx, 0);
> +}
> +
> +static inline void reset_all_counters(void)

Prefix these with "pmu_" so that it's obvious what counters are being rese.

> +{
> +    reset_all_gp_counters();
> +    reset_all_fixed_counters();
> +}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers Like Xu
@ 2022-11-02 17:56   ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:56 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> @@ -194,4 +197,34 @@ static inline void reset_all_counters(void)
>      reset_all_fixed_counters();
>  }
>  
> +static inline void pmu_clear_global_status(void)
> +{
> +	wrmsr(pmu.msr_global_status_clr, rdmsr(pmu.msr_global_status));
> +}
> +
> +static inline u64 pmu_get_global_status(void)
> +{
> +	return rdmsr(pmu.msr_global_status);
> +}
> +
> +static inline u64 pmu_get_global_enable(void)
> +{
> +	return rdmsr(pmu.msr_global_ctl);
> +}
> +
> +static inline void pmu_set_global_enable(u64 bitmask)
> +{
> +	wrmsr(pmu.msr_global_ctl, bitmask);
> +}
> +
> +static inline void pmu_reset_global_enable(void)
> +{
> +	wrmsr(pmu.msr_global_ctl, 0);
> +}
> +
> +static inline void pmu_ack_global_status(u64 value)
> +{
> +	wrmsr(pmu.msr_global_status_clr, value);
> +}

Other, than pmu_clear_global_status(), which provides novel functionality, the
rest of these wrappers are superfluous.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU Like Xu
@ 2022-11-02 17:58   ` Sean Christopherson
  2022-11-02 20:10     ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 17:58 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm, Sandipan Das

On Mon, Oct 24, 2022, Like Xu wrote:
> @@ -104,11 +115,17 @@ static inline void write_gp_event_select(unsigned int i, u64 value)
>  
>  static inline u8 pmu_version(void)
>  {
> +	if (!is_intel())
> +		return 0;

This can be handled by adding pmu_caps.version.

> +
>  	return cpuid_10.a & 0xff;
>  }
>  
>  static inline bool this_cpu_has_pmu(void)
>  {
> +	if (!is_intel())
> +		return true;

I think it makes sense to kill off this_cpu_has_pmu(), the only usage is after
an explicit is_intel() check, and practically speaking that will likely hold true
since differentiating between Intel and AMD PMUs seems inevitable.

> +
>  	return !!pmu_version();
>  }
>  
> @@ -135,12 +152,18 @@ static inline void set_nr_gp_counters(u8 new_num)
>  
>  static inline u8 pmu_gp_counter_width(void)
>  {
> -	return (cpuid_10.a >> 16) & 0xff;
> +	if (is_intel())

Again, can be handled by utilizing pmu_caps.

> +		return (cpuid_10.a >> 16) & 0xff;
> +	else
> +		return PMC_DEFAULT_WIDTH;
>  }
>  
>  static inline u8 pmu_gp_counter_mask_length(void)
>  {
> -	return (cpuid_10.a >> 24) & 0xff;
> +	if (is_intel())
> +		return (cpuid_10.a >> 24) & 0xff;
> +	else
> +		return pmu_nr_gp_counters();
>  }
>  
>  static inline u8 pmu_nr_fixed_counters(void)
> @@ -161,6 +184,9 @@ static inline u8 pmu_fixed_counter_width(void)
>  
>  static inline bool pmu_gp_counter_is_available(int i)
>  {
> +	if (!is_intel())
> +		return i < pmu_nr_gp_counters();
> +
>  	/* CPUID.0xA.EBX bit is '1 if they counter is NOT available. */
>  	return !(cpuid_10.b & BIT(i));
>  }
> @@ -268,4 +294,9 @@ static inline bool pebs_has_baseline(void)
>  	return pmu.perf_cap & PMU_CAP_PEBS_BASELINE;
>  }
>  
> +static inline bool has_amd_perfctr_core(void)

Unnecessary wrappers, just use this_cpu_has() directly.

> +{
> +	return this_cpu_has(X86_FEATURE_PERFCTR_CORE);
> +}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions
  2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
                   ` (23 preceding siblings ...)
  2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 24/24] x86/pmu: Add AMD Guest PerfMonV2 testcases Like Xu
@ 2022-11-02 18:35 ` Sean Christopherson
  24 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 18:35 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm

On Mon, Oct 24, 2022, Like Xu wrote:
> The patch set includes all the changes on my side (SPR PEBS and AMD
> PerfMonV2 are included, except for Arch lbr), which helps to keep the
> review time focused. 
> 
> There are no major changes in the test logic. A considerable number of
> helpers have been added to lib/x86/pmu.[c,h], which really helps the
> readability of the code, while hiding some hardware differentiation details.
> 
> These are divided into three parts, the first part (01 - 08) is bug fixing,
> the second part (09 - 18) is code refactoring, and the third part is the
> addition of new test cases. It may also be good to split up and merge
> in sequence. They get passed on AMD Zen3/4, Intel ICX/SPR machines.

Quite a few comments, some which result in a fair bit of fallout, e.g. avoiding
the global cpuid_10, tabs. vs. spaces, etc...  I've made all the changes locally
and have a few additional cleanup patches.  I'll post a v5 once testing looks ok,
hopefully this week.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU
  2022-11-02 17:58   ` Sean Christopherson
@ 2022-11-02 20:10     ` Sean Christopherson
  0 siblings, 0 replies; 35+ messages in thread
From: Sean Christopherson @ 2022-11-02 20:10 UTC (permalink / raw)
  To: Like Xu; +Cc: Paolo Bonzini, Jim Mattson, kvm, Sandipan Das

On Wed, Nov 02, 2022, Sean Christopherson wrote:
> On Mon, Oct 24, 2022, Like Xu wrote:
> >  static inline bool this_cpu_has_pmu(void)
> >  {
> > +	if (!is_intel())
> > +		return true;
> 
> I think it makes sense to kill off this_cpu_has_pmu(), the only usage is after
> an explicit is_intel() check, and practically speaking that will likely hold true
> since differentiating between Intel and AMD PMUs seems inevitable.

Rats, this won't work as vmx_tests.c uses the wrapper.  That's obviously Intel-only
too, but funneling that code through pmu_version() or whatever is rather gross.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-11-02 20:10 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24  9:11 [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 01/24] x86/pmu: Add PDCM check before accessing PERF_CAP register Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 02/24] x86/pmu: Test emulation instructions on full-width counters Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 03/24] x86/pmu: Pop up FW prefix to avoid out-of-context propagation Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 04/24] x86/pmu: Report SKIP when testing Intel LBR on AMD platforms Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 05/24] x86/pmu: Fix printed messages for emulated instruction test Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 06/24] x86/pmu: Introduce __start_event() to drop all of the manual zeroing Like Xu
2022-11-02 17:41   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 07/24] x86/pmu: Introduce multiple_{one, many}() to improve readability Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 08/24] x86/pmu: Reset the expected count of the fixed counter 0 when i386 Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 09/24] x86: create pmu group for quick pmu-scope testing Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 10/24] x86/pmu: Refine info to clarify the current support Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 11/24] x86/pmu: Update rdpmc testcase to cover #GP path Like Xu
2022-11-02 17:42   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 12/24] x86/pmu: Rename PC_VECTOR to PMI_VECTOR for better readability Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 13/24] x86/pmu: Add lib/x86/pmu.[c.h] and move common code to header files Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 14/24] x86/pmu: Read cpuid(10) in the pmu_init() to reduce VM-Exit Like Xu
2022-11-02 17:45   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 15/24] x86/pmu: Initialize PMU perf_capabilities at pmu_init() Like Xu
2022-11-02 17:45   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 16/24] x86/pmu: Add GP counter related helpers Like Xu
2022-11-02 17:54   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 17/24] x86/pmu: Add GP/Fixed counters reset helpers Like Xu
2022-11-02 17:55   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 18/24] x86/pmu: Add a set of helpers related to global registers Like Xu
2022-11-02 17:56   ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 19/24] x86: Add tests for Guest Processor Event Based Sampling (PEBS) Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 20/24] x86/pmu: Add global helpers to cover Intel Arch PMU Version 1 Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 21/24] x86/pmu: Add gp_events pointer to route different event tables Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 22/24] x86/pmu: Add nr_gp_counters to limit the number of test counters Like Xu
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 23/24] x86/pmu: Update testcases to cover AMD PMU Like Xu
2022-11-02 17:58   ` Sean Christopherson
2022-11-02 20:10     ` Sean Christopherson
2022-10-24  9:12 ` [kvm-unit-tests PATCH v4 24/24] x86/pmu: Add AMD Guest PerfMonV2 testcases Like Xu
2022-11-02 18:35 ` [kvm-unit-tests PATCH v4 00/24] x86/pmu: Test case optimization, fixes and additions Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).