[kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements
@ 2024-04-19  3:52 Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 01/17] x86: pmu: Remove duplicate code in pmu_init() Dapeng Mi
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Changes:
v3 -> v4:
  * Fix the new found issue that pmu_counter_t.config crosses cache 
    lines (patch 04)
  * Fix the cycles event false positive by introducing a warm-up phase
    instead of switching cycles events order in v3 (patch 07)
  * Use macro to replace hard-coded events index (Mingwei)(patches 08~10)
  * Simply the asm code to enable/disable GLOBAL_CTRL MSR in whole asm
    blob (patch 11)
  * Handle legacy CPUs without clflush/clflushopt/IBPB support 
    (patches 14, 16)
  * Optimize emulated instruction validation (patch 17)

All changes pass validation on Intel Sapphire Rapids and Emerald Rapids
platforms against latest kvm-x86/next code (2d181d84af38). No tests on
AMD platforms since no AMD platform on hand. Any tests on AMD platform
are appreciated.

History:
  v3: https://lore.kernel.org/lkml/20240103031409.2504051-1-dapeng1.mi@linux.intel.com/ 
  v2: https://lore.kernel.org/lkml/20231031092921.2885109-1-dapeng1.mi@linux.intel.com/
  v1: https://lore.kernel.org/lkml/20231024075748.1675382-1-dapeng1.mi@linux.intel.com/


Dapeng Mi (16):
  x86: pmu: Remove blank line and redundant space
  x86: pmu: Refine fixed_events[] names
  x86: pmu: Fix the issue that pmu_counter_t.config crosses cache line
  x86: pmu: Enlarge cnt[] length to 48 in check_counters_many()
  x86: pmu: Add asserts to warn inconsistent fixed events and counters
  x86: pmu: Fix cycles event validation failure
  x86: pmu: Use macro to replace hard-coded branches event index
  x86: pmu: Use macro to replace hard-coded ref-cycles event index
  x86: pmu: Use macro to replace hard-coded instructions event index
  x86: pmu: Enable and disable PMCs in loop() asm blob
  x86: pmu: Improve instruction and branches events verification
  x86: pmu: Improve LLC misses event verification
  x86: pmu: Adjust lower boundary of llc-misses event to 0 for legacy
    CPUs
  x86: pmu: Add IBPB indirect jump asm blob
  x86: pmu: Adjust lower boundary of branch-misses event
  x86: pmu: Optimize emulated instruction validation

Xiong Zhang (1):
  x86: pmu: Remove duplicate code in pmu_init()

 lib/x86/pmu.c |   5 -
 x86/pmu.c     | 386 ++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 308 insertions(+), 83 deletions(-)


base-commit: e96011b32944b1ecca5967674ae243067588d1e7
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 01/17] x86: pmu: Remove duplicate code in pmu_init()
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 02/17] x86: pmu: Remove blank line and redundant space Dapeng Mi
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

From: Xiong Zhang <xiong.y.zhang@intel.com>

There are totally same code in pmu_init() helper, remove the duplicate
code.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 lib/x86/pmu.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 0f2afd650bc9..d06e94553024 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -16,11 +16,6 @@ void pmu_init(void)
 			pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
 		}
 
-		if (pmu.version > 1) {
-			pmu.nr_fixed_counters = cpuid_10.d & 0x1f;
-			pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
-		}
-
 		pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
 		pmu.gp_counter_width = (cpuid_10.a >> 16) & 0xff;
 		pmu.gp_counter_mask_length = (cpuid_10.a >> 24) & 0xff;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 02/17] x86: pmu: Remove blank line and redundant space
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 01/17] x86: pmu: Remove duplicate code in pmu_init() Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 03/17] x86: pmu: Refine fixed_events[] names Dapeng Mi
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

code style changes.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 47a1a602ade2..4847a424f572 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -203,8 +203,7 @@ static noinline void __measure(pmu_counter_t *evt, uint64_t count)
 static bool verify_event(uint64_t count, struct pmu_event *e)
 {
 	// printf("%d <= %ld <= %d\n", e->min, count, e->max);
-	return count >= e->min  && count <= e->max;
-
+	return count >= e->min && count <= e->max;
 }
 
 static bool verify_counter(pmu_counter_t *cnt)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 03/17] x86: pmu: Refine fixed_events[] names
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 01/17] x86: pmu: Remove duplicate code in pmu_init() Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 02/17] x86: pmu: Remove blank line and redundant space Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 04/17] x86: pmu: Fix the issue that pmu_counter_t.config crosses cache line Dapeng Mi
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

In SDM the fixed counter is numbered from 0 but currently the
fixed_events names are numbered from 1. It would cause confusion for
users. So Change the fixed_events[] names to number from 0 as well and
keep identical with SDM.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 4847a424f572..c971386db4e6 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -44,9 +44,9 @@ struct pmu_event {
 	{"branches", 0x00c2, 1*N, 1.1*N},
 	{"branch misses", 0x00c3, 0, 0.1*N},
 }, fixed_events[] = {
-	{"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
-	{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
-	{"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
+	{"fixed 0", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
+	{"fixed 1", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
+	{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
 };
 
 char *buf;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 04/17] x86: pmu: Fix the issue that pmu_counter_t.config crosses cache line
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (2 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 03/17] x86: pmu: Refine fixed_events[] names Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 05/17] x86: pmu: Enlarge cnt[] length to 48 in check_counters_many() Dapeng Mi
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

When running pmu test on SPR, the following #GP fault is reported.

Unhandled exception 13 #GP at ip 000000000040771f
error_code=0000      rflags=00010046      cs=00000008
rax=00000000004031ad rcx=0000000000000186 rdx=0000000000000000 rbx=00000000005142f0
rbp=0000000000514260 rsi=0000000000000020 rdi=0000000000000340
 r8=0000000000513a65  r9=00000000000003f8 r10=000000000000000d r11=00000000ffffffff
r12=000000000043003c r13=0000000000514450 r14=000000000000000b r15=0000000000000001
cr0=0000000080010011 cr2=0000000000000000 cr3=0000000001007000 cr4=0000000000000020
cr8=0000000000000000
        STACK: @40771f 40040e 400976 400aef 40148d 401da9 4001ad
FAIL pmu

It looks EVENTSEL0 MSR (0x186) is written a invalid value (0x4031ad) and
cause a #GP.

Further investigation shows the #GP is caused by below code in
__start_event().

rmsr(MSR_GP_EVENT_SELECTx(event_to_global_idx(evt)),
		  evt->config | EVNTSEL_EN);

The evt->config is correctly initialized but seems corrupted before
writing to MSR.

The original pmu_counter_t layout looks as below.

typedef struct {
	uint32_t ctr;
	uint64_t config;
	uint64_t count;
	int idx;
} pmu_counter_t;

Obviously the config filed crosses two cache lines. When the two cache
lines are not updated simultaneously, the config value is corrupted.

Adjust pmu_counter_t fields order and ensure config field is cache-line
aligned.

Signeduoff-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index c971386db4e6..5fd7439a0eba 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -20,9 +20,9 @@
 
 typedef struct {
 	uint32_t ctr;
+	uint32_t idx;
 	uint64_t config;
 	uint64_t count;
-	int idx;
 } pmu_counter_t;
 
 struct pmu_event {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 05/17] x86: pmu: Enlarge cnt[] length to 48 in check_counters_many()
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (3 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 04/17] x86: pmu: Fix the issue that pmu_counter_t.config crosses cache line Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 06/17] x86: pmu: Add asserts to warn inconsistent fixed events and counters Dapeng Mi
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Considering there are already 8 GP counters and 4 fixed counters on
latest Intel processors, like Sapphire Rapids. The original cnt[] array
length 10 is definitely not enough to cover all supported PMU counters on
these new processors even through currently KVM only supports 3 fixed
counters at most. This would cause out of bound memory access and may trigger
false alarm on PMU counter validation

It's probably more and more GP and fixed counters are introduced in the
future and then directly extends the cnt[] array length to 48 once and
for all. Base on the layout of IA32_PERF_GLOBAL_CTRL and
IA32_PERF_GLOBAL_STATUS, 48 looks enough in near feature.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 5fd7439a0eba..494af4012e84 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -253,7 +253,7 @@ static void check_fixed_counters(void)

 static void check_counters_many(void)
 {
-	pmu_counter_t cnt[10];
+	pmu_counter_t cnt[48];
 	int i, n;

 	for (i = 0, n = 0; n < pmu.nr_gp_counters; i++) {
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 06/17] x86: pmu: Add asserts to warn inconsistent fixed events and counters
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (4 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 05/17] x86: pmu: Enlarge cnt[] length to 48 in check_counters_many() Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 07/17] x86: pmu: Fix cycles event validation failure Dapeng Mi
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Current PMU code deosn't check whether PMU fixed counter number is
larger than pre-defined fixed events. If so, it would cause memory
access out of range.

So add assert to warn this invalid case.

Reviewed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 494af4012e84..461a4090d475 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -111,8 +111,12 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
 		for (i = 0; i < gp_events_size; i++)
 			if (gp_events[i].unit_sel == (cnt->config & 0xffff))
 				return &gp_events[i];
-	} else
-		return &fixed_events[cnt->ctr - MSR_CORE_PERF_FIXED_CTR0];
+	} else {
+		unsigned int idx = cnt->ctr - MSR_CORE_PERF_FIXED_CTR0;
+
+		assert(idx < ARRAY_SIZE(fixed_events));
+		return &fixed_events[idx];
+	}
 
 	return (void*)0;
 }
@@ -738,6 +742,8 @@ int main(int ac, char **av)
 	printf("Fixed counters:      %d\n", pmu.nr_fixed_counters);
 	printf("Fixed counter width: %d\n", pmu.fixed_counter_width);
 
+	assert(pmu.nr_fixed_counters <= ARRAY_SIZE(fixed_events));
+
 	apic_write(APIC_LVTPC, PMI_VECTOR);
 
 	check_counters();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 07/17] x86: pmu: Fix cycles event validation failure
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (5 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 06/17] x86: pmu: Add asserts to warn inconsistent fixed events and counters Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 08/17] x86: pmu: Use macro to replace hard-coded branches event index Dapeng Mi
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

When running pmu test on SPR, sometimes the following failure is
reported.

PMU version:         2
GP counters:         8
GP counter width:    48
Mask length:         8
Fixed counters:      3
Fixed counter width: 48
1000000 <= 55109398 <= 50000000
FAIL: Intel: core cycles-0
1000000 <= 18279571 <= 50000000
PASS: Intel: core cycles-1
1000000 <= 12238092 <= 50000000
PASS: Intel: core cycles-2
1000000 <= 7981727 <= 50000000
PASS: Intel: core cycles-3
1000000 <= 6984711 <= 50000000
PASS: Intel: core cycles-4
1000000 <= 6773673 <= 50000000
PASS: Intel: core cycles-5
1000000 <= 6697842 <= 50000000
PASS: Intel: core cycles-6
1000000 <= 6747947 <= 50000000
PASS: Intel: core cycles-7

The count of the "core cycles" on first counter would exceed the upper
boundary and leads to a failure, and then the "core cycles" count would
drop gradually and reach a stable state.

That looks reasonable. The "core cycles" event is defined as the 1st
event in xxx_gp_events[] array and it is always verified at first.
when the program loop() is executed at the first time it needs to warm
up the pipeline and cache, such as it has to wait for cache is filled.
All these warm-up work leads to a quite large core cycles count which
may exceeds the verification range.

To avoid the false positive of cycles event caused by warm-up,
explicitly introduce a warm-up state before really starting
verification.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/x86/pmu.c b/x86/pmu.c
index 461a4090d475..43ef7ecbcaea 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -591,11 +591,27 @@ static void check_tsx_cycles(void)
 	report_prefix_pop();
 }
 
+static void warm_up(void)
+{
+	int i = 8;
+
+	/*
+	 * Since cycles event is always run as the first event, there would be
+	 * a warm-up state to warm up the cache, it leads to the measured cycles
+	 * value may exceed the pre-defined cycles upper boundary and cause
+	 * false positive. To avoid this, introduce an warm-up state before
+	 * the real verification.
+	 */
+	while (i--)
+		loop();
+}
+
 static void check_counters(void)
 {
 	if (is_fep_available())
 		check_emulated_instr();
 
+	warm_up();
 	check_gp_counters();
 	check_fixed_counters();
 	check_rdpmc();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 08/17] x86: pmu: Use macro to replace hard-coded branches event index
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (6 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 07/17] x86: pmu: Fix cycles event validation failure Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 09/17] x86: pmu: Use macro to replace hard-coded ref-cycles " Dapeng Mi
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Currently the branches event index is a hard-coded number. User could
add new events and cause the branches event index changes in the future,
but don't notice the hard-coded event index and forget to update the
event index synchronously, then the issue comes.

Thus, replace the hard-coded index to a macro.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 43ef7ecbcaea..fd1b22104fc4 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -49,6 +49,22 @@ struct pmu_event {
 	{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
 };
 
+/*
+ * Events index in intel_gp_events[], ensure consistent with
+ * intel_gp_events[].
+ */
+enum {
+	INTEL_BRANCHES_IDX	= 5,
+};
+
+/*
+ * Events index in amd_gp_events[], ensure consistent with
+ * amd_gp_events[].
+ */
+enum {
+	AMD_BRANCHES_IDX	= 2,
+};
+
 char *buf;
 
 static struct pmu_event *gp_events;
@@ -481,7 +497,8 @@ static void check_emulated_instr(void)
 {
 	uint64_t status, instr_start, brnch_start;
 	uint64_t gp_counter_width = (1ull << pmu.gp_counter_width) - 1;
-	unsigned int branch_idx = pmu.is_intel ? 5 : 2;
+	unsigned int branch_idx = pmu.is_intel ?
+				  INTEL_BRANCHES_IDX : AMD_BRANCHES_IDX;
 	pmu_counter_t brnch_cnt = {
 		.ctr = MSR_GP_COUNTERx(0),
 		/* branch instructions */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 09/17] x86: pmu: Use macro to replace hard-coded ref-cycles event index
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (7 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 08/17] x86: pmu: Use macro to replace hard-coded branches event index Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 10/17] x86: pmu: Use macro to replace hard-coded instructions " Dapeng Mi
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Replace hard-coded ref-cycles event index with macro to avoid possible
mismatch issue if new event is added in the future and cause ref-cycles
event index changed, but forget to update the hard-coded ref-cycles
event index.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index fd1b22104fc4..6ae46398d84b 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -54,6 +54,7 @@ struct pmu_event {
  * intel_gp_events[].
  */
 enum {
+	INTEL_REF_CYCLES_IDX	= 2,
 	INTEL_BRANCHES_IDX	= 5,
 };
 
@@ -697,7 +698,8 @@ static void set_ref_cycle_expectations(void)
 {
 	pmu_counter_t cnt = {
 		.ctr = MSR_IA32_PERFCTR0,
-		.config = EVNTSEL_OS | EVNTSEL_USR | intel_gp_events[2].unit_sel,
+		.config = EVNTSEL_OS | EVNTSEL_USR |
+			  intel_gp_events[INTEL_REF_CYCLES_IDX].unit_sel,
 	};
 	uint64_t tsc_delta;
 	uint64_t t0, t1, t2, t3;
@@ -733,8 +735,10 @@ static void set_ref_cycle_expectations(void)
 	if (!tsc_delta)
 		return;
 
-	intel_gp_events[2].min = (intel_gp_events[2].min * cnt.count) / tsc_delta;
-	intel_gp_events[2].max = (intel_gp_events[2].max * cnt.count) / tsc_delta;
+	intel_gp_events[INTEL_REF_CYCLES_IDX].min =
+		(intel_gp_events[INTEL_REF_CYCLES_IDX].min * cnt.count) / tsc_delta;
+	intel_gp_events[INTEL_REF_CYCLES_IDX].max =
+		(intel_gp_events[INTEL_REF_CYCLES_IDX].max * cnt.count) / tsc_delta;
 }
 
 static void check_invalid_rdpmc_gp(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 10/17] x86: pmu: Use macro to replace hard-coded instructions event index
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (8 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 09/17] x86: pmu: Use macro to replace hard-coded ref-cycles " Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 11/17] x86: pmu: Enable and disable PMCs in loop() asm blob Dapeng Mi
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Replace hard-coded instruction event index with macro to avoid possible
mismatch issue if new event is added in the future and cause
instructions event index changed, but forget to update the hard-coded
event index.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 6ae46398d84b..20bc6de9c936 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -54,6 +54,7 @@ struct pmu_event {
  * intel_gp_events[].
  */
 enum {
+	INTEL_INSTRUCTIONS_IDX  = 1,
 	INTEL_REF_CYCLES_IDX	= 2,
 	INTEL_BRANCHES_IDX	= 5,
 };
@@ -63,6 +64,7 @@ enum {
  * amd_gp_events[].
  */
 enum {
+	AMD_INSTRUCTIONS_IDX    = 1,
 	AMD_BRANCHES_IDX	= 2,
 };
 
@@ -317,11 +319,16 @@ static uint64_t measure_for_overflow(pmu_counter_t *cnt)
 
 static void check_counter_overflow(void)
 {
-	uint64_t overflow_preset;
 	int i;
+	uint64_t overflow_preset;
+	int instruction_idx = pmu.is_intel ?
+			      INTEL_INSTRUCTIONS_IDX :
+			      AMD_INSTRUCTIONS_IDX;
+
 	pmu_counter_t cnt = {
 		.ctr = MSR_GP_COUNTERx(0),
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
+		.config = EVNTSEL_OS | EVNTSEL_USR |
+			  gp_events[instruction_idx].unit_sel /* instructions */,
 	};
 	overflow_preset = measure_for_overflow(&cnt);
 
@@ -377,13 +384,18 @@ static void check_counter_overflow(void)
 
 static void check_gp_counter_cmask(void)
 {
+	int instruction_idx = pmu.is_intel ?
+			      INTEL_INSTRUCTIONS_IDX :
+			      AMD_INSTRUCTIONS_IDX;
+
 	pmu_counter_t cnt = {
 		.ctr = MSR_GP_COUNTERx(0),
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel /* instructions */,
+		.config = EVNTSEL_OS | EVNTSEL_USR |
+			  gp_events[instruction_idx].unit_sel /* instructions */,
 	};
 	cnt.config |= (0x2 << EVNTSEL_CMASK_SHIFT);
 	measure_one(&cnt);
-	report(cnt.count < gp_events[1].min, "cmask");
+	report(cnt.count < gp_events[instruction_idx].min, "cmask");
 }
 
 static void do_rdpmc_fast(void *ptr)
@@ -458,9 +470,14 @@ static void check_running_counter_wrmsr(void)
 {
 	uint64_t status;
 	uint64_t count;
+	unsigned int instruction_idx = pmu.is_intel ?
+				       INTEL_INSTRUCTIONS_IDX :
+				       AMD_INSTRUCTIONS_IDX;
+
 	pmu_counter_t evt = {
 		.ctr = MSR_GP_COUNTERx(0),
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
+		.config = EVNTSEL_OS | EVNTSEL_USR |
+			  gp_events[instruction_idx].unit_sel,
 	};
 
 	report_prefix_push("running counter wrmsr");
@@ -469,7 +486,7 @@ static void check_running_counter_wrmsr(void)
 	loop();
 	wrmsr(MSR_GP_COUNTERx(0), 0);
 	stop_event(&evt);
-	report(evt.count < gp_events[1].min, "cntr");
+	report(evt.count < gp_events[instruction_idx].min, "cntr");
 
 	/* clear status before overflow test */
 	if (this_cpu_has_perf_global_status())
@@ -500,6 +517,9 @@ static void check_emulated_instr(void)
 	uint64_t gp_counter_width = (1ull << pmu.gp_counter_width) - 1;
 	unsigned int branch_idx = pmu.is_intel ?
 				  INTEL_BRANCHES_IDX : AMD_BRANCHES_IDX;
+	unsigned int instruction_idx = pmu.is_intel ?
+				       INTEL_INSTRUCTIONS_IDX :
+				       AMD_INSTRUCTIONS_IDX;
 	pmu_counter_t brnch_cnt = {
 		.ctr = MSR_GP_COUNTERx(0),
 		/* branch instructions */
@@ -508,7 +528,7 @@ static void check_emulated_instr(void)
 	pmu_counter_t instr_cnt = {
 		.ctr = MSR_GP_COUNTERx(1),
 		/* instructions */
-		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[1].unit_sel,
+		.config = EVNTSEL_OS | EVNTSEL_USR | gp_events[instruction_idx].unit_sel,
 	};
 	report_prefix_push("emulated instruction");
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 11/17] x86: pmu: Enable and disable PMCs in loop() asm blob
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (9 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 10/17] x86: pmu: Use macro to replace hard-coded instructions " Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 12/17] x86: pmu: Improve instruction and branches events verification Dapeng Mi
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Currently enabling PMCs, executing loop() and disabling PMCs are divided
3 separated functions. So there could be other instructions executed
between enabling PMCS and running loop() or running loop() and disabling
PMCs, e.g. if there are multiple counters enabled in measure_many()
function, the instructions which enabling the 2nd and more counters
would be counted in by the 1st counter.

So current implementation can only verify the correctness of count by an
rough range rather than a precise count even for instructions and
branches events. Strictly speaking, this verification is meaningless as
the test could still pass even though KVM vPMU has something wrong and
reports an incorrect instructions or branches count which is in the rough
range.

Thus, move the PMCs enabling and disabling into the loop() asm blob and
ensure only the loop asm instructions would be counted, then the
instructions or branches events can be verified with an precise count
instead of an rough range.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 80 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 20bc6de9c936..d97309d7b8a3 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -18,6 +18,15 @@
 #define EXPECTED_INSTR 17
 #define EXPECTED_BRNCH 5
 
+#define LOOP_ASM(_wrmsr)						\
+	_wrmsr "\n\t"							\
+	"mov %%ecx, %%edi; mov %%ebx, %%ecx;\n\t"			\
+	"1: mov (%1), %2; add $64, %1;\n\t"				\
+	"nop; nop; nop; nop; nop; nop; nop;\n\t"			\
+	"loop 1b;\n\t"							\
+	"mov %%edi, %%ecx; xor %%eax, %%eax; xor %%edx, %%edx;\n\t"	\
+	_wrmsr "\n\t"
+
 typedef struct {
 	uint32_t ctr;
 	uint32_t idx;
@@ -73,13 +82,43 @@ char *buf;
 static struct pmu_event *gp_events;
 static unsigned int gp_events_size;
 
-static inline void loop(void)
+
+static inline void __loop(void)
+{
+	unsigned long tmp, tmp2, tmp3;
+
+	asm volatile(LOOP_ASM("nop")
+		     : "=c"(tmp), "=r"(tmp2), "=r"(tmp3)
+		     : "0"(N), "1"(buf));
+}
+
+/*
+ * Enable and disable counters in a whole asm blob to ensure
+ * no other instructions are counted in the window between
+ * counters enabling and really LOOP_ASM code executing.
+ * Thus counters can verify instructions and branches events
+ * against precise counts instead of a rough valid count range.
+ */
+static inline void __precise_loop(u64 cntrs)
 {
 	unsigned long tmp, tmp2, tmp3;
+	unsigned int global_ctl = pmu.msr_global_ctl;
+	u32 eax = cntrs & (BIT_ULL(32) - 1);
+	u32 edx = cntrs >> 32;
 
-	asm volatile("1: mov (%1), %2; add $64, %1; nop; nop; nop; nop; nop; nop; nop; loop 1b"
-			: "=c"(tmp), "=r"(tmp2), "=r"(tmp3): "0"(N), "1"(buf));
+	asm volatile(LOOP_ASM("wrmsr")
+		     : "=b"(tmp), "=r"(tmp2), "=r"(tmp3)
+		     : "a"(eax), "d"(edx), "c"(global_ctl),
+		       "0"(N), "1"(buf)
+		     : "edi");
+}
 
+static inline void loop(u64 cntrs)
+{
+	if (!this_cpu_has_perf_global_ctrl())
+		__loop();
+	else
+		__precise_loop(cntrs);
 }
 
 volatile uint64_t irq_received;
@@ -178,18 +217,17 @@ static void __start_event(pmu_counter_t *evt, uint64_t count)
 	    ctrl = (ctrl & ~(0xf << shift)) | (usrospmi << shift);
 	    wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, ctrl);
     }
-    global_enable(evt);
     apic_write(APIC_LVTPC, PMI_VECTOR);
 }
 
 static void start_event(pmu_counter_t *evt)
 {
 	__start_event(evt, 0);
+	global_enable(evt);
 }
 
-static void stop_event(pmu_counter_t *evt)
+static void __stop_event(pmu_counter_t *evt)
 {
-	global_disable(evt);
 	if (is_gp(evt)) {
 		wrmsr(MSR_GP_EVENT_SELECTx(event_to_global_idx(evt)),
 		      evt->config & ~EVNTSEL_EN);
@@ -201,14 +239,24 @@ static void stop_event(pmu_counter_t *evt)
 	evt->count = rdmsr(evt->ctr);
 }
 
+static void stop_event(pmu_counter_t *evt)
+{
+	global_disable(evt);
+	__stop_event(evt);
+}
+
 static noinline void measure_many(pmu_counter_t *evt, int count)
 {
 	int i;
+	u64 cntrs = 0;
+
+	for (i = 0; i < count; i++) {
+		__start_event(&evt[i], 0);
+		cntrs |= BIT_ULL(event_to_global_idx(&evt[i]));
+	}
+	loop(cntrs);
 	for (i = 0; i < count; i++)
-		start_event(&evt[i]);
-	loop();
-	for (i = 0; i < count; i++)
-		stop_event(&evt[i]);
+		__stop_event(&evt[i]);
 }
 
 static void measure_one(pmu_counter_t *evt)
@@ -218,9 +266,11 @@ static void measure_one(pmu_counter_t *evt)
 
 static noinline void __measure(pmu_counter_t *evt, uint64_t count)
 {
+	u64 cntrs = BIT_ULL(event_to_global_idx(evt));
+
 	__start_event(evt, count);
-	loop();
-	stop_event(evt);
+	loop(cntrs);
+	__stop_event(evt);
 }
 
 static bool verify_event(uint64_t count, struct pmu_event *e)
@@ -483,7 +533,7 @@ static void check_running_counter_wrmsr(void)
 	report_prefix_push("running counter wrmsr");
 
 	start_event(&evt);
-	loop();
+	__loop();
 	wrmsr(MSR_GP_COUNTERx(0), 0);
 	stop_event(&evt);
 	report(evt.count < gp_events[instruction_idx].min, "cntr");
@@ -500,7 +550,7 @@ static void check_running_counter_wrmsr(void)
 
 	wrmsr(MSR_GP_COUNTERx(0), count);
 
-	loop();
+	__loop();
 	stop_event(&evt);
 
 	if (this_cpu_has_perf_global_status()) {
@@ -641,7 +691,7 @@ static void warm_up(void)
 	 * the real verification.
 	 */
 	while (i--)
-		loop();
+		loop(0);
 }
 
 static void check_counters(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 12/17] x86: pmu: Improve instruction and branches events verification
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (10 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 11/17] x86: pmu: Enable and disable PMCs in loop() asm blob Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 13/17] x86: pmu: Improve LLC misses event verification Dapeng Mi
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

If HW supports GLOBAL_CTRL MSR, enabling and disabling PMCs are moved in
__precise_count_loop(). Thus, instructions and branches events can be
verified against a precise count instead of a rough range.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/x86/pmu.c b/x86/pmu.c
index d97309d7b8a3..1f81d96030e4 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -18,6 +18,11 @@
 #define EXPECTED_INSTR 17
 #define EXPECTED_BRNCH 5
 
+
+/* Enable GLOBAL_CTRL + disable GLOBAL_CTRL instructions */
+#define EXTRA_INSTRNS  (3 + 3)
+#define LOOP_INSTRNS   (N * 10 + EXTRA_INSTRNS)
+#define LOOP_BRANCHES  (N)
 #define LOOP_ASM(_wrmsr)						\
 	_wrmsr "\n\t"							\
 	"mov %%ecx, %%edi; mov %%ebx, %%ecx;\n\t"			\
@@ -121,6 +126,24 @@ static inline void loop(u64 cntrs)
 		__precise_loop(cntrs);
 }
 
+static void adjust_events_range(struct pmu_event *gp_events,
+				int instruction_idx, int branch_idx)
+{
+	/*
+	 * If HW supports GLOBAL_CTRL MSR, enabling and disabling PMCs are
+	 * moved in __precise_loop(). Thus, instructions and branches events
+	 * can be verified against a precise count instead of a rough range.
+	 */
+	if (this_cpu_has_perf_global_ctrl()) {
+		/* instructions event */
+		gp_events[instruction_idx].min = LOOP_INSTRNS;
+		gp_events[instruction_idx].max = LOOP_INSTRNS;
+		/* branches event */
+		gp_events[branch_idx].min = LOOP_BRANCHES;
+		gp_events[branch_idx].max = LOOP_BRANCHES;
+	}
+}
+
 volatile uint64_t irq_received;
 
 static void cnt_overflow(isr_regs_t *regs)
@@ -821,6 +844,9 @@ static void check_invalid_rdpmc_gp(void)
 
 int main(int ac, char **av)
 {
+	int instruction_idx;
+	int branch_idx;
+
 	setup_vm();
 	handle_irq(PMI_VECTOR, cnt_overflow);
 	buf = malloc(N*64);
@@ -834,13 +860,18 @@ int main(int ac, char **av)
 		}
 		gp_events = (struct pmu_event *)intel_gp_events;
 		gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
+		instruction_idx = INTEL_INSTRUCTIONS_IDX;
+		branch_idx = INTEL_BRANCHES_IDX;
 		report_prefix_push("Intel");
 		set_ref_cycle_expectations();
 	} else {
 		gp_events_size = sizeof(amd_gp_events)/sizeof(amd_gp_events[0]);
 		gp_events = (struct pmu_event *)amd_gp_events;
+		instruction_idx = AMD_INSTRUCTIONS_IDX;
+		branch_idx = AMD_BRANCHES_IDX;
 		report_prefix_push("AMD");
 	}
+	adjust_events_range(gp_events, instruction_idx, branch_idx);
 
 	printf("PMU version:         %d\n", pmu.version);
 	printf("GP counters:         %d\n", pmu.nr_gp_counters);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 13/17] x86: pmu: Improve LLC misses event verification
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (11 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 12/17] x86: pmu: Improve instruction and branches events verification Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 14/17] x86: pmu: Adjust lower boundary of llc-misses event to 0 for legacy CPUs Dapeng Mi
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

When running pmu test on SPR, sometimes the following failure is
reported.

1 <= 0 <= 1000000
FAIL: Intel: llc misses-4

Currently The LLC misses occurring only depends on probability. It's
possible that there is no LLC misses happened in the whole loop(),
especially along with processors have larger and larger cache size just
like what we observed on SPR.

Thus, add clflush instruction into the loop() asm blob and ensure once
LLC miss is triggered at least.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 1f81d96030e4..fcae60d33966 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -19,19 +19,30 @@
 #define EXPECTED_BRNCH 5
 
 
-/* Enable GLOBAL_CTRL + disable GLOBAL_CTRL instructions */
-#define EXTRA_INSTRNS  (3 + 3)
+/* Enable GLOBAL_CTRL + disable GLOBAL_CTRL + clflush/mfence instructions */
+#define EXTRA_INSTRNS  (3 + 3 + 2)
 #define LOOP_INSTRNS   (N * 10 + EXTRA_INSTRNS)
 #define LOOP_BRANCHES  (N)
-#define LOOP_ASM(_wrmsr)						\
+#define LOOP_ASM(_wrmsr, _clflush)					\
 	_wrmsr "\n\t"							\
 	"mov %%ecx, %%edi; mov %%ebx, %%ecx;\n\t"			\
+	_clflush "\n\t"                                 		\
+	"mfence;\n\t"                                   		\
 	"1: mov (%1), %2; add $64, %1;\n\t"				\
 	"nop; nop; nop; nop; nop; nop; nop;\n\t"			\
 	"loop 1b;\n\t"							\
 	"mov %%edi, %%ecx; xor %%eax, %%eax; xor %%edx, %%edx;\n\t"	\
 	_wrmsr "\n\t"
 
+#define _loop_asm(_wrmsr, _clflush)				\
+do {								\
+	asm volatile(LOOP_ASM(_wrmsr, _clflush)			\
+		     : "=b"(tmp), "=r"(tmp2), "=r"(tmp3)	\
+		     : "a"(eax), "d"(edx), "c"(global_ctl),	\
+		       "0"(N), "1"(buf)				\
+		     : "edi");					\
+} while (0)
+
 typedef struct {
 	uint32_t ctr;
 	uint32_t idx;
@@ -87,14 +98,17 @@ char *buf;
 static struct pmu_event *gp_events;
 static unsigned int gp_events_size;
 
-
 static inline void __loop(void)
 {
 	unsigned long tmp, tmp2, tmp3;
+	u32 global_ctl = 0;
+	u32 eax = 0;
+	u32 edx = 0;
 
-	asm volatile(LOOP_ASM("nop")
-		     : "=c"(tmp), "=r"(tmp2), "=r"(tmp3)
-		     : "0"(N), "1"(buf));
+	if (this_cpu_has(X86_FEATURE_CLFLUSH))
+		_loop_asm("nop", "clflush (%1)");
+	else
+		_loop_asm("nop", "nop");
 }
 
 /*
@@ -107,15 +121,14 @@ static inline void __loop(void)
 static inline void __precise_loop(u64 cntrs)
 {
 	unsigned long tmp, tmp2, tmp3;
-	unsigned int global_ctl = pmu.msr_global_ctl;
+	u32 global_ctl = pmu.msr_global_ctl;
 	u32 eax = cntrs & (BIT_ULL(32) - 1);
 	u32 edx = cntrs >> 32;
 
-	asm volatile(LOOP_ASM("wrmsr")
-		     : "=b"(tmp), "=r"(tmp2), "=r"(tmp3)
-		     : "a"(eax), "d"(edx), "c"(global_ctl),
-		       "0"(N), "1"(buf)
-		     : "edi");
+	if (this_cpu_has(X86_FEATURE_CLFLUSH))
+		_loop_asm("wrmsr", "clflush (%1)");
+	else
+		_loop_asm("wrmsr", "nop");
 }
 
 static inline void loop(u64 cntrs)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 14/17] x86: pmu: Adjust lower boundary of llc-misses event to 0 for legacy CPUs
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (12 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 13/17] x86: pmu: Improve LLC misses event verification Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 15/17] x86: pmu: Add IBPB indirect jump asm blob Dapeng Mi
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

For these legacy Intel CPUs without clflush/clflushopt support, there is
on way to force to trigger a LLC miss and the measured llc misses is
possible to be 0. Thus adjust the lower boundary of llc-misses event to
0 to avoid possible false positive.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/x86/pmu.c b/x86/pmu.c
index fcae60d33966..adc7e6c640c1 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -81,6 +81,7 @@ struct pmu_event {
 enum {
 	INTEL_INSTRUCTIONS_IDX  = 1,
 	INTEL_REF_CYCLES_IDX	= 2,
+	INTEL_LLC_MISSES_IDX	= 4,
 	INTEL_BRANCHES_IDX	= 5,
 };
 
@@ -875,6 +876,15 @@ int main(int ac, char **av)
 		gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
 		instruction_idx = INTEL_INSTRUCTIONS_IDX;
 		branch_idx = INTEL_BRANCHES_IDX;
+
+		/*
+		 * For legacy Intel CPUS without clflush/clflushopt support,
+		 * there is no way to force to trigger a LLC miss, thus set
+		 * the minimum value to 0 to avoid false positives.
+		 */
+		if (!this_cpu_has(X86_FEATURE_CLFLUSH))
+			gp_events[INTEL_LLC_MISSES_IDX].min = 0;
+
 		report_prefix_push("Intel");
 		set_ref_cycle_expectations();
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 15/17] x86: pmu: Add IBPB indirect jump asm blob
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (13 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 14/17] x86: pmu: Adjust lower boundary of llc-misses event to 0 for legacy CPUs Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 16/17] x86: pmu: Adjust lower boundary of branch-misses event Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 17/17] x86: pmu: Optimize emulated instruction validation Dapeng Mi
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Currently the lower boundary of branch misses event is set to 0.
Strictly speaking 0 shouldn't be a valid count since it can't tell us if
branch misses event counter works correctly or even disabled. Whereas
it's also possible and reasonable that branch misses event count is 0
especailly for such simple loop() program with advanced branch
predictor.

To eliminate such ambiguity and make branch misses event verification
more acccurately, an extra IBPB indirect jump asm blob is appended and
IBPB command is leveraged to clear the branch target buffer and force to
cause a branch miss for the indirect jump.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index adc7e6c640c1..0b3dd1ba1766 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -18,25 +18,36 @@
 #define EXPECTED_INSTR 17
 #define EXPECTED_BRNCH 5
 
-
-/* Enable GLOBAL_CTRL + disable GLOBAL_CTRL + clflush/mfence instructions */
-#define EXTRA_INSTRNS  (3 + 3 + 2)
+#define IBPB_JMP_INSTRNS      7
+#define IBPB_JMP_BRANCHES     1
+#define IBPB_JMP_ASM(_wrmsr)				\
+	"mov $1, %%eax; xor %%edx, %%edx;\n\t"		\
+	"mov $73, %%ecx;\n\t"				\
+	_wrmsr "\n\t"					\
+	"lea 2f, %%rax;\n\t"				\
+	"jmp *%%rax;\n\t"				\
+	"nop;\n\t"					\
+	"2: nop;\n\t"
+
+/* GLOBAL_CTRL enable + disable + clflush/mfence + IBPB_JMP */
+#define EXTRA_INSTRNS  (3 + 3 + 2 + IBPB_JMP_INSTRNS)
 #define LOOP_INSTRNS   (N * 10 + EXTRA_INSTRNS)
-#define LOOP_BRANCHES  (N)
-#define LOOP_ASM(_wrmsr, _clflush)					\
-	_wrmsr "\n\t"							\
+#define LOOP_BRANCHES  (N + IBPB_JMP_BRANCHES)
+#define LOOP_ASM(_wrmsr1, _clflush, _wrmsr2)				\
+	_wrmsr1 "\n\t"							\
 	"mov %%ecx, %%edi; mov %%ebx, %%ecx;\n\t"			\
 	_clflush "\n\t"                                 		\
 	"mfence;\n\t"                                   		\
 	"1: mov (%1), %2; add $64, %1;\n\t"				\
 	"nop; nop; nop; nop; nop; nop; nop;\n\t"			\
 	"loop 1b;\n\t"							\
+	IBPB_JMP_ASM(_wrmsr2) 						\
 	"mov %%edi, %%ecx; xor %%eax, %%eax; xor %%edx, %%edx;\n\t"	\
-	_wrmsr "\n\t"
+	_wrmsr1 "\n\t"
 
-#define _loop_asm(_wrmsr, _clflush)				\
+#define _loop_asm(_wrmsr1, _clflush, _wrmsr2)			\
 do {								\
-	asm volatile(LOOP_ASM(_wrmsr, _clflush)			\
+	asm volatile(LOOP_ASM(_wrmsr1, _clflush, _wrmsr2)	\
 		     : "=b"(tmp), "=r"(tmp2), "=r"(tmp3)	\
 		     : "a"(eax), "d"(edx), "c"(global_ctl),	\
 		       "0"(N), "1"(buf)				\
@@ -99,6 +110,12 @@ char *buf;
 static struct pmu_event *gp_events;
 static unsigned int gp_events_size;
 
+static int has_ibpb(void)
+{
+	return this_cpu_has(X86_FEATURE_SPEC_CTRL) ||
+	       this_cpu_has(X86_FEATURE_AMD_IBPB);
+}
+
 static inline void __loop(void)
 {
 	unsigned long tmp, tmp2, tmp3;
@@ -106,10 +123,14 @@ static inline void __loop(void)
 	u32 eax = 0;
 	u32 edx = 0;
 
-	if (this_cpu_has(X86_FEATURE_CLFLUSH))
-		_loop_asm("nop", "clflush (%1)");
+	if (this_cpu_has(X86_FEATURE_CLFLUSH) && has_ibpb())
+		_loop_asm("nop", "clflush (%1)", "wrmsr");
+	else if (this_cpu_has(X86_FEATURE_CLFLUSH))
+		_loop_asm("nop", "clflush (%1)", "nop");
+	else if (has_ibpb())
+		_loop_asm("nop", "nop", "wrmsr");
 	else
-		_loop_asm("nop", "nop");
+		_loop_asm("nop", "nop", "nop");
 }
 
 /*
@@ -126,10 +147,14 @@ static inline void __precise_loop(u64 cntrs)
 	u32 eax = cntrs & (BIT_ULL(32) - 1);
 	u32 edx = cntrs >> 32;
 
-	if (this_cpu_has(X86_FEATURE_CLFLUSH))
-		_loop_asm("wrmsr", "clflush (%1)");
+	if (this_cpu_has(X86_FEATURE_CLFLUSH) && has_ibpb())
+		_loop_asm("wrmsr", "clflush (%1)", "wrmsr");
+	else if (this_cpu_has(X86_FEATURE_CLFLUSH))
+		_loop_asm("wrmsr", "clflush (%1)", "nop");
+	else if (has_ibpb())
+		_loop_asm("wrmsr", "nop", "wrmsr");
 	else
-		_loop_asm("wrmsr", "nop");
+		_loop_asm("wrmsr", "nop", "nop");
 }
 
 static inline void loop(u64 cntrs)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 16/17] x86: pmu: Adjust lower boundary of branch-misses event
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (14 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 15/17] x86: pmu: Add IBPB indirect jump asm blob Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 17/17] x86: pmu: Optimize emulated instruction validation Dapeng Mi
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

Since the IBPB command is added to force to trigger a branch miss at
least, the lower boundary of branch misses event is increased to 1 by
default. For these CPUs without IBPB support, adjust dynamically the
lower boundary to 0 to avoid false positive.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 0b3dd1ba1766..e0da522c004b 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -73,12 +73,12 @@ struct pmu_event {
 	{"llc references", 0x4f2e, 1, 2*N},
 	{"llc misses", 0x412e, 1, 1*N},
 	{"branches", 0x00c4, 1*N, 1.1*N},
-	{"branch misses", 0x00c5, 0, 0.1*N},
+	{"branch misses", 0x00c5, 1, 0.1*N},
 }, amd_gp_events[] = {
 	{"core cycles", 0x0076, 1*N, 50*N},
 	{"instructions", 0x00c0, 10*N, 10.2*N},
 	{"branches", 0x00c2, 1*N, 1.1*N},
-	{"branch misses", 0x00c3, 0, 0.1*N},
+	{"branch misses", 0x00c3, 1, 0.1*N},
 }, fixed_events[] = {
 	{"fixed 0", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
 	{"fixed 1", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
@@ -94,6 +94,7 @@ enum {
 	INTEL_REF_CYCLES_IDX	= 2,
 	INTEL_LLC_MISSES_IDX	= 4,
 	INTEL_BRANCHES_IDX	= 5,
+	INTEL_BRANCH_MISS_IDX	= 6,
 };
 
 /*
@@ -103,6 +104,7 @@ enum {
 enum {
 	AMD_INSTRUCTIONS_IDX    = 1,
 	AMD_BRANCHES_IDX	= 2,
+	AMD_BRANCH_MISS_IDX	= 3,
 };
 
 char *buf;
@@ -166,7 +168,8 @@ static inline void loop(u64 cntrs)
 }
 
 static void adjust_events_range(struct pmu_event *gp_events,
-				int instruction_idx, int branch_idx)
+				int instruction_idx, int branch_idx,
+				int branch_miss_idx)
 {
 	/*
 	 * If HW supports GLOBAL_CTRL MSR, enabling and disabling PMCs are
@@ -181,6 +184,17 @@ static void adjust_events_range(struct pmu_event *gp_events,
 		gp_events[branch_idx].min = LOOP_BRANCHES;
 		gp_events[branch_idx].max = LOOP_BRANCHES;
 	}
+
+	/*
+	 * For CPUs without IBPB support, no way to force to trigger a
+	 * branch miss and the measured branch misses is possible to be
+	 * 0. Thus overwrite the lower boundary of branch misses event
+	 * to 0 to avoid false positive.
+	 */
+	if (!has_ibpb()) {
+		/* branch misses event */
+		gp_events[branch_miss_idx].min = 0;
+	}
 }
 
 volatile uint64_t irq_received;
@@ -885,6 +899,7 @@ int main(int ac, char **av)
 {
 	int instruction_idx;
 	int branch_idx;
+	int branch_miss_idx;
 
 	setup_vm();
 	handle_irq(PMI_VECTOR, cnt_overflow);
@@ -901,6 +916,7 @@ int main(int ac, char **av)
 		gp_events_size = sizeof(intel_gp_events)/sizeof(intel_gp_events[0]);
 		instruction_idx = INTEL_INSTRUCTIONS_IDX;
 		branch_idx = INTEL_BRANCHES_IDX;
+		branch_miss_idx = INTEL_BRANCH_MISS_IDX;
 
 		/*
 		 * For legacy Intel CPUS without clflush/clflushopt support,
@@ -917,9 +933,10 @@ int main(int ac, char **av)
 		gp_events = (struct pmu_event *)amd_gp_events;
 		instruction_idx = AMD_INSTRUCTIONS_IDX;
 		branch_idx = AMD_BRANCHES_IDX;
+		branch_miss_idx = AMD_BRANCH_MISS_IDX;
 		report_prefix_push("AMD");
 	}
-	adjust_events_range(gp_events, instruction_idx, branch_idx);
+	adjust_events_range(gp_events, instruction_idx, branch_idx, branch_miss_idx);
 
 	printf("PMU version:         %d\n", pmu.version);
 	printf("GP counters:         %d\n", pmu.nr_gp_counters);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [kvm-unit-tests Patch v4 17/17] x86: pmu: Optimize emulated instruction validation
  2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
                   ` (15 preceding siblings ...)
  2024-04-19  3:52 ` [kvm-unit-tests Patch v4 16/17] x86: pmu: Adjust lower boundary of branch-misses event Dapeng Mi
@ 2024-04-19  3:52 ` Dapeng Mi
  16 siblings, 0 replies; 18+ messages in thread
From: Dapeng Mi @ 2024-04-19  3:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Jim Mattson, Mingwei Zhang
  Cc: kvm, linux-kernel, Xiong Zhang, Zhenyu Wang, Like Xu,
	Jinrong Liang, Dapeng Mi, Dapeng Mi

For support CPUs supporting PERF_GLOBAL_CTRL MSR, the validation for
emulated instruction can be improved to check against precise counts for
instructions and branches events instead of a rough range.

Move enabling and disabling PERF_GLOBAL_CTRL MSR into kvm_fep_asm blob,
thus instructions and branches events can be verified against precise
counts.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 108 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 65 insertions(+), 43 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index e0da522c004b..dd83f157b35c 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -13,11 +13,6 @@
 
 #define N 1000000
 
-// These values match the number of instructions and branches in the
-// assembly block in check_emulated_instr().
-#define EXPECTED_INSTR 17
-#define EXPECTED_BRNCH 5
-
 #define IBPB_JMP_INSTRNS      7
 #define IBPB_JMP_BRANCHES     1
 #define IBPB_JMP_ASM(_wrmsr)				\
@@ -54,6 +49,40 @@ do {								\
 		     : "edi");					\
 } while (0)
 
+/* the number of instructions and branches of the kvm_fep_asm() blob */
+#define KVM_FEP_INSTR	22
+#define KVM_FEP_BRNCH	5
+
+/*
+ * KVM_FEP is a magic prefix that forces emulation so
+ * 'KVM_FEP "jne label\n"' just counts as a single instruction.
+ */
+#define kvm_fep_asm(_wrmsr)			\
+do {						\
+	asm volatile(				\
+		_wrmsr "\n\t"			\
+		"mov %%ecx, %%edi;\n\t"		\
+		"mov $0x0, %%eax;\n\t"		\
+		"cmp $0x0, %%eax;\n\t"		\
+		KVM_FEP "jne 1f\n\t"		\
+		KVM_FEP "jne 1f\n\t"		\
+		KVM_FEP "jne 1f\n\t"		\
+		KVM_FEP "jne 1f\n\t"		\
+		KVM_FEP "jne 1f\n\t"		\
+		"mov $0xa, %%eax; cpuid;\n\t"	\
+		"mov $0xa, %%eax; cpuid;\n\t"	\
+		"mov $0xa, %%eax; cpuid;\n\t"	\
+		"mov $0xa, %%eax; cpuid;\n\t"	\
+		"mov $0xa, %%eax; cpuid;\n\t"	\
+		"1: mov %%edi, %%ecx; \n\t"	\
+		"xor %%eax, %%eax; \n\t"	\
+		"xor %%edx, %%edx;\n\t"		\
+		_wrmsr "\n\t"			\
+		:				\
+		: "a"(eax), "d"(edx), "c"(ecx)	\
+		: "ebx", "edi");		\
+} while (0)
+
 typedef struct {
 	uint32_t ctr;
 	uint32_t idx;
@@ -639,6 +668,7 @@ static void check_running_counter_wrmsr(void)
 
 static void check_emulated_instr(void)
 {
+	u32 eax, edx, ecx;
 	uint64_t status, instr_start, brnch_start;
 	uint64_t gp_counter_width = (1ull << pmu.gp_counter_width) - 1;
 	unsigned int branch_idx = pmu.is_intel ?
@@ -646,6 +676,7 @@ static void check_emulated_instr(void)
 	unsigned int instruction_idx = pmu.is_intel ?
 				       INTEL_INSTRUCTIONS_IDX :
 				       AMD_INSTRUCTIONS_IDX;
+
 	pmu_counter_t brnch_cnt = {
 		.ctr = MSR_GP_COUNTERx(0),
 		/* branch instructions */
@@ -661,55 +692,46 @@ static void check_emulated_instr(void)
 	if (this_cpu_has_perf_global_status())
 		pmu_clear_global_status();
 
-	start_event(&brnch_cnt);
-	start_event(&instr_cnt);
+	__start_event(&brnch_cnt, 0);
+	__start_event(&instr_cnt, 0);
 
-	brnch_start = -EXPECTED_BRNCH;
-	instr_start = -EXPECTED_INSTR;
+	brnch_start = -KVM_FEP_BRNCH;
+	instr_start = -KVM_FEP_INSTR;
 	wrmsr(MSR_GP_COUNTERx(0), brnch_start & gp_counter_width);
 	wrmsr(MSR_GP_COUNTERx(1), instr_start & gp_counter_width);
-	// KVM_FEP is a magic prefix that forces emulation so
-	// 'KVM_FEP "jne label\n"' just counts as a single instruction.
-	asm volatile(
-		"mov $0x0, %%eax\n"
-		"cmp $0x0, %%eax\n"
-		KVM_FEP "jne label\n"
-		KVM_FEP "jne label\n"
-		KVM_FEP "jne label\n"
-		KVM_FEP "jne label\n"
-		KVM_FEP "jne label\n"
-		"mov $0xa, %%eax\n"
-		"cpuid\n"
-		"mov $0xa, %%eax\n"
-		"cpuid\n"
-		"mov $0xa, %%eax\n"
-		"cpuid\n"
-		"mov $0xa, %%eax\n"
-		"cpuid\n"
-		"mov $0xa, %%eax\n"
-		"cpuid\n"
-		"label:\n"
-		:
-		:
-		: "eax", "ebx", "ecx", "edx");
 
-	if (this_cpu_has_perf_global_ctrl())
-		wrmsr(pmu.msr_global_ctl, 0);
+	if (this_cpu_has_perf_global_ctrl()) {
+		eax = BIT(0) | BIT(1);
+		ecx = pmu.msr_global_ctl;
+		edx = 0;
+		kvm_fep_asm("wrmsr");
+	} else {
+		eax = ecx = edx = 0;
+		kvm_fep_asm("nop");
+	}
 
-	stop_event(&brnch_cnt);
-	stop_event(&instr_cnt);
+	__stop_event(&brnch_cnt);
+	__stop_event(&instr_cnt);
 
 	// Check that the end count - start count is at least the expected
 	// number of instructions and branches.
-	report(instr_cnt.count - instr_start >= EXPECTED_INSTR,
-	       "instruction count");
-	report(brnch_cnt.count - brnch_start >= EXPECTED_BRNCH,
-	       "branch count");
+	if (this_cpu_has_perf_global_ctrl()) {
+		report(instr_cnt.count - instr_start == KVM_FEP_INSTR,
+		       "instruction count");
+		report(brnch_cnt.count - brnch_start == KVM_FEP_BRNCH,
+		       "branch count");
+	} else {
+		report(instr_cnt.count - instr_start >= KVM_FEP_INSTR,
+		       "instruction count");
+		report(brnch_cnt.count - brnch_start >= KVM_FEP_BRNCH,
+		       "branch count");
+	}
+
 	if (this_cpu_has_perf_global_status()) {
 		// Additionally check that those counters overflowed properly.
 		status = rdmsr(pmu.msr_global_status);
-		report(status & 1, "branch counter overflow");
-		report(status & 2, "instruction counter overflow");
+		report(status & BIT_ULL(0), "branch counter overflow");
+		report(status & BIT_ULL(1), "instruction counter overflow");
 	}
 
 	report_prefix_pop();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-04-19  3:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19  3:52 [kvm-unit-tests Patch v4 00/17] pmu test bugs fix and improvements Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 01/17] x86: pmu: Remove duplicate code in pmu_init() Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 02/17] x86: pmu: Remove blank line and redundant space Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 03/17] x86: pmu: Refine fixed_events[] names Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 04/17] x86: pmu: Fix the issue that pmu_counter_t.config crosses cache line Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 05/17] x86: pmu: Enlarge cnt[] length to 48 in check_counters_many() Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 06/17] x86: pmu: Add asserts to warn inconsistent fixed events and counters Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 07/17] x86: pmu: Fix cycles event validation failure Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 08/17] x86: pmu: Use macro to replace hard-coded branches event index Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 09/17] x86: pmu: Use macro to replace hard-coded ref-cycles " Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 10/17] x86: pmu: Use macro to replace hard-coded instructions " Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 11/17] x86: pmu: Enable and disable PMCs in loop() asm blob Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 12/17] x86: pmu: Improve instruction and branches events verification Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 13/17] x86: pmu: Improve LLC misses event verification Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 14/17] x86: pmu: Adjust lower boundary of llc-misses event to 0 for legacy CPUs Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 15/17] x86: pmu: Add IBPB indirect jump asm blob Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 16/17] x86: pmu: Adjust lower boundary of branch-misses event Dapeng Mi
2024-04-19  3:52 ` [kvm-unit-tests Patch v4 17/17] x86: pmu: Optimize emulated instruction validation Dapeng Mi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.