kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
@ 2022-08-05  0:41 Ricardo Koller
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Ricardo Koller @ 2022-08-05  0:41 UTC (permalink / raw)
  To: kvm, kvmarm, andrew.jones
  Cc: maz, alexandru.elisei, eric.auger, oliver.upton, reijiw, Ricardo Koller

There are some tests that fail when running on bare metal (including a
passthrough prototype).  There are three issues with the tests.  The
first one is that there are some missing isb()'s between enabling event
counting and the actual counting. This wasn't an issue on KVM as
trapping on registers served as context synchronization events. The
second issue is that some tests assume that registers reset to 0.  And
finally, the third issue is that overflowing the low counter of a
chained event sets the overflow flag in PMVOS and some tests fail by
checking for it not being set.

Addressed all comments from the previous version:
https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
- adding missing isb() and fixed the commit message (Alexandru).
- fixed wording of a report() check (Andrew).

Thanks!
Ricardo

Ricardo Koller (3):
  arm: pmu: Add missing isb()'s after sys register writing
  arm: pmu: Reset the pmu registers before starting some tests
  arm: pmu: Check for overflow in the low counter in chained counters
    tests

 arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 17 deletions(-)

-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing
  2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
@ 2022-08-05  0:41 ` Ricardo Koller
  2022-08-09 15:21   ` Alexandru Elisei
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests Ricardo Koller
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Ricardo Koller @ 2022-08-05  0:41 UTC (permalink / raw)
  To: kvm, kvmarm, andrew.jones
  Cc: maz, alexandru.elisei, eric.auger, oliver.upton, reijiw, Ricardo Koller

There are various pmu tests that require an isb() between enabling
counting and the actual counting. This can lead to count registers
reporting less events than expected; the actual enabling happens after
some events have happened.  For example, some missing isb()'s in the
pmu-sw-incr test lead to the following errors on bare-metal:

	INFO: pmu: pmu-sw-incr: SW_INCR counter #0 has value 4294967280
	PASS: pmu: pmu-sw-incr: PWSYNC does not increment if PMCR.E is unset
	FAIL: pmu: pmu-sw-incr: counter #1 after + 100 SW_INCR
	FAIL: pmu: pmu-sw-incr: counter #0 after + 100 SW_INCR
	INFO: pmu: pmu-sw-incr: counter values after 100 SW_INCR #0=82 #1=98
	PASS: pmu: pmu-sw-incr: overflow on counter #0 after 100 SW_INCR
	SUMMARY: 4 tests, 2 unexpected failures

Add the missing isb()'s on all failing tests, plus some others that seem
required:
- after clearing the overflow signal in the IRQ handler to make spurious
  interrupts less likely.
- after direct writes to PMSWINC_EL0 for software to read the correct
  value for PMEVNCTR0_EL0 (from ARM DDI 0487H.a, page D13-5237).

Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arm/pmu.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 15c542a2..4c601b05 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -307,6 +307,7 @@ static void irq_handler(struct pt_regs *regs)
 			}
 		}
 		write_sysreg(ALL_SET, pmovsclr_el0);
+		isb();
 	} else {
 		pmu_stats.unexpected = true;
 	}
@@ -534,10 +535,12 @@ static void test_sw_incr(void)
 	write_sysreg_s(0x3, PMCNTENSET_EL0);
 
 	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
+	isb();
 
 	for (i = 0; i < 100; i++)
 		write_sysreg(0x1, pmswinc_el0);
 
+	isb();
 	report_info("SW_INCR counter #0 has value %ld", read_regn_el0(pmevcntr, 0));
 	report(read_regn_el0(pmevcntr, 0) == PRE_OVERFLOW,
 		"PWSYNC does not increment if PMCR.E is unset");
@@ -547,10 +550,12 @@ static void test_sw_incr(void)
 	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
 	write_sysreg_s(0x3, PMCNTENSET_EL0);
 	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
+	isb();
 
 	for (i = 0; i < 100; i++)
 		write_sysreg(0x3, pmswinc_el0);
 
+	isb();
 	report(read_regn_el0(pmevcntr, 0)  == 84, "counter #1 after + 100 SW_INCR");
 	report(read_regn_el0(pmevcntr, 1)  == 100,
 		"counter #0 after + 100 SW_INCR");
@@ -618,9 +623,12 @@ static void test_chained_sw_incr(void)
 
 	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
 	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
+	isb();
+
 	for (i = 0; i < 100; i++)
 		write_sysreg(0x1, pmswinc_el0);
 
+	isb();
 	report(!read_sysreg(pmovsclr_el0) && (read_regn_el0(pmevcntr, 1) == 1),
 		"no overflow and chain counter incremented after 100 SW_INCR/CHAIN");
 	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
@@ -634,9 +642,12 @@ static void test_chained_sw_incr(void)
 	write_regn_el0(pmevcntr, 1, ALL_SET);
 	write_sysreg_s(0x3, PMCNTENSET_EL0);
 	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
+	isb();
+
 	for (i = 0; i < 100; i++)
 		write_sysreg(0x1, pmswinc_el0);
 
+	isb();
 	report((read_sysreg(pmovsclr_el0) == 0x2) &&
 		(read_regn_el0(pmevcntr, 1) == 0) &&
 		(read_regn_el0(pmevcntr, 0) == 84),
@@ -821,10 +832,14 @@ static void test_overflow_interrupt(void)
 	report(expect_interrupts(0), "no overflow interrupt after preset");
 
 	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
+	isb();
+
 	for (i = 0; i < 100; i++)
 		write_sysreg(0x2, pmswinc_el0);
 
+	isb();
 	set_pmcr(pmu.pmcr_ro);
+	isb();
 	report(expect_interrupts(0), "no overflow interrupt after counting");
 
 	/* enable interrupts */
@@ -879,6 +894,7 @@ static bool check_cycles_increase(void)
 	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
 
 	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+	isb();
 
 	for (int i = 0; i < NR_SAMPLES; i++) {
 		uint64_t a, b;
@@ -894,6 +910,7 @@ static bool check_cycles_increase(void)
 	}
 
 	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
+	isb();
 
 	return success;
 }
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests
  2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
@ 2022-08-05  0:41 ` Ricardo Koller
  2022-08-10 19:02   ` Andrew Jones
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests Ricardo Koller
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Ricardo Koller @ 2022-08-05  0:41 UTC (permalink / raw)
  To: kvm, kvmarm, andrew.jones
  Cc: maz, alexandru.elisei, eric.auger, oliver.upton, reijiw, Ricardo Koller

Some registers like the PMOVS reset to an architecturally UNKNOWN value.
Most tests expect them to be reset (mostly zeroed) using pmu_reset().
Add a pmu_reset() on all the tests that need one.

As a bonus, fix a couple of comments related to the register state
before a sub-test.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arm/pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 4c601b05..12e7d84e 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -826,7 +826,7 @@ static void test_overflow_interrupt(void)
 	write_regn_el0(pmevcntr, 1, PRE_OVERFLOW);
 	isb();
 
-	/* interrupts are disabled */
+	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
 
 	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
 	report(expect_interrupts(0), "no overflow interrupt after preset");
@@ -842,7 +842,7 @@ static void test_overflow_interrupt(void)
 	isb();
 	report(expect_interrupts(0), "no overflow interrupt after counting");
 
-	/* enable interrupts */
+	/* enable interrupts (PMINTENSET_EL1 <= ALL_SET) */
 
 	pmu_reset_stats();
 
@@ -890,6 +890,7 @@ static bool check_cycles_increase(void)
 	bool success = true;
 
 	/* init before event access, this test only cares about cycle count */
+	pmu_reset();
 	set_pmcntenset(1 << PMU_CYCLE_IDX);
 	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
 
@@ -944,6 +945,7 @@ static bool check_cpi(int cpi)
 	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
 
 	/* init before event access, this test only cares about cycle count */
+	pmu_reset();
 	set_pmcntenset(1 << PMU_CYCLE_IDX);
 	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
 
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests
  2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests Ricardo Koller
@ 2022-08-05  0:41 ` Ricardo Koller
  2022-08-10 17:30   ` Oliver Upton
  2022-08-10 17:33 ` [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Oliver Upton
  2022-10-04 16:20 ` Eric Auger
  4 siblings, 1 reply; 21+ messages in thread
From: Ricardo Koller @ 2022-08-05  0:41 UTC (permalink / raw)
  To: kvm, kvmarm, andrew.jones
  Cc: maz, alexandru.elisei, eric.auger, oliver.upton, reijiw, Ricardo Koller

A chained event overflowing on the low counter can set the overflow flag
in PMOVS.  KVM does not set it, but real HW and the fast-model seem to.
Moreover, the AArch64.IncrementEventCounter() pseudocode in the ARM ARM
(DDI 0487H.a, J1.1.1 "aarch64/debug") also sets the PMOVS bit on
overflow.

The pmu chain tests fail on bare metal when checking the overflow flag
of the low counter _not_ being set on overflow.  Fix by checking for
overflow. Note that this test fails in KVM without the respective fix.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arm/pmu.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 12e7d84e..0a7e12f8 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -583,7 +583,7 @@ static void test_chained_counters(void)
 	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
 
 	report(read_regn_el0(pmevcntr, 1) == 1, "CHAIN counter #1 incremented");
-	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #1");
+	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #1");
 
 	/* test 64b overflow */
 
@@ -595,7 +595,7 @@ static void test_chained_counters(void)
 	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
 	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
 	report(read_regn_el0(pmevcntr, 1) == 2, "CHAIN counter #1 set to 2");
-	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #2");
+	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #2");
 
 	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
 	write_regn_el0(pmevcntr, 1, ALL_SET);
@@ -603,7 +603,7 @@ static void test_chained_counters(void)
 	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
 	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
 	report(!read_regn_el0(pmevcntr, 1), "CHAIN counter #1 wrapped");
-	report(read_sysreg(pmovsclr_el0) == 0x2, "overflow on chain counter");
+	report(read_sysreg(pmovsclr_el0) == 0x3, "overflow on even and odd counters");
 }
 
 static void test_chained_sw_incr(void)
@@ -629,8 +629,9 @@ static void test_chained_sw_incr(void)
 		write_sysreg(0x1, pmswinc_el0);
 
 	isb();
-	report(!read_sysreg(pmovsclr_el0) && (read_regn_el0(pmevcntr, 1) == 1),
-		"no overflow and chain counter incremented after 100 SW_INCR/CHAIN");
+	report((read_sysreg(pmovsclr_el0) == 0x1) &&
+		(read_regn_el0(pmevcntr, 1) == 1),
+		"overflow and chain counter incremented after 100 SW_INCR/CHAIN");
 	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
 		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
 
@@ -648,10 +649,10 @@ static void test_chained_sw_incr(void)
 		write_sysreg(0x1, pmswinc_el0);
 
 	isb();
-	report((read_sysreg(pmovsclr_el0) == 0x2) &&
+	report((read_sysreg(pmovsclr_el0) == 0x3) &&
 		(read_regn_el0(pmevcntr, 1) == 0) &&
 		(read_regn_el0(pmevcntr, 0) == 84),
-		"overflow on chain counter and expected values after 100 SW_INCR/CHAIN");
+		"expected overflows and values after 100 SW_INCR/CHAIN");
 	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
 		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
 }
@@ -731,8 +732,9 @@ static void test_chain_promotion(void)
 	report_info("MEM_ACCESS counter #0 has value 0x%lx",
 		    read_regn_el0(pmevcntr, 0));
 
-	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
-		"CHAIN counter enabled: CHAIN counter was incremented and no overflow");
+	report((read_regn_el0(pmevcntr, 1) == 1) &&
+		(read_sysreg(pmovsclr_el0) == 0x1),
+		"CHAIN counter enabled: CHAIN counter was incremented and overflow");
 
 	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
 		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
@@ -759,8 +761,9 @@ static void test_chain_promotion(void)
 	report_info("MEM_ACCESS counter #0 has value 0x%lx",
 		    read_regn_el0(pmevcntr, 0));
 
-	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
-		"32b->64b: CHAIN counter incremented and no overflow");
+	report((read_regn_el0(pmevcntr, 1) == 1) &&
+		(read_sysreg(pmovsclr_el0) == 0x1),
+		"32b->64b: CHAIN counter incremented and overflow");
 
 	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
 		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
@@ -868,8 +871,8 @@ static void test_overflow_interrupt(void)
 	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
 	isb();
 	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
-	report(expect_interrupts(0),
-		"no overflow interrupt expected on 32b boundary");
+	report(expect_interrupts(0x1),
+		"expect overflow interrupt on 32b boundary");
 
 	/* overflow on odd counter */
 	pmu_reset_stats();
@@ -877,8 +880,8 @@ static void test_overflow_interrupt(void)
 	write_regn_el0(pmevcntr, 1, ALL_SET);
 	isb();
 	mem_access_loop(addr, 400, pmu.pmcr_ro | PMU_PMCR_E);
-	report(expect_interrupts(0x2),
-		"expect overflow interrupt on odd counter");
+	report(expect_interrupts(0x3),
+		"expect overflow interrupt on even and odd counter");
 }
 #endif
 
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
@ 2022-08-09 15:21   ` Alexandru Elisei
  0 siblings, 0 replies; 21+ messages in thread
From: Alexandru Elisei @ 2022-08-09 15:21 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, kvmarm, andrew.jones, maz, eric.auger, oliver.upton, reijiw

Hi,

On Thu, Aug 04, 2022 at 05:41:37PM -0700, Ricardo Koller wrote:
> There are various pmu tests that require an isb() between enabling
> counting and the actual counting. This can lead to count registers
> reporting less events than expected; the actual enabling happens after
> some events have happened.  For example, some missing isb()'s in the
> pmu-sw-incr test lead to the following errors on bare-metal:
> 
> 	INFO: pmu: pmu-sw-incr: SW_INCR counter #0 has value 4294967280
> 	PASS: pmu: pmu-sw-incr: PWSYNC does not increment if PMCR.E is unset
> 	FAIL: pmu: pmu-sw-incr: counter #1 after + 100 SW_INCR
> 	FAIL: pmu: pmu-sw-incr: counter #0 after + 100 SW_INCR
> 	INFO: pmu: pmu-sw-incr: counter values after 100 SW_INCR #0=82 #1=98
> 	PASS: pmu: pmu-sw-incr: overflow on counter #0 after 100 SW_INCR
> 	SUMMARY: 4 tests, 2 unexpected failures
> 
> Add the missing isb()'s on all failing tests, plus some others that seem
> required:
> - after clearing the overflow signal in the IRQ handler to make spurious
>   interrupts less likely.
> - after direct writes to PMSWINC_EL0 for software to read the correct
>   value for PMEVNCTR0_EL0 (from ARM DDI 0487H.a, page D13-5237).
> 
> Signed-off-by: Ricardo Koller <ricarkol@google.com>

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests Ricardo Koller
@ 2022-08-10 17:30   ` Oliver Upton
  2022-08-10 18:28     ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2022-08-10 17:30 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, kvmarm, andrew.jones, maz, alexandru.elisei, eric.auger, reijiw

On Thu, Aug 04, 2022 at 05:41:39PM -0700, Ricardo Koller wrote:
> A chained event overflowing on the low counter can set the overflow flag
> in PMOVS.  KVM does not set it, but real HW and the fast-model seem to.
> Moreover, the AArch64.IncrementEventCounter() pseudocode in the ARM ARM
> (DDI 0487H.a, J1.1.1 "aarch64/debug") also sets the PMOVS bit on
> overflow.
> 
> The pmu chain tests fail on bare metal when checking the overflow flag
> of the low counter _not_ being set on overflow.  Fix by checking for
> overflow. Note that this test fails in KVM without the respective fix.
> 

It'd be good to link out to the respective KVM fix, either by commit or
lore link if this patch lands before the kernel patches:

Link: http://lore.kernel.org/r/20220805135813.2102034-1-maz@kernel.org

--
Thanks,
Oliver

> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Ricardo Koller <ricarkol@google.com>
> ---
>  arm/pmu.c | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 12e7d84e..0a7e12f8 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -583,7 +583,7 @@ static void test_chained_counters(void)
>  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
>  
>  	report(read_regn_el0(pmevcntr, 1) == 1, "CHAIN counter #1 incremented");
> -	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #1");
> +	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #1");
>  
>  	/* test 64b overflow */
>  
> @@ -595,7 +595,7 @@ static void test_chained_counters(void)
>  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
>  	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
>  	report(read_regn_el0(pmevcntr, 1) == 2, "CHAIN counter #1 set to 2");
> -	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #2");
> +	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #2");
>  
>  	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
>  	write_regn_el0(pmevcntr, 1, ALL_SET);
> @@ -603,7 +603,7 @@ static void test_chained_counters(void)
>  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
>  	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
>  	report(!read_regn_el0(pmevcntr, 1), "CHAIN counter #1 wrapped");
> -	report(read_sysreg(pmovsclr_el0) == 0x2, "overflow on chain counter");
> +	report(read_sysreg(pmovsclr_el0) == 0x3, "overflow on even and odd counters");
>  }
>  
>  static void test_chained_sw_incr(void)
> @@ -629,8 +629,9 @@ static void test_chained_sw_incr(void)
>  		write_sysreg(0x1, pmswinc_el0);
>  
>  	isb();
> -	report(!read_sysreg(pmovsclr_el0) && (read_regn_el0(pmevcntr, 1) == 1),
> -		"no overflow and chain counter incremented after 100 SW_INCR/CHAIN");
> +	report((read_sysreg(pmovsclr_el0) == 0x1) &&
> +		(read_regn_el0(pmevcntr, 1) == 1),
> +		"overflow and chain counter incremented after 100 SW_INCR/CHAIN");
>  	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
>  		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
>  
> @@ -648,10 +649,10 @@ static void test_chained_sw_incr(void)
>  		write_sysreg(0x1, pmswinc_el0);
>  
>  	isb();
> -	report((read_sysreg(pmovsclr_el0) == 0x2) &&
> +	report((read_sysreg(pmovsclr_el0) == 0x3) &&
>  		(read_regn_el0(pmevcntr, 1) == 0) &&
>  		(read_regn_el0(pmevcntr, 0) == 84),
> -		"overflow on chain counter and expected values after 100 SW_INCR/CHAIN");
> +		"expected overflows and values after 100 SW_INCR/CHAIN");
>  	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
>  		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
>  }
> @@ -731,8 +732,9 @@ static void test_chain_promotion(void)
>  	report_info("MEM_ACCESS counter #0 has value 0x%lx",
>  		    read_regn_el0(pmevcntr, 0));
>  
> -	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
> -		"CHAIN counter enabled: CHAIN counter was incremented and no overflow");
> +	report((read_regn_el0(pmevcntr, 1) == 1) &&
> +		(read_sysreg(pmovsclr_el0) == 0x1),
> +		"CHAIN counter enabled: CHAIN counter was incremented and overflow");
>  
>  	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
>  		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
> @@ -759,8 +761,9 @@ static void test_chain_promotion(void)
>  	report_info("MEM_ACCESS counter #0 has value 0x%lx",
>  		    read_regn_el0(pmevcntr, 0));
>  
> -	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
> -		"32b->64b: CHAIN counter incremented and no overflow");
> +	report((read_regn_el0(pmevcntr, 1) == 1) &&
> +		(read_sysreg(pmovsclr_el0) == 0x1),
> +		"32b->64b: CHAIN counter incremented and overflow");
>  
>  	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
>  		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
> @@ -868,8 +871,8 @@ static void test_overflow_interrupt(void)
>  	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
>  	isb();
>  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
> -	report(expect_interrupts(0),
> -		"no overflow interrupt expected on 32b boundary");
> +	report(expect_interrupts(0x1),
> +		"expect overflow interrupt on 32b boundary");
>  
>  	/* overflow on odd counter */
>  	pmu_reset_stats();
> @@ -877,8 +880,8 @@ static void test_overflow_interrupt(void)
>  	write_regn_el0(pmevcntr, 1, ALL_SET);
>  	isb();
>  	mem_access_loop(addr, 400, pmu.pmcr_ro | PMU_PMCR_E);
> -	report(expect_interrupts(0x2),
> -		"expect overflow interrupt on odd counter");
> +	report(expect_interrupts(0x3),
> +		"expect overflow interrupt on even and odd counter");
>  }
>  #endif
>  
> -- 
> 2.37.1.559.g78731f0fdb-goog
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
                   ` (2 preceding siblings ...)
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests Ricardo Koller
@ 2022-08-10 17:33 ` Oliver Upton
  2022-10-04 16:20 ` Eric Auger
  4 siblings, 0 replies; 21+ messages in thread
From: Oliver Upton @ 2022-08-10 17:33 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, kvmarm, andrew.jones, maz, alexandru.elisei, eric.auger, reijiw

On Thu, Aug 04, 2022 at 05:41:36PM -0700, Ricardo Koller wrote:
> There are some tests that fail when running on bare metal (including a
> passthrough prototype).  There are three issues with the tests.  The
> first one is that there are some missing isb()'s between enabling event
> counting and the actual counting. This wasn't an issue on KVM as
> trapping on registers served as context synchronization events. The
> second issue is that some tests assume that registers reset to 0.  And
> finally, the third issue is that overflowing the low counter of a
> chained event sets the overflow flag in PMVOS and some tests fail by
> checking for it not being set.
> 
> Addressed all comments from the previous version:
> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> - adding missing isb() and fixed the commit message (Alexandru).
> - fixed wording of a report() check (Andrew).
> 
> Thanks!
> Ricardo
> 
> Ricardo Koller (3):
>   arm: pmu: Add missing isb()'s after sys register writing
>   arm: pmu: Reset the pmu registers before starting some tests
>   arm: pmu: Check for overflow in the low counter in chained counters
>     tests

For the series:

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests
  2022-08-10 17:30   ` Oliver Upton
@ 2022-08-10 18:28     ` Andrew Jones
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Jones @ 2022-08-10 18:28 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ricardo Koller, kvm, kvmarm, maz, alexandru.elisei, eric.auger, reijiw

On Wed, Aug 10, 2022 at 12:30:26PM -0500, Oliver Upton wrote:
> On Thu, Aug 04, 2022 at 05:41:39PM -0700, Ricardo Koller wrote:
> > A chained event overflowing on the low counter can set the overflow flag
> > in PMOVS.  KVM does not set it, but real HW and the fast-model seem to.
> > Moreover, the AArch64.IncrementEventCounter() pseudocode in the ARM ARM
> > (DDI 0487H.a, J1.1.1 "aarch64/debug") also sets the PMOVS bit on
> > overflow.
> > 
> > The pmu chain tests fail on bare metal when checking the overflow flag
> > of the low counter _not_ being set on overflow.  Fix by checking for
> > overflow. Note that this test fails in KVM without the respective fix.
> > 
> 
> It'd be good to link out to the respective KVM fix, either by commit or
> lore link if this patch lands before the kernel patches:
> 
> Link: http://lore.kernel.org/r/20220805135813.2102034-1-maz@kernel.org

I'll pick that up with the tags when preparing to push.

Thanks,
drew

> 
> --
> Thanks,
> Oliver
> 
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Ricardo Koller <ricarkol@google.com>
> > ---
> >  arm/pmu.c | 33 ++++++++++++++++++---------------
> >  1 file changed, 18 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arm/pmu.c b/arm/pmu.c
> > index 12e7d84e..0a7e12f8 100644
> > --- a/arm/pmu.c
> > +++ b/arm/pmu.c
> > @@ -583,7 +583,7 @@ static void test_chained_counters(void)
> >  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
> >  
> >  	report(read_regn_el0(pmevcntr, 1) == 1, "CHAIN counter #1 incremented");
> > -	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #1");
> > +	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #1");
> >  
> >  	/* test 64b overflow */
> >  
> > @@ -595,7 +595,7 @@ static void test_chained_counters(void)
> >  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
> >  	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
> >  	report(read_regn_el0(pmevcntr, 1) == 2, "CHAIN counter #1 set to 2");
> > -	report(!read_sysreg(pmovsclr_el0), "no overflow recorded for chained incr #2");
> > +	report(read_sysreg(pmovsclr_el0) == 0x1, "overflow recorded for chained incr #2");
> >  
> >  	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
> >  	write_regn_el0(pmevcntr, 1, ALL_SET);
> > @@ -603,7 +603,7 @@ static void test_chained_counters(void)
> >  	precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
> >  	report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
> >  	report(!read_regn_el0(pmevcntr, 1), "CHAIN counter #1 wrapped");
> > -	report(read_sysreg(pmovsclr_el0) == 0x2, "overflow on chain counter");
> > +	report(read_sysreg(pmovsclr_el0) == 0x3, "overflow on even and odd counters");
> >  }
> >  
> >  static void test_chained_sw_incr(void)
> > @@ -629,8 +629,9 @@ static void test_chained_sw_incr(void)
> >  		write_sysreg(0x1, pmswinc_el0);
> >  
> >  	isb();
> > -	report(!read_sysreg(pmovsclr_el0) && (read_regn_el0(pmevcntr, 1) == 1),
> > -		"no overflow and chain counter incremented after 100 SW_INCR/CHAIN");
> > +	report((read_sysreg(pmovsclr_el0) == 0x1) &&
> > +		(read_regn_el0(pmevcntr, 1) == 1),
> > +		"overflow and chain counter incremented after 100 SW_INCR/CHAIN");
> >  	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
> >  		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
> >  
> > @@ -648,10 +649,10 @@ static void test_chained_sw_incr(void)
> >  		write_sysreg(0x1, pmswinc_el0);
> >  
> >  	isb();
> > -	report((read_sysreg(pmovsclr_el0) == 0x2) &&
> > +	report((read_sysreg(pmovsclr_el0) == 0x3) &&
> >  		(read_regn_el0(pmevcntr, 1) == 0) &&
> >  		(read_regn_el0(pmevcntr, 0) == 84),
> > -		"overflow on chain counter and expected values after 100 SW_INCR/CHAIN");
> > +		"expected overflows and values after 100 SW_INCR/CHAIN");
> >  	report_info("overflow=0x%lx, #0=%ld #1=%ld", read_sysreg(pmovsclr_el0),
> >  		    read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
> >  }
> > @@ -731,8 +732,9 @@ static void test_chain_promotion(void)
> >  	report_info("MEM_ACCESS counter #0 has value 0x%lx",
> >  		    read_regn_el0(pmevcntr, 0));
> >  
> > -	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
> > -		"CHAIN counter enabled: CHAIN counter was incremented and no overflow");
> > +	report((read_regn_el0(pmevcntr, 1) == 1) &&
> > +		(read_sysreg(pmovsclr_el0) == 0x1),
> > +		"CHAIN counter enabled: CHAIN counter was incremented and overflow");
> >  
> >  	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
> >  		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
> > @@ -759,8 +761,9 @@ static void test_chain_promotion(void)
> >  	report_info("MEM_ACCESS counter #0 has value 0x%lx",
> >  		    read_regn_el0(pmevcntr, 0));
> >  
> > -	report((read_regn_el0(pmevcntr, 1) == 1) && !read_sysreg(pmovsclr_el0),
> > -		"32b->64b: CHAIN counter incremented and no overflow");
> > +	report((read_regn_el0(pmevcntr, 1) == 1) &&
> > +		(read_sysreg(pmovsclr_el0) == 0x1),
> > +		"32b->64b: CHAIN counter incremented and overflow");
> >  
> >  	report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
> >  		read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
> > @@ -868,8 +871,8 @@ static void test_overflow_interrupt(void)
> >  	write_regn_el0(pmevcntr, 0, PRE_OVERFLOW);
> >  	isb();
> >  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
> > -	report(expect_interrupts(0),
> > -		"no overflow interrupt expected on 32b boundary");
> > +	report(expect_interrupts(0x1),
> > +		"expect overflow interrupt on 32b boundary");
> >  
> >  	/* overflow on odd counter */
> >  	pmu_reset_stats();
> > @@ -877,8 +880,8 @@ static void test_overflow_interrupt(void)
> >  	write_regn_el0(pmevcntr, 1, ALL_SET);
> >  	isb();
> >  	mem_access_loop(addr, 400, pmu.pmcr_ro | PMU_PMCR_E);
> > -	report(expect_interrupts(0x2),
> > -		"expect overflow interrupt on odd counter");
> > +	report(expect_interrupts(0x3),
> > +		"expect overflow interrupt on even and odd counter");
> >  }
> >  #endif
> >  
> > -- 
> > 2.37.1.559.g78731f0fdb-goog
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests
  2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests Ricardo Koller
@ 2022-08-10 19:02   ` Andrew Jones
  2022-08-10 23:33     ` Ricardo Koller
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2022-08-10 19:02 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, kvmarm, maz, alexandru.elisei, eric.auger, oliver.upton, reijiw

On Thu, Aug 04, 2022 at 05:41:38PM -0700, Ricardo Koller wrote:
> Some registers like the PMOVS reset to an architecturally UNKNOWN value.
> Most tests expect them to be reset (mostly zeroed) using pmu_reset().
> Add a pmu_reset() on all the tests that need one.
> 
> As a bonus, fix a couple of comments related to the register state
> before a sub-test.
> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Ricardo Koller <ricarkol@google.com>
> ---
>  arm/pmu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 4c601b05..12e7d84e 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -826,7 +826,7 @@ static void test_overflow_interrupt(void)
>  	write_regn_el0(pmevcntr, 1, PRE_OVERFLOW);
>  	isb();
>  
> -	/* interrupts are disabled */
> +	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
>  
>  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
>  	report(expect_interrupts(0), "no overflow interrupt after preset");
> @@ -842,7 +842,7 @@ static void test_overflow_interrupt(void)
>  	isb();
>  	report(expect_interrupts(0), "no overflow interrupt after counting");
>  
> -	/* enable interrupts */
> +	/* enable interrupts (PMINTENSET_EL1 <= ALL_SET) */
>  
>  	pmu_reset_stats();
>  
> @@ -890,6 +890,7 @@ static bool check_cycles_increase(void)
>  	bool success = true;
>  
>  	/* init before event access, this test only cares about cycle count */
> +	pmu_reset();

This and the other pmu_reset() call below break compilation on 32-bit arm,
because there's no pmu_reset() defined for it. It'd probably be best if
we actually implemented some sort of reset for arm, considering it's being
called in common tests.

Thanks,
drew

>  	set_pmcntenset(1 << PMU_CYCLE_IDX);
>  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>  
> @@ -944,6 +945,7 @@ static bool check_cpi(int cpi)
>  	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>  
>  	/* init before event access, this test only cares about cycle count */
> +	pmu_reset();
>  	set_pmcntenset(1 << PMU_CYCLE_IDX);
>  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>  
> -- 
> 2.37.1.559.g78731f0fdb-goog
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests
  2022-08-10 19:02   ` Andrew Jones
@ 2022-08-10 23:33     ` Ricardo Koller
  2022-08-11  7:04       ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Koller @ 2022-08-10 23:33 UTC (permalink / raw)
  To: Andrew Jones
  Cc: kvm, kvmarm, maz, alexandru.elisei, eric.auger, oliver.upton, reijiw

On Wed, Aug 10, 2022 at 09:02:16PM +0200, Andrew Jones wrote:
> On Thu, Aug 04, 2022 at 05:41:38PM -0700, Ricardo Koller wrote:
> > Some registers like the PMOVS reset to an architecturally UNKNOWN value.
> > Most tests expect them to be reset (mostly zeroed) using pmu_reset().
> > Add a pmu_reset() on all the tests that need one.
> > 
> > As a bonus, fix a couple of comments related to the register state
> > before a sub-test.
> > 
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Ricardo Koller <ricarkol@google.com>
> > ---
> >  arm/pmu.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arm/pmu.c b/arm/pmu.c
> > index 4c601b05..12e7d84e 100644
> > --- a/arm/pmu.c
> > +++ b/arm/pmu.c
> > @@ -826,7 +826,7 @@ static void test_overflow_interrupt(void)
> >  	write_regn_el0(pmevcntr, 1, PRE_OVERFLOW);
> >  	isb();
> >  
> > -	/* interrupts are disabled */
> > +	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
> >  
> >  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
> >  	report(expect_interrupts(0), "no overflow interrupt after preset");
> > @@ -842,7 +842,7 @@ static void test_overflow_interrupt(void)
> >  	isb();
> >  	report(expect_interrupts(0), "no overflow interrupt after counting");
> >  
> > -	/* enable interrupts */
> > +	/* enable interrupts (PMINTENSET_EL1 <= ALL_SET) */
> >  
> >  	pmu_reset_stats();
> >  
> > @@ -890,6 +890,7 @@ static bool check_cycles_increase(void)
> >  	bool success = true;
> >  
> >  	/* init before event access, this test only cares about cycle count */
> > +	pmu_reset();
> 
> This and the other pmu_reset() call below break compilation on 32-bit arm,
> because there's no pmu_reset() defined for it.
I completely missed the 32-bit arm case. Thanks!

> It'd probably be best if
> we actually implemented some sort of reset for arm, considering it's being
> called in common tests.

Mind if I start by creating a pmu_reset() for 32-bit arm, which can
later be used by a general arm_reset()?

> 
> Thanks,
> drew
> 
> >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> >  
> > @@ -944,6 +945,7 @@ static bool check_cpi(int cpi)
> >  	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> >  
> >  	/* init before event access, this test only cares about cycle count */
> > +	pmu_reset();
> >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> >  
> > -- 
> > 2.37.1.559.g78731f0fdb-goog
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests
  2022-08-10 23:33     ` Ricardo Koller
@ 2022-08-11  7:04       ` Andrew Jones
  2022-08-11 18:51         ` Ricardo Koller
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2022-08-11  7:04 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, kvmarm, maz, alexandru.elisei, eric.auger, oliver.upton, reijiw

On Wed, Aug 10, 2022 at 04:33:26PM -0700, Ricardo Koller wrote:
> On Wed, Aug 10, 2022 at 09:02:16PM +0200, Andrew Jones wrote:
> > On Thu, Aug 04, 2022 at 05:41:38PM -0700, Ricardo Koller wrote:
> > > Some registers like the PMOVS reset to an architecturally UNKNOWN value.
> > > Most tests expect them to be reset (mostly zeroed) using pmu_reset().
> > > Add a pmu_reset() on all the tests that need one.
> > > 
> > > As a bonus, fix a couple of comments related to the register state
> > > before a sub-test.
> > > 
> > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > Signed-off-by: Ricardo Koller <ricarkol@google.com>
> > > ---
> > >  arm/pmu.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arm/pmu.c b/arm/pmu.c
> > > index 4c601b05..12e7d84e 100644
> > > --- a/arm/pmu.c
> > > +++ b/arm/pmu.c
> > > @@ -826,7 +826,7 @@ static void test_overflow_interrupt(void)
> > >  	write_regn_el0(pmevcntr, 1, PRE_OVERFLOW);
> > >  	isb();
> > >  
> > > -	/* interrupts are disabled */
> > > +	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
> > >  
> > >  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
> > >  	report(expect_interrupts(0), "no overflow interrupt after preset");
> > > @@ -842,7 +842,7 @@ static void test_overflow_interrupt(void)
> > >  	isb();
> > >  	report(expect_interrupts(0), "no overflow interrupt after counting");
> > >  
> > > -	/* enable interrupts */
> > > +	/* enable interrupts (PMINTENSET_EL1 <= ALL_SET) */
> > >  
> > >  	pmu_reset_stats();
> > >  
> > > @@ -890,6 +890,7 @@ static bool check_cycles_increase(void)
> > >  	bool success = true;
> > >  
> > >  	/* init before event access, this test only cares about cycle count */
> > > +	pmu_reset();
> > 
> > This and the other pmu_reset() call below break compilation on 32-bit arm,
> > because there's no pmu_reset() defined for it.
> I completely missed the 32-bit arm case. Thanks!
> 
> > It'd probably be best if
> > we actually implemented some sort of reset for arm, considering it's being
> > called in common tests.
> 
> Mind if I start by creating a pmu_reset() for 32-bit arm, which can
> later be used by a general arm_reset()?

No need to worry about a general one. We just need a pmu_reset implemented
for 32-bit arm up in its #ifdef __arm__ section.

Thanks,
drew

> 
> > 
> > Thanks,
> > drew
> > 
> > >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> > >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> > >  
> > > @@ -944,6 +945,7 @@ static bool check_cpi(int cpi)
> > >  	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> > >  
> > >  	/* init before event access, this test only cares about cycle count */
> > > +	pmu_reset();
> > >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> > >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> > >  
> > > -- 
> > > 2.37.1.559.g78731f0fdb-goog
> > > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests
  2022-08-11  7:04       ` Andrew Jones
@ 2022-08-11 18:51         ` Ricardo Koller
  0 siblings, 0 replies; 21+ messages in thread
From: Ricardo Koller @ 2022-08-11 18:51 UTC (permalink / raw)
  To: Andrew Jones
  Cc: kvm, kvmarm, maz, alexandru.elisei, eric.auger, oliver.upton, reijiw

On Thu, Aug 11, 2022 at 09:04:05AM +0200, Andrew Jones wrote:
> On Wed, Aug 10, 2022 at 04:33:26PM -0700, Ricardo Koller wrote:
> > On Wed, Aug 10, 2022 at 09:02:16PM +0200, Andrew Jones wrote:
> > > On Thu, Aug 04, 2022 at 05:41:38PM -0700, Ricardo Koller wrote:
> > > > Some registers like the PMOVS reset to an architecturally UNKNOWN value.
> > > > Most tests expect them to be reset (mostly zeroed) using pmu_reset().
> > > > Add a pmu_reset() on all the tests that need one.
> > > > 
> > > > As a bonus, fix a couple of comments related to the register state
> > > > before a sub-test.
> > > > 
> > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > Signed-off-by: Ricardo Koller <ricarkol@google.com>
> > > > ---
> > > >  arm/pmu.c | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arm/pmu.c b/arm/pmu.c
> > > > index 4c601b05..12e7d84e 100644
> > > > --- a/arm/pmu.c
> > > > +++ b/arm/pmu.c
> > > > @@ -826,7 +826,7 @@ static void test_overflow_interrupt(void)
> > > >  	write_regn_el0(pmevcntr, 1, PRE_OVERFLOW);
> > > >  	isb();
> > > >  
> > > > -	/* interrupts are disabled */
> > > > +	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
> > > >  
> > > >  	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E);
> > > >  	report(expect_interrupts(0), "no overflow interrupt after preset");
> > > > @@ -842,7 +842,7 @@ static void test_overflow_interrupt(void)
> > > >  	isb();
> > > >  	report(expect_interrupts(0), "no overflow interrupt after counting");
> > > >  
> > > > -	/* enable interrupts */
> > > > +	/* enable interrupts (PMINTENSET_EL1 <= ALL_SET) */
> > > >  
> > > >  	pmu_reset_stats();
> > > >  
> > > > @@ -890,6 +890,7 @@ static bool check_cycles_increase(void)
> > > >  	bool success = true;
> > > >  
> > > >  	/* init before event access, this test only cares about cycle count */
> > > > +	pmu_reset();
> > > 
> > > This and the other pmu_reset() call below break compilation on 32-bit arm,
> > > because there's no pmu_reset() defined for it.
> > I completely missed the 32-bit arm case. Thanks!
> > 
> > > It'd probably be best if
> > > we actually implemented some sort of reset for arm, considering it's being
> > > called in common tests.
> > 
> > Mind if I start by creating a pmu_reset() for 32-bit arm, which can
> > later be used by a general arm_reset()?
> 
> No need to worry about a general one. We just need a pmu_reset implemented
> for 32-bit arm up in its #ifdef __arm__ section.

Ahh, OK, for some reason I thought you meant a general one. Anyway,
sending v4.

Thanks,
Ricardo

> 
> Thanks,
> drew
> 
> > 
> > > 
> > > Thanks,
> > > drew
> > > 
> > > >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> > > >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> > > >  
> > > > @@ -944,6 +945,7 @@ static bool check_cpi(int cpi)
> > > >  	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> > > >  
> > > >  	/* init before event access, this test only cares about cycle count */
> > > > +	pmu_reset();
> > > >  	set_pmcntenset(1 << PMU_CYCLE_IDX);
> > > >  	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> > > >  
> > > > -- 
> > > > 2.37.1.559.g78731f0fdb-goog
> > > > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
                   ` (3 preceding siblings ...)
  2022-08-10 17:33 ` [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Oliver Upton
@ 2022-10-04 16:20 ` Eric Auger
  2022-10-04 16:58   ` Alexandru Elisei
  4 siblings, 1 reply; 21+ messages in thread
From: Eric Auger @ 2022-10-04 16:20 UTC (permalink / raw)
  To: Ricardo Koller, kvm, kvmarm, andrew.jones
  Cc: maz, alexandru.elisei, oliver.upton, reijiw

Hi Ricardo, Marc,

On 8/5/22 02:41, Ricardo Koller wrote:
> There are some tests that fail when running on bare metal (including a
> passthrough prototype).  There are three issues with the tests.  The
> first one is that there are some missing isb()'s between enabling event
> counting and the actual counting. This wasn't an issue on KVM as
> trapping on registers served as context synchronization events. The
> second issue is that some tests assume that registers reset to 0.  And
> finally, the third issue is that overflowing the low counter of a
> chained event sets the overflow flag in PMVOS and some tests fail by
> checking for it not being set.
>
> Addressed all comments from the previous version:
> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> - adding missing isb() and fixed the commit message (Alexandru).
> - fixed wording of a report() check (Andrew).
>
> Thanks!
> Ricardo
>
> Ricardo Koller (3):
>   arm: pmu: Add missing isb()'s after sys register writing
>   arm: pmu: Reset the pmu registers before starting some tests
>   arm: pmu: Check for overflow in the low counter in chained counters
>     tests
>
>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 39 insertions(+), 17 deletions(-)
>
While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
test failures on some machines. This does not seem related to those
series though since I was able to get them without. The failures happen
on Amberwing machine for instance with the pmu-chain-promotion.

While further investigating I noticed there is a lot of variability on
the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
on the first iteration and 0x96 on the subsequent ones for instance.
While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
expecting the counter to be close to 20. It is on some HW.

        for (int i = 0; i < 20; i++) {
                write_regn_el0(pmevtyper, 0, MEM_ACCESS |
PMEVTYPER_EXCLUDE_EL0);
                write_sysreg_s(0x1, PMCNTENSET_EL0);
                write_regn_el0(pmevcntr, 0, 0);
                isb();
                mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
                isb();
                report_info("iter %d, MEM_ACCESS counter #0 has value
0x%lx",
                            i, read_regn_el0(pmevcntr, 0));
                write_sysreg_s(0x0, PMCNTENCLR_EL0);
        }

[I know there are some useless isb's by the way but that's just debug code.]

gives

INFO: PMU version: 0x1
INFO: PMU implementer/ID code: 0x51("Q")/0
INFO: Implements 8 event counters
INFO: pmu: pmu-chain-promotion: iter 0, MEM_ACCESS counter #0 has value
0x1f
INFO: pmu: pmu-chain-promotion: iter 1, MEM_ACCESS counter #0 has value
0x96 <--- ?
INFO: pmu: pmu-chain-promotion: iter 2, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 3, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 4, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 5, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 6, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 7, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 8, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 9, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 10, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 11, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 12, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 13, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 14, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 15, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 16, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 17, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 18, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 19, MEM_ACCESS counter #0 has value
0x96

If I run the following sequence before the previous one:
        for (int i = 0; i < 20; i++) {
                write_regn_el0(pmevtyper, 0, SW_INCR |
PMEVTYPER_EXCLUDE_EL0);
                write_sysreg_s(0x1, PMCNTENSET_EL0);
                write_regn_el0(pmevcntr, 0, 0);

                set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
                for (int j = 0; j < 20; j++)
                        write_sysreg(0x1, pmswinc_el0);
                set_pmcr(pmu.pmcr_ro);

                report_info("iter %d, 20 x SW_INCRs counter #0 has value
0x%lx",
                            i, read_regn_el0(pmevcntr, 0));
                write_sysreg_s(0x0, PMCNTENCLR_EL0);
        }

I get

INFO: pmu: pmu-chain-promotion: iter 0, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 1, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 2, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 3, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 4, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 5, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 6, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 7, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 8, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 9, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 10, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 11, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 12, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 13, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 14, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 15, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 16, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 17, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 18, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 19, 20 x SW_INCRs counter #0 has
value 0x14
INFO: pmu: pmu-chain-promotion: iter 0, MEM_ACCESS counter #0 has value
0x96 <---
INFO: pmu: pmu-chain-promotion: iter 1, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 2, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 3, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 4, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 5, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 6, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 7, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 8, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 9, MEM_ACCESS counter #0 has value 0x96
INFO: pmu: pmu-chain-promotion: iter 10, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 11, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 12, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 13, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 14, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 15, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 16, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 17, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 18, MEM_ACCESS counter #0 has value
0x96
INFO: pmu: pmu-chain-promotion: iter 19, MEM_ACCESS counter #0 has value
0x96

So I come to the actual question. Can we do any assumption on the
(virtual) PMU quality/precision? If not, the tests I originally wrote
are damned to fail on some HW (on some other they always pass) and I
need to make a decision wrt re-writing part of them, expecially those
which expect overflow after a given amount of ops. Otherwise, there is
either something wrong in the test (asm?) or in KVM PMU emulation.

I tried to bisect because I did observe the same behavior on some older
kernels but the bisect was not successful as the issue does not happen
always.

Thoughts?

Eric









^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-04 16:20 ` Eric Auger
@ 2022-10-04 16:58   ` Alexandru Elisei
  2022-10-04 17:31     ` Eric Auger
  0 siblings, 1 reply; 21+ messages in thread
From: Alexandru Elisei @ 2022-10-04 16:58 UTC (permalink / raw)
  To: Eric Auger
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi Eric,

On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
> Hi Ricardo, Marc,
> 
> On 8/5/22 02:41, Ricardo Koller wrote:
> > There are some tests that fail when running on bare metal (including a
> > passthrough prototype).  There are three issues with the tests.  The
> > first one is that there are some missing isb()'s between enabling event
> > counting and the actual counting. This wasn't an issue on KVM as
> > trapping on registers served as context synchronization events. The
> > second issue is that some tests assume that registers reset to 0.  And
> > finally, the third issue is that overflowing the low counter of a
> > chained event sets the overflow flag in PMVOS and some tests fail by
> > checking for it not being set.
> >
> > Addressed all comments from the previous version:
> > https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> > - adding missing isb() and fixed the commit message (Alexandru).
> > - fixed wording of a report() check (Andrew).
> >
> > Thanks!
> > Ricardo
> >
> > Ricardo Koller (3):
> >   arm: pmu: Add missing isb()'s after sys register writing
> >   arm: pmu: Reset the pmu registers before starting some tests
> >   arm: pmu: Check for overflow in the low counter in chained counters
> >     tests
> >
> >  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 39 insertions(+), 17 deletions(-)
> >
> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
> test failures on some machines. This does not seem related to those
> series though since I was able to get them without. The failures happen
> on Amberwing machine for instance with the pmu-chain-promotion.
> 
> While further investigating I noticed there is a lot of variability on
> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
> on the first iteration and 0x96 on the subsequent ones for instance.
> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
> expecting the counter to be close to 20. It is on some HW.
> 
> [..]
> 
> So I come to the actual question. Can we do any assumption on the
> (virtual) PMU quality/precision? If not, the tests I originally wrote
> are damned to fail on some HW (on some other they always pass) and I
> need to make a decision wrt re-writing part of them, expecially those
> which expect overflow after a given amount of ops. Otherwise, there is
> either something wrong in the test (asm?) or in KVM PMU emulation.
> 
> I tried to bisect because I did observe the same behavior on some older
> kernels but the bisect was not successful as the issue does not happen
> always.
> 
> Thoughts?

Looking at mem_access_loop(), the first thing that jumps out is the fact
that is missing a DSB barrier. ISB affects only instructions, not memory
accesses and without a DSB, the PE can reorder memory accesses however it
sees fit.

I also believe precise_instrs_loop() to be in the same situation, as the
architecture doesn't guarantee that the cycle counter increments after
every CPU cycle (ARM DDI 0487I.a, page D11-5246):

"Although the architecture requires that direct reads of PMCCNTR_EL0 or
PMCCNTR occur in program order, there is no requirement that the count
increments between two such reads. Even when the counter is incrementing on
every clock cycle, software might need check that the difference between
two reads of the counter is nonzero."

There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
"A reasonable degree of inaccuracy" (page D11-5248). I'll post some
snippets that I found interesting, but there are more examples and
explanations to be found in that chapter.

"In exceptional circumstances, such as a change in Security state or other
boundary condition, it is acceptable for the count to be inaccurate."

PMCR writes are trapped by KVM. Is a change in exception level an
"exception circumstance"? Could be, but couldn't find anything definitive.
For example, the architecture allows an implementation to drop an event in
the case of an interrupt:

"However, dropping a single branch count as the result of a rare
interaction with an interrupt is acceptable."

So events could definitely be dropped because of an interrupt for the host.

And there's also this:

"The imprecision means that the counter might have counted an event around
the time the counter was disabled, but does not allow the event to be
observed as counted after the counter was disabled."

If you want my opinion, if it is necessary to count the number of events
for a test instead, I would define a margin of error on the number of
events counted. Or the test could be changed to check that at least one
such event was observed.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-04 16:58   ` Alexandru Elisei
@ 2022-10-04 17:31     ` Eric Auger
  2022-10-05  9:21       ` Alexandru Elisei
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Auger @ 2022-10-04 17:31 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi Alexandru,

On 10/4/22 18:58, Alexandru Elisei wrote:
> Hi Eric,
>
> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
>> Hi Ricardo, Marc,
>>
>> On 8/5/22 02:41, Ricardo Koller wrote:
>>> There are some tests that fail when running on bare metal (including a
>>> passthrough prototype).  There are three issues with the tests.  The
>>> first one is that there are some missing isb()'s between enabling event
>>> counting and the actual counting. This wasn't an issue on KVM as
>>> trapping on registers served as context synchronization events. The
>>> second issue is that some tests assume that registers reset to 0.  And
>>> finally, the third issue is that overflowing the low counter of a
>>> chained event sets the overflow flag in PMVOS and some tests fail by
>>> checking for it not being set.
>>>
>>> Addressed all comments from the previous version:
>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
>>> - adding missing isb() and fixed the commit message (Alexandru).
>>> - fixed wording of a report() check (Andrew).
>>>
>>> Thanks!
>>> Ricardo
>>>
>>> Ricardo Koller (3):
>>>   arm: pmu: Add missing isb()'s after sys register writing
>>>   arm: pmu: Reset the pmu registers before starting some tests
>>>   arm: pmu: Check for overflow in the low counter in chained counters
>>>     tests
>>>
>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
>>>  1 file changed, 39 insertions(+), 17 deletions(-)
>>>
>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
>> test failures on some machines. This does not seem related to those
>> series though since I was able to get them without. The failures happen
>> on Amberwing machine for instance with the pmu-chain-promotion.
>>
>> While further investigating I noticed there is a lot of variability on
>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
>> on the first iteration and 0x96 on the subsequent ones for instance.
>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
>> expecting the counter to be close to 20. It is on some HW.
>>
>> [..]
>>
>> So I come to the actual question. Can we do any assumption on the
>> (virtual) PMU quality/precision? If not, the tests I originally wrote
>> are damned to fail on some HW (on some other they always pass) and I
>> need to make a decision wrt re-writing part of them, expecially those
>> which expect overflow after a given amount of ops. Otherwise, there is
>> either something wrong in the test (asm?) or in KVM PMU emulation.
>>
>> I tried to bisect because I did observe the same behavior on some older
>> kernels but the bisect was not successful as the issue does not happen
>> always.
>>
>> Thoughts?
> Looking at mem_access_loop(), the first thing that jumps out is the fact
> that is missing a DSB barrier. ISB affects only instructions, not memory
> accesses and without a DSB, the PE can reorder memory accesses however it
> sees fit.
Following your suggestion I added a dsh ish at the end of loop and
before disabling pmcr_el0 (I hope this is the place you were thinking
of) but unfortunately it does not seem to fix my issue.
>
> I also believe precise_instrs_loop() to be in the same situation, as the
> architecture doesn't guarantee that the cycle counter increments after
> every CPU cycle (ARM DDI 0487I.a, page D11-5246):
>
> "Although the architecture requires that direct reads of PMCCNTR_EL0 or
> PMCCNTR occur in program order, there is no requirement that the count
> increments between two such reads. Even when the counter is incrementing on
> every clock cycle, software might need check that the difference between
> two reads of the counter is nonzero."
OK
>
> There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
> "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
> snippets that I found interesting, but there are more examples and
> explanations to be found in that chapter.

yeah I saw that, hence my question about the reasonable disparity we can
expect from the HW/SW stack.
>
> "In exceptional circumstances, such as a change in Security state or other
> boundary condition, it is acceptable for the count to be inaccurate."
>
> PMCR writes are trapped by KVM. Is a change in exception level an
> "exception circumstance"? Could be, but couldn't find anything definitive.
> For example, the architecture allows an implementation to drop an event in
> the case of an interrupt:
>
> "However, dropping a single branch count as the result of a rare
> interaction with an interrupt is acceptable."
>
> So events could definitely be dropped because of an interrupt for the host.
>
> And there's also this:
>
> "The imprecision means that the counter might have counted an event around
> the time the counter was disabled, but does not allow the event to be
> observed as counted after the counter was disabled."
In our case there seems to be a huge discrepancy.
>
> If you want my opinion, if it is necessary to count the number of events
> for a test instead, I would define a margin of error on the number of
> events counted. Or the test could be changed to check that at least one
> such event was observed.
I agree with you on the fact a reasonable margin must be observed and
the tests may need to be rewritten to account for the observed disparity
if considered "normal". Another way to proceed is to compute the
disparity before launching the main tests and if too big, skip the main
tests. Again on some HW, the counts are really 'as expected' and constant.

Thanks!

Eric
>
> Thanks,
> Alex
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-04 17:31     ` Eric Auger
@ 2022-10-05  9:21       ` Alexandru Elisei
  2022-10-05  9:41         ` Alexandru Elisei
  2022-10-05  9:50         ` Eric Auger
  0 siblings, 2 replies; 21+ messages in thread
From: Alexandru Elisei @ 2022-10-05  9:21 UTC (permalink / raw)
  To: Eric Auger
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi Eric,

On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
> Hi Alexandru,
> 
> On 10/4/22 18:58, Alexandru Elisei wrote:
> > Hi Eric,
> >
> > On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
> >> Hi Ricardo, Marc,
> >>
> >> On 8/5/22 02:41, Ricardo Koller wrote:
> >>> There are some tests that fail when running on bare metal (including a
> >>> passthrough prototype).  There are three issues with the tests.  The
> >>> first one is that there are some missing isb()'s between enabling event
> >>> counting and the actual counting. This wasn't an issue on KVM as
> >>> trapping on registers served as context synchronization events. The
> >>> second issue is that some tests assume that registers reset to 0.  And
> >>> finally, the third issue is that overflowing the low counter of a
> >>> chained event sets the overflow flag in PMVOS and some tests fail by
> >>> checking for it not being set.
> >>>
> >>> Addressed all comments from the previous version:
> >>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> >>> - adding missing isb() and fixed the commit message (Alexandru).
> >>> - fixed wording of a report() check (Andrew).
> >>>
> >>> Thanks!
> >>> Ricardo
> >>>
> >>> Ricardo Koller (3):
> >>>   arm: pmu: Add missing isb()'s after sys register writing
> >>>   arm: pmu: Reset the pmu registers before starting some tests
> >>>   arm: pmu: Check for overflow in the low counter in chained counters
> >>>     tests
> >>>
> >>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
> >>>  1 file changed, 39 insertions(+), 17 deletions(-)
> >>>
> >> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
> >> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
> >> test failures on some machines. This does not seem related to those
> >> series though since I was able to get them without. The failures happen
> >> on Amberwing machine for instance with the pmu-chain-promotion.
> >>
> >> While further investigating I noticed there is a lot of variability on
> >> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
> >> on the first iteration and 0x96 on the subsequent ones for instance.
> >> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
> >> expecting the counter to be close to 20. It is on some HW.
> >>
> >> [..]
> >>
> >> So I come to the actual question. Can we do any assumption on the
> >> (virtual) PMU quality/precision? If not, the tests I originally wrote
> >> are damned to fail on some HW (on some other they always pass) and I
> >> need to make a decision wrt re-writing part of them, expecially those
> >> which expect overflow after a given amount of ops. Otherwise, there is
> >> either something wrong in the test (asm?) or in KVM PMU emulation.
> >>
> >> I tried to bisect because I did observe the same behavior on some older
> >> kernels but the bisect was not successful as the issue does not happen
> >> always.
> >>
> >> Thoughts?
> > Looking at mem_access_loop(), the first thing that jumps out is the fact
> > that is missing a DSB barrier. ISB affects only instructions, not memory
> > accesses and without a DSB, the PE can reorder memory accesses however it
> > sees fit.
> Following your suggestion I added a dsh ish at the end of loop and
> before disabling pmcr_el0 (I hope this is the place you were thinking
> of) but unfortunately it does not seem to fix my issue.

Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
disables the PMU.

I think you also need a DSB ISH before the write to PMCR_EL0 that enables
the PMU in the first instruction of the asm block. In your example, the
MEM_ACCESS event count is higher than expected, and one explanation for the
large disparity that I can think of is that previous memory accesses are
reordered past the instruction that enables the PMU, which makes the PMU
add these events to the total event count.

> >
> > I also believe precise_instrs_loop() to be in the same situation, as the
> > architecture doesn't guarantee that the cycle counter increments after
> > every CPU cycle (ARM DDI 0487I.a, page D11-5246):
> >
> > "Although the architecture requires that direct reads of PMCCNTR_EL0 or
> > PMCCNTR occur in program order, there is no requirement that the count
> > increments between two such reads. Even when the counter is incrementing on
> > every clock cycle, software might need check that the difference between
> > two reads of the counter is nonzero."
> OK
> >
> > There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
> > "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
> > snippets that I found interesting, but there are more examples and
> > explanations to be found in that chapter.
> 
> yeah I saw that, hence my question about the reasonable disparity we can
> expect from the HW/SW stack.
> >
> > "In exceptional circumstances, such as a change in Security state or other
> > boundary condition, it is acceptable for the count to be inaccurate."
> >
> > PMCR writes are trapped by KVM. Is a change in exception level an
> > "exception circumstance"? Could be, but couldn't find anything definitive.
> > For example, the architecture allows an implementation to drop an event in
> > the case of an interrupt:
> >
> > "However, dropping a single branch count as the result of a rare
> > interaction with an interrupt is acceptable."
> >
> > So events could definitely be dropped because of an interrupt for the host.
> >
> > And there's also this:
> >
> > "The imprecision means that the counter might have counted an event around
> > the time the counter was disabled, but does not allow the event to be
> > observed as counted after the counter was disabled."
> In our case there seems to be a huge discrepancy.

I agree. There is this about the MEM_ACCESS event in the Arm ARM:

"The counter counts each Memory-read operation or Memory-write operation
that the PE makes."

As for what a Memory-read operation is (emphasis added by me):

"A memory-read operation might be due to:
The result of an architecturally executed memory-reading instructions.
The result of a Speculatively executed memory-reading instructions <- this
is why the DSB ISH is needed before enabling the PMU.
**A translation table walk**."

Those extra memory accesses might be caused by the table walker deciding to
walk the tables, speculatively or not. Software has no control over the
table walker (as long as it is enabled).

Thanks,
Alex

> >
> > If you want my opinion, if it is necessary to count the number of events
> > for a test instead, I would define a margin of error on the number of
> > events counted. Or the test could be changed to check that at least one
> > such event was observed.
> I agree with you on the fact a reasonable margin must be observed and
> the tests may need to be rewritten to account for the observed disparity
> if considered "normal". Another way to proceed is to compute the
> disparity before launching the main tests and if too big, skip the main
> tests. Again on some HW, the counts are really 'as expected' and constant.
> 
> Thanks!
> 
> Eric
> >
> > Thanks,
> > Alex
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-05  9:21       ` Alexandru Elisei
@ 2022-10-05  9:41         ` Alexandru Elisei
  2022-10-05  9:51           ` Eric Auger
  2022-10-05  9:50         ` Eric Auger
  1 sibling, 1 reply; 21+ messages in thread
From: Alexandru Elisei @ 2022-10-05  9:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi,

On Wed, Oct 05, 2022 at 10:21:12AM +0100, Alexandru Elisei wrote:
> Hi Eric,
> 
> On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
> > Hi Alexandru,
> > 
> > On 10/4/22 18:58, Alexandru Elisei wrote:
> > > Hi Eric,
> > >
> > > On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
> > >> Hi Ricardo, Marc,
> > >>
> > >> On 8/5/22 02:41, Ricardo Koller wrote:
> > >>> There are some tests that fail when running on bare metal (including a
> > >>> passthrough prototype).  There are three issues with the tests.  The
> > >>> first one is that there are some missing isb()'s between enabling event
> > >>> counting and the actual counting. This wasn't an issue on KVM as
> > >>> trapping on registers served as context synchronization events. The
> > >>> second issue is that some tests assume that registers reset to 0.  And
> > >>> finally, the third issue is that overflowing the low counter of a
> > >>> chained event sets the overflow flag in PMVOS and some tests fail by
> > >>> checking for it not being set.
> > >>>
> > >>> Addressed all comments from the previous version:
> > >>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> > >>> - adding missing isb() and fixed the commit message (Alexandru).
> > >>> - fixed wording of a report() check (Andrew).
> > >>>
> > >>> Thanks!
> > >>> Ricardo
> > >>>
> > >>> Ricardo Koller (3):
> > >>>   arm: pmu: Add missing isb()'s after sys register writing
> > >>>   arm: pmu: Reset the pmu registers before starting some tests
> > >>>   arm: pmu: Check for overflow in the low counter in chained counters
> > >>>     tests
> > >>>
> > >>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
> > >>>  1 file changed, 39 insertions(+), 17 deletions(-)
> > >>>
> > >> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
> > >> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
> > >> test failures on some machines. This does not seem related to those
> > >> series though since I was able to get them without. The failures happen
> > >> on Amberwing machine for instance with the pmu-chain-promotion.
> > >>
> > >> While further investigating I noticed there is a lot of variability on
> > >> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
> > >> on the first iteration and 0x96 on the subsequent ones for instance.
> > >> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
> > >> expecting the counter to be close to 20. It is on some HW.
> > >>
> > >> [..]
> > >>
> > >> So I come to the actual question. Can we do any assumption on the
> > >> (virtual) PMU quality/precision? If not, the tests I originally wrote
> > >> are damned to fail on some HW (on some other they always pass) and I
> > >> need to make a decision wrt re-writing part of them, expecially those
> > >> which expect overflow after a given amount of ops. Otherwise, there is
> > >> either something wrong in the test (asm?) or in KVM PMU emulation.
> > >>
> > >> I tried to bisect because I did observe the same behavior on some older
> > >> kernels but the bisect was not successful as the issue does not happen
> > >> always.
> > >>
> > >> Thoughts?
> > > Looking at mem_access_loop(), the first thing that jumps out is the fact
> > > that is missing a DSB barrier. ISB affects only instructions, not memory
> > > accesses and without a DSB, the PE can reorder memory accesses however it
> > > sees fit.
> > Following your suggestion I added a dsh ish at the end of loop and
> > before disabling pmcr_el0 (I hope this is the place you were thinking
> > of) but unfortunately it does not seem to fix my issue.
> 
> Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
> disables the PMU.
> 
> I think you also need a DSB ISH before the write to PMCR_EL0 that enables
> the PMU in the first instruction of the asm block. In your example, the
> MEM_ACCESS event count is higher than expected, and one explanation for the
> large disparity that I can think of is that previous memory accesses are
> reordered past the instruction that enables the PMU, which makes the PMU
> add these events to the total event count.
> 
> > >
> > > I also believe precise_instrs_loop() to be in the same situation, as the
> > > architecture doesn't guarantee that the cycle counter increments after
> > > every CPU cycle (ARM DDI 0487I.a, page D11-5246):
> > >
> > > "Although the architecture requires that direct reads of PMCCNTR_EL0 or
> > > PMCCNTR occur in program order, there is no requirement that the count
> > > increments between two such reads. Even when the counter is incrementing on
> > > every clock cycle, software might need check that the difference between
> > > two reads of the counter is nonzero."
> > OK
> > >
> > > There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
> > > "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
> > > snippets that I found interesting, but there are more examples and
> > > explanations to be found in that chapter.
> > 
> > yeah I saw that, hence my question about the reasonable disparity we can
> > expect from the HW/SW stack.
> > >
> > > "In exceptional circumstances, such as a change in Security state or other
> > > boundary condition, it is acceptable for the count to be inaccurate."
> > >
> > > PMCR writes are trapped by KVM. Is a change in exception level an
> > > "exception circumstance"? Could be, but couldn't find anything definitive.
> > > For example, the architecture allows an implementation to drop an event in
> > > the case of an interrupt:
> > >
> > > "However, dropping a single branch count as the result of a rare
> > > interaction with an interrupt is acceptable."
> > >
> > > So events could definitely be dropped because of an interrupt for the host.
> > >
> > > And there's also this:
> > >
> > > "The imprecision means that the counter might have counted an event around
> > > the time the counter was disabled, but does not allow the event to be
> > > observed as counted after the counter was disabled."
> > In our case there seems to be a huge discrepancy.
> 
> I agree. There is this about the MEM_ACCESS event in the Arm ARM:
> 
> "The counter counts each Memory-read operation or Memory-write operation
> that the PE makes."
> 
> As for what a Memory-read operation is (emphasis added by me):
> 
> "A memory-read operation might be due to:
> The result of an architecturally executed memory-reading instructions.
> The result of a Speculatively executed memory-reading instructions <- this
> is why the DSB ISH is needed before enabling the PMU.
> **A translation table walk**."
> 
> Those extra memory accesses might be caused by the table walker deciding to
> walk the tables, speculatively or not. Software has no control over the
> table walker (as long as it is enabled).

Please ignore this part, just noticed in the MEM_ACCESS event definition
that translation table walks are not counted.

Thanks,
Alex

> 
> Thanks,
> Alex
> 
> > >
> > > If you want my opinion, if it is necessary to count the number of events
> > > for a test instead, I would define a margin of error on the number of
> > > events counted. Or the test could be changed to check that at least one
> > > such event was observed.
> > I agree with you on the fact a reasonable margin must be observed and
> > the tests may need to be rewritten to account for the observed disparity
> > if considered "normal". Another way to proceed is to compute the
> > disparity before launching the main tests and if too big, skip the main
> > tests. Again on some HW, the counts are really 'as expected' and constant.
> > 
> > Thanks!
> > 
> > Eric
> > >
> > > Thanks,
> > > Alex
> > >
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-05  9:21       ` Alexandru Elisei
  2022-10-05  9:41         ` Alexandru Elisei
@ 2022-10-05  9:50         ` Eric Auger
  2022-10-06  9:25           ` Alexandru Elisei
  2022-10-11  3:50           ` Ricardo Koller
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Auger @ 2022-10-05  9:50 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi Alexandru,

On 10/5/22 11:21, Alexandru Elisei wrote:
> Hi Eric,
>
> On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
>> Hi Alexandru,
>>
>> On 10/4/22 18:58, Alexandru Elisei wrote:
>>> Hi Eric,
>>>
>>> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
>>>> Hi Ricardo, Marc,
>>>>
>>>> On 8/5/22 02:41, Ricardo Koller wrote:
>>>>> There are some tests that fail when running on bare metal (including a
>>>>> passthrough prototype).  There are three issues with the tests.  The
>>>>> first one is that there are some missing isb()'s between enabling event
>>>>> counting and the actual counting. This wasn't an issue on KVM as
>>>>> trapping on registers served as context synchronization events. The
>>>>> second issue is that some tests assume that registers reset to 0.  And
>>>>> finally, the third issue is that overflowing the low counter of a
>>>>> chained event sets the overflow flag in PMVOS and some tests fail by
>>>>> checking for it not being set.
>>>>>
>>>>> Addressed all comments from the previous version:
>>>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
>>>>> - adding missing isb() and fixed the commit message (Alexandru).
>>>>> - fixed wording of a report() check (Andrew).
>>>>>
>>>>> Thanks!
>>>>> Ricardo
>>>>>
>>>>> Ricardo Koller (3):
>>>>>   arm: pmu: Add missing isb()'s after sys register writing
>>>>>   arm: pmu: Reset the pmu registers before starting some tests
>>>>>   arm: pmu: Check for overflow in the low counter in chained counters
>>>>>     tests
>>>>>
>>>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
>>>>>  1 file changed, 39 insertions(+), 17 deletions(-)
>>>>>
>>>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
>>>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
>>>> test failures on some machines. This does not seem related to those
>>>> series though since I was able to get them without. The failures happen
>>>> on Amberwing machine for instance with the pmu-chain-promotion.
>>>>
>>>> While further investigating I noticed there is a lot of variability on
>>>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
>>>> on the first iteration and 0x96 on the subsequent ones for instance.
>>>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
>>>> expecting the counter to be close to 20. It is on some HW.
>>>>
>>>> [..]
>>>>
>>>> So I come to the actual question. Can we do any assumption on the
>>>> (virtual) PMU quality/precision? If not, the tests I originally wrote
>>>> are damned to fail on some HW (on some other they always pass) and I
>>>> need to make a decision wrt re-writing part of them, expecially those
>>>> which expect overflow after a given amount of ops. Otherwise, there is
>>>> either something wrong in the test (asm?) or in KVM PMU emulation.
>>>>
>>>> I tried to bisect because I did observe the same behavior on some older
>>>> kernels but the bisect was not successful as the issue does not happen
>>>> always.
>>>>
>>>> Thoughts?
>>> Looking at mem_access_loop(), the first thing that jumps out is the fact
>>> that is missing a DSB barrier. ISB affects only instructions, not memory
>>> accesses and without a DSB, the PE can reorder memory accesses however it
>>> sees fit.
>> Following your suggestion I added a dsh ish at the end of loop and
>> before disabling pmcr_el0 (I hope this is the place you were thinking
>> of) but unfortunately it does not seem to fix my issue.
> Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
> disables the PMU.
>
> I think you also need a DSB ISH before the write to PMCR_EL0 that enables
> the PMU in the first instruction of the asm block. In your example, the
> MEM_ACCESS event count is higher than expected, and one explanation for the
> large disparity that I can think of is that previous memory accesses are
> reordered past the instruction that enables the PMU, which makes the PMU
> add these events to the total event count.

Makes sense. I added those at the 2 locations but unfortunately it does
not change the result for me.
>
>>> I also believe precise_instrs_loop() to be in the same situation, as the
>>> architecture doesn't guarantee that the cycle counter increments after
>>> every CPU cycle (ARM DDI 0487I.a, page D11-5246):
>>>
>>> "Although the architecture requires that direct reads of PMCCNTR_EL0 or
>>> PMCCNTR occur in program order, there is no requirement that the count
>>> increments between two such reads. Even when the counter is incrementing on
>>> every clock cycle, software might need check that the difference between
>>> two reads of the counter is nonzero."
>> OK
>>> There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
>>> "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
>>> snippets that I found interesting, but there are more examples and
>>> explanations to be found in that chapter.
>> yeah I saw that, hence my question about the reasonable disparity we can
>> expect from the HW/SW stack.
>>> "In exceptional circumstances, such as a change in Security state or other
>>> boundary condition, it is acceptable for the count to be inaccurate."
>>>
>>> PMCR writes are trapped by KVM. Is a change in exception level an
>>> "exception circumstance"? Could be, but couldn't find anything definitive.
>>> For example, the architecture allows an implementation to drop an event in
>>> the case of an interrupt:
>>>
>>> "However, dropping a single branch count as the result of a rare
>>> interaction with an interrupt is acceptable."
>>>
>>> So events could definitely be dropped because of an interrupt for the host.
>>>
>>> And there's also this:
>>>
>>> "The imprecision means that the counter might have counted an event around
>>> the time the counter was disabled, but does not allow the event to be
>>> observed as counted after the counter was disabled."
>> In our case there seems to be a huge discrepancy.
> I agree. There is this about the MEM_ACCESS event in the Arm ARM:
>
> "The counter counts each Memory-read operation or Memory-write operation
> that the PE makes."
>
> As for what a Memory-read operation is (emphasis added by me):
>
> "A memory-read operation might be due to:
> The result of an architecturally executed memory-reading instructions.
> The result of a Speculatively executed memory-reading instructions <- this
> is why the DSB ISH is needed before enabling the PMU.
> **A translation table walk**."
>
> Those extra memory accesses might be caused by the table walker deciding to
> walk the tables, speculatively or not. Software has no control over the
> table walker (as long as it is enabled).
That's indeed an interesting track. But can it be possible that for 20
expected load instructions we end up with ~150 actual memory accesses.
I can't help thinking this is a quite surprising amount.  Also the
pattern is surprising: the first iteration gives low counter count (~30)
while subsequent ones bring higher and constant ones (~150). I would
have expected the opposite, no? I will try to run the same experience on
various HW I have access to.

Anyway there is a problem while interpreting the result of the tests.
Either it can happen on some HW (it is a valid behavior according to the
ARM spec) and the test is simply not runnable or it is a bug somewhere
in the SW stack. 

It would be interesting to run the same tests at baremetal level on
Amberwing and see what are the results. Ricardo/Drew, could you give
some links about the setup?

Thanks

Eric
>
> Thanks,
> Alex
>
>>> If you want my opinion, if it is necessary to count the number of events
>>> for a test instead, I would define a margin of error on the number of
>>> events counted. Or the test could be changed to check that at least one
>>> such event was observed.
>> I agree with you on the fact a reasonable margin must be observed and
>> the tests may need to be rewritten to account for the observed disparity
>> if considered "normal". Another way to proceed is to compute the
>> disparity before launching the main tests and if too big, skip the main
>> tests. Again on some HW, the counts are really 'as expected' and constant.
>>
>> Thanks!
>>
>> Eric
>>> Thanks,
>>> Alex
>>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-05  9:41         ` Alexandru Elisei
@ 2022-10-05  9:51           ` Eric Auger
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Auger @ 2022-10-05  9:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw



On 10/5/22 11:41, Alexandru Elisei wrote:
> Hi,
>
> On Wed, Oct 05, 2022 at 10:21:12AM +0100, Alexandru Elisei wrote:
>> Hi Eric,
>>
>> On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
>>> Hi Alexandru,
>>>
>>> On 10/4/22 18:58, Alexandru Elisei wrote:
>>>> Hi Eric,
>>>>
>>>> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
>>>>> Hi Ricardo, Marc,
>>>>>
>>>>> On 8/5/22 02:41, Ricardo Koller wrote:
>>>>>> There are some tests that fail when running on bare metal (including a
>>>>>> passthrough prototype).  There are three issues with the tests.  The
>>>>>> first one is that there are some missing isb()'s between enabling event
>>>>>> counting and the actual counting. This wasn't an issue on KVM as
>>>>>> trapping on registers served as context synchronization events. The
>>>>>> second issue is that some tests assume that registers reset to 0.  And
>>>>>> finally, the third issue is that overflowing the low counter of a
>>>>>> chained event sets the overflow flag in PMVOS and some tests fail by
>>>>>> checking for it not being set.
>>>>>>
>>>>>> Addressed all comments from the previous version:
>>>>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
>>>>>> - adding missing isb() and fixed the commit message (Alexandru).
>>>>>> - fixed wording of a report() check (Andrew).
>>>>>>
>>>>>> Thanks!
>>>>>> Ricardo
>>>>>>
>>>>>> Ricardo Koller (3):
>>>>>>   arm: pmu: Add missing isb()'s after sys register writing
>>>>>>   arm: pmu: Reset the pmu registers before starting some tests
>>>>>>   arm: pmu: Check for overflow in the low counter in chained counters
>>>>>>     tests
>>>>>>
>>>>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
>>>>>>  1 file changed, 39 insertions(+), 17 deletions(-)
>>>>>>
>>>>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
>>>>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
>>>>> test failures on some machines. This does not seem related to those
>>>>> series though since I was able to get them without. The failures happen
>>>>> on Amberwing machine for instance with the pmu-chain-promotion.
>>>>>
>>>>> While further investigating I noticed there is a lot of variability on
>>>>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
>>>>> on the first iteration and 0x96 on the subsequent ones for instance.
>>>>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
>>>>> expecting the counter to be close to 20. It is on some HW.
>>>>>
>>>>> [..]
>>>>>
>>>>> So I come to the actual question. Can we do any assumption on the
>>>>> (virtual) PMU quality/precision? If not, the tests I originally wrote
>>>>> are damned to fail on some HW (on some other they always pass) and I
>>>>> need to make a decision wrt re-writing part of them, expecially those
>>>>> which expect overflow after a given amount of ops. Otherwise, there is
>>>>> either something wrong in the test (asm?) or in KVM PMU emulation.
>>>>>
>>>>> I tried to bisect because I did observe the same behavior on some older
>>>>> kernels but the bisect was not successful as the issue does not happen
>>>>> always.
>>>>>
>>>>> Thoughts?
>>>> Looking at mem_access_loop(), the first thing that jumps out is the fact
>>>> that is missing a DSB barrier. ISB affects only instructions, not memory
>>>> accesses and without a DSB, the PE can reorder memory accesses however it
>>>> sees fit.
>>> Following your suggestion I added a dsh ish at the end of loop and
>>> before disabling pmcr_el0 (I hope this is the place you were thinking
>>> of) but unfortunately it does not seem to fix my issue.
>> Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
>> disables the PMU.
>>
>> I think you also need a DSB ISH before the write to PMCR_EL0 that enables
>> the PMU in the first instruction of the asm block. In your example, the
>> MEM_ACCESS event count is higher than expected, and one explanation for the
>> large disparity that I can think of is that previous memory accesses are
>> reordered past the instruction that enables the PMU, which makes the PMU
>> add these events to the total event count.
>>
>>>> I also believe precise_instrs_loop() to be in the same situation, as the
>>>> architecture doesn't guarantee that the cycle counter increments after
>>>> every CPU cycle (ARM DDI 0487I.a, page D11-5246):
>>>>
>>>> "Although the architecture requires that direct reads of PMCCNTR_EL0 or
>>>> PMCCNTR occur in program order, there is no requirement that the count
>>>> increments between two such reads. Even when the counter is incrementing on
>>>> every clock cycle, software might need check that the difference between
>>>> two reads of the counter is nonzero."
>>> OK
>>>> There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
>>>> "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
>>>> snippets that I found interesting, but there are more examples and
>>>> explanations to be found in that chapter.
>>> yeah I saw that, hence my question about the reasonable disparity we can
>>> expect from the HW/SW stack.
>>>> "In exceptional circumstances, such as a change in Security state or other
>>>> boundary condition, it is acceptable for the count to be inaccurate."
>>>>
>>>> PMCR writes are trapped by KVM. Is a change in exception level an
>>>> "exception circumstance"? Could be, but couldn't find anything definitive.
>>>> For example, the architecture allows an implementation to drop an event in
>>>> the case of an interrupt:
>>>>
>>>> "However, dropping a single branch count as the result of a rare
>>>> interaction with an interrupt is acceptable."
>>>>
>>>> So events could definitely be dropped because of an interrupt for the host.
>>>>
>>>> And there's also this:
>>>>
>>>> "The imprecision means that the counter might have counted an event around
>>>> the time the counter was disabled, but does not allow the event to be
>>>> observed as counted after the counter was disabled."
>>> In our case there seems to be a huge discrepancy.
>> I agree. There is this about the MEM_ACCESS event in the Arm ARM:
>>
>> "The counter counts each Memory-read operation or Memory-write operation
>> that the PE makes."
>>
>> As for what a Memory-read operation is (emphasis added by me):
>>
>> "A memory-read operation might be due to:
>> The result of an architecturally executed memory-reading instructions.
>> The result of a Speculatively executed memory-reading instructions <- this
>> is why the DSB ISH is needed before enabling the PMU.
>> **A translation table walk**."
>>
>> Those extra memory accesses might be caused by the table walker deciding to
>> walk the tables, speculatively or not. Software has no control over the
>> table walker (as long as it is enabled).
> Please ignore this part, just noticed in the MEM_ACCESS event definition
> that translation table walks are not counted.

Ah OK. So this removes this hypothesis.

Eric
>
> Thanks,
> Alex
>
>> Thanks,
>> Alex
>>
>>>> If you want my opinion, if it is necessary to count the number of events
>>>> for a test instead, I would define a margin of error on the number of
>>>> events counted. Or the test could be changed to check that at least one
>>>> such event was observed.
>>> I agree with you on the fact a reasonable margin must be observed and
>>> the tests may need to be rewritten to account for the observed disparity
>>> if considered "normal". Another way to proceed is to compute the
>>> disparity before launching the main tests and if too big, skip the main
>>> tests. Again on some HW, the counts are really 'as expected' and constant.
>>>
>>> Thanks!
>>>
>>> Eric
>>>> Thanks,
>>>> Alex
>>>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-05  9:50         ` Eric Auger
@ 2022-10-06  9:25           ` Alexandru Elisei
  2022-10-11  3:50           ` Ricardo Koller
  1 sibling, 0 replies; 21+ messages in thread
From: Alexandru Elisei @ 2022-10-06  9:25 UTC (permalink / raw)
  To: Eric Auger
  Cc: Ricardo Koller, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

Hi Eric,

On Wed, Oct 05, 2022 at 11:50:09AM +0200, Eric Auger wrote:
> Hi Alexandru,
> 
> On 10/5/22 11:21, Alexandru Elisei wrote:
> > Hi Eric,
> >
> > On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
> >> Hi Alexandru,
> >>
> >> On 10/4/22 18:58, Alexandru Elisei wrote:
> >>> Hi Eric,
> >>>
> >>> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
> >>>> Hi Ricardo, Marc,
> >>>>
> >>>> On 8/5/22 02:41, Ricardo Koller wrote:
> >>>>> There are some tests that fail when running on bare metal (including a
> >>>>> passthrough prototype).  There are three issues with the tests.  The
> >>>>> first one is that there are some missing isb()'s between enabling event
> >>>>> counting and the actual counting. This wasn't an issue on KVM as
> >>>>> trapping on registers served as context synchronization events. The
> >>>>> second issue is that some tests assume that registers reset to 0.  And
> >>>>> finally, the third issue is that overflowing the low counter of a
> >>>>> chained event sets the overflow flag in PMVOS and some tests fail by
> >>>>> checking for it not being set.
> >>>>>
> >>>>> Addressed all comments from the previous version:
> >>>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> >>>>> - adding missing isb() and fixed the commit message (Alexandru).
> >>>>> - fixed wording of a report() check (Andrew).
> >>>>>
> >>>>> Thanks!
> >>>>> Ricardo
> >>>>>
> >>>>> Ricardo Koller (3):
> >>>>>   arm: pmu: Add missing isb()'s after sys register writing
> >>>>>   arm: pmu: Reset the pmu registers before starting some tests
> >>>>>   arm: pmu: Check for overflow in the low counter in chained counters
> >>>>>     tests
> >>>>>
> >>>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
> >>>>>  1 file changed, 39 insertions(+), 17 deletions(-)
> >>>>>
> >>>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
> >>>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
> >>>> test failures on some machines. This does not seem related to those
> >>>> series though since I was able to get them without. The failures happen
> >>>> on Amberwing machine for instance with the pmu-chain-promotion.
> >>>>
> >>>> While further investigating I noticed there is a lot of variability on
> >>>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
> >>>> on the first iteration and 0x96 on the subsequent ones for instance.
> >>>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
> >>>> expecting the counter to be close to 20. It is on some HW.
> >>>>
> >>>> [..]
> >>>>
> >>>> So I come to the actual question. Can we do any assumption on the
> >>>> (virtual) PMU quality/precision? If not, the tests I originally wrote
> >>>> are damned to fail on some HW (on some other they always pass) and I
> >>>> need to make a decision wrt re-writing part of them, expecially those
> >>>> which expect overflow after a given amount of ops. Otherwise, there is
> >>>> either something wrong in the test (asm?) or in KVM PMU emulation.
> >>>>
> >>>> I tried to bisect because I did observe the same behavior on some older
> >>>> kernels but the bisect was not successful as the issue does not happen
> >>>> always.
> >>>>
> >>>> Thoughts?
> >>> Looking at mem_access_loop(), the first thing that jumps out is the fact
> >>> that is missing a DSB barrier. ISB affects only instructions, not memory
> >>> accesses and without a DSB, the PE can reorder memory accesses however it
> >>> sees fit.
> >> Following your suggestion I added a dsh ish at the end of loop and
> >> before disabling pmcr_el0 (I hope this is the place you were thinking
> >> of) but unfortunately it does not seem to fix my issue.
> > Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
> > disables the PMU.
> >
> > I think you also need a DSB ISH before the write to PMCR_EL0 that enables
> > the PMU in the first instruction of the asm block. In your example, the
> > MEM_ACCESS event count is higher than expected, and one explanation for the
> > large disparity that I can think of is that previous memory accesses are
> > reordered past the instruction that enables the PMU, which makes the PMU
> > add these events to the total event count.
> 
> Makes sense. I added those at the 2 locations but unfortunately it does
> not change the result for me.

Have you tried increasing the number of iterations for mem_access_loop to
something very large? If the number of unexpected accesses remains about
the same (~80) instead of growing proportionally with the number of
iterations, that might point to some extra accesses that are performed in
the setup phase instead of something fundamently wrong with the test.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
  2022-10-05  9:50         ` Eric Auger
  2022-10-06  9:25           ` Alexandru Elisei
@ 2022-10-11  3:50           ` Ricardo Koller
  1 sibling, 0 replies; 21+ messages in thread
From: Ricardo Koller @ 2022-10-11  3:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: Alexandru Elisei, kvm, kvmarm, andrew.jones, maz, oliver.upton, reijiw

On Wed, Oct 05, 2022 at 11:50:09AM +0200, Eric Auger wrote:
> Hi Alexandru,
> 
> On 10/5/22 11:21, Alexandru Elisei wrote:
> > Hi Eric,
> >
> > On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
> >> Hi Alexandru,
> >>
> >> On 10/4/22 18:58, Alexandru Elisei wrote:
> >>> Hi Eric,
> >>>
> >>> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
> >>>> Hi Ricardo, Marc,
> >>>>
> >>>> On 8/5/22 02:41, Ricardo Koller wrote:
> >>>>> There are some tests that fail when running on bare metal (including a
> >>>>> passthrough prototype).  There are three issues with the tests.  The
> >>>>> first one is that there are some missing isb()'s between enabling event
> >>>>> counting and the actual counting. This wasn't an issue on KVM as
> >>>>> trapping on registers served as context synchronization events. The
> >>>>> second issue is that some tests assume that registers reset to 0.  And
> >>>>> finally, the third issue is that overflowing the low counter of a
> >>>>> chained event sets the overflow flag in PMVOS and some tests fail by
> >>>>> checking for it not being set.
> >>>>>
> >>>>> Addressed all comments from the previous version:
> >>>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
> >>>>> - adding missing isb() and fixed the commit message (Alexandru).
> >>>>> - fixed wording of a report() check (Andrew).
> >>>>>
> >>>>> Thanks!
> >>>>> Ricardo
> >>>>>
> >>>>> Ricardo Koller (3):
> >>>>>   arm: pmu: Add missing isb()'s after sys register writing
> >>>>>   arm: pmu: Reset the pmu registers before starting some tests
> >>>>>   arm: pmu: Check for overflow in the low counter in chained counters
> >>>>>     tests
> >>>>>
> >>>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
> >>>>>  1 file changed, 39 insertions(+), 17 deletions(-)
> >>>>>
> >>>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
> >>>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
> >>>> test failures on some machines. This does not seem related to those
> >>>> series though since I was able to get them without. The failures happen
> >>>> on Amberwing machine for instance with the pmu-chain-promotion.
> >>>>
> >>>> While further investigating I noticed there is a lot of variability on
> >>>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
> >>>> on the first iteration and 0x96 on the subsequent ones for instance.
> >>>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
> >>>> expecting the counter to be close to 20. It is on some HW.
> >>>>
> >>>> [..]
> >>>>
> >>>> So I come to the actual question. Can we do any assumption on the
> >>>> (virtual) PMU quality/precision? If not, the tests I originally wrote
> >>>> are damned to fail on some HW (on some other they always pass) and I
> >>>> need to make a decision wrt re-writing part of them, expecially those
> >>>> which expect overflow after a given amount of ops. Otherwise, there is
> >>>> either something wrong in the test (asm?) or in KVM PMU emulation.

I don't think it's the asm because in that case the counter value should
be the same every time (even if wrong).

> >>>>
> >>>> I tried to bisect because I did observe the same behavior on some older
> >>>> kernels but the bisect was not successful as the issue does not happen
> >>>> always.
> >>>>
> >>>> Thoughts?
> >>> Looking at mem_access_loop(), the first thing that jumps out is the fact
> >>> that is missing a DSB barrier. ISB affects only instructions, not memory
> >>> accesses and without a DSB, the PE can reorder memory accesses however it
> >>> sees fit.
> >> Following your suggestion I added a dsh ish at the end of loop and
> >> before disabling pmcr_el0 (I hope this is the place you were thinking
> >> of) but unfortunately it does not seem to fix my issue.
> > Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
> > disables the PMU.
> >
> > I think you also need a DSB ISH before the write to PMCR_EL0 that enables
> > the PMU in the first instruction of the asm block. In your example, the
> > MEM_ACCESS event count is higher than expected, and one explanation for the
> > large disparity that I can think of is that previous memory accesses are
> > reordered past the instruction that enables the PMU, which makes the PMU
> > add these events to the total event count.
> 
> Makes sense. I added those at the 2 locations but unfortunately it does
> not change the result for me.
> >
> >>> I also believe precise_instrs_loop() to be in the same situation, as the
> >>> architecture doesn't guarantee that the cycle counter increments after
> >>> every CPU cycle (ARM DDI 0487I.a, page D11-5246):
> >>>
> >>> "Although the architecture requires that direct reads of PMCCNTR_EL0 or
> >>> PMCCNTR occur in program order, there is no requirement that the count
> >>> increments between two such reads. Even when the counter is incrementing on
> >>> every clock cycle, software might need check that the difference between
> >>> two reads of the counter is nonzero."
> >> OK
> >>> There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
> >>> "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
> >>> snippets that I found interesting, but there are more examples and
> >>> explanations to be found in that chapter.
> >> yeah I saw that, hence my question about the reasonable disparity we can
> >> expect from the HW/SW stack.
> >>> "In exceptional circumstances, such as a change in Security state or other
> >>> boundary condition, it is acceptable for the count to be inaccurate."
> >>>
> >>> PMCR writes are trapped by KVM. Is a change in exception level an
> >>> "exception circumstance"? Could be, but couldn't find anything definitive.
> >>> For example, the architecture allows an implementation to drop an event in
> >>> the case of an interrupt:
> >>>
> >>> "However, dropping a single branch count as the result of a rare
> >>> interaction with an interrupt is acceptable."
> >>>
> >>> So events could definitely be dropped because of an interrupt for the host.
> >>>
> >>> And there's also this:
> >>>
> >>> "The imprecision means that the counter might have counted an event around
> >>> the time the counter was disabled, but does not allow the event to be
> >>> observed as counted after the counter was disabled."
> >> In our case there seems to be a huge discrepancy.
> > I agree. There is this about the MEM_ACCESS event in the Arm ARM:
> >
> > "The counter counts each Memory-read operation or Memory-write operation
> > that the PE makes."
> >
> > As for what a Memory-read operation is (emphasis added by me):
> >
> > "A memory-read operation might be due to:
> > The result of an architecturally executed memory-reading instructions.
> > The result of a Speculatively executed memory-reading instructions <- this
> > is why the DSB ISH is needed before enabling the PMU.
> > **A translation table walk**."
> >
> > Those extra memory accesses might be caused by the table walker deciding to
> > walk the tables, speculatively or not. Software has no control over the
> > table walker (as long as it is enabled).
> That's indeed an interesting track. But can it be possible that for 20
> expected load instructions we end up with ~150 actual memory accesses.
> I can't help thinking this is a quite surprising amount.  Also the
> pattern is surprising: the first iteration gives low counter count (~30)
> while subsequent ones bring higher and constant ones (~150). I would
> have expected the opposite, no? I will try to run the same experience on
> various HW I have access to.
> 
> Anyway there is a problem while interpreting the result of the tests.
> Either it can happen on some HW (it is a valid behavior according to the
> ARM spec) and the test is simply not runnable or it is a bug somewhere
> in the SW stack. 
> 
> It would be interesting to run the same tests at baremetal level on
> Amberwing and see what are the results. Ricardo/Drew, could you give
> some links about the setup?

Actually, the "bare metal" tests I performed were on a prototype
passthrough implementation:
https://github.com/ricarkol/linux/commit/c2b009e813e18e89d6945915bd3ae5787bbe3164
Let me know how it goes.

Thanks,
Ricardo

> 
> Thanks
> 
> Eric
> >
> > Thanks,
> > Alex
> >
> >>> If you want my opinion, if it is necessary to count the number of events
> >>> for a test instead, I would define a margin of error on the number of
> >>> events counted. Or the test could be changed to check that at least one
> >>> such event was observed.
> >> I agree with you on the fact a reasonable margin must be observed and
> >> the tests may need to be rewritten to account for the observed disparity
> >> if considered "normal". Another way to proceed is to compute the
> >> disparity before launching the main tests and if too big, skip the main
> >> tests. Again on some HW, the counts are really 'as expected' and constant.
> >>
> >> Thanks!
> >>
> >> Eric
> >>> Thanks,
> >>> Alex
> >>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-10-11  3:50 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
2022-08-09 15:21   ` Alexandru Elisei
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests Ricardo Koller
2022-08-10 19:02   ` Andrew Jones
2022-08-10 23:33     ` Ricardo Koller
2022-08-11  7:04       ` Andrew Jones
2022-08-11 18:51         ` Ricardo Koller
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests Ricardo Koller
2022-08-10 17:30   ` Oliver Upton
2022-08-10 18:28     ` Andrew Jones
2022-08-10 17:33 ` [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Oliver Upton
2022-10-04 16:20 ` Eric Auger
2022-10-04 16:58   ` Alexandru Elisei
2022-10-04 17:31     ` Eric Auger
2022-10-05  9:21       ` Alexandru Elisei
2022-10-05  9:41         ` Alexandru Elisei
2022-10-05  9:51           ` Eric Auger
2022-10-05  9:50         ` Eric Auger
2022-10-06  9:25           ` Alexandru Elisei
2022-10-11  3:50           ` Ricardo Koller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).