All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v8 0/3] ARM PMU tests
@ 2016-11-08 18:17 ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

Changes from v7:
* Standardize PMU register accessor names and remove unused ones
* Use bit defines instead of bit fields
* Change the testing configure for pmu.flat
* Commit comments were updated

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.
2) Because the code was changed, Drew's original reviewed-by needs to
be acknowledged by him again.

-Wei

Wei Huang (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c           | 270 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  19 ++++
 3 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v8 0/3] ARM PMU tests
@ 2016-11-08 18:17 ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

Changes from v7:
* Standardize PMU register accessor names and remove unused ones
* Use bit defines instead of bit fields
* Change the testing configure for pmu.flat
* Commit comments were updated

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.
2) Because the code was changed, Drew's original reviewed-by needs to
be acknowledged by him again.

-Wei

Wei Huang (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c           | 270 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  19 ++++
 3 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test
  2016-11-08 18:17 ` [Qemu-devel] " Wei Huang
@ 2016-11-08 18:17   ` Wei Huang
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

From: Christopher Covington <cov@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c           | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  5 ++++
 3 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index ccb554d..f98f422 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
 	$(TEST_DIR)/selftest.flat \
-	$(TEST_DIR)/spinlock-test.flat
+	$(TEST_DIR)/spinlock-test.flat \
+	$(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 0000000..0b29088
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,73 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK    0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+	return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+	return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+	uint32_t pmcr;
+
+	pmcr = pmcr_read();
+
+	printf("PMU implementer:     %c\n",
+	       (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+	printf("Identification code: 0x%x\n",
+	       (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+	printf("Event counters:      %d\n",
+	       (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+	report_prefix_push("pmu");
+
+	report("Control register", check_pmcr());
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 3f6fa45..7645180 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -54,3 +54,8 @@ file = selftest.flat
 smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test
@ 2016-11-08 18:17   ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

From: Christopher Covington <cov@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c           | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  5 ++++
 3 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index ccb554d..f98f422 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,7 +11,8 @@ endif
 
 tests-common = \
 	$(TEST_DIR)/selftest.flat \
-	$(TEST_DIR)/spinlock-test.flat
+	$(TEST_DIR)/spinlock-test.flat \
+	$(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 0000000..0b29088
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,73 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK    0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+	return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+	return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+	uint32_t pmcr;
+
+	pmcr = pmcr_read();
+
+	printf("PMU implementer:     %c\n",
+	       (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+	printf("Identification code: 0x%x\n",
+	       (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+	printf("Event counters:      %d\n",
+	       (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+	report_prefix_push("pmu");
+
+	report("Control register", check_pmcr());
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 3f6fa45..7645180 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -54,3 +54,8 @@ file = selftest.flat
 smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-08 18:17 ` [Qemu-devel] " Wei Huang
@ 2016-11-08 18:17   ` Wei Huang
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

From: Christopher Covington <cov@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 0b29088..d5e3ac3 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,7 @@
  */
 #include "libcflat.h"
 
+#define PMU_PMCR_E         (1 << 0)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -21,6 +22,10 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define PMU_CYCLE_IDX      31
+
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+/*
+ * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
+ * bits doesn't seem worth the trouble when differential usage of the result is
+ * expected (with differences that can easily fit in 32 bits). So just return
+ * the lower 32 bits of the cycle count in AArch32.
+ */
+static inline uint32_t pmccntr_read(void)
+{
+	uint32_t cycles;
+
+	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+	return cycles;
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+	pmselr_write(PMU_CYCLE_IDX);
+	pmxevtyper_write(value);
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("msr pmcr_el0, %0" : : "r" (value));
+}
+
+static inline uint32_t pmccntr_read(void)
+{
+	uint32_t cycles;
+
+	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+	return cycles;
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+}
 #endif
 
 /*
@@ -63,11 +132,40 @@ static bool check_pmcr(void)
 	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+	pmcr_write(pmcr_read() | PMU_PMCR_E);
+
+	for (int i = 0; i < NR_SAMPLES; i++) {
+		unsigned long a, b;
+
+		a = pmccntr_read();
+		b = pmccntr_read();
+
+		if (a >= b) {
+			printf("Read %ld then %ld.\n", a, b);
+			return false;
+		}
+	}
+
+	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
+
+	return true;
+}
+
 int main(void)
 {
 	report_prefix_push("pmu");
 
+	/* init for PMU event access, right now only care about cycle count */
+	pmcntenset_write(1 << PMU_CYCLE_IDX);
+	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+
 	report("Control register", check_pmcr());
+	report("Monotonically increasing cycle count", check_cycles_increase());
 
 	return report_summary();
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
@ 2016-11-08 18:17   ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

From: Christopher Covington <cov@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 0b29088..d5e3ac3 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -14,6 +14,7 @@
  */
 #include "libcflat.h"
 
+#define PMU_PMCR_E         (1 << 0)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -21,6 +22,10 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define PMU_CYCLE_IDX      31
+
+#define NR_SAMPLES 10
+
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+/*
+ * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
+ * bits doesn't seem worth the trouble when differential usage of the result is
+ * expected (with differences that can easily fit in 32 bits). So just return
+ * the lower 32 bits of the cycle count in AArch32.
+ */
+static inline uint32_t pmccntr_read(void)
+{
+	uint32_t cycles;
+
+	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
+	return cycles;
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+	pmselr_write(PMU_CYCLE_IDX);
+	pmxevtyper_write(value);
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("msr pmcr_el0, %0" : : "r" (value));
+}
+
+static inline uint32_t pmccntr_read(void)
+{
+	uint32_t cycles;
+
+	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+	return cycles;
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+}
 #endif
 
 /*
@@ -63,11 +132,40 @@ static bool check_pmcr(void)
 	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+	pmcr_write(pmcr_read() | PMU_PMCR_E);
+
+	for (int i = 0; i < NR_SAMPLES; i++) {
+		unsigned long a, b;
+
+		a = pmccntr_read();
+		b = pmccntr_read();
+
+		if (a >= b) {
+			printf("Read %ld then %ld.\n", a, b);
+			return false;
+		}
+	}
+
+	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
+
+	return true;
+}
+
 int main(void)
 {
 	report_prefix_push("pmu");
 
+	/* init for PMU event access, right now only care about cycle count */
+	pmcntenset_write(1 << PMU_CYCLE_IDX);
+	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+
 	report("Control register", check_pmcr());
+	report("Monotonically increasing cycle count", check_cycles_increase());
 
 	return report_summary();
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
  2016-11-08 18:17 ` [Qemu-devel] " Wei Huang
@ 2016-11-08 18:17   ` Wei Huang
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

From: Christopher Covington <cov@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c         | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 arm/unittests.cfg |  14 ++++++++
 2 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index d5e3ac3..09aff89 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,7 @@
 #include "libcflat.h"
 
 #define PMU_PMCR_E         (1 << 0)
+#define PMU_PMCR_C         (1 << 2)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
 	pmselr_write(PMU_CYCLE_IDX);
 	pmxevtyper_write(value);
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+	asm volatile(
+	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
+	"1:	subs	%[i], %[i], #1\n"
+	"	bgt	1b\n"
+	"	mcr	p15, 0, %[z], c9, c12, 0\n"
+	: [i] "+r" (i)
+	: [pmcr] "r" (pmcr), [z] "r" (0)
+	: "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
 {
 	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+	asm volatile(
+	"	msr	pmcr_el0, %[pmcr]\n"
+	"1:	subs	%[i], %[i], #1\n"
+	"	b.gt	1b\n"
+	"	msr	pmcr_el0, xzr\n"
+	: [i] "+r" (i)
+	: [pmcr] "r" (pmcr)
+	: "cc");
+}
 #endif
 
 /*
@@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
 	return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+	int i = (num - 1) / 2;
+
+	assert(num >= 3 && ((num - 1) % 2 == 0));
+	loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+	uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
+	
+	if (cpi > 0)
+		printf("Checking for CPI=%d.\n", cpi);
+	printf("instrs : cycles0 cycles1 ...\n");
+
+	for (int i = 3; i < 300; i += 32) {
+		int avg, sum = 0;
+
+		printf("%d :", i);
+		for (int j = 0; j < NR_SAMPLES; j++) {
+			int cycles;
+
+			measure_instrs(i, pmcr);
+			cycles =pmccntr_read();
+			printf(" %d", cycles);
+
+			if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+				printf("\n");
+				return false;
+			}
+
+			sum += cycles;
+		}
+		avg = sum / NR_SAMPLES;
+		printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+			sum, avg, i / avg, avg / i);
+	}
+
+	return true;
+}
+
+int main(int argc, char *argv[])
 {
+	int cpi = 0;
+
+	if (argc >= 1)
+		cpi = atol(argv[0]);
+
 	report_prefix_push("pmu");
 
 	/* init for PMU event access, right now only care about cycle count */
@@ -166,6 +264,7 @@ int main(void)
 
 	report("Control register", check_pmcr());
 	report("Monotonically increasing cycle count", check_cycles_increase());
+	report("Cycle/instruction ratio", check_cpi(cpi));
 
 	return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 7645180..2050dc8 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -59,3 +59,17 @@ groups = selftest
 [pmu]
 file = pmu.flat
 groups = pmu
+
+# Test PMU support (TCG) with -icount IPC=1
+[pmu-tcg-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support (TCG) with -icount IPC=256
+[pmu-tcg-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
@ 2016-11-08 18:17   ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-08 18:17 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

From: Christopher Covington <cov@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c         | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 arm/unittests.cfg |  14 ++++++++
 2 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index d5e3ac3..09aff89 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,7 @@
 #include "libcflat.h"
 
 #define PMU_PMCR_E         (1 << 0)
+#define PMU_PMCR_C         (1 << 2)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
 	pmselr_write(PMU_CYCLE_IDX);
 	pmxevtyper_write(value);
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+	asm volatile(
+	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
+	"1:	subs	%[i], %[i], #1\n"
+	"	bgt	1b\n"
+	"	mcr	p15, 0, %[z], c9, c12, 0\n"
+	: [i] "+r" (i)
+	: [pmcr] "r" (pmcr), [z] "r" (0)
+	: "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
 {
 	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting.
+ */
+static inline void loop(int i, uint32_t pmcr)
+{
+	asm volatile(
+	"	msr	pmcr_el0, %[pmcr]\n"
+	"1:	subs	%[i], %[i], #1\n"
+	"	b.gt	1b\n"
+	"	msr	pmcr_el0, xzr\n"
+	: [i] "+r" (i)
+	: [pmcr] "r" (pmcr)
+	: "cc");
+}
 #endif
 
 /*
@@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
 	return true;
 }
 
-int main(void)
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+	int i = (num - 1) / 2;
+
+	assert(num >= 3 && ((num - 1) % 2 == 0));
+	loop(i, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+	uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
+	
+	if (cpi > 0)
+		printf("Checking for CPI=%d.\n", cpi);
+	printf("instrs : cycles0 cycles1 ...\n");
+
+	for (int i = 3; i < 300; i += 32) {
+		int avg, sum = 0;
+
+		printf("%d :", i);
+		for (int j = 0; j < NR_SAMPLES; j++) {
+			int cycles;
+
+			measure_instrs(i, pmcr);
+			cycles =pmccntr_read();
+			printf(" %d", cycles);
+
+			if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+				printf("\n");
+				return false;
+			}
+
+			sum += cycles;
+		}
+		avg = sum / NR_SAMPLES;
+		printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+			sum, avg, i / avg, avg / i);
+	}
+
+	return true;
+}
+
+int main(int argc, char *argv[])
 {
+	int cpi = 0;
+
+	if (argc >= 1)
+		cpi = atol(argv[0]);
+
 	report_prefix_push("pmu");
 
 	/* init for PMU event access, right now only care about cycle count */
@@ -166,6 +264,7 @@ int main(void)
 
 	report("Control register", check_pmcr());
 	report("Monotonically increasing cycle count", check_cycles_increase());
+	report("Cycle/instruction ratio", check_cpi(cpi));
 
 	return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 7645180..2050dc8 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -59,3 +59,17 @@ groups = selftest
 [pmu]
 file = pmu.flat
 groups = pmu
+
+# Test PMU support (TCG) with -icount IPC=1
+[pmu-tcg-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support (TCG) with -icount IPC=256
+[pmu-tcg-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test
  2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
  (?)
@ 2016-11-11  7:37   ` Andrew Jones
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Jones @ 2016-11-11  7:37 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On Tue, Nov 08, 2016 at 12:17:13PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Beginning with a simple sanity check of the control register, add
> a unit test for the ARM Performance Monitors Unit (PMU).
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/Makefile.common |  3 ++-
>  arm/pmu.c           | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  5 ++++
>  3 files changed, 80 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
  (?)
@ 2016-11-11  7:43   ` Andrew Jones
  2016-11-11 19:55     ` Wei Huang
  -1 siblings, 1 reply; 23+ messages in thread
From: Andrew Jones @ 2016-11-11  7:43 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 0b29088..d5e3ac3 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libcflat.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -21,6 +22,10 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define PMU_CYCLE_IDX      31
> +
> +#define NR_SAMPLES 10
> +
>  #if defined(__arm__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> +}
> +
> +static inline void pmselr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> +}
> +
> +static inline void pmxevtyper_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> +}
> +
> +/*
> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
> + * bits doesn't seem worth the trouble when differential usage of the result is
> + * expected (with differences that can easily fit in 32 bits). So just return
> + * the lower 32 bits of the cycle count in AArch32.

Like I said in the last review, I'd rather we not do this. We should
return the full value and then the test case should confirm the upper
32 bits are zero.

> + */
> +static inline uint32_t pmccntr_read(void)
> +{
> +	uint32_t cycles;
> +
> +	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> +	return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	pmselr_write(PMU_CYCLE_IDX);
> +	pmxevtyper_write(value);
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("msr pmcr_el0, %0" : : "r" (value));
> +}
> +
> +static inline uint32_t pmccntr_read(void)
> +{
> +	uint32_t cycles;
> +
> +	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> +	return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
> +}
> +
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
> +}
>  #endif
>  
>  /*
> @@ -63,11 +132,40 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	pmcr_write(pmcr_read() | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		unsigned long a, b;
> +
> +		a = pmccntr_read();
> +		b = pmccntr_read();
> +
> +		if (a >= b) {
> +			printf("Read %ld then %ld.\n", a, b);
> +			return false;
> +		}
> +	}
> +
> +	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
> +
> +	return true;
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	/* init for PMU event access, right now only care about cycle count */
> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
> +
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());
>  
>  	return report_summary();
>  }
> -- 
> 1.8.3.1

Besides needing to use u64's for registers that return u64's, it
looks good to me.

drew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
  2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
  (?)
@ 2016-11-11  8:08   ` Andrew Jones
  2016-11-11 16:10     ` Wei Huang
  -1 siblings, 1 reply; 23+ messages in thread
From: Andrew Jones @ 2016-11-11  8:08 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/pmu.c         | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 ++++++++
>  2 files changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index d5e3ac3..09aff89 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -15,6 +15,7 @@
>  #include "libcflat.h"
>  
>  #define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
>  	pmselr_write(PMU_CYCLE_IDX);
>  	pmxevtyper_write(value);
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)

We should probably pick a more descriptive name for this function, as
we intend to add many more PMU tests to this file. While at it, let's
change 'i' to 'n', as it's the number of times to loop.

> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> +	"1:	subs	%[i], %[i], #1\n"
> +	"	bgt	1b\n"
> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
> +	: [i] "+r" (i)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
>  {
>  	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"
> +	"1:	subs	%[i], %[i], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"
> +	: [i] "+r" (i)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
>  	return true;
>  }
>  
> -int main(void)
> +/*
> + * Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int i = (num - 1) / 2;
> +
> +	assert(num >= 3 && ((num - 1) % 2 == 0));
> +	loop(i, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
> +	
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");
> +
> +	for (int i = 3; i < 300; i += 32) {
> +		int avg, sum = 0;
> +
> +		printf("%d :", i);
> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			int cycles;
> +
> +			measure_instrs(i, pmcr);
> +			cycles =pmccntr_read();
                                ^ missing space

> +			printf(" %d", cycles);
> +
> +			if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> +				printf("\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> +			sum, avg, i / avg, avg / i);
> +	}
> +
> +	return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
> +	int cpi = 0;
> +
> +	if (argc >= 1)
> +		cpi = atol(argv[0]);
                                ^ should be '1', otherwise cpi is
getting set to the string "arm/pmu.flat" pointer. How did this
work when you tested it?

> +
>  	report_prefix_push("pmu");
>  
>  	/* init for PMU event access, right now only care about cycle count */
> @@ -166,6 +264,7 @@ int main(void)
>  
>  	report("Control register", check_pmcr());
>  	report("Monotonically increasing cycle count", check_cycles_increase());
> +	report("Cycle/instruction ratio", check_cpi(cpi));
>  
>  	return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 7645180..2050dc8 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -59,3 +59,17 @@ groups = selftest
>  [pmu]
>  file = pmu.flat
>  groups = pmu
> +
> +# Test PMU support (TCG) with -icount IPC=1
> +[pmu-tcg-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +accel = tcg
> +
> +# Test PMU support (TCG) with -icount IPC=256
> +[pmu-tcg-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
> +groups = pmu
> +accel = tcg

The unittests.cfg looks good.

Thanks,
drew 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
  2016-11-11  8:08   ` Andrew Jones
@ 2016-11-11 16:10     ` Wei Huang
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Huang @ 2016-11-11 16:10 UTC (permalink / raw)
  To: Andrew Jones
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm



On 11/11/2016 02:08 AM, Andrew Jones wrote:
> On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  arm/pmu.c         | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 ++++++++
>>  2 files changed, 114 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index d5e3ac3..09aff89 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -15,6 +15,7 @@
>>  #include "libcflat.h"
>>  
>>  #define PMU_PMCR_E         (1 << 0)
>> +#define PMU_PMCR_C         (1 << 2)
>>  #define PMU_PMCR_N_SHIFT   11
>>  #define PMU_PMCR_N_MASK    0x1f
>>  #define PMU_PMCR_ID_SHIFT  16
>> @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
>>  	pmselr_write(PMU_CYCLE_IDX);
>>  	pmxevtyper_write(value);
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting.
>> + */
>> +static inline void loop(int i, uint32_t pmcr)
> 
> We should probably pick a more descriptive name for this function, as
> we intend to add many more PMU tests to this file. While at it, let's
> change 'i' to 'n', as it's the number of times to loop.

I will rename it to fixed_num_loop(). When more tests are added to it,
we can standardize the name, e.g. *_test().

> 
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"1:	subs	%[i], %[i], #1\n"
>> +	"	bgt	1b\n"
>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>> +	: [i] "+r" (i)
>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>> +	: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  static inline uint32_t pmcr_read(void)
>>  {
>> @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
>>  {
>>  	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting.
>> + */
>> +static inline void loop(int i, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	msr	pmcr_el0, %[pmcr]\n"
>> +	"1:	subs	%[i], %[i], #1\n"
>> +	"	b.gt	1b\n"
>> +	"	msr	pmcr_el0, xzr\n"
>> +	: [i] "+r" (i)
>> +	: [pmcr] "r" (pmcr)
>> +	: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
>>  	return true;
>>  }
>>  
>> -int main(void)
>> +/*
>> + * Execute a known number of guest instructions. Only odd instruction counts
>> + * greater than or equal to 3 are supported by the in-line assembly code. The
>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>> + * for example for the cycle counter or event counters to be reset). At the end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +	int i = (num - 1) / 2;
>> +
>> +	assert(num >= 3 && ((num - 1) % 2 == 0));
>> +	loop(i, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that the
>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +	uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
>> +	
>> +	if (cpi > 0)
>> +		printf("Checking for CPI=%d.\n", cpi);
>> +	printf("instrs : cycles0 cycles1 ...\n");
>> +
>> +	for (int i = 3; i < 300; i += 32) {
>> +		int avg, sum = 0;
>> +
>> +		printf("%d :", i);
>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>> +			int cycles;
>> +
>> +			measure_instrs(i, pmcr);
>> +			cycles =pmccntr_read();
>                                 ^ missing space
> 
>> +			printf(" %d", cycles);
>> +
>> +			if (!cycles || (cpi > 0 && cycles != i * cpi)) {
>> +				printf("\n");
>> +				return false;
>> +			}
>> +
>> +			sum += cycles;
>> +		}
>> +		avg = sum / NR_SAMPLES;
>> +		printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
>> +			sum, avg, i / avg, avg / i);
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +int main(int argc, char *argv[])
>>  {
>> +	int cpi = 0;
>> +
>> +	if (argc >= 1)
>> +		cpi = atol(argv[0]);
>                                 ^ should be '1', otherwise cpi is
> getting set to the string "arm/pmu.flat" pointer. How did this
> work when you tested it?

This is a bug and the value is always 0. With that said, it didn't
change the testing result: other than in printf(), cpi is only used in
"(!cycles || (cpi > 0 && cycles != i * cpi))", which is always evaluated
as FALSE. So check_cpi() won't return early incorrectly. I will fix it.

> 
>> +
>>  	report_prefix_push("pmu");
>>  
>>  	/* init for PMU event access, right now only care about cycle count */
>> @@ -166,6 +264,7 @@ int main(void)
>>  
>>  	report("Control register", check_pmcr());
>>  	report("Monotonically increasing cycle count", check_cycles_increase());
>> +	report("Cycle/instruction ratio", check_cpi(cpi));
>>  
>>  	return report_summary();
>>  }
>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>> index 7645180..2050dc8 100644
>> --- a/arm/unittests.cfg
>> +++ b/arm/unittests.cfg
>> @@ -59,3 +59,17 @@ groups = selftest
>>  [pmu]
>>  file = pmu.flat
>>  groups = pmu
>> +
>> +# Test PMU support (TCG) with -icount IPC=1
>> +[pmu-tcg-icount-1]
>> +file = pmu.flat
>> +extra_params = -icount 0 -append '1'
>> +groups = pmu
>> +accel = tcg
>> +
>> +# Test PMU support (TCG) with -icount IPC=256
>> +[pmu-tcg-icount-256]
>> +file = pmu.flat
>> +extra_params = -icount 8 -append '256'
>> +groups = pmu
>> +accel = tcg
> 
> The unittests.cfg looks good.
> 
> Thanks,
> drew 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-11  7:43   ` Andrew Jones
@ 2016-11-11 19:55     ` Wei Huang
  2016-11-14 10:05       ` Andrew Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Huang @ 2016-11-11 19:55 UTC (permalink / raw)
  To: Andrew Jones
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm



On 11/11/2016 01:43 AM, Andrew Jones wrote:
> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 98 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 0b29088..d5e3ac3 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -14,6 +14,7 @@
>>   */
>>  #include "libcflat.h"
>>  
>> +#define PMU_PMCR_E         (1 << 0)
>>  #define PMU_PMCR_N_SHIFT   11
>>  #define PMU_PMCR_N_MASK    0x1f
>>  #define PMU_PMCR_ID_SHIFT  16
>> @@ -21,6 +22,10 @@
>>  #define PMU_PMCR_IMP_SHIFT 24
>>  #define PMU_PMCR_IMP_MASK  0xff
>>  
>> +#define PMU_CYCLE_IDX      31
>> +
>> +#define NR_SAMPLES 10
>> +
>>  #if defined(__arm__)
>>  static inline uint32_t pmcr_read(void)
>>  {
>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>  	return ret;
>>  }
>> +
>> +static inline void pmcr_write(uint32_t value)
>> +{
>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>> +}
>> +
>> +static inline void pmselr_write(uint32_t value)
>> +{
>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>> +}
>> +
>> +static inline void pmxevtyper_write(uint32_t value)
>> +{
>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>> +}
>> +
>> +/*
>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>> + * bits doesn't seem worth the trouble when differential usage of the result is
>> + * expected (with differences that can easily fit in 32 bits). So just return
>> + * the lower 32 bits of the cycle count in AArch32.
> 
> Like I said in the last review, I'd rather we not do this. We should
> return the full value and then the test case should confirm the upper
> 32 bits are zero.
> 

Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
register. We can force it to a more coarse-grained cycle counter with
PMCR.D bit=1 (see below). But it is still not a 64-bit register. ARMv8
PMCCNTR_EL0 is a 64-bit register.

"The PMCR.D bit configures whether PMCCNTR increments once every clock
cycle, or once every 64 clock cycles. "

So I think the comment above in the code is an overstatement, which
should be deleted or moved down to ARMv8 pmccntr_read() below.

>> + */
>> +static inline uint32_t pmccntr_read(void)
>> +{
>> +	uint32_t cycles;
>> +
>> +	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>> +	return cycles;
>> +}
>> +
>> +static inline void pmcntenset_write(uint32_t value)
>> +{
>> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
>> +}
>> +
>> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
>> +static inline void pmccfiltr_write(uint32_t value)
>> +{
>> +	pmselr_write(PMU_CYCLE_IDX);
>> +	pmxevtyper_write(value);
>> +}
>>  #elif defined(__aarch64__)
>>  static inline uint32_t pmcr_read(void)
>>  {
>> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
>>  	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>>  	return ret;
>>  }
>> +
>> +static inline void pmcr_write(uint32_t value)
>> +{
>> +	asm volatile("msr pmcr_el0, %0" : : "r" (value));
>> +}
>> +
>> +static inline uint32_t pmccntr_read(void)
>> +{
>> +	uint32_t cycles;
>> +
>> +	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>> +	return cycles;
>> +}
>> +
>> +static inline void pmcntenset_write(uint32_t value)
>> +{
>> +	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
>> +}
>> +
>> +static inline void pmccfiltr_write(uint32_t value)
>> +{
>> +	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
>> +}
>>  #endif
>>  
>>  /*
>> @@ -63,11 +132,40 @@ static bool check_pmcr(void)
>>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +	pmcr_write(pmcr_read() | PMU_PMCR_E);
>> +
>> +	for (int i = 0; i < NR_SAMPLES; i++) {
>> +		unsigned long a, b;
>> +
>> +		a = pmccntr_read();
>> +		b = pmccntr_read();
>> +
>> +		if (a >= b) {
>> +			printf("Read %ld then %ld.\n", a, b);
>> +			return false;
>> +		}
>> +	}
>> +
>> +	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
>> +
>> +	return true;
>> +}
>> +
>>  int main(void)
>>  {
>>  	report_prefix_push("pmu");
>>  
>> +	/* init for PMU event access, right now only care about cycle count */
>> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
>> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
>> +
>>  	report("Control register", check_pmcr());
>> +	report("Monotonically increasing cycle count", check_cycles_increase());
>>  
>>  	return report_summary();
>>  }
>> -- 
>> 1.8.3.1
> 
> Besides needing to use u64's for registers that return u64's, it
> looks good to me.
> 
> drew
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-11 19:55     ` Wei Huang
@ 2016-11-14 10:05       ` Andrew Jones
  2016-11-14 15:12         ` Christopher Covington
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Jones @ 2016-11-14 10:05 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, cov,
	kvmarm, shannon.zhao

On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
> 
> 
> On 11/11/2016 01:43 AM, Andrew Jones wrote:
> > On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> >> From: Christopher Covington <cov@codeaurora.org>
> >>
> >> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >> even for the smallest delta of two subsequent reads.
> >>
> >> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> ---
> >>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 98 insertions(+)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 0b29088..d5e3ac3 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -14,6 +14,7 @@
> >>   */
> >>  #include "libcflat.h"
> >>  
> >> +#define PMU_PMCR_E         (1 << 0)
> >>  #define PMU_PMCR_N_SHIFT   11
> >>  #define PMU_PMCR_N_MASK    0x1f
> >>  #define PMU_PMCR_ID_SHIFT  16
> >> @@ -21,6 +22,10 @@
> >>  #define PMU_PMCR_IMP_SHIFT 24
> >>  #define PMU_PMCR_IMP_MASK  0xff
> >>  
> >> +#define PMU_CYCLE_IDX      31
> >> +
> >> +#define NR_SAMPLES 10
> >> +
> >>  #if defined(__arm__)
> >>  static inline uint32_t pmcr_read(void)
> >>  {
> >> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
> >>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
> >>  	return ret;
> >>  }
> >> +
> >> +static inline void pmcr_write(uint32_t value)
> >> +{
> >> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> >> +}
> >> +
> >> +static inline void pmselr_write(uint32_t value)
> >> +{
> >> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> >> +}
> >> +
> >> +static inline void pmxevtyper_write(uint32_t value)
> >> +{
> >> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> >> +}
> >> +
> >> +/*
> >> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
> >> + * bits doesn't seem worth the trouble when differential usage of the result is
> >> + * expected (with differences that can easily fit in 32 bits). So just return
> >> + * the lower 32 bits of the cycle count in AArch32.
> > 
> > Like I said in the last review, I'd rather we not do this. We should
> > return the full value and then the test case should confirm the upper
> > 32 bits are zero.
> > 
> 
> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> register. We can force it to a more coarse-grained cycle counter with
> PMCR.D bit=1 (see below). But it is still not a 64-bit register. ARMv8
> PMCCNTR_EL0 is a 64-bit register.
> 
> "The PMCR.D bit configures whether PMCCNTR increments once every clock
> cycle, or once every 64 clock cycles. "
> 
> So I think the comment above in the code is an overstatement, which
> should be deleted or moved down to ARMv8 pmccntr_read() below.

OK, please fix as appropriate, but for the v8 64-bit register, please
don't drop the upper bits until after a unit test has a chance to check
them.

Thanks,
drew

> 
> >> + */
> >> +static inline uint32_t pmccntr_read(void)
> >> +{
> >> +	uint32_t cycles;
> >> +
> >> +	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> >> +	return cycles;
> >> +}
> >> +
> >> +static inline void pmcntenset_write(uint32_t value)
> >> +{
> >> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
> >> +}
> >> +
> >> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> >> +static inline void pmccfiltr_write(uint32_t value)
> >> +{
> >> +	pmselr_write(PMU_CYCLE_IDX);
> >> +	pmxevtyper_write(value);
> >> +}
> >>  #elif defined(__aarch64__)
> >>  static inline uint32_t pmcr_read(void)
> >>  {
> >> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
> >>  	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
> >>  	return ret;
> >>  }
> >> +
> >> +static inline void pmcr_write(uint32_t value)
> >> +{
> >> +	asm volatile("msr pmcr_el0, %0" : : "r" (value));
> >> +}
> >> +
> >> +static inline uint32_t pmccntr_read(void)
> >> +{
> >> +	uint32_t cycles;
> >> +
> >> +	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> >> +	return cycles;
> >> +}
> >> +
> >> +static inline void pmcntenset_write(uint32_t value)
> >> +{
> >> +	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
> >> +}
> >> +
> >> +static inline void pmccfiltr_write(uint32_t value)
> >> +{
> >> +	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
> >> +}
> >>  #endif
> >>  
> >>  /*
> >> @@ -63,11 +132,40 @@ static bool check_pmcr(void)
> >>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
> >>  }
> >>  
> >> +/*
> >> + * Ensure that the cycle counter progresses between back-to-back reads.
> >> + */
> >> +static bool check_cycles_increase(void)
> >> +{
> >> +	pmcr_write(pmcr_read() | PMU_PMCR_E);
> >> +
> >> +	for (int i = 0; i < NR_SAMPLES; i++) {
> >> +		unsigned long a, b;
> >> +
> >> +		a = pmccntr_read();
> >> +		b = pmccntr_read();
> >> +
> >> +		if (a >= b) {
> >> +			printf("Read %ld then %ld.\n", a, b);
> >> +			return false;
> >> +		}
> >> +	}
> >> +
> >> +	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
> >> +
> >> +	return true;
> >> +}
> >> +
> >>  int main(void)
> >>  {
> >>  	report_prefix_push("pmu");
> >>  
> >> +	/* init for PMU event access, right now only care about cycle count */
> >> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
> >> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
> >> +
> >>  	report("Control register", check_pmcr());
> >> +	report("Monotonically increasing cycle count", check_cycles_increase());
> >>  
> >>  	return report_summary();
> >>  }
> >> -- 
> >> 1.8.3.1
> > 
> > Besides needing to use u64's for registers that return u64's, it
> > looks good to me.
> > 
> > drew
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-14 10:05       ` Andrew Jones
@ 2016-11-14 15:12         ` Christopher Covington
  2016-11-15 22:50           ` Wei Huang
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher Covington @ 2016-11-14 15:12 UTC (permalink / raw)
  To: Andrew Jones, Wei Huang
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, kvmarm,
	shannon.zhao

Hi Drew, Wei,

On 11/14/2016 05:05 AM, Andrew Jones wrote:
> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>
>>
>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>> From: Christopher Covington <cov@codeaurora.org>
>>>>
>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>> even for the smallest delta of two subsequent reads.
>>>>
>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>>> ---
>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 98 insertions(+)
>>>>
>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>> index 0b29088..d5e3ac3 100644
>>>> --- a/arm/pmu.c
>>>> +++ b/arm/pmu.c
>>>> @@ -14,6 +14,7 @@
>>>>   */
>>>>  #include "libcflat.h"
>>>>  
>>>> +#define PMU_PMCR_E         (1 << 0)
>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>  #define PMU_PMCR_N_MASK    0x1f
>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>> @@ -21,6 +22,10 @@
>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>  
>>>> +#define PMU_CYCLE_IDX      31
>>>> +
>>>> +#define NR_SAMPLES 10
>>>> +
>>>>  #if defined(__arm__)
>>>>  static inline uint32_t pmcr_read(void)
>>>>  {
>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>  	return ret;
>>>>  }
>>>> +
>>>> +static inline void pmcr_write(uint32_t value)
>>>> +{
>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>> +}
>>>> +
>>>> +static inline void pmselr_write(uint32_t value)
>>>> +{
>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>> +}
>>>> +
>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>> +{
>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>> +}
>>>> +
>>>> +/*
>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
>>>> + * expected (with differences that can easily fit in 32 bits). So just return
>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>
>>> Like I said in the last review, I'd rather we not do this. We should
>>> return the full value and then the test case should confirm the upper
>>> 32 bits are zero.
>>
>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>> register. We can force it to a more coarse-grained cycle counter with
>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.

AArch32 System Register Descriptions
Performance Monitors registers
PMCCNTR, Performance Monitors Cycle Count Register

To access the PMCCNTR when accessing as a 32-bit register:
MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged

To access the PMCCNTR when accessing as a 64-bit register:
MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]

Regards,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-14 15:12         ` Christopher Covington
@ 2016-11-15 22:50           ` Wei Huang
  2016-11-16 13:01             ` Andrew Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Huang @ 2016-11-15 22:50 UTC (permalink / raw)
  To: Christopher Covington, Andrew Jones
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis, kvmarm,
	shannon.zhao



On 11/14/2016 09:12 AM, Christopher Covington wrote:
> Hi Drew, Wei,
> 
> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>
>>>
>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>> From: Christopher Covington <cov@codeaurora.org>
>>>>>
>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>> even for the smallest delta of two subsequent reads.
>>>>>
>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>>>> ---
>>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 98 insertions(+)
>>>>>
>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>> index 0b29088..d5e3ac3 100644
>>>>> --- a/arm/pmu.c
>>>>> +++ b/arm/pmu.c
>>>>> @@ -14,6 +14,7 @@
>>>>>   */
>>>>>  #include "libcflat.h"
>>>>>  
>>>>> +#define PMU_PMCR_E         (1 << 0)
>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>  #define PMU_PMCR_N_MASK    0x1f
>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>> @@ -21,6 +22,10 @@
>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>  
>>>>> +#define PMU_CYCLE_IDX      31
>>>>> +
>>>>> +#define NR_SAMPLES 10
>>>>> +
>>>>>  #if defined(__arm__)
>>>>>  static inline uint32_t pmcr_read(void)
>>>>>  {
>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>  	return ret;
>>>>>  }
>>>>> +
>>>>> +static inline void pmcr_write(uint32_t value)
>>>>> +{
>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>> +}
>>>>> +
>>>>> +static inline void pmselr_write(uint32_t value)
>>>>> +{
>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>> +}
>>>>> +
>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>> +{
>>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
>>>>> + * expected (with differences that can easily fit in 32 bits). So just return
>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>
>>>> Like I said in the last review, I'd rather we not do this. We should
>>>> return the full value and then the test case should confirm the upper
>>>> 32 bits are zero.
>>>
>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>> register. We can force it to a more coarse-grained cycle counter with
>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
> 
> AArch32 System Register Descriptions
> Performance Monitors registers
> PMCCNTR, Performance Monitors Cycle Count Register
> 
> To access the PMCCNTR when accessing as a 32-bit register:
> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
> 
> To access the PMCCNTR when accessing as a 64-bit register:
> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
> 

Thanks. I did some research based on your info and came back with the
following proposals (Cov, correct me if I am wrong):

By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
think this 64-bit cycle register is only available when running under
aarch32 compatibility mode on ARMv8 because it is not specified in A15
TRM. To further verify it, I tested 32-bit pmu code on QEMU with TCG
mode. The result is: accessing 64-bit PMCCNTR using the following
assembly failed on A15:

   volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
or
   volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));

Given this difference, I think there are two solutions for 64-bit
AArch32 pmccntr_read, as requested by Drew:

1) The PMU unit testing code tells if it is running under ARMv7 or under
AArch32-compability mode. When it is running ARMv7, such as A15, let us
use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise
use "MRRC p15,0,<Rt>,<Rt2>,c9".

2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC
p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will
be the same as the original code.

Thoughts?

-Wei

[1] A57 TRM,
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
[2] A15 TRM,
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438c/DDI0438C_cortex_a15_r2p0_trm.pdf

> Regards,
> Cov
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-15 22:50           ` Wei Huang
@ 2016-11-16 13:01             ` Andrew Jones
  2016-11-16 16:08               ` Christopher Covington
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Jones @ 2016-11-16 13:01 UTC (permalink / raw)
  To: Wei Huang
  Cc: Christopher Covington, alindsay, kvm, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
> 
> 
> On 11/14/2016 09:12 AM, Christopher Covington wrote:
> > Hi Drew, Wei,
> > 
> > On 11/14/2016 05:05 AM, Andrew Jones wrote:
> >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
> >>>
> >>>
> >>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
> >>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> >>>>> From: Christopher Covington <cov@codeaurora.org>
> >>>>>
> >>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >>>>> even for the smallest delta of two subsequent reads.
> >>>>>
> >>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >>>>> Signed-off-by: Wei Huang <wei@redhat.com>
> >>>>> ---
> >>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>  1 file changed, 98 insertions(+)
> >>>>>
> >>>>> diff --git a/arm/pmu.c b/arm/pmu.c
> >>>>> index 0b29088..d5e3ac3 100644
> >>>>> --- a/arm/pmu.c
> >>>>> +++ b/arm/pmu.c
> >>>>> @@ -14,6 +14,7 @@
> >>>>>   */
> >>>>>  #include "libcflat.h"
> >>>>>  
> >>>>> +#define PMU_PMCR_E         (1 << 0)
> >>>>>  #define PMU_PMCR_N_SHIFT   11
> >>>>>  #define PMU_PMCR_N_MASK    0x1f
> >>>>>  #define PMU_PMCR_ID_SHIFT  16
> >>>>> @@ -21,6 +22,10 @@
> >>>>>  #define PMU_PMCR_IMP_SHIFT 24
> >>>>>  #define PMU_PMCR_IMP_MASK  0xff
> >>>>>  
> >>>>> +#define PMU_CYCLE_IDX      31
> >>>>> +
> >>>>> +#define NR_SAMPLES 10
> >>>>> +
> >>>>>  #if defined(__arm__)
> >>>>>  static inline uint32_t pmcr_read(void)
> >>>>>  {
> >>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
> >>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
> >>>>>  	return ret;
> >>>>>  }
> >>>>> +
> >>>>> +static inline void pmcr_write(uint32_t value)
> >>>>> +{
> >>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> >>>>> +}
> >>>>> +
> >>>>> +static inline void pmselr_write(uint32_t value)
> >>>>> +{
> >>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> >>>>> +}
> >>>>> +
> >>>>> +static inline void pmxevtyper_write(uint32_t value)
> >>>>> +{
> >>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
> >>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
> >>>>> + * expected (with differences that can easily fit in 32 bits). So just return
> >>>>> + * the lower 32 bits of the cycle count in AArch32.
> >>>>
> >>>> Like I said in the last review, I'd rather we not do this. We should
> >>>> return the full value and then the test case should confirm the upper
> >>>> 32 bits are zero.
> >>>
> >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> >>> register. We can force it to a more coarse-grained cycle counter with
> >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
> > 
> > AArch32 System Register Descriptions
> > Performance Monitors registers
> > PMCCNTR, Performance Monitors Cycle Count Register
> > 
> > To access the PMCCNTR when accessing as a 32-bit register:
> > MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> > MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
> > 
> > To access the PMCCNTR when accessing as a 64-bit register:
> > MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
> > MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
> > 
> 
> Thanks. I did some research based on your info and came back with the
> following proposals (Cov, correct me if I am wrong):
> 
> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
> think this 64-bit cycle register is only available when running under
> aarch32 compatibility mode on ARMv8 because it is not specified in A15
> TRM.

OK, I hadn't realized that there would be differences between v7 and
AArch32. It looks like we need to add a function to the kvm-unit-tests
framework that enables unit tests to make that distinction, because we'll
want to explicitly test those differences in order to flush out emulation
bugs. I see now that Appendix K5 of the v8 ARM ARM lists some differences,
but this PMCCNTR difference isn't there...

As v8-A32 is an update/extension of v7-A, I'd expect there to be a RES0
bit in some v7 ID register that, on v8, is no longer reserved and a 1.
Unfortunately I just did some ARM doc skimming but can't find anything
like that. As we currently only use the cortex-a15 for our v7 processor,
then I guess we can just check MIDR, but yuck. Anyway, I'll send a
patch for that.

> To further verify it, I tested 32-bit pmu code on QEMU with TCG
> mode. The result is: accessing 64-bit PMCCNTR using the following
> assembly failed on A15:
> 
>    volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
> or
>    volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));
> 
> Given this difference, I think there are two solutions for 64-bit
> AArch32 pmccntr_read, as requested by Drew:
> 
> 1) The PMU unit testing code tells if it is running under ARMv7 or under
> AArch32-compability mode. When it is running ARMv7, such as A15, let us
> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise
> use "MRRC p15,0,<Rt>,<Rt2>,c9".
> 
> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC
> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will
> be the same as the original code.

3) For the basic test do (2), but add an additional test for AArch32
   mode that also does the MRRC. That way on AArch32 we test both access
   types.

Going with (3) means we can finish this series off now and then post
another patch later with the additional access, after my is_aarch32()
patch, that I'll write now, gets merged.

Thanks,
drew

> 
> Thoughts?
> 
> -Wei
> 
> [1] A57 TRM,
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
> [2] A15 TRM,
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438c/DDI0438C_cortex_a15_r2p0_trm.pdf
> 
> > Regards,
> > Cov
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
  (?)
  (?)
@ 2016-11-16 15:40   ` Andrew Jones
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Jones @ 2016-11-16 15:40 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm


Just crossed my mind that we're missing isb's.

On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 0b29088..d5e3ac3 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libcflat.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -21,6 +22,10 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define PMU_CYCLE_IDX      31
> +
> +#define NR_SAMPLES 10
> +
>  #if defined(__arm__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> +}
> +
> +static inline void pmselr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));

Probably want an isb here, users will call this and then immediately
another PMU reg write, like is done below

> +}
> +
> +static inline void pmxevtyper_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> +}
> +
> +/*
> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
> + * bits doesn't seem worth the trouble when differential usage of the result is
> + * expected (with differences that can easily fit in 32 bits). So just return
> + * the lower 32 bits of the cycle count in AArch32.

Also, while we're discussing confirming upper bits are as expected, I
guess we should confirm no overflow too. We should clear the overflow
bit PMOVSCLR_EL0.C before we use the counter, and then check it at some
point to confirm it's as expected. I guess that could be separate test
cases though.

> + */
> +static inline uint32_t pmccntr_read(void)
> +{
> +	uint32_t cycles;
> +
> +	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> +	return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	pmselr_write(PMU_CYCLE_IDX);
> +	pmxevtyper_write(value);
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("msr pmcr_el0, %0" : : "r" (value));
> +}
> +
> +static inline uint32_t pmccntr_read(void)
> +{
> +	uint32_t cycles;
> +
> +	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> +	return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
> +}
> +
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
> +}
>  #endif
>  
>  /*
> @@ -63,11 +132,40 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	pmcr_write(pmcr_read() | PMU_PMCR_E);

Need isb() here

> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		unsigned long a, b;
> +
> +		a = pmccntr_read();
> +		b = pmccntr_read();
> +
> +		if (a >= b) {
> +			printf("Read %ld then %ld.\n", a, b);
> +			return false;
> +		}
> +	}
> +
> +	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
> +

Need isb() here

> +	return true;
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	/* init for PMU event access, right now only care about cycle count */
> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */

Need isb() here

> +
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());
>  
>  	return report_summary();
>  }
> -- 
> 1.8.3.1
> 
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
  2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
  (?)
  (?)
@ 2016-11-16 15:45   ` Andrew Jones
  -1 siblings, 0 replies; 23+ messages in thread
From: Andrew Jones @ 2016-11-16 15:45 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/pmu.c         | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 ++++++++
>  2 files changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index d5e3ac3..09aff89 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -15,6 +15,7 @@
>  #include "libcflat.h"
>  
>  #define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
>  	pmselr_write(PMU_CYCLE_IDX);
>  	pmxevtyper_write(value);
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"

isb

> +	"1:	subs	%[i], %[i], #1\n"
> +	"	bgt	1b\n"
> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"

isb

> +	: [i] "+r" (i)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
>  {
>  	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"

isb

> +	"1:	subs	%[i], %[i], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"

isb

> +	: [i] "+r" (i)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
>  	return true;
>  }
>  
> -int main(void)
> +/*
> + * Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int i = (num - 1) / 2;
> +
> +	assert(num >= 3 && ((num - 1) % 2 == 0));
> +	loop(i, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
> +	
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");
> +
> +	for (int i = 3; i < 300; i += 32) {
> +		int avg, sum = 0;
> +
> +		printf("%d :", i);
> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			int cycles;
> +
> +			measure_instrs(i, pmcr);
> +			cycles =pmccntr_read();
> +			printf(" %d", cycles);
> +
> +			if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> +				printf("\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> +			sum, avg, i / avg, avg / i);
> +	}
> +
> +	return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
> +	int cpi = 0;
> +
> +	if (argc >= 1)
> +		cpi = atol(argv[0]);
> +
>  	report_prefix_push("pmu");
>  
>  	/* init for PMU event access, right now only care about cycle count */
> @@ -166,6 +264,7 @@ int main(void)
>  
>  	report("Control register", check_pmcr());
>  	report("Monotonically increasing cycle count", check_cycles_increase());
> +	report("Cycle/instruction ratio", check_cpi(cpi));
>  
>  	return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 7645180..2050dc8 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -59,3 +59,17 @@ groups = selftest
>  [pmu]
>  file = pmu.flat
>  groups = pmu
> +
> +# Test PMU support (TCG) with -icount IPC=1
> +[pmu-tcg-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +accel = tcg
> +
> +# Test PMU support (TCG) with -icount IPC=256
> +[pmu-tcg-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
> +groups = pmu
> +accel = tcg
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-16 13:01             ` Andrew Jones
@ 2016-11-16 16:08               ` Christopher Covington
  2016-11-16 16:25                 ` Andrew Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher Covington @ 2016-11-16 16:08 UTC (permalink / raw)
  To: Andrew Jones, Wei Huang
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm, Peter Maydell

On 11/16/2016 08:01 AM, Andrew Jones wrote:
> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>
>>
>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>> Hi Drew, Wei,
>>>
>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>>>
>>>>>
>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>>>> From: Christopher Covington <cov@codeaurora.org>
>>>>>>>
>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>>>> even for the smallest delta of two subsequent reads.
>>>>>>>
>>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>>>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>>>>>> ---
>>>>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  1 file changed, 98 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>>>> index 0b29088..d5e3ac3 100644
>>>>>>> --- a/arm/pmu.c
>>>>>>> +++ b/arm/pmu.c
>>>>>>> @@ -14,6 +14,7 @@
>>>>>>>   */
>>>>>>>  #include "libcflat.h"
>>>>>>>  
>>>>>>> +#define PMU_PMCR_E         (1 << 0)
>>>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>>>  #define PMU_PMCR_N_MASK    0x1f
>>>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>>>> @@ -21,6 +22,10 @@
>>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>>>  
>>>>>>> +#define PMU_CYCLE_IDX      31
>>>>>>> +
>>>>>>> +#define NR_SAMPLES 10
>>>>>>> +
>>>>>>>  #if defined(__arm__)
>>>>>>>  static inline uint32_t pmcr_read(void)
>>>>>>>  {
>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>> +
>>>>>>> +static inline void pmcr_write(uint32_t value)
>>>>>>> +{
>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline void pmselr_write(uint32_t value)
>>>>>>> +{
>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>>>> +{
>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
>>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return
>>>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>>>
>>>>>> Like I said in the last review, I'd rather we not do this. We should
>>>>>> return the full value and then the test case should confirm the upper
>>>>>> 32 bits are zero.
>>>>>
>>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>>>> register. We can force it to a more coarse-grained cycle counter with
>>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>>>
>>> AArch32 System Register Descriptions
>>> Performance Monitors registers
>>> PMCCNTR, Performance Monitors Cycle Count Register
>>>
>>> To access the PMCCNTR when accessing as a 32-bit register:
>>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
>>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
>>>
>>> To access the PMCCNTR when accessing as a 64-bit register:
>>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
>>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
>>>
>>
>> Thanks. I did some research based on your info and came back with the
>> following proposals (Cov, correct me if I am wrong):
>>
>> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
>> think this 64-bit cycle register is only available when running under
>> aarch32 compatibility mode on ARMv8 because it is not specified in A15
>> TRM.

That interpretation sounds really strange to me. My recollection is that the
cycle counter was available as a 64 bit register in ARMv7 as well. I would
expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
Manual is the complete and authoritative source.

>> To further verify it, I tested 32-bit pmu code on QEMU with TCG
>> mode. The result is: accessing 64-bit PMCCNTR using the following
>> assembly failed on A15:
>>
>>    volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
>> or
>>    volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));

The PMU implementation on QEMU TCG mode is infantile. (I was trying to
write these tests to help guide fixes and enhancements in a
test-driven-development manner.) I would not trust QEMU TCG to behave
properly here. If you want to execute those instructions, is there anything
preventing you from doing it on hardware, or at least the Foundation Model?

>> Given this difference, I think there are two solutions for 64-bit
>> AArch32 pmccntr_read, as requested by Drew:
>>
>> 1) The PMU unit testing code tells if it is running under ARMv7 or under
>> AArch32-compability mode. When it is running ARMv7, such as A15, let us
>> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise
>> use "MRRC p15,0,<Rt>,<Rt2>,c9".
>>
>> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC
>> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will
>> be the same as the original code.
> 
> 3) For the basic test do (2), but add an additional test for AArch32
>    mode that also does the MRRC. That way on AArch32 we test both access
>    types.

The upper bits being non-zero is an insane corner case.

I'd really prefer to first build some momentum with checks for issues that
are A) likely to occur and 2) not too difficult to check, like whether PMCR
is writeable (especially relevant to KVM mode where it's not by default).

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-16 16:08               ` Christopher Covington
@ 2016-11-16 16:25                 ` Andrew Jones
  2016-11-16 16:41                     ` Christopher Covington
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Jones @ 2016-11-16 16:25 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Wei Huang, alindsay, kvm, Peter Maydell, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
> On 11/16/2016 08:01 AM, Andrew Jones wrote:
> > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
> >>
> >>
> >> On 11/14/2016 09:12 AM, Christopher Covington wrote:
> >>> Hi Drew, Wei,
> >>>
> >>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
> >>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
> >>>>>
> >>>>>
> >>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
> >>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> >>>>>>> From: Christopher Covington <cov@codeaurora.org>
> >>>>>>>
> >>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >>>>>>> even for the smallest delta of two subsequent reads.
> >>>>>>>
> >>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >>>>>>> Signed-off-by: Wei Huang <wei@redhat.com>
> >>>>>>> ---
> >>>>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>  1 file changed, 98 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
> >>>>>>> index 0b29088..d5e3ac3 100644
> >>>>>>> --- a/arm/pmu.c
> >>>>>>> +++ b/arm/pmu.c
> >>>>>>> @@ -14,6 +14,7 @@
> >>>>>>>   */
> >>>>>>>  #include "libcflat.h"
> >>>>>>>  
> >>>>>>> +#define PMU_PMCR_E         (1 << 0)
> >>>>>>>  #define PMU_PMCR_N_SHIFT   11
> >>>>>>>  #define PMU_PMCR_N_MASK    0x1f
> >>>>>>>  #define PMU_PMCR_ID_SHIFT  16
> >>>>>>> @@ -21,6 +22,10 @@
> >>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
> >>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
> >>>>>>>  
> >>>>>>> +#define PMU_CYCLE_IDX      31
> >>>>>>> +
> >>>>>>> +#define NR_SAMPLES 10
> >>>>>>> +
> >>>>>>>  #if defined(__arm__)
> >>>>>>>  static inline uint32_t pmcr_read(void)
> >>>>>>>  {
> >>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
> >>>>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
> >>>>>>>  	return ret;
> >>>>>>>  }
> >>>>>>> +
> >>>>>>> +static inline void pmcr_write(uint32_t value)
> >>>>>>> +{
> >>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline void pmselr_write(uint32_t value)
> >>>>>>> +{
> >>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static inline void pmxevtyper_write(uint32_t value)
> >>>>>>> +{
> >>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
> >>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
> >>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return
> >>>>>>> + * the lower 32 bits of the cycle count in AArch32.
> >>>>>>
> >>>>>> Like I said in the last review, I'd rather we not do this. We should
> >>>>>> return the full value and then the test case should confirm the upper
> >>>>>> 32 bits are zero.
> >>>>>
> >>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> >>>>> register. We can force it to a more coarse-grained cycle counter with
> >>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
> >>>
> >>> AArch32 System Register Descriptions
> >>> Performance Monitors registers
> >>> PMCCNTR, Performance Monitors Cycle Count Register
> >>>
> >>> To access the PMCCNTR when accessing as a 32-bit register:
> >>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> >>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
> >>>
> >>> To access the PMCCNTR when accessing as a 64-bit register:
> >>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
> >>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
> >>>
> >>
> >> Thanks. I did some research based on your info and came back with the
> >> following proposals (Cov, correct me if I am wrong):
> >>
> >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
> >> think this 64-bit cycle register is only available when running under
> >> aarch32 compatibility mode on ARMv8 because it is not specified in A15
> >> TRM.
> 
> That interpretation sounds really strange to me. My recollection is that the
> cycle counter was available as a 64 bit register in ARMv7 as well. I would
> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
> Manual is the complete and authoritative source.

Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit.
Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64.

> 
> >> To further verify it, I tested 32-bit pmu code on QEMU with TCG
> >> mode. The result is: accessing 64-bit PMCCNTR using the following
> >> assembly failed on A15:
> >>
> >>    volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
> >> or
> >>    volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));
> 
> The PMU implementation on QEMU TCG mode is infantile. (I was trying to
> write these tests to help guide fixes and enhancements in a
> test-driven-development manner.) I would not trust QEMU TCG to behave
> properly here. If you want to execute those instructions, is there anything
> preventing you from doing it on hardware, or at least the Foundation Model?
> 
> >> Given this difference, I think there are two solutions for 64-bit
> >> AArch32 pmccntr_read, as requested by Drew:
> >>
> >> 1) The PMU unit testing code tells if it is running under ARMv7 or under
> >> AArch32-compability mode. When it is running ARMv7, such as A15, let us
> >> use "MRC p15,0,<Rt>,c9,c13,0" and clear the upper 32-bit as 0. Otherwise
> >> use "MRRC p15,0,<Rt>,<Rt2>,c9".
> >>
> >> 2) Returns 64-bit results for ARM pmccntr_read(). But we only uses "MRC
> >> p15,0,<Rt>,c9,c13,0" and always clear the upper 32-bit as 0. This will
> >> be the same as the original code.
> > 
> > 3) For the basic test do (2), but add an additional test for AArch32
> >    mode that also does the MRRC. That way on AArch32 we test both access
> >    types.
> 
> The upper bits being non-zero is an insane corner case.

The bits will most likely be zero, but how do you know without checking?
Also, just invoking an mrrc vs. mrc is a big difference when testing
emulation. We shouldn't skip it just because we expect it to give us a
boring result.

> 
> I'd really prefer to first build some momentum with checks for issues that
> are A) likely to occur and 2) not too difficult to check, like whether PMCR
> is writeable (especially relevant to KVM mode where it's not by default).

Looks like we're on the right track for this starter series then.

Thanks,
drew

> 
> Thanks,
> Cov
> 
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
> Aurora Forum, a Linux Foundation Collaborative Project.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
  2016-11-16 16:25                 ` Andrew Jones
@ 2016-11-16 16:41                     ` Christopher Covington
  0 siblings, 0 replies; 23+ messages in thread
From: Christopher Covington @ 2016-11-16 16:41 UTC (permalink / raw)
  To: Andrew Jones
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On 11/16/2016 11:25 AM, Andrew Jones wrote:
> On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
>> On 11/16/2016 08:01 AM, Andrew Jones wrote:
>>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>>>
>>>>
>>>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>>>> Hi Drew, Wei,
>>>>>
>>>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>>>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>>>>>> From: Christopher Covington <cov@codeaurora.org>
>>>>>>>>>
>>>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>>>>>> even for the smallest delta of two subsequent reads.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>>>>>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>>>>>>>> ---
>>>>>>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  1 file changed, 98 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>>>>>> index 0b29088..d5e3ac3 100644
>>>>>>>>> --- a/arm/pmu.c
>>>>>>>>> +++ b/arm/pmu.c
>>>>>>>>> @@ -14,6 +14,7 @@
>>>>>>>>>   */
>>>>>>>>>  #include "libcflat.h"
>>>>>>>>>  
>>>>>>>>> +#define PMU_PMCR_E         (1 << 0)
>>>>>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>>>>>  #define PMU_PMCR_N_MASK    0x1f
>>>>>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>>>>>> @@ -21,6 +22,10 @@
>>>>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>>>>>  
>>>>>>>>> +#define PMU_CYCLE_IDX      31
>>>>>>>>> +
>>>>>>>>> +#define NR_SAMPLES 10
>>>>>>>>> +
>>>>>>>>>  #if defined(__arm__)
>>>>>>>>>  static inline uint32_t pmcr_read(void)
>>>>>>>>>  {
>>>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>>>>>  	return ret;
>>>>>>>>>  }
>>>>>>>>> +
>>>>>>>>> +static inline void pmcr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmselr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>>>>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
>>>>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return
>>>>>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>>>>>
>>>>>>>> Like I said in the last review, I'd rather we not do this. We should
>>>>>>>> return the full value and then the test case should confirm the upper
>>>>>>>> 32 bits are zero.
>>>>>>>
>>>>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>>>>>> register. We can force it to a more coarse-grained cycle counter with
>>>>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>>>>>
>>>>> AArch32 System Register Descriptions
>>>>> Performance Monitors registers
>>>>> PMCCNTR, Performance Monitors Cycle Count Register
>>>>>
>>>>> To access the PMCCNTR when accessing as a 32-bit register:
>>>>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
>>>>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
>>>>>
>>>>> To access the PMCCNTR when accessing as a 64-bit register:
>>>>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
>>>>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
>>>>>
>>>>
>>>> Thanks. I did some research based on your info and came back with the
>>>> following proposals (Cov, correct me if I am wrong):
>>>>
>>>> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
>>>> think this 64-bit cycle register is only available when running under
>>>> aarch32 compatibility mode on ARMv8 because it is not specified in A15
>>>> TRM.
>>
>> That interpretation sounds really strange to me. My recollection is that the
>> cycle counter was available as a 64 bit register in ARMv7 as well. I would
>> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
>> Manual is the complete and authoritative source.
> 
> Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit.
> Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64.

Just looked it up as well in the good old ARM DDI 0406C.c and you're absolutely
right. Sorry for the bad recollection.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
@ 2016-11-16 16:41                     ` Christopher Covington
  0 siblings, 0 replies; 23+ messages in thread
From: Christopher Covington @ 2016-11-16 16:41 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Wei Huang, alindsay, kvm, Peter Maydell, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On 11/16/2016 11:25 AM, Andrew Jones wrote:
> On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
>> On 11/16/2016 08:01 AM, Andrew Jones wrote:
>>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>>>
>>>>
>>>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>>>> Hi Drew, Wei,
>>>>>
>>>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>>>>>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>>>>>>>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>>>>>>>> From: Christopher Covington <cov@codeaurora.org>
>>>>>>>>>
>>>>>>>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>>>>>>>> even for the smallest delta of two subsequent reads.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>>>>>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>>>>>>>> ---
>>>>>>>>>  arm/pmu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  1 file changed, 98 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>>>>>>>> index 0b29088..d5e3ac3 100644
>>>>>>>>> --- a/arm/pmu.c
>>>>>>>>> +++ b/arm/pmu.c
>>>>>>>>> @@ -14,6 +14,7 @@
>>>>>>>>>   */
>>>>>>>>>  #include "libcflat.h"
>>>>>>>>>  
>>>>>>>>> +#define PMU_PMCR_E         (1 << 0)
>>>>>>>>>  #define PMU_PMCR_N_SHIFT   11
>>>>>>>>>  #define PMU_PMCR_N_MASK    0x1f
>>>>>>>>>  #define PMU_PMCR_ID_SHIFT  16
>>>>>>>>> @@ -21,6 +22,10 @@
>>>>>>>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>>>>>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>>>>>>>  
>>>>>>>>> +#define PMU_CYCLE_IDX      31
>>>>>>>>> +
>>>>>>>>> +#define NR_SAMPLES 10
>>>>>>>>> +
>>>>>>>>>  #if defined(__arm__)
>>>>>>>>>  static inline uint32_t pmcr_read(void)
>>>>>>>>>  {
>>>>>>>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>>>>>>>>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>>>>>>>>  	return ret;
>>>>>>>>>  }
>>>>>>>>> +
>>>>>>>>> +static inline void pmcr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmselr_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void pmxevtyper_write(uint32_t value)
>>>>>>>>> +{
>>>>>>>>> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 64
>>>>>>>>> + * bits doesn't seem worth the trouble when differential usage of the result is
>>>>>>>>> + * expected (with differences that can easily fit in 32 bits). So just return
>>>>>>>>> + * the lower 32 bits of the cycle count in AArch32.
>>>>>>>>
>>>>>>>> Like I said in the last review, I'd rather we not do this. We should
>>>>>>>> return the full value and then the test case should confirm the upper
>>>>>>>> 32 bits are zero.
>>>>>>>
>>>>>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>>>>>> register. We can force it to a more coarse-grained cycle counter with
>>>>>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>>>>>
>>>>> AArch32 System Register Descriptions
>>>>> Performance Monitors registers
>>>>> PMCCNTR, Performance Monitors Cycle Count Register
>>>>>
>>>>> To access the PMCCNTR when accessing as a 32-bit register:
>>>>> MRC p15,0,<Rt>,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
>>>>> MCR p15,0,<Rt>,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are unchanged
>>>>>
>>>>> To access the PMCCNTR when accessing as a 64-bit register:
>>>>> MRRC p15,0,<Rt>,<Rt2>,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] into Rt2
>>>>> MCRR p15,0,<Rt>,<Rt2>,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to PMCCNTR[63:32]
>>>>>
>>>>
>>>> Thanks. I did some research based on your info and came back with the
>>>> following proposals (Cov, correct me if I am wrong):
>>>>
>>>> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
>>>> think this 64-bit cycle register is only available when running under
>>>> aarch32 compatibility mode on ARMv8 because it is not specified in A15
>>>> TRM.
>>
>> That interpretation sounds really strange to me. My recollection is that the
>> cycle counter was available as a 64 bit register in ARMv7 as well. I would
>> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
>> Manual is the complete and authoritative source.
> 
> Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit.
> Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64.

Just looked it up as well in the good old ARM DDI 0406C.c and you're absolutely
right. Sorry for the bad recollection.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-11-16 16:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-08 18:17 [kvm-unit-tests PATCH v8 0/3] ARM PMU tests Wei Huang
2016-11-08 18:17 ` [Qemu-devel] " Wei Huang
2016-11-08 18:17 ` [kvm-unit-tests PATCH v8 1/3] arm: Add PMU test Wei Huang
2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
2016-11-11  7:37   ` Andrew Jones
2016-11-08 18:17 ` [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases Wei Huang
2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
2016-11-11  7:43   ` Andrew Jones
2016-11-11 19:55     ` Wei Huang
2016-11-14 10:05       ` Andrew Jones
2016-11-14 15:12         ` Christopher Covington
2016-11-15 22:50           ` Wei Huang
2016-11-16 13:01             ` Andrew Jones
2016-11-16 16:08               ` Christopher Covington
2016-11-16 16:25                 ` Andrew Jones
2016-11-16 16:41                   ` Christopher Covington
2016-11-16 16:41                     ` Christopher Covington
2016-11-16 15:40   ` Andrew Jones
2016-11-08 18:17 ` [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking Wei Huang
2016-11-08 18:17   ` [Qemu-devel] " Wei Huang
2016-11-11  8:08   ` Andrew Jones
2016-11-11 16:10     ` Wei Huang
2016-11-16 15:45   ` Andrew Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.