All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v13 0/4] ARM PMU tests
@ 2016-12-01  5:16 ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

Changes from v12:
* Define macros for system register accessors
* Re-write PMU code using the newly-defined macros
* Code tested under both AArch32 and AArch64 modes

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Christopher Covington (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

Wei Huang (1):
  arm: Define macros for accessing system registers

 arm/Makefile.common       |   3 +-
 arm/pmu.c                 | 277 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         |  19 ++++
 lib/arm/asm/processor.h   |  37 ++++++-
 lib/arm64/asm/processor.h |  35 ++++--
 5 files changed, 358 insertions(+), 13 deletions(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v13 0/4] ARM PMU tests
@ 2016-12-01  5:16 ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

Changes from v12:
* Define macros for system register accessors
* Re-write PMU code using the newly-defined macros
* Code tested under both AArch32 and AArch64 modes

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Christopher Covington (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

Wei Huang (1):
  arm: Define macros for accessing system registers

 arm/Makefile.common       |   3 +-
 arm/pmu.c                 | 277 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         |  19 ++++
 lib/arm/asm/processor.h   |  37 ++++++-
 lib/arm64/asm/processor.h |  35 ++++--
 5 files changed, 358 insertions(+), 13 deletions(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01  5:16 ` [Qemu-devel] " Wei Huang
@ 2016-12-01  5:16   ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

This patch defines four macros to assist creating system register
accessors under both ARMv7 and AArch64:
   * DEFINE_GET_SYSREG32(name, ...)
   * DEFINE_SET_SYSREG32(name, ...)
   * DEFINE_GET_SYSREG64(name, ...)
   * DEFINE_SET_SYSREG64(name, ...)
These macros are translated to inline functions with consistent naming,
get_##name() and set_##name(), which can be used by C code directly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
 lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
index f25e7ee..3ca6b42 100644
--- a/lib/arm/asm/processor.h
+++ b/lib/arm/asm/processor.h
@@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
 
 #define current_mode() (current_cpsr() & MODE_MASK)
 
-static inline unsigned int get_mpidr(void)
-{
-	unsigned int mpidr;
-	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
-	return mpidr;
+#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
+static inline uint32_t get_##name(void)					\
+{									\
+	uint32_t reg;							\
+	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
+		     #opc2 : "=r" (reg));				\
+	return reg;							\
+}
+
+#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
+static inline void set_##name(uint32_t value)				\
+{									\
+	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
+		     #opc2 :: "r" (value));				\
+}
+
+#define DEFINE_GET_SYSREG64(name, opc, crm)				\
+static inline uint64_t get_##name(void)					\
+{									\
+	uint32_t lo, hi;						\
+	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
+		     : "=r" (lo), "=r" (hi));				\
+	return (uint64_t)hi << 32 | lo;					\
+}
+
+#define DEFINE_SET_SYSREG64(name, opc, crm)				\
+static inline void set_##name(uint64_t value)				\
+{									\
+	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
+		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
 }
 
+DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
+
 /* Only support Aff0 for now, up to 4 cpus */
 #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
 
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 84d5c7c..dfa75eb 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
 	return el & 0xc;
 }
 
-#define DEFINE_GET_SYSREG32(reg)				\
-static inline unsigned int get_##reg(void)			\
-{								\
-	unsigned int reg;					\
-	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
-	return reg;						\
+#define DEFINE_GET_SYSREG32(reg, el)					\
+static inline uint32_t get_##reg(void)					\
+{									\
+	uint32_t reg;							\
+	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
+	return reg;							\
 }
-DEFINE_GET_SYSREG32(mpidr)
+
+#define DEFINE_SET_SYSREG32(reg, el)					\
+static inline void set_##reg(uint32_t value)				\
+{									\
+	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
+}
+
+#define DEFINE_GET_SYSREG64(reg, el)					\
+static inline uint64_t get_##reg(void)					\
+{									\
+	uint64_t reg;							\
+	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
+	return reg;							\
+}
+
+#define DEFINE_SET_SYSREG64(reg, el)					\
+static inline void set_##reg(uint64_t value)				\
+{									\
+	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
+}
+
+DEFINE_GET_SYSREG32(mpidr, el1)
 
 /* Only support Aff0 for now, gicv2 only */
 #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
@ 2016-12-01  5:16   ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

This patch defines four macros to assist creating system register
accessors under both ARMv7 and AArch64:
   * DEFINE_GET_SYSREG32(name, ...)
   * DEFINE_SET_SYSREG32(name, ...)
   * DEFINE_GET_SYSREG64(name, ...)
   * DEFINE_SET_SYSREG64(name, ...)
These macros are translated to inline functions with consistent naming,
get_##name() and set_##name(), which can be used by C code directly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
 lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
index f25e7ee..3ca6b42 100644
--- a/lib/arm/asm/processor.h
+++ b/lib/arm/asm/processor.h
@@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
 
 #define current_mode() (current_cpsr() & MODE_MASK)
 
-static inline unsigned int get_mpidr(void)
-{
-	unsigned int mpidr;
-	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
-	return mpidr;
+#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
+static inline uint32_t get_##name(void)					\
+{									\
+	uint32_t reg;							\
+	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
+		     #opc2 : "=r" (reg));				\
+	return reg;							\
+}
+
+#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
+static inline void set_##name(uint32_t value)				\
+{									\
+	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
+		     #opc2 :: "r" (value));				\
+}
+
+#define DEFINE_GET_SYSREG64(name, opc, crm)				\
+static inline uint64_t get_##name(void)					\
+{									\
+	uint32_t lo, hi;						\
+	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
+		     : "=r" (lo), "=r" (hi));				\
+	return (uint64_t)hi << 32 | lo;					\
+}
+
+#define DEFINE_SET_SYSREG64(name, opc, crm)				\
+static inline void set_##name(uint64_t value)				\
+{									\
+	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
+		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
 }
 
+DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
+
 /* Only support Aff0 for now, up to 4 cpus */
 #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
 
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 84d5c7c..dfa75eb 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
 	return el & 0xc;
 }
 
-#define DEFINE_GET_SYSREG32(reg)				\
-static inline unsigned int get_##reg(void)			\
-{								\
-	unsigned int reg;					\
-	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
-	return reg;						\
+#define DEFINE_GET_SYSREG32(reg, el)					\
+static inline uint32_t get_##reg(void)					\
+{									\
+	uint32_t reg;							\
+	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
+	return reg;							\
 }
-DEFINE_GET_SYSREG32(mpidr)
+
+#define DEFINE_SET_SYSREG32(reg, el)					\
+static inline void set_##reg(uint32_t value)				\
+{									\
+	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
+}
+
+#define DEFINE_GET_SYSREG64(reg, el)					\
+static inline uint64_t get_##reg(void)					\
+{									\
+	uint64_t reg;							\
+	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
+	return reg;							\
+}
+
+#define DEFINE_SET_SYSREG64(reg, el)					\
+static inline void set_##reg(uint64_t value)				\
+{									\
+	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
+}
+
+DEFINE_GET_SYSREG32(mpidr, el1)
 
 /* Only support Aff0 for now, gicv2 only */
 #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01  5:16 ` [Qemu-devel] " Wei Huang
@ 2016-12-01  5:16   ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

From: Christopher Covington <cov@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU). PMU register
was read using the newly defined macros.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c           | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  5 +++++
 3 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f37b5c2..5da2fdd 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -12,7 +12,8 @@ endif
 tests-common = \
 	$(TEST_DIR)/selftest.flat \
 	$(TEST_DIR)/spinlock-test.flat \
-	$(TEST_DIR)/pci-test.flat
+	$(TEST_DIR)/pci-test.flat \
+	$(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 0000000..1fe2b1a
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,62 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+#include "asm/processor.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK    0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
+#elif defined(__aarch64__)
+DEFINE_GET_SYSREG32(pmcr, el0)
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+	uint32_t pmcr;
+
+	pmcr = get_pmcr();
+
+	report_info("PMU implementer/ID code/counters: 0x%x(\"%c\")/0x%x/%d",
+		    (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK,
+		    ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) ? : ' ',
+		    (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK,
+		    (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+	report_prefix_push("pmu");
+
+	report("Control register", check_pmcr());
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ae32a42..816f494 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -58,3 +58,8 @@ groups = selftest
 [pci-test]
 file = pci-test.flat
 groups = pci
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
@ 2016-12-01  5:16   ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

From: Christopher Covington <cov@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU). PMU register
was read using the newly defined macros.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c           | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  5 +++++
 3 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f37b5c2..5da2fdd 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -12,7 +12,8 @@ endif
 tests-common = \
 	$(TEST_DIR)/selftest.flat \
 	$(TEST_DIR)/spinlock-test.flat \
-	$(TEST_DIR)/pci-test.flat
+	$(TEST_DIR)/pci-test.flat \
+	$(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 0000000..1fe2b1a
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,62 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+#include "asm/processor.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK    0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
+#elif defined(__aarch64__)
+DEFINE_GET_SYSREG32(pmcr, el0)
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+	uint32_t pmcr;
+
+	pmcr = get_pmcr();
+
+	report_info("PMU implementer/ID code/counters: 0x%x(\"%c\")/0x%x/%d",
+		    (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK,
+		    ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) ? : ' ',
+		    (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK,
+		    (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+	report_prefix_push("pmu");
+
+	report("Control register", check_pmcr());
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ae32a42..816f494 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -58,3 +58,8 @@ groups = selftest
 [pci-test]
 file = pci-test.flat
 groups = pci
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01  5:16 ` [Qemu-devel] " Wei Huang
@ 2016-12-01  5:16   ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

From: Christopher Covington <cov@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 1fe2b1a..3566a27 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -16,6 +16,9 @@
 #include "asm/barrier.h"
 #include "asm/processor.h"
 
+#define PMU_PMCR_E         (1 << 0)
+#define PMU_PMCR_C         (1 << 2)
+#define PMU_PMCR_LC        (1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -23,10 +26,57 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX         31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
+DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
+DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
+DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
+DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
+DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
+DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
+DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
+DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
+DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
+
+static inline uint64_t get_pmccntr(void)
+{
+	if (pmu_version == 0x3)
+		return get_pmccntr64();
+	else
+		return get_pmccntr32();
+}
+
+static inline void set_pmccntr(uint64_t value)
+{
+	if (pmu_version == 0x3)
+		set_pmccntr64(value);
+	else
+		set_pmccntr32(value & 0xffffffff);
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void set_pmccfiltr(uint32_t value)
+{
+	set_pmselr(PMU_CYCLE_IDX);
+	set_pmxevtyper(value);
+	isb();
+}
 #elif defined(__aarch64__)
 DEFINE_GET_SYSREG32(pmcr, el0)
+DEFINE_SET_SYSREG32(pmcr, el0)
+DEFINE_GET_SYSREG32(id_dfr0, el1)
+DEFINE_GET_SYSREG64(pmccntr, el0);
+DEFINE_SET_SYSREG64(pmccntr, el0);
+DEFINE_SET_SYSREG32(pmcntenset, el0);
+DEFINE_SET_SYSREG32(pmccfiltr, el0);
 #endif
 
 /*
@@ -52,11 +102,55 @@ static bool check_pmcr(void)
 	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+	bool success = true;
+
+	/* init before event access, this test only cares about cycle count */
+	set_pmcntenset(1 << PMU_CYCLE_IDX);
+	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
+	set_pmccntr(0);
+
+	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+	for (int i = 0; i < NR_SAMPLES; i++) {
+		uint64_t a, b;
+
+		a = get_pmccntr();
+		b = get_pmccntr();
+
+		if (a >= b) {
+			printf("Read %"PRId64" then %"PRId64".\n", a, b);
+			success = false;
+			break;
+		}
+	}
+
+	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
+
+	return success;
+}
+
+void pmu_init(void)
+{
+	uint32_t dfr0;
+
+	/* probe pmu version */
+	dfr0 = get_id_dfr0();
+	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
+	report_info("PMU version: %d", pmu_version);
+}
+
 int main(void)
 {
 	report_prefix_push("pmu");
 
+	pmu_init();
 	report("Control register", check_pmcr());
+	report("Monotonically increasing cycle count", check_cycles_increase());
 
 	return report_summary();
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
@ 2016-12-01  5:16   ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

From: Christopher Covington <cov@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 1fe2b1a..3566a27 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -16,6 +16,9 @@
 #include "asm/barrier.h"
 #include "asm/processor.h"
 
+#define PMU_PMCR_E         (1 << 0)
+#define PMU_PMCR_C         (1 << 2)
+#define PMU_PMCR_LC        (1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -23,10 +26,57 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX         31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
+DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
+DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
+DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
+DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
+DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
+DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
+DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
+DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
+DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
+
+static inline uint64_t get_pmccntr(void)
+{
+	if (pmu_version == 0x3)
+		return get_pmccntr64();
+	else
+		return get_pmccntr32();
+}
+
+static inline void set_pmccntr(uint64_t value)
+{
+	if (pmu_version == 0x3)
+		set_pmccntr64(value);
+	else
+		set_pmccntr32(value & 0xffffffff);
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void set_pmccfiltr(uint32_t value)
+{
+	set_pmselr(PMU_CYCLE_IDX);
+	set_pmxevtyper(value);
+	isb();
+}
 #elif defined(__aarch64__)
 DEFINE_GET_SYSREG32(pmcr, el0)
+DEFINE_SET_SYSREG32(pmcr, el0)
+DEFINE_GET_SYSREG32(id_dfr0, el1)
+DEFINE_GET_SYSREG64(pmccntr, el0);
+DEFINE_SET_SYSREG64(pmccntr, el0);
+DEFINE_SET_SYSREG32(pmcntenset, el0);
+DEFINE_SET_SYSREG32(pmccfiltr, el0);
 #endif
 
 /*
@@ -52,11 +102,55 @@ static bool check_pmcr(void)
 	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+	bool success = true;
+
+	/* init before event access, this test only cares about cycle count */
+	set_pmcntenset(1 << PMU_CYCLE_IDX);
+	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
+	set_pmccntr(0);
+
+	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+	for (int i = 0; i < NR_SAMPLES; i++) {
+		uint64_t a, b;
+
+		a = get_pmccntr();
+		b = get_pmccntr();
+
+		if (a >= b) {
+			printf("Read %"PRId64" then %"PRId64".\n", a, b);
+			success = false;
+			break;
+		}
+	}
+
+	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
+
+	return success;
+}
+
+void pmu_init(void)
+{
+	uint32_t dfr0;
+
+	/* probe pmu version */
+	dfr0 = get_id_dfr0();
+	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
+	report_info("PMU version: %d", pmu_version);
+}
+
 int main(void)
 {
 	report_prefix_push("pmu");
 
+	pmu_init();
 	report("Control register", check_pmcr());
+	report("Monotonically increasing cycle count", check_cycles_increase());
 
 	return report_summary();
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01  5:16 ` [Qemu-devel] " Wei Huang
@ 2016-12-01  5:16   ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

From: Christopher Covington <cov@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 arm/unittests.cfg |  14 +++++++
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 3566a27..29d7c2c 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
 	set_pmxevtyper(value);
 	isb();
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. isb instructions were inserted to make sure
+ * pmccntr read after this function returns the exact instructions executed in
+ * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_instrs_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	bgt	1b\n"
+	"	mcr	p15, 0, %[z], c9, c12, 0\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr), [z] "r" (0)
+	: "cc");
+}
 #elif defined(__aarch64__)
 DEFINE_GET_SYSREG32(pmcr, el0)
 DEFINE_SET_SYSREG32(pmcr, el0)
@@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
 DEFINE_SET_SYSREG64(pmccntr, el0);
 DEFINE_SET_SYSREG32(pmcntenset, el0);
 DEFINE_SET_SYSREG32(pmccfiltr, el0);
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. isb instructions are inserted to make sure
+ * pmccntr read after this function returns the exact instructions executed
+ * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_instrs_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	msr	pmcr_el0, %[pmcr]\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	b.gt	1b\n"
+	"	msr	pmcr_el0, xzr\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr)
+	: "cc");
+}
 #endif
 
 /*
@@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
 	return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only even instruction counts
+ * greater than or equal to 4 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+	int loop = (num - 2) / 2;
+
+	assert(num >= 4 && ((num - 2) % 2 == 0));
+	precise_instrs_loop(loop, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+
+	/* init before event access, this test only cares about cycle count */
+	set_pmcntenset(1 << PMU_CYCLE_IDX);
+	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
+
+	if (cpi > 0)
+		printf("Checking for CPI=%d.\n", cpi);
+	printf("instrs : cycles0 cycles1 ...\n");
+
+	for (unsigned int i = 4; i < 300; i += 32) {
+		uint64_t avg, sum = 0;
+
+		printf("%d :", i);
+		for (int j = 0; j < NR_SAMPLES; j++) {
+			uint64_t cycles;
+
+			set_pmccntr(0);
+			measure_instrs(i, pmcr);
+			cycles = get_pmccntr();
+			printf(" %"PRId64"", cycles);
+
+			if (!cycles) {
+				printf("\ncycles not incrementing!\n");
+				return false;
+			} else if (cpi > 0 && cycles != i * cpi) {
+				printf("\nunexpected cycle count received!\n");
+				return false;
+			} else if ((cycles >> 32) != 0) {
+				/* The cycles taken by the loop above should
+				 * fit in 32 bits easily. We check the upper
+				 * 32 bits of the cycle counter to make sure
+				 * there is no supprise. */
+				printf("\ncycle count bigger than 32bit!\n");
+				return false;
+			}
+
+			sum += cycles;
+		}
+		avg = sum / NR_SAMPLES;
+		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
+		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
+	}
+
+	return true;
+}
+
 void pmu_init(void)
 {
 	uint32_t dfr0;
@@ -144,13 +259,19 @@ void pmu_init(void)
 	report_info("PMU version: %d", pmu_version);
 }
 
-int main(void)
+int main(int argc, char *argv[])
 {
+	int cpi = 0;
+
+	if (argc > 1)
+		cpi = atol(argv[1]);
+
 	report_prefix_push("pmu");
 
 	pmu_init();
 	report("Control register", check_pmcr());
 	report("Monotonically increasing cycle count", check_cycles_increase());
+	report("Cycle/instruction ratio", check_cpi(cpi));
 
 	return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 816f494..044d97c 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -63,3 +63,17 @@ groups = pci
 [pmu]
 file = pmu.flat
 groups = pmu
+
+# Test PMU support (TCG) with -icount IPC=1
+[pmu-tcg-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support (TCG) with -icount IPC=256
+[pmu-tcg-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01  5:16   ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01  5:16 UTC (permalink / raw)
  To: cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones, andre.przywara

From: Christopher Covington <cov@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 arm/unittests.cfg |  14 +++++++
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 3566a27..29d7c2c 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
 	set_pmxevtyper(value);
 	isb();
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. isb instructions were inserted to make sure
+ * pmccntr read after this function returns the exact instructions executed in
+ * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_instrs_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	bgt	1b\n"
+	"	mcr	p15, 0, %[z], c9, c12, 0\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr), [z] "r" (0)
+	: "cc");
+}
 #elif defined(__aarch64__)
 DEFINE_GET_SYSREG32(pmcr, el0)
 DEFINE_SET_SYSREG32(pmcr, el0)
@@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
 DEFINE_SET_SYSREG64(pmccntr, el0);
 DEFINE_SET_SYSREG32(pmcntenset, el0);
 DEFINE_SET_SYSREG32(pmccfiltr, el0);
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. isb instructions are inserted to make sure
+ * pmccntr read after this function returns the exact instructions executed
+ * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_instrs_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	msr	pmcr_el0, %[pmcr]\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	b.gt	1b\n"
+	"	msr	pmcr_el0, xzr\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr)
+	: "cc");
+}
 #endif
 
 /*
@@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
 	return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only even instruction counts
+ * greater than or equal to 4 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+	int loop = (num - 2) / 2;
+
+	assert(num >= 4 && ((num - 2) % 2 == 0));
+	precise_instrs_loop(loop, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+
+	/* init before event access, this test only cares about cycle count */
+	set_pmcntenset(1 << PMU_CYCLE_IDX);
+	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
+
+	if (cpi > 0)
+		printf("Checking for CPI=%d.\n", cpi);
+	printf("instrs : cycles0 cycles1 ...\n");
+
+	for (unsigned int i = 4; i < 300; i += 32) {
+		uint64_t avg, sum = 0;
+
+		printf("%d :", i);
+		for (int j = 0; j < NR_SAMPLES; j++) {
+			uint64_t cycles;
+
+			set_pmccntr(0);
+			measure_instrs(i, pmcr);
+			cycles = get_pmccntr();
+			printf(" %"PRId64"", cycles);
+
+			if (!cycles) {
+				printf("\ncycles not incrementing!\n");
+				return false;
+			} else if (cpi > 0 && cycles != i * cpi) {
+				printf("\nunexpected cycle count received!\n");
+				return false;
+			} else if ((cycles >> 32) != 0) {
+				/* The cycles taken by the loop above should
+				 * fit in 32 bits easily. We check the upper
+				 * 32 bits of the cycle counter to make sure
+				 * there is no supprise. */
+				printf("\ncycle count bigger than 32bit!\n");
+				return false;
+			}
+
+			sum += cycles;
+		}
+		avg = sum / NR_SAMPLES;
+		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
+		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
+	}
+
+	return true;
+}
+
 void pmu_init(void)
 {
 	uint32_t dfr0;
@@ -144,13 +259,19 @@ void pmu_init(void)
 	report_info("PMU version: %d", pmu_version);
 }
 
-int main(void)
+int main(int argc, char *argv[])
 {
+	int cpi = 0;
+
+	if (argc > 1)
+		cpi = atol(argv[1]);
+
 	report_prefix_push("pmu");
 
 	pmu_init();
 	report("Control register", check_pmcr());
 	report("Monotonically increasing cycle count", check_cycles_increase());
+	report("Cycle/instruction ratio", check_cpi(cpi));
 
 	return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 816f494..044d97c 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -63,3 +63,17 @@ groups = pci
 [pmu]
 file = pmu.flat
 groups = pmu
+
+# Test PMU support (TCG) with -icount IPC=1
+[pmu-tcg-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support (TCG) with -icount IPC=256
+[pmu-tcg-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
@ 2016-12-01  8:59     ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  8:59 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao


Should this be From: Andre?

On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> This patch defines four macros to assist creating system register
> accessors under both ARMv7 and AArch64:
>    * DEFINE_GET_SYSREG32(name, ...)
>    * DEFINE_SET_SYSREG32(name, ...)
>    * DEFINE_GET_SYSREG64(name, ...)
>    * DEFINE_SET_SYSREG64(name, ...)
> These macros are translated to inline functions with consistent naming,
> get_##name() and set_##name(), which can be used by C code directly.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
>  2 files changed, 60 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> index f25e7ee..3ca6b42 100644
> --- a/lib/arm/asm/processor.h
> +++ b/lib/arm/asm/processor.h
> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
>  
>  #define current_mode() (current_cpsr() & MODE_MASK)
>  
> -static inline unsigned int get_mpidr(void)
> -{
> -	unsigned int mpidr;
> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> -	return mpidr;
> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> +static inline uint32_t get_##name(void)					\
> +{									\
> +	uint32_t reg;							\
> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> +		     #opc2 : "=r" (reg));				\
> +	return reg;							\
> +}
> +
> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> +static inline void set_##name(uint32_t value)				\
> +{									\
> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> +		     #opc2 :: "r" (value));				\
                           ^ nit: no space here, checkpatch would complain
> +}
> +
> +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
> +static inline uint64_t get_##name(void)					\
> +{									\
> +	uint32_t lo, hi;						\
> +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
> +		     : "=r" (lo), "=r" (hi));				\
> +	return (uint64_t)hi << 32 | lo;					\
> +}
> +
> +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
> +static inline void set_##name(uint64_t value)				\
> +{									\
> +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
> +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
>  }
>  
> +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
> +
>  /* Only support Aff0 for now, up to 4 cpus */
>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
>  
> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> index 84d5c7c..dfa75eb 100644
> --- a/lib/arm64/asm/processor.h
> +++ b/lib/arm64/asm/processor.h
> @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
>  	return el & 0xc;
>  }
>  
> -#define DEFINE_GET_SYSREG32(reg)				\
> -static inline unsigned int get_##reg(void)			\
> -{								\
> -	unsigned int reg;					\
> -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
> -	return reg;						\
> +#define DEFINE_GET_SYSREG32(reg, el)					\
> +static inline uint32_t get_##reg(void)					\
> +{									\
> +	uint32_t reg;							\
> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> +	return reg;							\
>  }
> -DEFINE_GET_SYSREG32(mpidr)
> +
> +#define DEFINE_SET_SYSREG32(reg, el)					\
> +static inline void set_##reg(uint32_t value)				\
> +{									\
> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> +}
> +
> +#define DEFINE_GET_SYSREG64(reg, el)					\
> +static inline uint64_t get_##reg(void)					\
> +{									\
> +	uint64_t reg;							\
> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> +	return reg;							\
> +}
> +
> +#define DEFINE_SET_SYSREG64(reg, el)					\
> +static inline void set_##reg(uint64_t value)				\
> +{									\
> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> +}
> +
> +DEFINE_GET_SYSREG32(mpidr, el1)

32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
gic series. However changing it to 64-bit with this patch would result
in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
arm32, which won't be nice for common code. Andre brought up during the
review of [1] that we should be using the architectural types for register
accessors. That means, that while internally all the above functions can
know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
the external interfaces should be 'unsigned long', 'unsigned int',
'unsigned long long'.

[1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b

Thanks,
drew

>  
>  /* Only support Aff0 for now, gicv2 only */
>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
@ 2016-12-01  8:59     ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  8:59 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm


Should this be From: Andre?

On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> This patch defines four macros to assist creating system register
> accessors under both ARMv7 and AArch64:
>    * DEFINE_GET_SYSREG32(name, ...)
>    * DEFINE_SET_SYSREG32(name, ...)
>    * DEFINE_GET_SYSREG64(name, ...)
>    * DEFINE_SET_SYSREG64(name, ...)
> These macros are translated to inline functions with consistent naming,
> get_##name() and set_##name(), which can be used by C code directly.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
>  2 files changed, 60 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> index f25e7ee..3ca6b42 100644
> --- a/lib/arm/asm/processor.h
> +++ b/lib/arm/asm/processor.h
> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
>  
>  #define current_mode() (current_cpsr() & MODE_MASK)
>  
> -static inline unsigned int get_mpidr(void)
> -{
> -	unsigned int mpidr;
> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> -	return mpidr;
> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> +static inline uint32_t get_##name(void)					\
> +{									\
> +	uint32_t reg;							\
> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> +		     #opc2 : "=r" (reg));				\
> +	return reg;							\
> +}
> +
> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> +static inline void set_##name(uint32_t value)				\
> +{									\
> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> +		     #opc2 :: "r" (value));				\
                           ^ nit: no space here, checkpatch would complain
> +}
> +
> +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
> +static inline uint64_t get_##name(void)					\
> +{									\
> +	uint32_t lo, hi;						\
> +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
> +		     : "=r" (lo), "=r" (hi));				\
> +	return (uint64_t)hi << 32 | lo;					\
> +}
> +
> +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
> +static inline void set_##name(uint64_t value)				\
> +{									\
> +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
> +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
>  }
>  
> +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
> +
>  /* Only support Aff0 for now, up to 4 cpus */
>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
>  
> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> index 84d5c7c..dfa75eb 100644
> --- a/lib/arm64/asm/processor.h
> +++ b/lib/arm64/asm/processor.h
> @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
>  	return el & 0xc;
>  }
>  
> -#define DEFINE_GET_SYSREG32(reg)				\
> -static inline unsigned int get_##reg(void)			\
> -{								\
> -	unsigned int reg;					\
> -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
> -	return reg;						\
> +#define DEFINE_GET_SYSREG32(reg, el)					\
> +static inline uint32_t get_##reg(void)					\
> +{									\
> +	uint32_t reg;							\
> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> +	return reg;							\
>  }
> -DEFINE_GET_SYSREG32(mpidr)
> +
> +#define DEFINE_SET_SYSREG32(reg, el)					\
> +static inline void set_##reg(uint32_t value)				\
> +{									\
> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> +}
> +
> +#define DEFINE_GET_SYSREG64(reg, el)					\
> +static inline uint64_t get_##reg(void)					\
> +{									\
> +	uint64_t reg;							\
> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> +	return reg;							\
> +}
> +
> +#define DEFINE_SET_SYSREG64(reg, el)					\
> +static inline void set_##reg(uint64_t value)				\
> +{									\
> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> +}
> +
> +DEFINE_GET_SYSREG32(mpidr, el1)

32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
gic series. However changing it to 64-bit with this patch would result
in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
arm32, which won't be nice for common code. Andre brought up during the
review of [1] that we should be using the architectural types for register
accessors. That means, that while internally all the above functions can
know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
the external interfaces should be 'unsigned long', 'unsigned int',
'unsigned long long'.

[1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b

Thanks,
drew

>  
>  /* Only support Aff0 for now, gicv2 only */
>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
@ 2016-12-01  9:03     ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:03 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao

On Wed, Nov 30, 2016 at 11:16:40PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Beginning with a simple sanity check of the control register, add
> a unit test for the ARM Performance Monitors Unit (PMU). PMU register
> was read using the newly defined macros.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/Makefile.common |  3 ++-
>  arm/pmu.c           | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  5 +++++
>  3 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index f37b5c2..5da2fdd 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -12,7 +12,8 @@ endif
>  tests-common = \
>  	$(TEST_DIR)/selftest.flat \
>  	$(TEST_DIR)/spinlock-test.flat \
> -	$(TEST_DIR)/pci-test.flat
> +	$(TEST_DIR)/pci-test.flat \
> +	$(TEST_DIR)/pmu.flat
>  
>  all: test_cases
>  
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 0000000..1fe2b1a
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,62 @@
> +/*
> + * Test the ARM Performance Monitors Unit (PMU).
> + *
> + * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License version 2.1 and
> + * only version 2.1 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
> + * for more details.
> + */
> +#include "libcflat.h"
> +#include "asm/barrier.h"
> +#include "asm/processor.h"
> +
> +#define PMU_PMCR_N_SHIFT   11
> +#define PMU_PMCR_N_MASK    0x1f
> +#define PMU_PMCR_ID_SHIFT  16
> +#define PMU_PMCR_ID_MASK   0xff
> +#define PMU_PMCR_IMP_SHIFT 24
> +#define PMU_PMCR_IMP_MASK  0xff
> +
> +#if defined(__arm__)
> +DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +#elif defined(__aarch64__)
> +DEFINE_GET_SYSREG32(pmcr, el0)
> +#endif
> +
> +/*
> + * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
> + * null. Also print out a couple other interesting fields for diagnostic
> + * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
> + * event counters and therefore reports zero event counters, but hopefully
> + * support for at least the instructions event will be added in the future and
> + * the reported number of event counters will become nonzero.
> + */
> +static bool check_pmcr(void)
> +{
> +	uint32_t pmcr;

So based on my comments from the previous patch, pmcr should be
'unsigned int'

> +
> +	pmcr = get_pmcr();
> +
> +	report_info("PMU implementer/ID code/counters: 0x%x(\"%c\")/0x%x/%d",
> +		    (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK,
> +		    ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) ? : ' ',
> +		    (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK,
> +		    (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
> +
> +	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
> +}
> +
> +int main(void)
> +{
> +	report_prefix_push("pmu");
> +
> +	report("Control register", check_pmcr());
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index ae32a42..816f494 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -58,3 +58,8 @@ groups = selftest
>  [pci-test]
>  file = pci-test.flat
>  groups = pci
> +
> +# Test PMU support
> +[pmu]
> +file = pmu.flat
> +groups = pmu
> -- 
> 1.8.3.1
> 
>

drew 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
@ 2016-12-01  9:03     ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:03 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On Wed, Nov 30, 2016 at 11:16:40PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Beginning with a simple sanity check of the control register, add
> a unit test for the ARM Performance Monitors Unit (PMU). PMU register
> was read using the newly defined macros.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/Makefile.common |  3 ++-
>  arm/pmu.c           | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  5 +++++
>  3 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index f37b5c2..5da2fdd 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -12,7 +12,8 @@ endif
>  tests-common = \
>  	$(TEST_DIR)/selftest.flat \
>  	$(TEST_DIR)/spinlock-test.flat \
> -	$(TEST_DIR)/pci-test.flat
> +	$(TEST_DIR)/pci-test.flat \
> +	$(TEST_DIR)/pmu.flat
>  
>  all: test_cases
>  
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 0000000..1fe2b1a
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,62 @@
> +/*
> + * Test the ARM Performance Monitors Unit (PMU).
> + *
> + * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License version 2.1 and
> + * only version 2.1 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
> + * for more details.
> + */
> +#include "libcflat.h"
> +#include "asm/barrier.h"
> +#include "asm/processor.h"
> +
> +#define PMU_PMCR_N_SHIFT   11
> +#define PMU_PMCR_N_MASK    0x1f
> +#define PMU_PMCR_ID_SHIFT  16
> +#define PMU_PMCR_ID_MASK   0xff
> +#define PMU_PMCR_IMP_SHIFT 24
> +#define PMU_PMCR_IMP_MASK  0xff
> +
> +#if defined(__arm__)
> +DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +#elif defined(__aarch64__)
> +DEFINE_GET_SYSREG32(pmcr, el0)
> +#endif
> +
> +/*
> + * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
> + * null. Also print out a couple other interesting fields for diagnostic
> + * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
> + * event counters and therefore reports zero event counters, but hopefully
> + * support for at least the instructions event will be added in the future and
> + * the reported number of event counters will become nonzero.
> + */
> +static bool check_pmcr(void)
> +{
> +	uint32_t pmcr;

So based on my comments from the previous patch, pmcr should be
'unsigned int'

> +
> +	pmcr = get_pmcr();
> +
> +	report_info("PMU implementer/ID code/counters: 0x%x(\"%c\")/0x%x/%d",
> +		    (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK,
> +		    ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) ? : ' ',
> +		    (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK,
> +		    (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
> +
> +	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
> +}
> +
> +int main(void)
> +{
> +	report_prefix_push("pmu");
> +
> +	report("Control register", check_pmcr());
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index ae32a42..816f494 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -58,3 +58,8 @@ groups = selftest
>  [pci-test]
>  file = pci-test.flat
>  groups = pci
> +
> +# Test PMU support
> +[pmu]
> +file = pmu.flat
> +groups = pmu
> -- 
> 1.8.3.1
> 
>

drew 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
@ 2016-12-01  9:18     ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:18 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao

On Wed, Nov 30, 2016 at 11:16:41PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 1fe2b1a..3566a27 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -16,6 +16,9 @@
>  #include "asm/barrier.h"
>  #include "asm/processor.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
> +#define PMU_PMCR_LC        (1 << 6)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -23,10 +26,57 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define ID_DFR0_PERFMON_SHIFT 24
> +#define ID_DFR0_PERFMON_MASK  0xf
> +
> +#define PMU_CYCLE_IDX         31
> +
> +#define NR_SAMPLES 10
> +
> +static unsigned int pmu_version;
>  #if defined(__arm__)
>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)

Seeing how we get lots of redundant looking lines, I think instead
of defining DEFINE_SET/GET_SYSREG32/64, we should instead have

DEFINE_SYSREG32/64      ... creates both get_<reg> and set_<reg>
DEFINE_SYSREG32/64_RO   ... creates just get_<reg>

> +
> +static inline uint64_t get_pmccntr(void)
> +{
> +	if (pmu_version == 0x3)
> +		return get_pmccntr64();
> +	else
> +		return get_pmccntr32();
> +}
> +
> +static inline void set_pmccntr(uint64_t value)
> +{
> +	if (pmu_version == 0x3)
> +		set_pmccntr64(value);
> +	else
> +		set_pmccntr32(value & 0xffffffff);
> +}

So the two accessors above are exceptional, which is why we don't
use SYSREG for them. These can have uint64_t for there external
interface. We can't require 'unsigned long' or 'unsigned long long'

> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void set_pmccfiltr(uint32_t value)
> +{
> +	set_pmselr(PMU_CYCLE_IDX);
> +	set_pmxevtyper(value);
> +	isb();
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
> +DEFINE_SET_SYSREG32(pmcr, el0)
> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> +DEFINE_GET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>  #endif
>  
>  /*
> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	bool success = true;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +	set_pmccntr(0);
> +
> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		uint64_t a, b;
> +
> +		a = get_pmccntr();
> +		b = get_pmccntr();
> +
> +		if (a >= b) {
> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> +			success = false;
> +			break;
> +		}
> +	}
> +
> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> +
> +	return success;
> +}
> +
> +void pmu_init(void)
> +{
> +	uint32_t dfr0;
> +
> +	/* probe pmu version */
> +	dfr0 = get_id_dfr0();
> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> +	report_info("PMU version: %d", pmu_version);
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	pmu_init();
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());
>  
>  	return report_summary();
>  }
> -- 
> 1.8.3.1
> 
>

drew 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
@ 2016-12-01  9:18     ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:18 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On Wed, Nov 30, 2016 at 11:16:41PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 1fe2b1a..3566a27 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -16,6 +16,9 @@
>  #include "asm/barrier.h"
>  #include "asm/processor.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
> +#define PMU_PMCR_LC        (1 << 6)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -23,10 +26,57 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define ID_DFR0_PERFMON_SHIFT 24
> +#define ID_DFR0_PERFMON_MASK  0xf
> +
> +#define PMU_CYCLE_IDX         31
> +
> +#define NR_SAMPLES 10
> +
> +static unsigned int pmu_version;
>  #if defined(__arm__)
>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)

Seeing how we get lots of redundant looking lines, I think instead
of defining DEFINE_SET/GET_SYSREG32/64, we should instead have

DEFINE_SYSREG32/64      ... creates both get_<reg> and set_<reg>
DEFINE_SYSREG32/64_RO   ... creates just get_<reg>

> +
> +static inline uint64_t get_pmccntr(void)
> +{
> +	if (pmu_version == 0x3)
> +		return get_pmccntr64();
> +	else
> +		return get_pmccntr32();
> +}
> +
> +static inline void set_pmccntr(uint64_t value)
> +{
> +	if (pmu_version == 0x3)
> +		set_pmccntr64(value);
> +	else
> +		set_pmccntr32(value & 0xffffffff);
> +}

So the two accessors above are exceptional, which is why we don't
use SYSREG for them. These can have uint64_t for there external
interface. We can't require 'unsigned long' or 'unsigned long long'

> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void set_pmccfiltr(uint32_t value)
> +{
> +	set_pmselr(PMU_CYCLE_IDX);
> +	set_pmxevtyper(value);
> +	isb();
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
> +DEFINE_SET_SYSREG32(pmcr, el0)
> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> +DEFINE_GET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>  #endif
>  
>  /*
> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	bool success = true;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +	set_pmccntr(0);
> +
> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		uint64_t a, b;
> +
> +		a = get_pmccntr();
> +		b = get_pmccntr();
> +
> +		if (a >= b) {
> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> +			success = false;
> +			break;
> +		}
> +	}
> +
> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> +
> +	return success;
> +}
> +
> +void pmu_init(void)
> +{
> +	uint32_t dfr0;
> +
> +	/* probe pmu version */
> +	dfr0 = get_id_dfr0();
> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> +	report_info("PMU version: %d", pmu_version);
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	pmu_init();
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());
>  
>  	return report_summary();
>  }
> -- 
> 1.8.3.1
> 
>

drew 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
  (?)
@ 2016-12-01  9:26   ` Andrew Jones
  2016-12-01 10:19     ` Andre Przywara
  -1 siblings, 1 reply; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:26 UTC (permalink / raw)
  To: Wei Huang
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm

On Wed, Nov 30, 2016 at 11:16:42PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 +++++++
>  2 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 3566a27..29d7c2c 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>  	set_pmxevtyper(value);
>  	isb();
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions were inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed in
> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	bgt	1b\n"
> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
>  DEFINE_SET_SYSREG32(pmcr, el0)
> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions are inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed
> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>  	return success;
>  }
>  
> +/*
> + * Execute a known number of guest instructions. Only even instruction counts
> + * greater than or equal to 4 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int loop = (num - 2) / 2;
> +
> +	assert(num >= 4 && ((num - 2) % 2 == 0));
> +	precise_instrs_loop(loop, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");
> +
> +	for (unsigned int i = 4; i < 300; i += 32) {
> +		uint64_t avg, sum = 0;
> +
> +		printf("%d :", i);
> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			uint64_t cycles;
> +
> +			set_pmccntr(0);
> +			measure_instrs(i, pmcr);
> +			cycles = get_pmccntr();
> +			printf(" %"PRId64"", cycles);
> +
> +			if (!cycles) {
> +				printf("\ncycles not incrementing!\n");
> +				return false;
> +			} else if (cpi > 0 && cycles != i * cpi) {
> +				printf("\nunexpected cycle count received!\n");
> +				return false;
> +			} else if ((cycles >> 32) != 0) {
> +				/* The cycles taken by the loop above should
> +				 * fit in 32 bits easily. We check the upper
> +				 * 32 bits of the cycle counter to make sure
> +				 * there is no supprise. */
> +				printf("\ncycle count bigger than 32bit!\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> +	}
> +
> +	return true;
> +}
> +
>  void pmu_init(void)
>  {
>  	uint32_t dfr0;
> @@ -144,13 +259,19 @@ void pmu_init(void)
>  	report_info("PMU version: %d", pmu_version);
>  }
>  
> -int main(void)
> +int main(int argc, char *argv[])
>  {
> +	int cpi = 0;
> +
> +	if (argc > 1)
> +		cpi = atol(argv[1]);
> +
>  	report_prefix_push("pmu");
>  
>  	pmu_init();
>  	report("Control register", check_pmcr());
>  	report("Monotonically increasing cycle count", check_cycles_increase());
> +	report("Cycle/instruction ratio", check_cpi(cpi));
>  
>  	return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 816f494..044d97c 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -63,3 +63,17 @@ groups = pci
>  [pmu]
>  file = pmu.flat
>  groups = pmu
> +
> +# Test PMU support (TCG) with -icount IPC=1
> +[pmu-tcg-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +accel = tcg
> +
> +# Test PMU support (TCG) with -icount IPC=256
> +[pmu-tcg-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
> +groups = pmu
> +accel = tcg
> -- 
> 1.8.3.1
> 
>

As we work out how best to handle tcg-only tests in order to get Alex
Bennee's MTTCG tests merged, we'll probably revisit this file, factoring
out common PMU code and pulling the tcg-only code out to its own unit
test file (maybe arm/tcg/pmu.c). But that's future work.

Thanks,
drew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01  8:59     ` Andrew Jones
@ 2016-12-01  9:38       ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:38 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 09:59:03AM +0100, Andrew Jones wrote:
> 
> Should this be From: Andre?
> 
> On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> > This patch defines four macros to assist creating system register
> > accessors under both ARMv7 and AArch64:
> >    * DEFINE_GET_SYSREG32(name, ...)
> >    * DEFINE_SET_SYSREG32(name, ...)
> >    * DEFINE_GET_SYSREG64(name, ...)
> >    * DEFINE_SET_SYSREG64(name, ...)
> > These macros are translated to inline functions with consistent naming,
> > get_##name() and set_##name(), which can be used by C code directly.
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Wei Huang <wei@redhat.com>
> > ---
> >  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
> >  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
> >  2 files changed, 60 insertions(+), 12 deletions(-)
> > 
> > diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> > index f25e7ee..3ca6b42 100644
> > --- a/lib/arm/asm/processor.h
> > +++ b/lib/arm/asm/processor.h
> > @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
> >  
> >  #define current_mode() (current_cpsr() & MODE_MASK)
> >  
> > -static inline unsigned int get_mpidr(void)
> > -{
> > -	unsigned int mpidr;
> > -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> > -	return mpidr;
> > +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> > +static inline uint32_t get_##name(void)					\
> > +{									\
> > +	uint32_t reg;							\
> > +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> > +		     #opc2 : "=r" (reg));				\
> > +	return reg;							\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> > +static inline void set_##name(uint32_t value)				\
> > +{									\
> > +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> > +		     #opc2 :: "r" (value));				\
>                            ^ nit: no space here, checkpatch would complain
> > +}
> > +
> > +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
> > +static inline uint64_t get_##name(void)					\
> > +{									\
> > +	uint32_t lo, hi;						\
> > +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
> > +		     : "=r" (lo), "=r" (hi));				\
> > +	return (uint64_t)hi << 32 | lo;					\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
> > +static inline void set_##name(uint64_t value)				\
> > +{									\
> > +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
> > +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
> >  }
> >  
> > +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
> > +
> >  /* Only support Aff0 for now, up to 4 cpus */
> >  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> >  
> > diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> > index 84d5c7c..dfa75eb 100644
> > --- a/lib/arm64/asm/processor.h
> > +++ b/lib/arm64/asm/processor.h
> > @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
> >  	return el & 0xc;
> >  }
> >  
> > -#define DEFINE_GET_SYSREG32(reg)				\
> > -static inline unsigned int get_##reg(void)			\
> > -{								\
> > -	unsigned int reg;					\
> > -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
> > -	return reg;						\
> > +#define DEFINE_GET_SYSREG32(reg, el)					\
> > +static inline uint32_t get_##reg(void)					\
> > +{									\
> > +	uint32_t reg;							\
> > +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> > +	return reg;							\
> >  }
> > -DEFINE_GET_SYSREG32(mpidr)
> > +
> > +#define DEFINE_SET_SYSREG32(reg, el)					\
> > +static inline void set_##reg(uint32_t value)				\
> > +{									\
> > +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> > +}

Another comment for arm64's SYSREG32 accessors. Technically there's no
32-bit register in AArch64, so the asm should always use an unsigned long
and explicit casts. That's what I did in [1].

> > +
> > +#define DEFINE_GET_SYSREG64(reg, el)					\
> > +static inline uint64_t get_##reg(void)					\
> > +{									\
> > +	uint64_t reg;							\
> > +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> > +	return reg;							\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG64(reg, el)					\
> > +static inline void set_##reg(uint64_t value)				\
> > +{									\
> > +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> > +}
> > +
> > +DEFINE_GET_SYSREG32(mpidr, el1)
> 
> 32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
> gic series. However changing it to 64-bit with this patch would result
> in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
> arm32, which won't be nice for common code. Andre brought up during the
> review of [1] that we should be using the architectural types for register
> accessors. That means, that while internally all the above functions can
> know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
> the external interfaces should be 'unsigned long', 'unsigned int',
> 'unsigned long long'.
> 
> [1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b
> 
> Thanks,
> drew
> 
> >  
> >  /* Only support Aff0 for now, gicv2 only */
> >  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> > -- 
> > 1.8.3.1
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
@ 2016-12-01  9:38       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01  9:38 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, cov, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 09:59:03AM +0100, Andrew Jones wrote:
> 
> Should this be From: Andre?
> 
> On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> > This patch defines four macros to assist creating system register
> > accessors under both ARMv7 and AArch64:
> >    * DEFINE_GET_SYSREG32(name, ...)
> >    * DEFINE_SET_SYSREG32(name, ...)
> >    * DEFINE_GET_SYSREG64(name, ...)
> >    * DEFINE_SET_SYSREG64(name, ...)
> > These macros are translated to inline functions with consistent naming,
> > get_##name() and set_##name(), which can be used by C code directly.
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Wei Huang <wei@redhat.com>
> > ---
> >  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
> >  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
> >  2 files changed, 60 insertions(+), 12 deletions(-)
> > 
> > diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> > index f25e7ee..3ca6b42 100644
> > --- a/lib/arm/asm/processor.h
> > +++ b/lib/arm/asm/processor.h
> > @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
> >  
> >  #define current_mode() (current_cpsr() & MODE_MASK)
> >  
> > -static inline unsigned int get_mpidr(void)
> > -{
> > -	unsigned int mpidr;
> > -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> > -	return mpidr;
> > +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> > +static inline uint32_t get_##name(void)					\
> > +{									\
> > +	uint32_t reg;							\
> > +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> > +		     #opc2 : "=r" (reg));				\
> > +	return reg;							\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> > +static inline void set_##name(uint32_t value)				\
> > +{									\
> > +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> > +		     #opc2 :: "r" (value));				\
>                            ^ nit: no space here, checkpatch would complain
> > +}
> > +
> > +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
> > +static inline uint64_t get_##name(void)					\
> > +{									\
> > +	uint32_t lo, hi;						\
> > +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
> > +		     : "=r" (lo), "=r" (hi));				\
> > +	return (uint64_t)hi << 32 | lo;					\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
> > +static inline void set_##name(uint64_t value)				\
> > +{									\
> > +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
> > +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
> >  }
> >  
> > +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
> > +
> >  /* Only support Aff0 for now, up to 4 cpus */
> >  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> >  
> > diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> > index 84d5c7c..dfa75eb 100644
> > --- a/lib/arm64/asm/processor.h
> > +++ b/lib/arm64/asm/processor.h
> > @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
> >  	return el & 0xc;
> >  }
> >  
> > -#define DEFINE_GET_SYSREG32(reg)				\
> > -static inline unsigned int get_##reg(void)			\
> > -{								\
> > -	unsigned int reg;					\
> > -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
> > -	return reg;						\
> > +#define DEFINE_GET_SYSREG32(reg, el)					\
> > +static inline uint32_t get_##reg(void)					\
> > +{									\
> > +	uint32_t reg;							\
> > +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> > +	return reg;							\
> >  }
> > -DEFINE_GET_SYSREG32(mpidr)
> > +
> > +#define DEFINE_SET_SYSREG32(reg, el)					\
> > +static inline void set_##reg(uint32_t value)				\
> > +{									\
> > +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> > +}

Another comment for arm64's SYSREG32 accessors. Technically there's no
32-bit register in AArch64, so the asm should always use an unsigned long
and explicit casts. That's what I did in [1].

> > +
> > +#define DEFINE_GET_SYSREG64(reg, el)					\
> > +static inline uint64_t get_##reg(void)					\
> > +{									\
> > +	uint64_t reg;							\
> > +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> > +	return reg;							\
> > +}
> > +
> > +#define DEFINE_SET_SYSREG64(reg, el)					\
> > +static inline void set_##reg(uint64_t value)				\
> > +{									\
> > +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> > +}
> > +
> > +DEFINE_GET_SYSREG32(mpidr, el1)
> 
> 32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
> gic series. However changing it to 64-bit with this patch would result
> in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
> arm32, which won't be nice for common code. Andre brought up during the
> review of [1] that we should be using the architectural types for register
> accessors. That means, that while internally all the above functions can
> know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
> the external interfaces should be 'unsigned long', 'unsigned int',
> 'unsigned long long'.
> 
> [1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b
> 
> Thanks,
> drew
> 
> >  
> >  /* Only support Aff0 for now, gicv2 only */
> >  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> > -- 
> > 1.8.3.1
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01  9:26   ` Andrew Jones
@ 2016-12-01 10:19     ` Andre Przywara
  2016-12-01 13:47       ` Andrew Jones
  0 siblings, 1 reply; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 10:19 UTC (permalink / raw)
  To: Andrew Jones, Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

Hi Drew,

actually unrelated to this actual patch, but since you mentioned it:

> As we work out how best to handle tcg-only tests in order to get Alex
> Bennee's MTTCG tests merged, we'll probably revisit this file,

So when I was experimenting with kvmtool, I realized that some tests
would need to be QEMU only. Also I was tempted to try some tests either
on bare metal machines or in a Fast Model.
So I wonder if we should have some constraints or tags on the tests, so
that a certain backend can filter on this and skip if it's not capable?

Just wanted to mention this so that we can use this refactoring
opportunity to come up with something more generic than just a boolean
TGC vs. KVM.
Maybe we should introduce the notion of a "test backend"? That could be
QEMU/KVM and TCG for now, but later extended to cover kvmtool and
probably other hypervisors like Xen as well.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01  8:59     ` Andrew Jones
  (?)
  (?)
@ 2016-12-01 11:11     ` Andre Przywara
  2016-12-01 13:16       ` Andrew Jones
  -1 siblings, 1 reply; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 11:11 UTC (permalink / raw)
  To: Andrew Jones, Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

Hi,

On 01/12/16 08:59, Andrew Jones wrote:
> 
> Should this be From: Andre?

No need from my side, this way all the bug reports are send to Wei ;-)

> On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
>> This patch defines four macros to assist creating system register
>> accessors under both ARMv7 and AArch64:
>>    * DEFINE_GET_SYSREG32(name, ...)
>>    * DEFINE_SET_SYSREG32(name, ...)
>>    * DEFINE_GET_SYSREG64(name, ...)
>>    * DEFINE_SET_SYSREG64(name, ...)
>> These macros are translated to inline functions with consistent naming,
>> get_##name() and set_##name(), which can be used by C code directly.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
>>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
>>  2 files changed, 60 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>> index f25e7ee..3ca6b42 100644
>> --- a/lib/arm/asm/processor.h
>> +++ b/lib/arm/asm/processor.h
>> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
>>  
>>  #define current_mode() (current_cpsr() & MODE_MASK)
>>  
>> -static inline unsigned int get_mpidr(void)
>> -{
>> -	unsigned int mpidr;
>> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
>> -	return mpidr;
>> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline uint32_t get_##name(void)					\
>> +{									\
>> +	uint32_t reg;							\
>> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 : "=r" (reg));				\
>> +	return reg;							\
>> +}
>> +
>> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline void set_##name(uint32_t value)				\
>> +{									\
>> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 :: "r" (value));				\
>                            ^ nit: no space here, checkpatch would complain
>> +}
>> +
>> +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
>> +static inline uint64_t get_##name(void)					\
>> +{									\
>> +	uint32_t lo, hi;						\
>> +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
>> +		     : "=r" (lo), "=r" (hi));				\
>> +	return (uint64_t)hi << 32 | lo;					\
>> +}
>> +
>> +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
>> +static inline void set_##name(uint64_t value)				\
>> +{									\
>> +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
>> +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
>>  }
>>  
>> +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
>> +
>>  /* Only support Aff0 for now, up to 4 cpus */
>>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
>>  
>> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
>> index 84d5c7c..dfa75eb 100644
>> --- a/lib/arm64/asm/processor.h
>> +++ b/lib/arm64/asm/processor.h
>> @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
>>  	return el & 0xc;
>>  }
>>  
>> -#define DEFINE_GET_SYSREG32(reg)				\
>> -static inline unsigned int get_##reg(void)			\
>> -{								\
>> -	unsigned int reg;					\
>> -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
>> -	return reg;						\
>> +#define DEFINE_GET_SYSREG32(reg, el)					\
>> +static inline uint32_t get_##reg(void)					\
>> +{									\
>> +	uint32_t reg;							\
>> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
>> +	return reg;							\
>>  }
>> -DEFINE_GET_SYSREG32(mpidr)
>> +
>> +#define DEFINE_SET_SYSREG32(reg, el)					\
>> +static inline void set_##reg(uint32_t value)				\
>> +{									\
>> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
>> +}
>> +
>> +#define DEFINE_GET_SYSREG64(reg, el)					\
>> +static inline uint64_t get_##reg(void)					\
>> +{									\
>> +	uint64_t reg;							\
>> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
>> +	return reg;							\
>> +}
>> +
>> +#define DEFINE_SET_SYSREG64(reg, el)					\
>> +static inline void set_##reg(uint64_t value)				\
>> +{									\
>> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
>> +}
>> +
>> +DEFINE_GET_SYSREG32(mpidr, el1)
> 
> 32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
> gic series.

So how are we actually handling this? Waiting for the GIC series to be
merged, then rebasing on top of that (which is what I thought we'd do)?

Or make a combined patch (taking it out of both series), and merge it
before?

> However changing it to 64-bit with this patch would result
> in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
> arm32, which won't be nice for common code. Andre brought up during the
> review of [1] that we should be using the architectural types for register
> accessors.

At least for registers that differ in size between A32 and A64. Many
system registers are actually 32-bit wide and are explicitly stated as
such in the ARM(v8) ARM (for instance MIDR_EL1).
Yes, the A64 msr/mrs instructions only know a 64-bit register encoding,
but the actual content is often confined to 32 bits (in MIDR or PMCR,
for instance).
So I wonder if we should take care of those with an explicit uint32_t
return type?

> That means, that while internally all the above functions can
> know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
> the external interfaces should be 'unsigned long', 'unsigned int',
> 'unsigned long long'.

Not so sure about that.
I think we may need _three_ types of system register accessors?
1) always 32-bit (e.g. MIDR_EL1): use uint32_t
2) always 64-bit (e.g. CNTVCT_EL0),: use uint64_t
3) natural register size (e.g. MPIDR_EL1): use unsigned long

Does that make sense or is that overkill?

Cheers,
Andre.

> [1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b
> 
> Thanks,
> drew
> 
>>  
>>  /* Only support Aff0 for now, gicv2 only */
>>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
>> -- 
>> 1.8.3.1
>>
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
@ 2016-12-01 11:27     ` Andre Przywara
  -1 siblings, 0 replies; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 11:27 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

Hi,

On 01/12/16 05:16, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 1fe2b1a..3566a27 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -16,6 +16,9 @@
>  #include "asm/barrier.h"
>  #include "asm/processor.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
> +#define PMU_PMCR_LC        (1 << 6)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -23,10 +26,57 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define ID_DFR0_PERFMON_SHIFT 24
> +#define ID_DFR0_PERFMON_MASK  0xf
> +
> +#define PMU_CYCLE_IDX         31
> +
> +#define NR_SAMPLES 10
> +
> +static unsigned int pmu_version;
>  #if defined(__arm__)
>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
> +
> +static inline uint64_t get_pmccntr(void)
> +{
> +	if (pmu_version == 0x3)
> +		return get_pmccntr64();
> +	else
> +		return get_pmccntr32();
> +}
> +
> +static inline void set_pmccntr(uint64_t value)
> +{
> +	if (pmu_version == 0x3)
> +		set_pmccntr64(value);
> +	else
> +		set_pmccntr32(value & 0xffffffff);
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void set_pmccfiltr(uint32_t value)
> +{
> +	set_pmselr(PMU_CYCLE_IDX);
> +	set_pmxevtyper(value);
> +	isb();
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
> +DEFINE_SET_SYSREG32(pmcr, el0)
> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> +DEFINE_GET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>  #endif
>  
>  /*
> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	bool success = true;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +	set_pmccntr(0);

Why do we need this? Shouldn't PMU_PMCR_C below take care of that?

> +
> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		uint64_t a, b;
> +
> +		a = get_pmccntr();
> +		b = get_pmccntr();
> +
> +		if (a >= b) {
> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> +			success = false;
> +			break;
> +		}
> +	}
> +
> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> +
> +	return success;
> +}
> +
> +void pmu_init(void)

Mmh, this function doesn't really initialize anything, does it?
Should it be named pmu_available() or pmu_version() or the like?

And should we bail out early here (or rather at the caller) if this
register reports that no PMU is available? For instance by making it
return a boolean?

> +{
> +	uint32_t dfr0;
> +
> +	/* probe pmu version */
> +	dfr0 = get_id_dfr0();
> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> +	report_info("PMU version: %d", pmu_version);
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	pmu_init();
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());

I wonder if we should skip this test if check_pmcr() has returned false
before? We let it return a boolean, so it seems quite natural to use
this information here.
This would avoid a lot of false FAILs due to the PMU not being available
(because QEMU is too old, for instance).

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
@ 2016-12-01 11:27     ` Andre Przywara
  0 siblings, 0 replies; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 11:27 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

Hi,

On 01/12/16 05:16, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 1fe2b1a..3566a27 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -16,6 +16,9 @@
>  #include "asm/barrier.h"
>  #include "asm/processor.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
> +#define PMU_PMCR_LC        (1 << 6)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -23,10 +26,57 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define ID_DFR0_PERFMON_SHIFT 24
> +#define ID_DFR0_PERFMON_MASK  0xf
> +
> +#define PMU_CYCLE_IDX         31
> +
> +#define NR_SAMPLES 10
> +
> +static unsigned int pmu_version;
>  #if defined(__arm__)
>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
> +
> +static inline uint64_t get_pmccntr(void)
> +{
> +	if (pmu_version == 0x3)
> +		return get_pmccntr64();
> +	else
> +		return get_pmccntr32();
> +}
> +
> +static inline void set_pmccntr(uint64_t value)
> +{
> +	if (pmu_version == 0x3)
> +		set_pmccntr64(value);
> +	else
> +		set_pmccntr32(value & 0xffffffff);
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void set_pmccfiltr(uint32_t value)
> +{
> +	set_pmselr(PMU_CYCLE_IDX);
> +	set_pmxevtyper(value);
> +	isb();
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
> +DEFINE_SET_SYSREG32(pmcr, el0)
> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> +DEFINE_GET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG64(pmccntr, el0);
> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>  #endif
>  
>  /*
> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	bool success = true;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +	set_pmccntr(0);

Why do we need this? Shouldn't PMU_PMCR_C below take care of that?

> +
> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		uint64_t a, b;
> +
> +		a = get_pmccntr();
> +		b = get_pmccntr();
> +
> +		if (a >= b) {
> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> +			success = false;
> +			break;
> +		}
> +	}
> +
> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> +
> +	return success;
> +}
> +
> +void pmu_init(void)

Mmh, this function doesn't really initialize anything, does it?
Should it be named pmu_available() or pmu_version() or the like?

And should we bail out early here (or rather at the caller) if this
register reports that no PMU is available? For instance by making it
return a boolean?

> +{
> +	uint32_t dfr0;
> +
> +	/* probe pmu version */
> +	dfr0 = get_id_dfr0();
> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> +	report_info("PMU version: %d", pmu_version);
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	pmu_init();
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());

I wonder if we should skip this test if check_pmcr() has returned false
before? We let it return a boolean, so it seems quite natural to use
this information here.
This would avoid a lot of false FAILs due to the PMU not being available
(because QEMU is too old, for instance).

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01  9:03     ` Andrew Jones
  (?)
@ 2016-12-01 11:28     ` Andre Przywara
  2016-12-01 12:02         ` Peter Maydell
  -1 siblings, 1 reply; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 11:28 UTC (permalink / raw)
  To: Andrew Jones, Wei Huang
  Cc: cov, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

Hi,

On 01/12/16 09:03, Andrew Jones wrote:
> On Wed, Nov 30, 2016 at 11:16:40PM -0600, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Beginning with a simple sanity check of the control register, add
>> a unit test for the ARM Performance Monitors Unit (PMU). PMU register
>> was read using the newly defined macros.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/Makefile.common |  3 ++-
>>  arm/pmu.c           | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  arm/unittests.cfg   |  5 +++++
>>  3 files changed, 69 insertions(+), 1 deletion(-)
>>  create mode 100644 arm/pmu.c
>>
>> diff --git a/arm/Makefile.common b/arm/Makefile.common
>> index f37b5c2..5da2fdd 100644
>> --- a/arm/Makefile.common
>> +++ b/arm/Makefile.common
>> @@ -12,7 +12,8 @@ endif
>>  tests-common = \
>>  	$(TEST_DIR)/selftest.flat \
>>  	$(TEST_DIR)/spinlock-test.flat \
>> -	$(TEST_DIR)/pci-test.flat
>> +	$(TEST_DIR)/pci-test.flat \
>> +	$(TEST_DIR)/pmu.flat
>>  
>>  all: test_cases
>>  
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> new file mode 100644
>> index 0000000..1fe2b1a
>> --- /dev/null
>> +++ b/arm/pmu.c
>> @@ -0,0 +1,62 @@
>> +/*
>> + * Test the ARM Performance Monitors Unit (PMU).
>> + *
>> + * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU Lesser General Public License version 2.1 and
>> + * only version 2.1 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
>> + * for more details.
>> + */
>> +#include "libcflat.h"
>> +#include "asm/barrier.h"
>> +#include "asm/processor.h"
>> +
>> +#define PMU_PMCR_N_SHIFT   11
>> +#define PMU_PMCR_N_MASK    0x1f
>> +#define PMU_PMCR_ID_SHIFT  16
>> +#define PMU_PMCR_ID_MASK   0xff
>> +#define PMU_PMCR_IMP_SHIFT 24
>> +#define PMU_PMCR_IMP_MASK  0xff
>> +
>> +#if defined(__arm__)
>> +DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +#elif defined(__aarch64__)
>> +DEFINE_GET_SYSREG32(pmcr, el0)
>> +#endif
>> +
>> +/*
>> + * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
>> + * null. Also print out a couple other interesting fields for diagnostic
>> + * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
>> + * event counters and therefore reports zero event counters, but hopefully
>> + * support for at least the instructions event will be added in the future and
>> + * the reported number of event counters will become nonzero.
>> + */
>> +static bool check_pmcr(void)
>> +{
>> +	uint32_t pmcr;
> 
> So based on my comments from the previous patch, pmcr should be
> 'unsigned int'

I don't think so. At least here as a _variable_ type uint32_t is
probably the right one, as the ARMv8 ARM explicitly says that PMCR is a
32-bit register, for both bitnesses. I find it only natural to express
it here accordingly.
I believe this is a different (though related) discussion from the
return type of the accessor functions.

Cheers,
Andre.

>> +
>> +	pmcr = get_pmcr();
>> +
>> +	report_info("PMU implementer/ID code/counters: 0x%x(\"%c\")/0x%x/%d",
>> +		    (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK,
>> +		    ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) ? : ' ',
>> +		    (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK,
>> +		    (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
>> +
>> +	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>> +}
>> +
>> +int main(void)
>> +{
>> +	report_prefix_push("pmu");
>> +
>> +	report("Control register", check_pmcr());
>> +
>> +	return report_summary();
>> +}
>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>> index ae32a42..816f494 100644
>> --- a/arm/unittests.cfg
>> +++ b/arm/unittests.cfg
>> @@ -58,3 +58,8 @@ groups = selftest
>>  [pci-test]
>>  file = pci-test.flat
>>  groups = pci
>> +
>> +# Test PMU support
>> +[pmu]
>> +file = pmu.flat
>> +groups = pmu
>> -- 
>> 1.8.3.1
>>
>>
> 
> drew 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01 11:28     ` Andre Przywara
@ 2016-12-01 12:02         ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2016-12-01 12:02 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Aaron Lindsay, kvm-devel, croberts, QEMU Developers,
	Alistair Francis, Shannon Zhao, kvmarm

On 1 December 2016 at 11:28, Andre Przywara <andre.przywara@arm.com> wrote:
> I don't think so. At least here as a _variable_ type uint32_t is
> probably the right one, as the ARMv8 ARM explicitly says that PMCR is a
> 32-bit register, for both bitnesses.

For 64-bit ARM this is strictly speaking just shorthand for "64-bit
register with the top 32-bit being RES0". It is in theory possible that
a future architecture extension might define uses for those RES0
bits.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
@ 2016-12-01 12:02         ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2016-12-01 12:02 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Andrew Jones, Wei Huang, Aaron Lindsay, kvm-devel, croberts,
	QEMU Developers, Alistair Francis, kvmarm, Shannon Zhao

On 1 December 2016 at 11:28, Andre Przywara <andre.przywara@arm.com> wrote:
> I don't think so. At least here as a _variable_ type uint32_t is
> probably the right one, as the ARMv8 ARM explicitly says that PMCR is a
> 32-bit register, for both bitnesses.

For 64-bit ARM this is strictly speaking just shorthand for "64-bit
register with the top 32-bit being RES0". It is in theory possible that
a future architecture extension might define uses for those RES0
bits.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01 12:02         ` Peter Maydell
  (?)
@ 2016-12-01 12:19         ` Andre Przywara
  2016-12-01 12:36           ` Peter Maydell
  -1 siblings, 1 reply; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 12:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Andrew Jones, Wei Huang, Aaron Lindsay, kvm-devel, croberts,
	QEMU Developers, Alistair Francis, kvmarm, Shannon Zhao

Hi,

On 01/12/16 12:02, Peter Maydell wrote:
> On 1 December 2016 at 11:28, Andre Przywara <andre.przywara@arm.com> wrote:
>> I don't think so. At least here as a _variable_ type uint32_t is
>> probably the right one, as the ARMv8 ARM explicitly says that PMCR is a
>> 32-bit register, for both bitnesses.
> 
> For 64-bit ARM this is strictly speaking just shorthand for "64-bit
> register with the top 32-bit being RES0". It is in theory possible that
> a future architecture extension might define uses for those RES0
> bits.

I trade: "in theory possible that a future architecture extension might"
(that's four speculative terms, right?) against:

ARMv8 ARM, D7.4.7  PMCR_EL0, Performance Monitors Control Register:
Attributes
            PMCR_EL0 is a 32-bit register.

If this ever gets extended, we would need extra code to deal with the
new bits, so we would need to touch the code anyway. And again, it's
just a local variable, not an interface.

Cheers,
Andre.

P.S. We really should save this discussion for a Friday afternoon ;-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test
  2016-12-01 12:19         ` Andre Przywara
@ 2016-12-01 12:36           ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2016-12-01 12:36 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Andrew Jones, Wei Huang, Aaron Lindsay, kvm-devel, croberts,
	QEMU Developers, Alistair Francis, kvmarm, Shannon Zhao

On 1 December 2016 at 12:19, Andre Przywara <andre.przywara@arm.com> wrote:
> Hi,
>
> On 01/12/16 12:02, Peter Maydell wrote:
>> On 1 December 2016 at 11:28, Andre Przywara <andre.przywara@arm.com> wrote:
>>> I don't think so. At least here as a _variable_ type uint32_t is
>>> probably the right one, as the ARMv8 ARM explicitly says that PMCR is a
>>> 32-bit register, for both bitnesses.
>>
>> For 64-bit ARM this is strictly speaking just shorthand for "64-bit
>> register with the top 32-bit being RES0". It is in theory possible that
>> a future architecture extension might define uses for those RES0
>> bits.
>
> I trade: "in theory possible that a future architecture extension might"
> (that's four speculative terms, right?) against:
>
> ARMv8 ARM, D7.4.7  PMCR_EL0, Performance Monitors Control Register:
> Attributes
>             PMCR_EL0 is a 32-bit register.

As I say, this just means "64 bit with 32 RES0 bits". See DDI0487A.k
C5.1.1 "System register width":

# In AArch64 state, each encoding in the System instruction space can
# provide access to a 64-bit register. An AArch64 System register is
# described as either a 32-bit register or a 64-bit register. For
# a 32-bit register, the upper bits, bits[63:32], are RES0.

(ie the register is 64-bits, it's just "described as" 32-bits for
convenience.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01 11:11     ` Andre Przywara
@ 2016-12-01 13:16       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01 13:16 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Wei Huang, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	cov, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 11:11:55AM +0000, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 08:59, Andrew Jones wrote:
> > 
> > Should this be From: Andre?
> 
> No need from my side, this way all the bug reports are send to Wei ;-)
> 
> > On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> >> This patch defines four macros to assist creating system register
> >> accessors under both ARMv7 and AArch64:
> >>    * DEFINE_GET_SYSREG32(name, ...)
> >>    * DEFINE_SET_SYSREG32(name, ...)
> >>    * DEFINE_GET_SYSREG64(name, ...)
> >>    * DEFINE_SET_SYSREG64(name, ...)
> >> These macros are translated to inline functions with consistent naming,
> >> get_##name() and set_##name(), which can be used by C code directly.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> ---
> >>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
> >>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
> >>  2 files changed, 60 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> >> index f25e7ee..3ca6b42 100644
> >> --- a/lib/arm/asm/processor.h
> >> +++ b/lib/arm/asm/processor.h
> >> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
> >>  
> >>  #define current_mode() (current_cpsr() & MODE_MASK)
> >>  
> >> -static inline unsigned int get_mpidr(void)
> >> -{
> >> -	unsigned int mpidr;
> >> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> >> -	return mpidr;
> >> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> >> +static inline uint32_t get_##name(void)					\
> >> +{									\
> >> +	uint32_t reg;							\
> >> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> >> +		     #opc2 : "=r" (reg));				\
> >> +	return reg;							\
> >> +}
> >> +
> >> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> >> +static inline void set_##name(uint32_t value)				\
> >> +{									\
> >> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> >> +		     #opc2 :: "r" (value));				\
> >                            ^ nit: no space here, checkpatch would complain
> >> +}
> >> +
> >> +#define DEFINE_GET_SYSREG64(name, opc, crm)				\
> >> +static inline uint64_t get_##name(void)					\
> >> +{									\
> >> +	uint32_t lo, hi;						\
> >> +	asm volatile("mrrc p15, " #opc ", %0, %1, " #crm		\
> >> +		     : "=r" (lo), "=r" (hi));				\
> >> +	return (uint64_t)hi << 32 | lo;					\
> >> +}
> >> +
> >> +#define DEFINE_SET_SYSREG64(name, opc, crm)				\
> >> +static inline void set_##name(uint64_t value)				\
> >> +{									\
> >> +	asm volatile("mcrr p15, " #opc ", %0, %1, " #crm		\
> >> +		     :: "r" (value & 0xffffffff), "r" (value >> 32));	\
> >>  }
> >>  
> >> +DEFINE_GET_SYSREG32(mpidr, 0, c0, c0, 5)
> >> +
> >>  /* Only support Aff0 for now, up to 4 cpus */
> >>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> >>  
> >> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> >> index 84d5c7c..dfa75eb 100644
> >> --- a/lib/arm64/asm/processor.h
> >> +++ b/lib/arm64/asm/processor.h
> >> @@ -66,14 +66,35 @@ static inline unsigned long current_level(void)
> >>  	return el & 0xc;
> >>  }
> >>  
> >> -#define DEFINE_GET_SYSREG32(reg)				\
> >> -static inline unsigned int get_##reg(void)			\
> >> -{								\
> >> -	unsigned int reg;					\
> >> -	asm volatile("mrs %0, " #reg "_el1" : "=r" (reg));	\
> >> -	return reg;						\
> >> +#define DEFINE_GET_SYSREG32(reg, el)					\
> >> +static inline uint32_t get_##reg(void)					\
> >> +{									\
> >> +	uint32_t reg;							\
> >> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> >> +	return reg;							\
> >>  }
> >> -DEFINE_GET_SYSREG32(mpidr)
> >> +
> >> +#define DEFINE_SET_SYSREG32(reg, el)					\
> >> +static inline void set_##reg(uint32_t value)				\
> >> +{									\
> >> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> >> +}
> >> +
> >> +#define DEFINE_GET_SYSREG64(reg, el)					\
> >> +static inline uint64_t get_##reg(void)					\
> >> +{									\
> >> +	uint64_t reg;							\
> >> +	asm volatile("mrs %0, " #reg "_" #el : "=r" (reg));		\
> >> +	return reg;							\
> >> +}
> >> +
> >> +#define DEFINE_SET_SYSREG64(reg, el)					\
> >> +static inline void set_##reg(uint64_t value)				\
> >> +{									\
> >> +	asm volatile("msr " #reg "_" #el ", %0" :: "r" (value));	\
> >> +}
> >> +
> >> +DEFINE_GET_SYSREG32(mpidr, el1)
> > 
> > 32-bit mpidr for arm64 isn't right, and it's changed by [1] in the
> > gic series.
> 
> So how are we actually handling this? Waiting for the GIC series to be
> merged, then rebasing on top of that (which is what I thought we'd do)?
> 
> Or make a combined patch (taking it out of both series), and merge it
> before?

Let's make sure this one addresses the need of the gic one, and then
I'll merge this one into arm/next first and drop the other one.

> 
> > However changing it to 64-bit with this patch would result
> > in a get_mpidr() call that returns uint64_t on arm64 and uint32_t on
> > arm32, which won't be nice for common code. Andre brought up during the
> > review of [1] that we should be using the architectural types for register
> > accessors.
> 
> At least for registers that differ in size between A32 and A64. Many
> system registers are actually 32-bit wide and are explicitly stated as
> such in the ARM(v8) ARM (for instance MIDR_EL1).
> Yes, the A64 msr/mrs instructions only know a 64-bit register encoding,
> but the actual content is often confined to 32 bits (in MIDR or PMCR,
> for instance).
> So I wonder if we should take care of those with an explicit uint32_t
> return type?
> 
> > That means, that while internally all the above functions can
> > know what's 32-bit and what's 64-bit, using uint32/64_t appropriately,
> > the external interfaces should be 'unsigned long', 'unsigned int',
> > 'unsigned long long'.
> 
> Not so sure about that.
> I think we may need _three_ types of system register accessors?
> 1) always 32-bit (e.g. MIDR_EL1): use uint32_t
> 2) always 64-bit (e.g. CNTVCT_EL0),: use uint64_t
> 3) natural register size (e.g. MPIDR_EL1): use unsigned long
> 
> Does that make sense or is that overkill?

OK, now that'd I've rethought about this, the natural sizes weren't
a good idea. They don't really give us better common code anyway,
because 64-bit code may want to look at the upper 32 bits. I think
we need to forget (3) above, and special case MPIDR for arm32 accesses.
MPIDR is special because it gets used by core smp and gic code, and we
don't want #ifdefs everywhere.

So, for this patch, I think we only need to add

DEFINE_GET_SYSREG32(mpidr32, 0, c0, c0, 5)
static inline uint64_t get_mpidr(void)
{
	return get_mpidr32();
}

to the 32-bit processor.h

and in the 64-bit file we need to switch DEFINE_GET_SYSREG32(mpidr, el1)
to DEFINE_GET_SYSREG64(mpidr, el1). Also, the second reply I made still
holds; always using 64-bit (unsigned long) internally for AArch64
registers, but choosing to just return the lower 32 bits for the "32-bit"
ones (Peter's arguments weren't lost on me :-)

How's that sound?

Thanks,
drew

(Changing get_mpidr() to return a uint64_t means I'll need to fixup [2],
 but no biggy...)

[2] https://github.com/rhdrjones/kvm-unit-tests/commit/9f3f7f8141a98d0cd5175ad6cb491a4e1c5f7cd9

> 
> Cheers,
> Andre.
> 
> > [1] https://github.com/rhdrjones/kvm-unit-tests/commit/57e48b8e6dc2ddf4b2e4eb1ceb5a5f87f2dd074b
> > 
> > Thanks,
> > drew
> > 
> >>  
> >>  /* Only support Aff0 for now, gicv2 only */
> >>  #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff))
> >> -- 
> >> 1.8.3.1
> >>
> >>
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01 10:19     ` Andre Przywara
@ 2016-12-01 13:47       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01 13:47 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Wei Huang, alindsay, kvm, croberts, qemu-devel, alistair.francis,
	cov, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 10:19:13AM +0000, Andre Przywara wrote:
> Hi Drew,
> 
> actually unrelated to this actual patch, but since you mentioned it:
> 
> > As we work out how best to handle tcg-only tests in order to get Alex
> > Bennee's MTTCG tests merged, we'll probably revisit this file,
> 
> So when I was experimenting with kvmtool, I realized that some tests
> would need to be QEMU only. Also I was tempted to try some tests either
> on bare metal machines or in a Fast Model.
> So I wonder if we should have some constraints or tags on the tests, so
> that a certain backend can filter on this and skip if it's not capable?

So far we've been using unittests.cfg flags for filtering, but we could
also teach configure how to set up the build for only certain tests, i.e.
new config options and makefile targets could take use further. The
unittests.cfg file could be split into multiple cfg files as well,
teaching run_tests.sh how to select the right one.

> 
> Just wanted to mention this so that we can use this refactoring
> opportunity to come up with something more generic than just a boolean
> TGC vs. KVM.
> Maybe we should introduce the notion of a "test backend"? That could be
> QEMU/KVM and TCG for now, but later extended to cover kvmtool and
>

So the $TEST_DIR/run (e.g. arm/run) script is currently qemu focused,
and is learning how to deal with both KVM and TCG. We haven't been
trying to keep qemu knowledge in that one file, but we could probably
do it, i.e. rename it to run-qemu and make sure all common script
files are kvm userspace agnostic. I don't think that should be too
difficult. We'll likely need more build config options for this though
too, as load addresses, etc. may change.

> probably other hypervisors like Xen as well.

Also doable once we isolate the hypervisor userspace specifics from
everything else. Same goes for getting tests to run on bare-metal,
launched from the grub prompt or UEFI. But, eventually people will
be confused with the project's name *kvm*-unit-tests :-)

Thanks,
drew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01  8:59     ` Andrew Jones
@ 2016-12-01 15:27       ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 15:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao



On 12/01/2016 02:59 AM, Andrew Jones wrote:
> 
> Should this be From: Andre?
> 
> On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
>> This patch defines four macros to assist creating system register
>> accessors under both ARMv7 and AArch64:
>>    * DEFINE_GET_SYSREG32(name, ...)
>>    * DEFINE_SET_SYSREG32(name, ...)
>>    * DEFINE_GET_SYSREG64(name, ...)
>>    * DEFINE_SET_SYSREG64(name, ...)
>> These macros are translated to inline functions with consistent naming,
>> get_##name() and set_##name(), which can be used by C code directly.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
>>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
>>  2 files changed, 60 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>> index f25e7ee..3ca6b42 100644
>> --- a/lib/arm/asm/processor.h
>> +++ b/lib/arm/asm/processor.h
>> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
>>  
>>  #define current_mode() (current_cpsr() & MODE_MASK)
>>  
>> -static inline unsigned int get_mpidr(void)
>> -{
>> -	unsigned int mpidr;
>> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
>> -	return mpidr;
>> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline uint32_t get_##name(void)					\
>> +{									\
>> +	uint32_t reg;							\
>> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 : "=r" (reg));				\
>> +	return reg;							\
>> +}
>> +
>> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline void set_##name(uint32_t value)				\
>> +{									\
>> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 :: "r" (value));				\
>                            ^ nit: no space here, checkpatch would complain

Which checkpatch script you are using? I didn't find one in
kvm-unit-tests. I tried kernel's checkpatch script, but it didn't
complain anything against this patch.

>> +}
>> +

<snip>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
@ 2016-12-01 15:27       ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 15:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm



On 12/01/2016 02:59 AM, Andrew Jones wrote:
> 
> Should this be From: Andre?
> 
> On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
>> This patch defines four macros to assist creating system register
>> accessors under both ARMv7 and AArch64:
>>    * DEFINE_GET_SYSREG32(name, ...)
>>    * DEFINE_SET_SYSREG32(name, ...)
>>    * DEFINE_GET_SYSREG64(name, ...)
>>    * DEFINE_SET_SYSREG64(name, ...)
>> These macros are translated to inline functions with consistent naming,
>> get_##name() and set_##name(), which can be used by C code directly.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
>>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
>>  2 files changed, 60 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
>> index f25e7ee..3ca6b42 100644
>> --- a/lib/arm/asm/processor.h
>> +++ b/lib/arm/asm/processor.h
>> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
>>  
>>  #define current_mode() (current_cpsr() & MODE_MASK)
>>  
>> -static inline unsigned int get_mpidr(void)
>> -{
>> -	unsigned int mpidr;
>> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
>> -	return mpidr;
>> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline uint32_t get_##name(void)					\
>> +{									\
>> +	uint32_t reg;							\
>> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 : "=r" (reg));				\
>> +	return reg;							\
>> +}
>> +
>> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
>> +static inline void set_##name(uint32_t value)				\
>> +{									\
>> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
>> +		     #opc2 :: "r" (value));				\
>                            ^ nit: no space here, checkpatch would complain

Which checkpatch script you are using? I didn't find one in
kvm-unit-tests. I tried kernel's checkpatch script, but it didn't
complain anything against this patch.

>> +}
>> +

<snip>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers
  2016-12-01 15:27       ` Wei Huang
  (?)
@ 2016-12-01 15:50       ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-01 15:50 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, cov, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 09:27:59AM -0600, Wei Huang wrote:
> 
> 
> On 12/01/2016 02:59 AM, Andrew Jones wrote:
> > 
> > Should this be From: Andre?
> > 
> > On Wed, Nov 30, 2016 at 11:16:39PM -0600, Wei Huang wrote:
> >> This patch defines four macros to assist creating system register
> >> accessors under both ARMv7 and AArch64:
> >>    * DEFINE_GET_SYSREG32(name, ...)
> >>    * DEFINE_SET_SYSREG32(name, ...)
> >>    * DEFINE_GET_SYSREG64(name, ...)
> >>    * DEFINE_SET_SYSREG64(name, ...)
> >> These macros are translated to inline functions with consistent naming,
> >> get_##name() and set_##name(), which can be used by C code directly.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> ---
> >>  lib/arm/asm/processor.h   | 37 ++++++++++++++++++++++++++++++++-----
> >>  lib/arm64/asm/processor.h | 35 ++++++++++++++++++++++++++++-------
> >>  2 files changed, 60 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
> >> index f25e7ee..3ca6b42 100644
> >> --- a/lib/arm/asm/processor.h
> >> +++ b/lib/arm/asm/processor.h
> >> @@ -33,13 +33,40 @@ static inline unsigned long current_cpsr(void)
> >>  
> >>  #define current_mode() (current_cpsr() & MODE_MASK)
> >>  
> >> -static inline unsigned int get_mpidr(void)
> >> -{
> >> -	unsigned int mpidr;
> >> -	asm volatile("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr));
> >> -	return mpidr;
> >> +#define DEFINE_GET_SYSREG32(name, opc1, crn, crm, opc2)			\
> >> +static inline uint32_t get_##name(void)					\
> >> +{									\
> >> +	uint32_t reg;							\
> >> +	asm volatile("mrc p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> >> +		     #opc2 : "=r" (reg));				\
> >> +	return reg;							\
> >> +}
> >> +
> >> +#define DEFINE_SET_SYSREG32(name, opc1, crn, crm, opc2)			\
> >> +static inline void set_##name(uint32_t value)				\
> >> +{									\
> >> +	asm volatile("mcr p15, " #opc1 ", %0, " #crn ", " #crm ", "	\
> >> +		     #opc2 :: "r" (value));				\
> >                            ^ nit: no space here, checkpatch would complain
> 
> Which checkpatch script you are using? I didn't find one in
> kvm-unit-tests. I tried kernel's checkpatch script, but it didn't
> complain anything against this patch.

I use the kernel's, and it, at least used to, complains about no
spaces between the ':' in asms. If it doesn't complain now, then
it doesn't matter. Actually, it didn't really matter before :-)

Thanks,
drew
> 
> >> +}
> >> +
> 
> <snip>
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01  9:18     ` Andrew Jones
  (?)
@ 2016-12-01 17:36     ` Wei Huang
  2016-12-02  9:58         ` Andrew Jones
  -1 siblings, 1 reply; 48+ messages in thread
From: Wei Huang @ 2016-12-01 17:36 UTC (permalink / raw)
  To: Andrew Jones
  Cc: cov, alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, shannon.zhao, kvmarm



On 12/01/2016 03:18 AM, Andrew Jones wrote:
> On Wed, Nov 30, 2016 at 11:16:41PM -0600, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 94 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 1fe2b1a..3566a27 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -16,6 +16,9 @@
>>  #include "asm/barrier.h"
>>  #include "asm/processor.h"
>>  
>> +#define PMU_PMCR_E         (1 << 0)
>> +#define PMU_PMCR_C         (1 << 2)
>> +#define PMU_PMCR_LC        (1 << 6)
>>  #define PMU_PMCR_N_SHIFT   11
>>  #define PMU_PMCR_N_MASK    0x1f
>>  #define PMU_PMCR_ID_SHIFT  16
>> @@ -23,10 +26,57 @@
>>  #define PMU_PMCR_IMP_SHIFT 24
>>  #define PMU_PMCR_IMP_MASK  0xff
>>  
>> +#define ID_DFR0_PERFMON_SHIFT 24
>> +#define ID_DFR0_PERFMON_MASK  0xf
>> +
>> +#define PMU_CYCLE_IDX         31
>> +
>> +#define NR_SAMPLES 10
>> +
>> +static unsigned int pmu_version;
>>  #if defined(__arm__)
>>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
>> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
>> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
>> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
> 
> Seeing how we get lots of redundant looking lines, I think instead
> of defining DEFINE_SET/GET_SYSREG32/64, we should instead have
> 
> DEFINE_SYSREG32/64      ... creates both get_<reg> and set_<reg>
> DEFINE_SYSREG32/64_RO   ... creates just get_<reg>

Don't like the naming. I think we can create a new macro, named
DEFINE_GET_SET_SYSREG32/64. I know it is boring, but readers should get
the idea easily.

> 
>> +
>> +static inline uint64_t get_pmccntr(void)
>> +{
>> +	if (pmu_version == 0x3)
>> +		return get_pmccntr64();
>> +	else
>> +		return get_pmccntr32();
>> +}
>> +
>> +static inline void set_pmccntr(uint64_t value)
>> +{
>> +	if (pmu_version == 0x3)
>> +		set_pmccntr64(value);
>> +	else
>> +		set_pmccntr32(value & 0xffffffff);
>> +}
> 
> So the two accessors above are exceptional, which is why we don't
> use SYSREG for them. These can have uint64_t for there external
> interface. We can't require 'unsigned long' or 'unsigned long long'
> 
>> +
>> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
>> +static inline void set_pmccfiltr(uint32_t value)
>> +{
>> +	set_pmselr(PMU_CYCLE_IDX);
>> +	set_pmxevtyper(value);
>> +	isb();
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>> +DEFINE_SET_SYSREG32(pmcr, el0)
>> +DEFINE_GET_SYSREG32(id_dfr0, el1)
>> +DEFINE_GET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG32(pmcntenset, el0);
>> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>>  #endif
>>  
>>  /*
>> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +	bool success = true;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +	set_pmccntr(0);
>> +
>> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
>> +
>> +	for (int i = 0; i < NR_SAMPLES; i++) {
>> +		uint64_t a, b;
>> +
>> +		a = get_pmccntr();
>> +		b = get_pmccntr();
>> +
>> +		if (a >= b) {
>> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
>> +			success = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
>> +
>> +	return success;
>> +}
>> +
>> +void pmu_init(void)
>> +{
>> +	uint32_t dfr0;
>> +
>> +	/* probe pmu version */
>> +	dfr0 = get_id_dfr0();
>> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
>> +	report_info("PMU version: %d", pmu_version);
>> +}
>> +
>>  int main(void)
>>  {
>>  	report_prefix_push("pmu");
>>  
>> +	pmu_init();
>>  	report("Control register", check_pmcr());
>> +	report("Monotonically increasing cycle count", check_cycles_increase());
>>  
>>  	return report_summary();
>>  }
>> -- 
>> 1.8.3.1
>>
>>
> 
> drew 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01 11:27     ` [Qemu-devel] " Andre Przywara
@ 2016-12-01 17:39       ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 17:39 UTC (permalink / raw)
  To: Andre Przywara, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones



On 12/01/2016 05:27 AM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 94 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 1fe2b1a..3566a27 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -16,6 +16,9 @@
>>  #include "asm/barrier.h"
>>  #include "asm/processor.h"
>>  
>> +#define PMU_PMCR_E         (1 << 0)
>> +#define PMU_PMCR_C         (1 << 2)
>> +#define PMU_PMCR_LC        (1 << 6)
>>  #define PMU_PMCR_N_SHIFT   11
>>  #define PMU_PMCR_N_MASK    0x1f
>>  #define PMU_PMCR_ID_SHIFT  16
>> @@ -23,10 +26,57 @@
>>  #define PMU_PMCR_IMP_SHIFT 24
>>  #define PMU_PMCR_IMP_MASK  0xff
>>  
>> +#define ID_DFR0_PERFMON_SHIFT 24
>> +#define ID_DFR0_PERFMON_MASK  0xf
>> +
>> +#define PMU_CYCLE_IDX         31
>> +
>> +#define NR_SAMPLES 10
>> +
>> +static unsigned int pmu_version;
>>  #if defined(__arm__)
>>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
>> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
>> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
>> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
>> +
>> +static inline uint64_t get_pmccntr(void)
>> +{
>> +	if (pmu_version == 0x3)
>> +		return get_pmccntr64();
>> +	else
>> +		return get_pmccntr32();
>> +}
>> +
>> +static inline void set_pmccntr(uint64_t value)
>> +{
>> +	if (pmu_version == 0x3)
>> +		set_pmccntr64(value);
>> +	else
>> +		set_pmccntr32(value & 0xffffffff);
>> +}
>> +
>> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
>> +static inline void set_pmccfiltr(uint32_t value)
>> +{
>> +	set_pmselr(PMU_CYCLE_IDX);
>> +	set_pmxevtyper(value);
>> +	isb();
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>> +DEFINE_SET_SYSREG32(pmcr, el0)
>> +DEFINE_GET_SYSREG32(id_dfr0, el1)
>> +DEFINE_GET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG32(pmcntenset, el0);
>> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>>  #endif
>>  
>>  /*
>> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +	bool success = true;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +	set_pmccntr(0);
> 
> Why do we need this? Shouldn't PMU_PMCR_C below take care of that?

PMU_PMCR_C does reset cycle counter, I can remove this one.

> 
>> +
>> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
>> +
>> +	for (int i = 0; i < NR_SAMPLES; i++) {
>> +		uint64_t a, b;
>> +
>> +		a = get_pmccntr();
>> +		b = get_pmccntr();
>> +
>> +		if (a >= b) {
>> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
>> +			success = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
>> +
>> +	return success;
>> +}
>> +
>> +void pmu_init(void)
> 
> Mmh, this function doesn't really initialize anything, does it?
> Should it be named pmu_available() or pmu_version() or the like?
> 

This function used to contain cycle counter configuration code. It sets
up PMCCNFILTR, PMCNTENSET, etc. Since then, the configuration code has
been moved to sub-unit tests. We can change its name to something like
pmu_probe().

> And should we bail out early here (or rather at the caller) if this
> register reports that no PMU is available? For instance by making it
> return a boolean?

This could do.

> 
>> +{
>> +	uint32_t dfr0;
>> +
>> +	/* probe pmu version */
>> +	dfr0 = get_id_dfr0();
>> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
>> +	report_info("PMU version: %d", pmu_version);
>> +}
>> +
>>  int main(void)
>>  {
>>  	report_prefix_push("pmu");
>>  
>> +	pmu_init();
>>  	report("Control register", check_pmcr());
>> +	report("Monotonically increasing cycle count", check_cycles_increase());
> 
> I wonder if we should skip this test if check_pmcr() has returned false
> before? We let it return a boolean, so it seems quite natural to use
> this information here.
> This would avoid a lot of false FAILs due to the PMU not being available
> (because QEMU is too old, for instance).
> 
> Cheers,
> Andre.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
@ 2016-12-01 17:39       ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 17:39 UTC (permalink / raw)
  To: Andre Przywara, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones



On 12/01/2016 05:27 AM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>> even for the smallest delta of two subsequent reads.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 94 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 1fe2b1a..3566a27 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -16,6 +16,9 @@
>>  #include "asm/barrier.h"
>>  #include "asm/processor.h"
>>  
>> +#define PMU_PMCR_E         (1 << 0)
>> +#define PMU_PMCR_C         (1 << 2)
>> +#define PMU_PMCR_LC        (1 << 6)
>>  #define PMU_PMCR_N_SHIFT   11
>>  #define PMU_PMCR_N_MASK    0x1f
>>  #define PMU_PMCR_ID_SHIFT  16
>> @@ -23,10 +26,57 @@
>>  #define PMU_PMCR_IMP_SHIFT 24
>>  #define PMU_PMCR_IMP_MASK  0xff
>>  
>> +#define ID_DFR0_PERFMON_SHIFT 24
>> +#define ID_DFR0_PERFMON_MASK  0xf
>> +
>> +#define PMU_CYCLE_IDX         31
>> +
>> +#define NR_SAMPLES 10
>> +
>> +static unsigned int pmu_version;
>>  #if defined(__arm__)
>>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
>> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
>> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
>> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
>> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
>> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
>> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
>> +
>> +static inline uint64_t get_pmccntr(void)
>> +{
>> +	if (pmu_version == 0x3)
>> +		return get_pmccntr64();
>> +	else
>> +		return get_pmccntr32();
>> +}
>> +
>> +static inline void set_pmccntr(uint64_t value)
>> +{
>> +	if (pmu_version == 0x3)
>> +		set_pmccntr64(value);
>> +	else
>> +		set_pmccntr32(value & 0xffffffff);
>> +}
>> +
>> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
>> +static inline void set_pmccfiltr(uint32_t value)
>> +{
>> +	set_pmselr(PMU_CYCLE_IDX);
>> +	set_pmxevtyper(value);
>> +	isb();
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>> +DEFINE_SET_SYSREG32(pmcr, el0)
>> +DEFINE_GET_SYSREG32(id_dfr0, el1)
>> +DEFINE_GET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG64(pmccntr, el0);
>> +DEFINE_SET_SYSREG32(pmcntenset, el0);
>> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
>>  #endif
>>  
>>  /*
>> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
>>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>>  }
>>  
>> +/*
>> + * Ensure that the cycle counter progresses between back-to-back reads.
>> + */
>> +static bool check_cycles_increase(void)
>> +{
>> +	bool success = true;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +	set_pmccntr(0);
> 
> Why do we need this? Shouldn't PMU_PMCR_C below take care of that?

PMU_PMCR_C does reset cycle counter, I can remove this one.

> 
>> +
>> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
>> +
>> +	for (int i = 0; i < NR_SAMPLES; i++) {
>> +		uint64_t a, b;
>> +
>> +		a = get_pmccntr();
>> +		b = get_pmccntr();
>> +
>> +		if (a >= b) {
>> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
>> +			success = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
>> +
>> +	return success;
>> +}
>> +
>> +void pmu_init(void)
> 
> Mmh, this function doesn't really initialize anything, does it?
> Should it be named pmu_available() or pmu_version() or the like?
> 

This function used to contain cycle counter configuration code. It sets
up PMCCNFILTR, PMCNTENSET, etc. Since then, the configuration code has
been moved to sub-unit tests. We can change its name to something like
pmu_probe().

> And should we bail out early here (or rather at the caller) if this
> register reports that no PMU is available? For instance by making it
> return a boolean?

This could do.

> 
>> +{
>> +	uint32_t dfr0;
>> +
>> +	/* probe pmu version */
>> +	dfr0 = get_id_dfr0();
>> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
>> +	report_info("PMU version: %d", pmu_version);
>> +}
>> +
>>  int main(void)
>>  {
>>  	report_prefix_push("pmu");
>>  
>> +	pmu_init();
>>  	report("Control register", check_pmcr());
>> +	report("Monotonically increasing cycle count", check_cycles_increase());
> 
> I wonder if we should skip this test if check_pmcr() has returned false
> before? We let it return a boolean, so it seems quite natural to use
> this information here.
> This would avoid a lot of false FAILs due to the PMU not being available
> (because QEMU is too old, for instance).
> 
> Cheers,
> Andre.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
@ 2016-12-01 20:27     ` Andre Przywara
  -1 siblings, 0 replies; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 20:27 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

Hi,

On 01/12/16 05:16, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 +++++++
>  2 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 3566a27..29d7c2c 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>  	set_pmxevtyper(value);
>  	isb();
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions were inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed in
> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	bgt	1b\n"
> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
>  DEFINE_SET_SYSREG32(pmcr, el0)
> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions are inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed
> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>  	return success;
>  }
>  
> +/*
> + * Execute a known number of guest instructions. Only even instruction counts
> + * greater than or equal to 4 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int loop = (num - 2) / 2;
> +
> +	assert(num >= 4 && ((num - 2) % 2 == 0));
> +	precise_instrs_loop(loop, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");

Do we really need this line?

In general I find the output quite confusing, actually distracting from
the other, actual tests. To make it more readable, I tweaked it a bit to
look like:
  4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
 36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
 68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
....
with some padding hints and limiting the line to at most 80 characters, by:

> +
> +	for (unsigned int i = 4; i < 300; i += 32) {
> +		uint64_t avg, sum = 0;
> +
> +		printf("%d :", i);

                printf("%3d: ", i);

> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			uint64_t cycles;
> +
> +			set_pmccntr(0);
> +			measure_instrs(i, pmcr);
> +			cycles = get_pmccntr();
> +			printf(" %"PRId64"", cycles);

                        printf(" %4"PRId64"", cycles);

> +
> +			if (!cycles) {
> +				printf("\ncycles not incrementing!\n");
> +				return false;
> +			} else if (cpi > 0 && cycles != i * cpi) {
> +				printf("\nunexpected cycle count received!\n");
> +				return false;
> +			} else if ((cycles >> 32) != 0) {
> +				/* The cycles taken by the loop above should
> +				 * fit in 32 bits easily. We check the upper
> +				 * 32 bits of the cycle counter to make sure
> +				 * there is no supprise. */
> +				printf("\ncycle count bigger than 32bit!\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);

                printf(" avg=%4"PRId64": %3"PRId64" %s\n",
                       sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
                       i > avg ? "ipc" : "cpi");

In general I question the usefulness of the cpi/ipc output, it didn't
seem meaningful in any way to me, neither in KVM or in TCG.
See the last line (68: ...) in the example above, we shouldn't use an
average with that deviation for statistical purposes.
For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
any real information to me, in fact the actual cycles look like constant
to me, probably due to emulation overhead.

So what are we supposed to learn from those numbers?

Cheers,
Andre.

> +	}
> +
> +	return true;
> +}
> +
>  void pmu_init(void)
>  {
>  	uint32_t dfr0;
> @@ -144,13 +259,19 @@ void pmu_init(void)
>  	report_info("PMU version: %d", pmu_version);
>  }
>  
> -int main(void)
> +int main(int argc, char *argv[])
>  {
> +	int cpi = 0;
> +
> +	if (argc > 1)
> +		cpi = atol(argv[1]);
> +
>  	report_prefix_push("pmu");
>  
>  	pmu_init();
>  	report("Control register", check_pmcr());
>  	report("Monotonically increasing cycle count", check_cycles_increase());
> +	report("Cycle/instruction ratio", check_cpi(cpi));
>  
>  	return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 816f494..044d97c 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -63,3 +63,17 @@ groups = pci
>  [pmu]
>  file = pmu.flat
>  groups = pmu
> +
> +# Test PMU support (TCG) with -icount IPC=1
> +[pmu-tcg-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +accel = tcg
> +
> +# Test PMU support (TCG) with -icount IPC=256
> +[pmu-tcg-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
> +groups = pmu
> +accel = tcg
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01 20:27     ` Andre Przywara
  0 siblings, 0 replies; 48+ messages in thread
From: Andre Przywara @ 2016-12-01 20:27 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

Hi,

On 01/12/16 05:16, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 +++++++
>  2 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 3566a27..29d7c2c 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>  	set_pmxevtyper(value);
>  	isb();
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions were inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed in
> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	bgt	1b\n"
> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  DEFINE_GET_SYSREG32(pmcr, el0)
>  DEFINE_SET_SYSREG32(pmcr, el0)
> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG64(pmccntr, el0);
>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. isb instructions are inserted to make sure
> + * pmccntr read after this function returns the exact instructions executed
> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>  	return success;
>  }
>  
> +/*
> + * Execute a known number of guest instructions. Only even instruction counts
> + * greater than or equal to 4 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int loop = (num - 2) / 2;
> +
> +	assert(num >= 4 && ((num - 2) % 2 == 0));
> +	precise_instrs_loop(loop, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> +
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");

Do we really need this line?

In general I find the output quite confusing, actually distracting from
the other, actual tests. To make it more readable, I tweaked it a bit to
look like:
  4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
 36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
 68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
....
with some padding hints and limiting the line to at most 80 characters, by:

> +
> +	for (unsigned int i = 4; i < 300; i += 32) {
> +		uint64_t avg, sum = 0;
> +
> +		printf("%d :", i);

                printf("%3d: ", i);

> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			uint64_t cycles;
> +
> +			set_pmccntr(0);
> +			measure_instrs(i, pmcr);
> +			cycles = get_pmccntr();
> +			printf(" %"PRId64"", cycles);

                        printf(" %4"PRId64"", cycles);

> +
> +			if (!cycles) {
> +				printf("\ncycles not incrementing!\n");
> +				return false;
> +			} else if (cpi > 0 && cycles != i * cpi) {
> +				printf("\nunexpected cycle count received!\n");
> +				return false;
> +			} else if ((cycles >> 32) != 0) {
> +				/* The cycles taken by the loop above should
> +				 * fit in 32 bits easily. We check the upper
> +				 * 32 bits of the cycle counter to make sure
> +				 * there is no supprise. */
> +				printf("\ncycle count bigger than 32bit!\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);

                printf(" avg=%4"PRId64": %3"PRId64" %s\n",
                       sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
                       i > avg ? "ipc" : "cpi");

In general I question the usefulness of the cpi/ipc output, it didn't
seem meaningful in any way to me, neither in KVM or in TCG.
See the last line (68: ...) in the example above, we shouldn't use an
average with that deviation for statistical purposes.
For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
any real information to me, in fact the actual cycles look like constant
to me, probably due to emulation overhead.

So what are we supposed to learn from those numbers?

Cheers,
Andre.

> +	}
> +
> +	return true;
> +}
> +
>  void pmu_init(void)
>  {
>  	uint32_t dfr0;
> @@ -144,13 +259,19 @@ void pmu_init(void)
>  	report_info("PMU version: %d", pmu_version);
>  }
>  
> -int main(void)
> +int main(int argc, char *argv[])
>  {
> +	int cpi = 0;
> +
> +	if (argc > 1)
> +		cpi = atol(argv[1]);
> +
>  	report_prefix_push("pmu");
>  
>  	pmu_init();
>  	report("Control register", check_pmcr());
>  	report("Monotonically increasing cycle count", check_cycles_increase());
> +	report("Cycle/instruction ratio", check_cpi(cpi));
>  
>  	return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 816f494..044d97c 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -63,3 +63,17 @@ groups = pci
>  [pmu]
>  file = pmu.flat
>  groups = pmu
> +
> +# Test PMU support (TCG) with -icount IPC=1
> +[pmu-tcg-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +accel = tcg
> +
> +# Test PMU support (TCG) with -icount IPC=256
> +[pmu-tcg-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
> +groups = pmu
> +accel = tcg
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01 20:27     ` [Qemu-devel] " Andre Przywara
@ 2016-12-01 21:12       ` Wei Huang
  -1 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 21:12 UTC (permalink / raw)
  To: Andre Przywara, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones



On 12/01/2016 02:27 PM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 +++++++
>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 3566a27..29d7c2c 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>  	set_pmxevtyper(value);
>>  	isb();
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions were inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed in
>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	bgt	1b\n"
>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>> +	: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>  DEFINE_SET_SYSREG32(pmcr, el0)
>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions are inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed
>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	msr	pmcr_el0, %[pmcr]\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	b.gt	1b\n"
>> +	"	msr	pmcr_el0, xzr\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr)
>> +	: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>  	return success;
>>  }
>>  
>> +/*
>> + * Execute a known number of guest instructions. Only even instruction counts
>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>> + * for example for the cycle counter or event counters to be reset). At the end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +	int loop = (num - 2) / 2;
>> +
>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>> +	precise_instrs_loop(loop, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that the
>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +
>> +	if (cpi > 0)
>> +		printf("Checking for CPI=%d.\n", cpi);
>> +	printf("instrs : cycles0 cycles1 ...\n");
> 
> Do we really need this line?
> 
> In general I find the output quite confusing, actually distracting from
> the other, actual tests. To make it more readable, I tweaked it a bit to
> look like:

Formatting the output can be useful and it indeed makes the output
easier to read.

>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
> ....
> with some padding hints and limiting the line to at most 80 characters, by:
> 
>> +
>> +	for (unsigned int i = 4; i < 300; i += 32) {
>> +		uint64_t avg, sum = 0;
>> +
>> +		printf("%d :", i);
> 
>                 printf("%3d: ", i);
> 
>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>> +			uint64_t cycles;
>> +
>> +			set_pmccntr(0);
>> +			measure_instrs(i, pmcr);
>> +			cycles = get_pmccntr();
>> +			printf(" %"PRId64"", cycles);
> 
>                         printf(" %4"PRId64"", cycles);
> 
>> +
>> +			if (!cycles) {
>> +				printf("\ncycles not incrementing!\n");
>> +				return false;
>> +			} else if (cpi > 0 && cycles != i * cpi) {
>> +				printf("\nunexpected cycle count received!\n");
>> +				return false;
>> +			} else if ((cycles >> 32) != 0) {
>> +				/* The cycles taken by the loop above should
>> +				 * fit in 32 bits easily. We check the upper
>> +				 * 32 bits of the cycle counter to make sure
>> +				 * there is no supprise. */
>> +				printf("\ncycle count bigger than 32bit!\n");
>> +				return false;
>> +			}
>> +
>> +			sum += cycles;
>> +		}
>> +		avg = sum / NR_SAMPLES;
>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> 
>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>                        i > avg ? "ipc" : "cpi");
> 
> In general I question the usefulness of the cpi/ipc output, it didn't
> seem meaningful in any way to me, neither in KVM or in TCG.

For KVM, CPI is useful for (vaguely) figuring out the total time spent
on emulation: KVM exit, perf_event calls, returning results. This
especially is true when i is small. For TCG, CPI is related to the cpi
parameter passed from main() function. The average CPI in check_cpi()
should always be the same as the one from main() under TCG mode;
otherwise QEMU is wrong. So I think CPI is still useful. But I agree
that IPC can be removed.

> See the last line (68: ...) in the example above, we shouldn't use an
> average with that deviation for statistical purposes.
> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
> any real information to me, in fact the actual cycles look like constant
> to me, probably due to emulation overhead.

Constants should only happen under TCG modes, which is expected.

> 
> So what are we supposed to learn from those numbers?
> 
> Cheers,
> Andre.
> 
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  void pmu_init(void)
>>  {
>>  	uint32_t dfr0;
>> @@ -144,13 +259,19 @@ void pmu_init(void)
>>  	report_info("PMU version: %d", pmu_version);
>>  }
>>  
>> -int main(void)
>> +int main(int argc, char *argv[])
>>  {
>> +	int cpi = 0;
>> +
>> +	if (argc > 1)
>> +		cpi = atol(argv[1]);
>> +
>>  	report_prefix_push("pmu");
>>  
>>  	pmu_init();
>>  	report("Control register", check_pmcr());
>>  	report("Monotonically increasing cycle count", check_cycles_increase());
>> +	report("Cycle/instruction ratio", check_cpi(cpi));
>>  
>>  	return report_summary();
>>  }
>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>> index 816f494..044d97c 100644
>> --- a/arm/unittests.cfg
>> +++ b/arm/unittests.cfg
>> @@ -63,3 +63,17 @@ groups = pci
>>  [pmu]
>>  file = pmu.flat
>>  groups = pmu
>> +
>> +# Test PMU support (TCG) with -icount IPC=1
>> +[pmu-tcg-icount-1]
>> +file = pmu.flat
>> +extra_params = -icount 0 -append '1'
>> +groups = pmu
>> +accel = tcg
>> +
>> +# Test PMU support (TCG) with -icount IPC=256
>> +[pmu-tcg-icount-256]
>> +file = pmu.flat
>> +extra_params = -icount 8 -append '256'
>> +groups = pmu
>> +accel = tcg
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01 21:12       ` Wei Huang
  0 siblings, 0 replies; 48+ messages in thread
From: Wei Huang @ 2016-12-01 21:12 UTC (permalink / raw)
  To: Andre Przywara, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones



On 12/01/2016 02:27 PM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 +++++++
>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 3566a27..29d7c2c 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>  	set_pmxevtyper(value);
>>  	isb();
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions were inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed in
>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	bgt	1b\n"
>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>> +	: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>  DEFINE_SET_SYSREG32(pmcr, el0)
>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions are inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed
>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	msr	pmcr_el0, %[pmcr]\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	b.gt	1b\n"
>> +	"	msr	pmcr_el0, xzr\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr)
>> +	: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>  	return success;
>>  }
>>  
>> +/*
>> + * Execute a known number of guest instructions. Only even instruction counts
>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>> + * for example for the cycle counter or event counters to be reset). At the end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +	int loop = (num - 2) / 2;
>> +
>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>> +	precise_instrs_loop(loop, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that the
>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +
>> +	if (cpi > 0)
>> +		printf("Checking for CPI=%d.\n", cpi);
>> +	printf("instrs : cycles0 cycles1 ...\n");
> 
> Do we really need this line?
> 
> In general I find the output quite confusing, actually distracting from
> the other, actual tests. To make it more readable, I tweaked it a bit to
> look like:

Formatting the output can be useful and it indeed makes the output
easier to read.

>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
> ....
> with some padding hints and limiting the line to at most 80 characters, by:
> 
>> +
>> +	for (unsigned int i = 4; i < 300; i += 32) {
>> +		uint64_t avg, sum = 0;
>> +
>> +		printf("%d :", i);
> 
>                 printf("%3d: ", i);
> 
>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>> +			uint64_t cycles;
>> +
>> +			set_pmccntr(0);
>> +			measure_instrs(i, pmcr);
>> +			cycles = get_pmccntr();
>> +			printf(" %"PRId64"", cycles);
> 
>                         printf(" %4"PRId64"", cycles);
> 
>> +
>> +			if (!cycles) {
>> +				printf("\ncycles not incrementing!\n");
>> +				return false;
>> +			} else if (cpi > 0 && cycles != i * cpi) {
>> +				printf("\nunexpected cycle count received!\n");
>> +				return false;
>> +			} else if ((cycles >> 32) != 0) {
>> +				/* The cycles taken by the loop above should
>> +				 * fit in 32 bits easily. We check the upper
>> +				 * 32 bits of the cycle counter to make sure
>> +				 * there is no supprise. */
>> +				printf("\ncycle count bigger than 32bit!\n");
>> +				return false;
>> +			}
>> +
>> +			sum += cycles;
>> +		}
>> +		avg = sum / NR_SAMPLES;
>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> 
>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>                        i > avg ? "ipc" : "cpi");
> 
> In general I question the usefulness of the cpi/ipc output, it didn't
> seem meaningful in any way to me, neither in KVM or in TCG.

For KVM, CPI is useful for (vaguely) figuring out the total time spent
on emulation: KVM exit, perf_event calls, returning results. This
especially is true when i is small. For TCG, CPI is related to the cpi
parameter passed from main() function. The average CPI in check_cpi()
should always be the same as the one from main() under TCG mode;
otherwise QEMU is wrong. So I think CPI is still useful. But I agree
that IPC can be removed.

> See the last line (68: ...) in the example above, we shouldn't use an
> average with that deviation for statistical purposes.
> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
> any real information to me, in fact the actual cycles look like constant
> to me, probably due to emulation overhead.

Constants should only happen under TCG modes, which is expected.

> 
> So what are we supposed to learn from those numbers?
> 
> Cheers,
> Andre.
> 
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  void pmu_init(void)
>>  {
>>  	uint32_t dfr0;
>> @@ -144,13 +259,19 @@ void pmu_init(void)
>>  	report_info("PMU version: %d", pmu_version);
>>  }
>>  
>> -int main(void)
>> +int main(int argc, char *argv[])
>>  {
>> +	int cpi = 0;
>> +
>> +	if (argc > 1)
>> +		cpi = atol(argv[1]);
>> +
>>  	report_prefix_push("pmu");
>>  
>>  	pmu_init();
>>  	report("Control register", check_pmcr());
>>  	report("Monotonically increasing cycle count", check_cycles_increase());
>> +	report("Cycle/instruction ratio", check_cpi(cpi));
>>  
>>  	return report_summary();
>>  }
>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>> index 816f494..044d97c 100644
>> --- a/arm/unittests.cfg
>> +++ b/arm/unittests.cfg
>> @@ -63,3 +63,17 @@ groups = pci
>>  [pmu]
>>  file = pmu.flat
>>  groups = pmu
>> +
>> +# Test PMU support (TCG) with -icount IPC=1
>> +[pmu-tcg-icount-1]
>> +file = pmu.flat
>> +extra_params = -icount 0 -append '1'
>> +groups = pmu
>> +accel = tcg
>> +
>> +# Test PMU support (TCG) with -icount IPC=256
>> +[pmu-tcg-icount-256]
>> +file = pmu.flat
>> +extra_params = -icount 8 -append '256'
>> +groups = pmu
>> +accel = tcg
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01 20:27     ` [Qemu-devel] " Andre Przywara
@ 2016-12-01 21:18       ` Christopher Covington
  -1 siblings, 0 replies; 48+ messages in thread
From: Christopher Covington @ 2016-12-01 21:18 UTC (permalink / raw)
  To: Andre Przywara, Wei Huang
  Cc: alindsay, kvm, croberts, qemu-devel, alistair.francis,
	shannon.zhao, kvmarm

On 12/01/2016 03:27 PM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 +++++++
>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 3566a27..29d7c2c 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>  	set_pmxevtyper(value);
>>  	isb();
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions were inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed in
>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	bgt	1b\n"
>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>> +	: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>  DEFINE_SET_SYSREG32(pmcr, el0)
>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions are inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed
>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	msr	pmcr_el0, %[pmcr]\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	b.gt	1b\n"
>> +	"	msr	pmcr_el0, xzr\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr)
>> +	: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>  	return success;
>>  }
>>  
>> +/*
>> + * Execute a known number of guest instructions. Only even instruction counts
>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>> + * for example for the cycle counter or event counters to be reset). At the end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +	int loop = (num - 2) / 2;
>> +
>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>> +	precise_instrs_loop(loop, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that the
>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +
>> +	if (cpi > 0)
>> +		printf("Checking for CPI=%d.\n", cpi);
>> +	printf("instrs : cycles0 cycles1 ...\n");
> 
> Do we really need this line?
> 
> In general I find the output quite confusing, actually distracting from
> the other, actual tests. To make it more readable, I tweaked it a bit to
> look like:
>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
> ....
> with some padding hints and limiting the line to at most 80 characters, by:
> 
>> +
>> +	for (unsigned int i = 4; i < 300; i += 32) {
>> +		uint64_t avg, sum = 0;
>> +
>> +		printf("%d :", i);
> 
>                 printf("%3d: ", i);
> 
>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>> +			uint64_t cycles;
>> +
>> +			set_pmccntr(0);
>> +			measure_instrs(i, pmcr);
>> +			cycles = get_pmccntr();
>> +			printf(" %"PRId64"", cycles);
> 
>                         printf(" %4"PRId64"", cycles);
> 
>> +
>> +			if (!cycles) {
>> +				printf("\ncycles not incrementing!\n");
>> +				return false;
>> +			} else if (cpi > 0 && cycles != i * cpi) {
>> +				printf("\nunexpected cycle count received!\n");
>> +				return false;
>> +			} else if ((cycles >> 32) != 0) {
>> +				/* The cycles taken by the loop above should
>> +				 * fit in 32 bits easily. We check the upper
>> +				 * 32 bits of the cycle counter to make sure
>> +				 * there is no supprise. */
>> +				printf("\ncycle count bigger than 32bit!\n");
>> +				return false;
>> +			}
>> +
>> +			sum += cycles;
>> +		}
>> +		avg = sum / NR_SAMPLES;
>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> 
>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>                        i > avg ? "ipc" : "cpi");
> 
> In general I question the usefulness of the cpi/ipc output, it didn't
> seem meaningful in any way to me, neither in KVM or in TCG.
> See the last line (68: ...) in the example above, we shouldn't use an
> average with that deviation for statistical purposes.
> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
> any real information to me, in fact the actual cycles look like constant
> to me, probably due to emulation overhead.
> 
> So what are we supposed to learn from those numbers?

I think they were mostly useful in debugging the checking of TCG's
-icount mode, where the numbers are precise.

I think seeing variable numbers from TCG when -icount is off illustrates
why -icount is useful. But justifying TCG best practices is a non-goal of
kvm-unit-tests.

I'd like to think is possible to see anomalies in the KVM info which are
due to bugs, but perhaps that's unrealistic or unlikely.

Feel free to drop the prints, or only print in -icount mode, or only print
when there's error in -icount mode.

Regards,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01 21:18       ` Christopher Covington
  0 siblings, 0 replies; 48+ messages in thread
From: Christopher Covington @ 2016-12-01 21:18 UTC (permalink / raw)
  To: Andre Przywara, Wei Huang
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

On 12/01/2016 03:27 PM, Andre Przywara wrote:
> Hi,
> 
> On 01/12/16 05:16, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
>>
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 +++++++
>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 3566a27..29d7c2c 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>  	set_pmxevtyper(value);
>>  	isb();
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions were inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed in
>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	bgt	1b\n"
>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>> +	: "cc");
>> +}
>>  #elif defined(__aarch64__)
>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>  DEFINE_SET_SYSREG32(pmcr, el0)
>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. isb instructions are inserted to make sure
>> + * pmccntr read after this function returns the exact instructions executed
>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>> + */
>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>> +{
>> +	asm volatile(
>> +	"	msr	pmcr_el0, %[pmcr]\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	b.gt	1b\n"
>> +	"	msr	pmcr_el0, xzr\n"
>> +	"	isb\n"
>> +	: [loop] "+r" (loop)
>> +	: [pmcr] "r" (pmcr)
>> +	: "cc");
>> +}
>>  #endif
>>  
>>  /*
>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>  	return success;
>>  }
>>  
>> +/*
>> + * Execute a known number of guest instructions. Only even instruction counts
>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>> + * for example for the cycle counter or event counters to be reset). At the end
>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>> + * counting, allowing the cycle counter or event counters to be read at the
>> + * leisure of the calling code.
>> + */
>> +static void measure_instrs(int num, uint32_t pmcr)
>> +{
>> +	int loop = (num - 2) / 2;
>> +
>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>> +	precise_instrs_loop(loop, pmcr);
>> +}
>> +
>> +/*
>> + * Measure cycle counts for various known instruction counts. Ensure that the
>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>> + * instructions and using reset and stop controls). If supplied a positive,
>> + * nonzero CPI parameter, also strictly check that every measurement matches
>> + * it. Strict CPI checking is used to test -icount mode.
>> + */
>> +static bool check_cpi(int cpi)
>> +{
>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>> +
>> +	/* init before event access, this test only cares about cycle count */
>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>> +
>> +	if (cpi > 0)
>> +		printf("Checking for CPI=%d.\n", cpi);
>> +	printf("instrs : cycles0 cycles1 ...\n");
> 
> Do we really need this line?
> 
> In general I find the output quite confusing, actually distracting from
> the other, actual tests. To make it more readable, I tweaked it a bit to
> look like:
>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
> ....
> with some padding hints and limiting the line to at most 80 characters, by:
> 
>> +
>> +	for (unsigned int i = 4; i < 300; i += 32) {
>> +		uint64_t avg, sum = 0;
>> +
>> +		printf("%d :", i);
> 
>                 printf("%3d: ", i);
> 
>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>> +			uint64_t cycles;
>> +
>> +			set_pmccntr(0);
>> +			measure_instrs(i, pmcr);
>> +			cycles = get_pmccntr();
>> +			printf(" %"PRId64"", cycles);
> 
>                         printf(" %4"PRId64"", cycles);
> 
>> +
>> +			if (!cycles) {
>> +				printf("\ncycles not incrementing!\n");
>> +				return false;
>> +			} else if (cpi > 0 && cycles != i * cpi) {
>> +				printf("\nunexpected cycle count received!\n");
>> +				return false;
>> +			} else if ((cycles >> 32) != 0) {
>> +				/* The cycles taken by the loop above should
>> +				 * fit in 32 bits easily. We check the upper
>> +				 * 32 bits of the cycle counter to make sure
>> +				 * there is no supprise. */
>> +				printf("\ncycle count bigger than 32bit!\n");
>> +				return false;
>> +			}
>> +
>> +			sum += cycles;
>> +		}
>> +		avg = sum / NR_SAMPLES;
>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> 
>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>                        i > avg ? "ipc" : "cpi");
> 
> In general I question the usefulness of the cpi/ipc output, it didn't
> seem meaningful in any way to me, neither in KVM or in TCG.
> See the last line (68: ...) in the example above, we shouldn't use an
> average with that deviation for statistical purposes.
> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
> any real information to me, in fact the actual cycles look like constant
> to me, probably due to emulation overhead.
> 
> So what are we supposed to learn from those numbers?

I think they were mostly useful in debugging the checking of TCG's
-icount mode, where the numbers are precise.

I think seeing variable numbers from TCG when -icount is off illustrates
why -icount is useful. But justifying TCG best practices is a non-goal of
kvm-unit-tests.

I'd like to think is possible to see anomalies in the KVM info which are
due to bugs, but perhaps that's unrealistic or unlikely.

Feel free to drop the prints, or only print in -icount mode, or only print
when there's error in -icount mode.

Regards,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01 21:18       ` [Qemu-devel] " Christopher Covington
@ 2016-12-01 22:04         ` André Przywara
  -1 siblings, 0 replies; 48+ messages in thread
From: André Przywara @ 2016-12-01 22:04 UTC (permalink / raw)
  To: Christopher Covington, Wei Huang
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

On 01/12/16 21:18, Christopher Covington wrote:
> On 12/01/2016 03:27 PM, Andre Przywara wrote:

Hi,

....

>>> +		}
>>> +		avg = sum / NR_SAMPLES;
>>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
>>
>>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>>                        i > avg ? "ipc" : "cpi");
>>
>> In general I question the usefulness of the cpi/ipc output, it didn't
>> seem meaningful in any way to me, neither in KVM or in TCG.
>> See the last line (68: ...) in the example above, we shouldn't use an
>> average with that deviation for statistical purposes.
>> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
>> any real information to me, in fact the actual cycles look like constant
>> to me, probably due to emulation overhead.
>>
>> So what are we supposed to learn from those numbers?
> 
> I think they were mostly useful in debugging the checking of TCG's
> -icount mode, where the numbers are precise.
> 
> I think seeing variable numbers from TCG when -icount is off illustrates
> why -icount is useful. But justifying TCG best practices is a non-goal of
> kvm-unit-tests.
> 
> I'd like to think is possible to see anomalies in the KVM info which are
> due to bugs, but perhaps that's unrealistic or unlikely.
> 
> Feel free to drop the prints, or only print in -icount mode, or only print
> when there's error in -icount mode.

No, it's totally fine to keep them, especially since they only appear
when one runs a test manually.

So thanks for the explanation, those numbers _are_ useful, it was just
me being clueless and/or ignorant ;-)

Cheers,
Andre

P.S. It looks like we should have some documentation, explaining these
numbers, for instance, and the expected results. Along with explanations
what all the other tests do, possibly with pointers where to look for in
case of failures. </enthusiasm>
And no, I am not volunteering, at least not for the PMU ...


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01 22:04         ` André Przywara
  0 siblings, 0 replies; 48+ messages in thread
From: André Przywara @ 2016-12-01 22:04 UTC (permalink / raw)
  To: Christopher Covington, Wei Huang
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

On 01/12/16 21:18, Christopher Covington wrote:
> On 12/01/2016 03:27 PM, Andre Przywara wrote:

Hi,

....

>>> +		}
>>> +		avg = sum / NR_SAMPLES;
>>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
>>
>>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>>                        i > avg ? "ipc" : "cpi");
>>
>> In general I question the usefulness of the cpi/ipc output, it didn't
>> seem meaningful in any way to me, neither in KVM or in TCG.
>> See the last line (68: ...) in the example above, we shouldn't use an
>> average with that deviation for statistical purposes.
>> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
>> any real information to me, in fact the actual cycles look like constant
>> to me, probably due to emulation overhead.
>>
>> So what are we supposed to learn from those numbers?
> 
> I think they were mostly useful in debugging the checking of TCG's
> -icount mode, where the numbers are precise.
> 
> I think seeing variable numbers from TCG when -icount is off illustrates
> why -icount is useful. But justifying TCG best practices is a non-goal of
> kvm-unit-tests.
> 
> I'd like to think is possible to see anomalies in the KVM info which are
> due to bugs, but perhaps that's unrealistic or unlikely.
> 
> Feel free to drop the prints, or only print in -icount mode, or only print
> when there's error in -icount mode.

No, it's totally fine to keep them, especially since they only appear
when one runs a test manually.

So thanks for the explanation, those numbers _are_ useful, it was just
me being clueless and/or ignorant ;-)

Cheers,
Andre

P.S. It looks like we should have some documentation, explaining these
numbers, for instance, and the expected results. Along with explanations
what all the other tests do, possibly with pointers where to look for in
case of failures. </enthusiasm>
And no, I am not volunteering, at least not for the PMU ...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
  2016-12-01 21:12       ` [Qemu-devel] " Wei Huang
@ 2016-12-01 22:11         ` André Przywara
  -1 siblings, 0 replies; 48+ messages in thread
From: André Przywara @ 2016-12-01 22:11 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

On 01/12/16 21:12, Wei Huang wrote:

Hi Wei,

> On 12/01/2016 02:27 PM, Andre Przywara wrote:
>> Hi,
>>
>> On 01/12/16 05:16, Wei Huang wrote:
>>> From: Christopher Covington <cov@codeaurora.org>
>>>
>>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>>> PMU cycle counter values. The code includes a strict checking facility
>>> intended for the -icount option in TCG mode in the configuration file.
>>>
>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>>> ---
>>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  arm/unittests.cfg |  14 +++++++
>>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>> index 3566a27..29d7c2c 100644
>>> --- a/arm/pmu.c
>>> +++ b/arm/pmu.c
>>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>>  	set_pmxevtyper(value);
>>>  	isb();
>>>  }
>>> +
>>> +/*
>>> + * Extra instructions inserted by the compiler would be difficult to compensate
>>> + * for, so hand assemble everything between, and including, the PMCR accesses
>>> + * to start and stop counting. isb instructions were inserted to make sure
>>> + * pmccntr read after this function returns the exact instructions executed in
>>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>>> + */
>>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>>> +{
>>> +	asm volatile(
>>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>>> +	"	isb\n"
>>> +	"1:	subs	%[loop], %[loop], #1\n"
>>> +	"	bgt	1b\n"
>>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>>> +	"	isb\n"
>>> +	: [loop] "+r" (loop)
>>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>>> +	: "cc");
>>> +}
>>>  #elif defined(__aarch64__)
>>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>>  DEFINE_SET_SYSREG32(pmcr, el0)
>>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>>> +
>>> +/*
>>> + * Extra instructions inserted by the compiler would be difficult to compensate
>>> + * for, so hand assemble everything between, and including, the PMCR accesses
>>> + * to start and stop counting. isb instructions are inserted to make sure
>>> + * pmccntr read after this function returns the exact instructions executed
>>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>>> + */
>>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>>> +{
>>> +	asm volatile(
>>> +	"	msr	pmcr_el0, %[pmcr]\n"
>>> +	"	isb\n"
>>> +	"1:	subs	%[loop], %[loop], #1\n"
>>> +	"	b.gt	1b\n"
>>> +	"	msr	pmcr_el0, xzr\n"
>>> +	"	isb\n"
>>> +	: [loop] "+r" (loop)
>>> +	: [pmcr] "r" (pmcr)
>>> +	: "cc");
>>> +}
>>>  #endif
>>>  
>>>  /*
>>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>>  	return success;
>>>  }
>>>  
>>> +/*
>>> + * Execute a known number of guest instructions. Only even instruction counts
>>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>>> + * for example for the cycle counter or event counters to be reset). At the end
>>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>>> + * counting, allowing the cycle counter or event counters to be read at the
>>> + * leisure of the calling code.
>>> + */
>>> +static void measure_instrs(int num, uint32_t pmcr)
>>> +{
>>> +	int loop = (num - 2) / 2;
>>> +
>>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>>> +	precise_instrs_loop(loop, pmcr);
>>> +}
>>> +
>>> +/*
>>> + * Measure cycle counts for various known instruction counts. Ensure that the
>>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>>> + * instructions and using reset and stop controls). If supplied a positive,
>>> + * nonzero CPI parameter, also strictly check that every measurement matches
>>> + * it. Strict CPI checking is used to test -icount mode.
>>> + */
>>> +static bool check_cpi(int cpi)
>>> +{
>>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>>> +
>>> +	/* init before event access, this test only cares about cycle count */
>>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>>> +
>>> +	if (cpi > 0)
>>> +		printf("Checking for CPI=%d.\n", cpi);
>>> +	printf("instrs : cycles0 cycles1 ...\n");
>>
>> Do we really need this line?
>>
>> In general I find the output quite confusing, actually distracting from
>> the other, actual tests. To make it more readable, I tweaked it a bit to
>> look like:
> 
> Formatting the output can be useful and it indeed makes the output
> easier to read.
> 
>>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
>> ....
>> with some padding hints and limiting the line to at most 80 characters, by:
>>
>>> +
>>> +	for (unsigned int i = 4; i < 300; i += 32) {
>>> +		uint64_t avg, sum = 0;
>>> +
>>> +		printf("%d :", i);
>>
>>                 printf("%3d: ", i);
>>
>>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>>> +			uint64_t cycles;
>>> +
>>> +			set_pmccntr(0);
>>> +			measure_instrs(i, pmcr);
>>> +			cycles = get_pmccntr();
>>> +			printf(" %"PRId64"", cycles);
>>
>>                         printf(" %4"PRId64"", cycles);
>>
>>> +
>>> +			if (!cycles) {
>>> +				printf("\ncycles not incrementing!\n");
>>> +				return false;
>>> +			} else if (cpi > 0 && cycles != i * cpi) {
>>> +				printf("\nunexpected cycle count received!\n");
>>> +				return false;
>>> +			} else if ((cycles >> 32) != 0) {
>>> +				/* The cycles taken by the loop above should
>>> +				 * fit in 32 bits easily. We check the upper
>>> +				 * 32 bits of the cycle counter to make sure
>>> +				 * there is no supprise. */
>>> +				printf("\ncycle count bigger than 32bit!\n");
>>> +				return false;
>>> +			}
>>> +
>>> +			sum += cycles;
>>> +		}
>>> +		avg = sum / NR_SAMPLES;
>>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
>>
>>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>>                        i > avg ? "ipc" : "cpi");
>>
>> In general I question the usefulness of the cpi/ipc output, it didn't
>> seem meaningful in any way to me, neither in KVM or in TCG.
> 
> For KVM, CPI is useful for (vaguely) figuring out the total time spent
> on emulation: KVM exit, perf_event calls, returning results. This
> especially is true when i is small. For TCG, CPI is related to the cpi
> parameter passed from main() function. The average CPI in check_cpi()
> should always be the same as the one from main() under TCG mode;
> otherwise QEMU is wrong. So I think CPI is still useful. But I agree
> that IPC can be removed.

If you follow my snippet above, it gives you both. One of them is always
zero anyway, so we just need to print one number and the proper unit.

>> See the last line (68: ...) in the example above, we shouldn't use an
>> average with that deviation for statistical purposes.
>> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
>> any real information to me, in fact the actual cycles look like constant
>> to me, probably due to emulation overhead.
> 
> Constants should only happen under TCG modes, which is expected.

Ah, OK, thanks for the explanation. So I am absolutely fine with keeping
those numbers in then.

Sorry for the noise and my rather harsh comments!

Thanks!
Andre.

>> So what are we supposed to learn from those numbers?
>>
>> Cheers,
>> Andre.
>>
>>> +	}
>>> +
>>> +	return true;
>>> +}
>>> +
>>>  void pmu_init(void)
>>>  {
>>>  	uint32_t dfr0;
>>> @@ -144,13 +259,19 @@ void pmu_init(void)
>>>  	report_info("PMU version: %d", pmu_version);
>>>  }
>>>  
>>> -int main(void)
>>> +int main(int argc, char *argv[])
>>>  {
>>> +	int cpi = 0;
>>> +
>>> +	if (argc > 1)
>>> +		cpi = atol(argv[1]);
>>> +
>>>  	report_prefix_push("pmu");
>>>  
>>>  	pmu_init();
>>>  	report("Control register", check_pmcr());
>>>  	report("Monotonically increasing cycle count", check_cycles_increase());
>>> +	report("Cycle/instruction ratio", check_cpi(cpi));
>>>  
>>>  	return report_summary();
>>>  }
>>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>>> index 816f494..044d97c 100644
>>> --- a/arm/unittests.cfg
>>> +++ b/arm/unittests.cfg
>>> @@ -63,3 +63,17 @@ groups = pci
>>>  [pmu]
>>>  file = pmu.flat
>>>  groups = pmu
>>> +
>>> +# Test PMU support (TCG) with -icount IPC=1
>>> +[pmu-tcg-icount-1]
>>> +file = pmu.flat
>>> +extra_params = -icount 0 -append '1'
>>> +groups = pmu
>>> +accel = tcg
>>> +
>>> +# Test PMU support (TCG) with -icount IPC=256
>>> +[pmu-tcg-icount-256]
>>> +file = pmu.flat
>>> +extra_params = -icount 8 -append '256'
>>> +groups = pmu
>>> +accel = tcg
>>>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking
@ 2016-12-01 22:11         ` André Przywara
  0 siblings, 0 replies; 48+ messages in thread
From: André Przywara @ 2016-12-01 22:11 UTC (permalink / raw)
  To: Wei Huang, cov
  Cc: qemu-devel, kvm, kvmarm, shannon.zhao, alistair.francis,
	croberts, alindsay, drjones

On 01/12/16 21:12, Wei Huang wrote:

Hi Wei,

> On 12/01/2016 02:27 PM, Andre Przywara wrote:
>> Hi,
>>
>> On 01/12/16 05:16, Wei Huang wrote:
>>> From: Christopher Covington <cov@codeaurora.org>
>>>
>>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>>> PMU cycle counter values. The code includes a strict checking facility
>>> intended for the -icount option in TCG mode in the configuration file.
>>>
>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>> Signed-off-by: Wei Huang <wei@redhat.com>
>>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>>> ---
>>>  arm/pmu.c         | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  arm/unittests.cfg |  14 +++++++
>>>  2 files changed, 136 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>> index 3566a27..29d7c2c 100644
>>> --- a/arm/pmu.c
>>> +++ b/arm/pmu.c
>>> @@ -69,6 +69,27 @@ static inline void set_pmccfiltr(uint32_t value)
>>>  	set_pmxevtyper(value);
>>>  	isb();
>>>  }
>>> +
>>> +/*
>>> + * Extra instructions inserted by the compiler would be difficult to compensate
>>> + * for, so hand assemble everything between, and including, the PMCR accesses
>>> + * to start and stop counting. isb instructions were inserted to make sure
>>> + * pmccntr read after this function returns the exact instructions executed in
>>> + * the controlled block. Total instrs = isb + mcr + 2*loop = 2 + 2*loop.
>>> + */
>>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>>> +{
>>> +	asm volatile(
>>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>>> +	"	isb\n"
>>> +	"1:	subs	%[loop], %[loop], #1\n"
>>> +	"	bgt	1b\n"
>>> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
>>> +	"	isb\n"
>>> +	: [loop] "+r" (loop)
>>> +	: [pmcr] "r" (pmcr), [z] "r" (0)
>>> +	: "cc");
>>> +}
>>>  #elif defined(__aarch64__)
>>>  DEFINE_GET_SYSREG32(pmcr, el0)
>>>  DEFINE_SET_SYSREG32(pmcr, el0)
>>> @@ -77,6 +98,27 @@ DEFINE_GET_SYSREG64(pmccntr, el0);
>>>  DEFINE_SET_SYSREG64(pmccntr, el0);
>>>  DEFINE_SET_SYSREG32(pmcntenset, el0);
>>>  DEFINE_SET_SYSREG32(pmccfiltr, el0);
>>> +
>>> +/*
>>> + * Extra instructions inserted by the compiler would be difficult to compensate
>>> + * for, so hand assemble everything between, and including, the PMCR accesses
>>> + * to start and stop counting. isb instructions are inserted to make sure
>>> + * pmccntr read after this function returns the exact instructions executed
>>> + * in the controlled block. Total instrs = isb + msr + 2*loop = 2 + 2*loop.
>>> + */
>>> +static inline void precise_instrs_loop(int loop, uint32_t pmcr)
>>> +{
>>> +	asm volatile(
>>> +	"	msr	pmcr_el0, %[pmcr]\n"
>>> +	"	isb\n"
>>> +	"1:	subs	%[loop], %[loop], #1\n"
>>> +	"	b.gt	1b\n"
>>> +	"	msr	pmcr_el0, xzr\n"
>>> +	"	isb\n"
>>> +	: [loop] "+r" (loop)
>>> +	: [pmcr] "r" (pmcr)
>>> +	: "cc");
>>> +}
>>>  #endif
>>>  
>>>  /*
>>> @@ -134,6 +176,79 @@ static bool check_cycles_increase(void)
>>>  	return success;
>>>  }
>>>  
>>> +/*
>>> + * Execute a known number of guest instructions. Only even instruction counts
>>> + * greater than or equal to 4 are supported by the in-line assembly code. The
>>> + * control register (PMCR_EL0) is initialized with the provided value (allowing
>>> + * for example for the cycle counter or event counters to be reset). At the end
>>> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
>>> + * counting, allowing the cycle counter or event counters to be read at the
>>> + * leisure of the calling code.
>>> + */
>>> +static void measure_instrs(int num, uint32_t pmcr)
>>> +{
>>> +	int loop = (num - 2) / 2;
>>> +
>>> +	assert(num >= 4 && ((num - 2) % 2 == 0));
>>> +	precise_instrs_loop(loop, pmcr);
>>> +}
>>> +
>>> +/*
>>> + * Measure cycle counts for various known instruction counts. Ensure that the
>>> + * cycle counter progresses (similar to check_cycles_increase() but with more
>>> + * instructions and using reset and stop controls). If supplied a positive,
>>> + * nonzero CPI parameter, also strictly check that every measurement matches
>>> + * it. Strict CPI checking is used to test -icount mode.
>>> + */
>>> +static bool check_cpi(int cpi)
>>> +{
>>> +	uint32_t pmcr = get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
>>> +
>>> +	/* init before event access, this test only cares about cycle count */
>>> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
>>> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
>>> +
>>> +	if (cpi > 0)
>>> +		printf("Checking for CPI=%d.\n", cpi);
>>> +	printf("instrs : cycles0 cycles1 ...\n");
>>
>> Do we really need this line?
>>
>> In general I find the output quite confusing, actually distracting from
>> the other, actual tests. To make it more readable, I tweaked it a bit to
>> look like:
> 
> Formatting the output can be useful and it indeed makes the output
> easier to read.
> 
>>   4: 9996  173  222  122  118  119  120  212  240  233 avg=1155: 288 cpi
>>  36:  773  282  291  314  291  335  315  264  162  308 avg= 333:   9 cpi
>>  68:  229  356  400  339  203  201  335  233  201  372 avg= 286:   4 cpi
>> ....
>> with some padding hints and limiting the line to at most 80 characters, by:
>>
>>> +
>>> +	for (unsigned int i = 4; i < 300; i += 32) {
>>> +		uint64_t avg, sum = 0;
>>> +
>>> +		printf("%d :", i);
>>
>>                 printf("%3d: ", i);
>>
>>> +		for (int j = 0; j < NR_SAMPLES; j++) {
>>> +			uint64_t cycles;
>>> +
>>> +			set_pmccntr(0);
>>> +			measure_instrs(i, pmcr);
>>> +			cycles = get_pmccntr();
>>> +			printf(" %"PRId64"", cycles);
>>
>>                         printf(" %4"PRId64"", cycles);
>>
>>> +
>>> +			if (!cycles) {
>>> +				printf("\ncycles not incrementing!\n");
>>> +				return false;
>>> +			} else if (cpi > 0 && cycles != i * cpi) {
>>> +				printf("\nunexpected cycle count received!\n");
>>> +				return false;
>>> +			} else if ((cycles >> 32) != 0) {
>>> +				/* The cycles taken by the loop above should
>>> +				 * fit in 32 bits easily. We check the upper
>>> +				 * 32 bits of the cycle counter to make sure
>>> +				 * there is no supprise. */
>>> +				printf("\ncycle count bigger than 32bit!\n");
>>> +				return false;
>>> +			}
>>> +
>>> +			sum += cycles;
>>> +		}
>>> +		avg = sum / NR_SAMPLES;
>>> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
>>> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
>>
>>                 printf(" avg=%4"PRId64": %3"PRId64" %s\n",
>>                        sum / NR_SAMPLES, i > avg ? i / avg : avg / i,
>>                        i > avg ? "ipc" : "cpi");
>>
>> In general I question the usefulness of the cpi/ipc output, it didn't
>> seem meaningful in any way to me, neither in KVM or in TCG.
> 
> For KVM, CPI is useful for (vaguely) figuring out the total time spent
> on emulation: KVM exit, perf_event calls, returning results. This
> especially is true when i is small. For TCG, CPI is related to the cpi
> parameter passed from main() function. The average CPI in check_cpi()
> should always be the same as the one from main() under TCG mode;
> otherwise QEMU is wrong. So I think CPI is still useful. But I agree
> that IPC can be removed.

If you follow my snippet above, it gives you both. One of them is always
zero anyway, so we just need to print one number and the proper unit.

>> See the last line (68: ...) in the example above, we shouldn't use an
>> average with that deviation for statistical purposes.
>> For KVM I get values ranging from 60 to 4383 cpi, which doesn't convey
>> any real information to me, in fact the actual cycles look like constant
>> to me, probably due to emulation overhead.
> 
> Constants should only happen under TCG modes, which is expected.

Ah, OK, thanks for the explanation. So I am absolutely fine with keeping
those numbers in then.

Sorry for the noise and my rather harsh comments!

Thanks!
Andre.

>> So what are we supposed to learn from those numbers?
>>
>> Cheers,
>> Andre.
>>
>>> +	}
>>> +
>>> +	return true;
>>> +}
>>> +
>>>  void pmu_init(void)
>>>  {
>>>  	uint32_t dfr0;
>>> @@ -144,13 +259,19 @@ void pmu_init(void)
>>>  	report_info("PMU version: %d", pmu_version);
>>>  }
>>>  
>>> -int main(void)
>>> +int main(int argc, char *argv[])
>>>  {
>>> +	int cpi = 0;
>>> +
>>> +	if (argc > 1)
>>> +		cpi = atol(argv[1]);
>>> +
>>>  	report_prefix_push("pmu");
>>>  
>>>  	pmu_init();
>>>  	report("Control register", check_pmcr());
>>>  	report("Monotonically increasing cycle count", check_cycles_increase());
>>> +	report("Cycle/instruction ratio", check_cpi(cpi));
>>>  
>>>  	return report_summary();
>>>  }
>>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>>> index 816f494..044d97c 100644
>>> --- a/arm/unittests.cfg
>>> +++ b/arm/unittests.cfg
>>> @@ -63,3 +63,17 @@ groups = pci
>>>  [pmu]
>>>  file = pmu.flat
>>>  groups = pmu
>>> +
>>> +# Test PMU support (TCG) with -icount IPC=1
>>> +[pmu-tcg-icount-1]
>>> +file = pmu.flat
>>> +extra_params = -icount 0 -append '1'
>>> +groups = pmu
>>> +accel = tcg
>>> +
>>> +# Test PMU support (TCG) with -icount IPC=256
>>> +[pmu-tcg-icount-256]
>>> +file = pmu.flat
>>> +extra_params = -icount 8 -append '256'
>>> +groups = pmu
>>> +accel = tcg
>>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
  2016-12-01 17:36     ` Wei Huang
@ 2016-12-02  9:58         ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-02  9:58 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 11:36:55AM -0600, Wei Huang wrote:
> 
> 
> On 12/01/2016 03:18 AM, Andrew Jones wrote:
> > On Wed, Nov 30, 2016 at 11:16:41PM -0600, Wei Huang wrote:
> >> From: Christopher Covington <cov@codeaurora.org>
> >>
> >> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >> even for the smallest delta of two subsequent reads.
> >>
> >> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> Reviewed-by: Andrew Jones <drjones@redhat.com>
> >> ---
> >>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 94 insertions(+)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 1fe2b1a..3566a27 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -16,6 +16,9 @@
> >>  #include "asm/barrier.h"
> >>  #include "asm/processor.h"
> >>  
> >> +#define PMU_PMCR_E         (1 << 0)
> >> +#define PMU_PMCR_C         (1 << 2)
> >> +#define PMU_PMCR_LC        (1 << 6)
> >>  #define PMU_PMCR_N_SHIFT   11
> >>  #define PMU_PMCR_N_MASK    0x1f
> >>  #define PMU_PMCR_ID_SHIFT  16
> >> @@ -23,10 +26,57 @@
> >>  #define PMU_PMCR_IMP_SHIFT 24
> >>  #define PMU_PMCR_IMP_MASK  0xff
> >>  
> >> +#define ID_DFR0_PERFMON_SHIFT 24
> >> +#define ID_DFR0_PERFMON_MASK  0xf
> >> +
> >> +#define PMU_CYCLE_IDX         31
> >> +
> >> +#define NR_SAMPLES 10
> >> +
> >> +static unsigned int pmu_version;
> >>  #if defined(__arm__)
> >>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> >> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> >> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> >> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> >> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> >> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> >> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> >> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> >> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> >> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
> > 
> > Seeing how we get lots of redundant looking lines, I think instead
> > of defining DEFINE_SET/GET_SYSREG32/64, we should instead have
> > 
> > DEFINE_SYSREG32/64      ... creates both get_<reg> and set_<reg>
> > DEFINE_SYSREG32/64_RO   ... creates just get_<reg>
> 
> Don't like the naming. I think we can create a new macro, named
> DEFINE_GET_SET_SYSREG32/64. I know it is boring, but readers should get
> the idea easily.

I don't like the looks of DEFINE_GET_SET_SYSREG32/64... But we don't
need the _RO version I proposed. Just DEFINE_SYSREG32/64, which makes
both get/set is fine. Unit tests shouldn't be restricted on attempting
to write r/o registers - they're testing precisely that type of thing,
and they shouldn't have to write their own set accessors to do it either.

> 
> > 
> >> +
> >> +static inline uint64_t get_pmccntr(void)
> >> +{
> >> +	if (pmu_version == 0x3)
> >> +		return get_pmccntr64();
> >> +	else
> >> +		return get_pmccntr32();
> >> +}
> >> +
> >> +static inline void set_pmccntr(uint64_t value)
> >> +{
> >> +	if (pmu_version == 0x3)
> >> +		set_pmccntr64(value);
> >> +	else
> >> +		set_pmccntr32(value & 0xffffffff);
> >> +}
> > 
> > So the two accessors above are exceptional, which is why we don't
> > use SYSREG for them. These can have uint64_t for there external
> > interface. We can't require 'unsigned long' or 'unsigned long long'
> > 
> >> +
> >> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> >> +static inline void set_pmccfiltr(uint32_t value)
> >> +{
> >> +	set_pmselr(PMU_CYCLE_IDX);
> >> +	set_pmxevtyper(value);
> >> +	isb();
> >> +}
> >>  #elif defined(__aarch64__)
> >>  DEFINE_GET_SYSREG32(pmcr, el0)
> >> +DEFINE_SET_SYSREG32(pmcr, el0)
> >> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> >> +DEFINE_GET_SYSREG64(pmccntr, el0);
> >> +DEFINE_SET_SYSREG64(pmccntr, el0);
> >> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> >> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
> >>  #endif
> >>  
> >>  /*
> >> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
> >>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
> >>  }
> >>  
> >> +/*
> >> + * Ensure that the cycle counter progresses between back-to-back reads.
> >> + */
> >> +static bool check_cycles_increase(void)
> >> +{
> >> +	bool success = true;
> >> +
> >> +	/* init before event access, this test only cares about cycle count */
> >> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> >> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> >> +	set_pmccntr(0);
> >> +
> >> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> >> +
> >> +	for (int i = 0; i < NR_SAMPLES; i++) {
> >> +		uint64_t a, b;
> >> +
> >> +		a = get_pmccntr();
> >> +		b = get_pmccntr();
> >> +
> >> +		if (a >= b) {
> >> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> >> +			success = false;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> >> +
> >> +	return success;
> >> +}
> >> +
> >> +void pmu_init(void)
> >> +{
> >> +	uint32_t dfr0;
> >> +
> >> +	/* probe pmu version */
> >> +	dfr0 = get_id_dfr0();
> >> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> >> +	report_info("PMU version: %d", pmu_version);
> >> +}
> >> +
> >>  int main(void)
> >>  {
> >>  	report_prefix_push("pmu");
> >>  
> >> +	pmu_init();
> >>  	report("Control register", check_pmcr());
> >> +	report("Monotonically increasing cycle count", check_cycles_increase());
> >>  
> >>  	return report_summary();
> >>  }
> >> -- 
> >> 1.8.3.1
> >>
> >>
> > 
> > drew 
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases
@ 2016-12-02  9:58         ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2016-12-02  9:58 UTC (permalink / raw)
  To: Wei Huang
  Cc: alindsay, kvm, andre.przywara, croberts, qemu-devel,
	alistair.francis, cov, kvmarm, shannon.zhao

On Thu, Dec 01, 2016 at 11:36:55AM -0600, Wei Huang wrote:
> 
> 
> On 12/01/2016 03:18 AM, Andrew Jones wrote:
> > On Wed, Nov 30, 2016 at 11:16:41PM -0600, Wei Huang wrote:
> >> From: Christopher Covington <cov@codeaurora.org>
> >>
> >> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >> even for the smallest delta of two subsequent reads.
> >>
> >> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> Reviewed-by: Andrew Jones <drjones@redhat.com>
> >> ---
> >>  arm/pmu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 94 insertions(+)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 1fe2b1a..3566a27 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -16,6 +16,9 @@
> >>  #include "asm/barrier.h"
> >>  #include "asm/processor.h"
> >>  
> >> +#define PMU_PMCR_E         (1 << 0)
> >> +#define PMU_PMCR_C         (1 << 2)
> >> +#define PMU_PMCR_LC        (1 << 6)
> >>  #define PMU_PMCR_N_SHIFT   11
> >>  #define PMU_PMCR_N_MASK    0x1f
> >>  #define PMU_PMCR_ID_SHIFT  16
> >> @@ -23,10 +26,57 @@
> >>  #define PMU_PMCR_IMP_SHIFT 24
> >>  #define PMU_PMCR_IMP_MASK  0xff
> >>  
> >> +#define ID_DFR0_PERFMON_SHIFT 24
> >> +#define ID_DFR0_PERFMON_MASK  0xf
> >> +
> >> +#define PMU_CYCLE_IDX         31
> >> +
> >> +#define NR_SAMPLES 10
> >> +
> >> +static unsigned int pmu_version;
> >>  #if defined(__arm__)
> >>  DEFINE_GET_SYSREG32(pmcr, 0, c9, c12, 0)
> >> +DEFINE_SET_SYSREG32(pmcr, 0, c9, c12, 0)
> >> +DEFINE_GET_SYSREG32(id_dfr0, 0, c0, c1, 2)
> >> +DEFINE_SET_SYSREG32(pmselr, 0, c9, c12, 5)
> >> +DEFINE_SET_SYSREG32(pmxevtyper, 0, c9, c13, 1)
> >> +DEFINE_GET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> >> +DEFINE_SET_SYSREG32(pmccntr32, 0, c9, c13, 0)
> >> +DEFINE_GET_SYSREG64(pmccntr64, 0, c9)
> >> +DEFINE_SET_SYSREG64(pmccntr64, 0, c9)
> >> +DEFINE_SET_SYSREG32(pmcntenset, 0, c9, c12, 1)
> > 
> > Seeing how we get lots of redundant looking lines, I think instead
> > of defining DEFINE_SET/GET_SYSREG32/64, we should instead have
> > 
> > DEFINE_SYSREG32/64      ... creates both get_<reg> and set_<reg>
> > DEFINE_SYSREG32/64_RO   ... creates just get_<reg>
> 
> Don't like the naming. I think we can create a new macro, named
> DEFINE_GET_SET_SYSREG32/64. I know it is boring, but readers should get
> the idea easily.

I don't like the looks of DEFINE_GET_SET_SYSREG32/64... But we don't
need the _RO version I proposed. Just DEFINE_SYSREG32/64, which makes
both get/set is fine. Unit tests shouldn't be restricted on attempting
to write r/o registers - they're testing precisely that type of thing,
and they shouldn't have to write their own set accessors to do it either.

> 
> > 
> >> +
> >> +static inline uint64_t get_pmccntr(void)
> >> +{
> >> +	if (pmu_version == 0x3)
> >> +		return get_pmccntr64();
> >> +	else
> >> +		return get_pmccntr32();
> >> +}
> >> +
> >> +static inline void set_pmccntr(uint64_t value)
> >> +{
> >> +	if (pmu_version == 0x3)
> >> +		set_pmccntr64(value);
> >> +	else
> >> +		set_pmccntr32(value & 0xffffffff);
> >> +}
> > 
> > So the two accessors above are exceptional, which is why we don't
> > use SYSREG for them. These can have uint64_t for there external
> > interface. We can't require 'unsigned long' or 'unsigned long long'
> > 
> >> +
> >> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> >> +static inline void set_pmccfiltr(uint32_t value)
> >> +{
> >> +	set_pmselr(PMU_CYCLE_IDX);
> >> +	set_pmxevtyper(value);
> >> +	isb();
> >> +}
> >>  #elif defined(__aarch64__)
> >>  DEFINE_GET_SYSREG32(pmcr, el0)
> >> +DEFINE_SET_SYSREG32(pmcr, el0)
> >> +DEFINE_GET_SYSREG32(id_dfr0, el1)
> >> +DEFINE_GET_SYSREG64(pmccntr, el0);
> >> +DEFINE_SET_SYSREG64(pmccntr, el0);
> >> +DEFINE_SET_SYSREG32(pmcntenset, el0);
> >> +DEFINE_SET_SYSREG32(pmccfiltr, el0);
> >>  #endif
> >>  
> >>  /*
> >> @@ -52,11 +102,55 @@ static bool check_pmcr(void)
> >>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
> >>  }
> >>  
> >> +/*
> >> + * Ensure that the cycle counter progresses between back-to-back reads.
> >> + */
> >> +static bool check_cycles_increase(void)
> >> +{
> >> +	bool success = true;
> >> +
> >> +	/* init before event access, this test only cares about cycle count */
> >> +	set_pmcntenset(1 << PMU_CYCLE_IDX);
> >> +	set_pmccfiltr(0); /* count cycles in EL0, EL1, but not EL2 */
> >> +	set_pmccntr(0);
> >> +
> >> +	set_pmcr(get_pmcr() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> >> +
> >> +	for (int i = 0; i < NR_SAMPLES; i++) {
> >> +		uint64_t a, b;
> >> +
> >> +		a = get_pmccntr();
> >> +		b = get_pmccntr();
> >> +
> >> +		if (a >= b) {
> >> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> >> +			success = false;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	set_pmcr(get_pmcr() & ~PMU_PMCR_E);
> >> +
> >> +	return success;
> >> +}
> >> +
> >> +void pmu_init(void)
> >> +{
> >> +	uint32_t dfr0;
> >> +
> >> +	/* probe pmu version */
> >> +	dfr0 = get_id_dfr0();
> >> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> >> +	report_info("PMU version: %d", pmu_version);
> >> +}
> >> +
> >>  int main(void)
> >>  {
> >>  	report_prefix_push("pmu");
> >>  
> >> +	pmu_init();
> >>  	report("Control register", check_pmcr());
> >> +	report("Monotonically increasing cycle count", check_cycles_increase());
> >>  
> >>  	return report_summary();
> >>  }
> >> -- 
> >> 1.8.3.1
> >>
> >>
> > 
> > drew 
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-12-02  9:58 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-01  5:16 [kvm-unit-tests PATCH v13 0/4] ARM PMU tests Wei Huang
2016-12-01  5:16 ` [Qemu-devel] " Wei Huang
2016-12-01  5:16 ` [kvm-unit-tests PATCH v13 1/4] arm: Define macros for accessing system registers Wei Huang
2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
2016-12-01  8:59   ` Andrew Jones
2016-12-01  8:59     ` Andrew Jones
2016-12-01  9:38     ` Andrew Jones
2016-12-01  9:38       ` Andrew Jones
2016-12-01 11:11     ` Andre Przywara
2016-12-01 13:16       ` Andrew Jones
2016-12-01 15:27     ` Wei Huang
2016-12-01 15:27       ` Wei Huang
2016-12-01 15:50       ` Andrew Jones
2016-12-01  5:16 ` [kvm-unit-tests PATCH v13 2/4] arm: Add PMU test Wei Huang
2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
2016-12-01  9:03   ` Andrew Jones
2016-12-01  9:03     ` Andrew Jones
2016-12-01 11:28     ` Andre Przywara
2016-12-01 12:02       ` Peter Maydell
2016-12-01 12:02         ` Peter Maydell
2016-12-01 12:19         ` Andre Przywara
2016-12-01 12:36           ` Peter Maydell
2016-12-01  5:16 ` [kvm-unit-tests PATCH v13 3/4] arm: pmu: Check cycle count increases Wei Huang
2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
2016-12-01  9:18   ` Andrew Jones
2016-12-01  9:18     ` Andrew Jones
2016-12-01 17:36     ` Wei Huang
2016-12-02  9:58       ` Andrew Jones
2016-12-02  9:58         ` Andrew Jones
2016-12-01 11:27   ` Andre Przywara
2016-12-01 11:27     ` [Qemu-devel] " Andre Przywara
2016-12-01 17:39     ` Wei Huang
2016-12-01 17:39       ` [Qemu-devel] " Wei Huang
2016-12-01  5:16 ` [kvm-unit-tests PATCH v13 4/4] arm: pmu: Add CPI checking Wei Huang
2016-12-01  5:16   ` [Qemu-devel] " Wei Huang
2016-12-01  9:26   ` Andrew Jones
2016-12-01 10:19     ` Andre Przywara
2016-12-01 13:47       ` Andrew Jones
2016-12-01 20:27   ` Andre Przywara
2016-12-01 20:27     ` [Qemu-devel] " Andre Przywara
2016-12-01 21:12     ` Wei Huang
2016-12-01 21:12       ` [Qemu-devel] " Wei Huang
2016-12-01 22:11       ` André Przywara
2016-12-01 22:11         ` [Qemu-devel] " André Przywara
2016-12-01 21:18     ` Christopher Covington
2016-12-01 21:18       ` [Qemu-devel] " Christopher Covington
2016-12-01 22:04       ` André Przywara
2016-12-01 22:04         ` [Qemu-devel] " André Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.