linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Cavium ThunderX PMU support
@ 2016-02-03 17:11 Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events Jan Glauber
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:11 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Hi,

I'm reposting the whole series just in case my previous attempt to
repost only the broken patch was confusing. Patches are based on
4.5-rc2.

Patches 1-3 add support for ThunderX specific PMU events.

Patch 4 changes the cycle counter to overflow on 64 bit but tries to minimize
code changes. Without this change perf does not work at all on ThunderX.

Patch 5 extends the event mask according to ARMv8.1 and also affects arm32.

Changes to v2:
- fixed arm compile errors

Changes to v1:
- renamed thunderx dt pmu binding to thunder

--Jan

Jan Glauber (5):
  arm64/perf: Rename Cortex A57 events
  arm64/perf: Add Cavium ThunderX PMU support
  arm64: dts: Add Cavium ThunderX specific PMU
  arm64/perf: Enable PMCR long cycle counter bit
  arm64/perf: Extend event mask for ARMv8.1

 Documentation/devicetree/bindings/arm/pmu.txt |   1 +
 arch/arm/kernel/perf_event_v6.c               |   6 +-
 arch/arm/kernel/perf_event_v7.c               |  29 ++++--
 arch/arm/kernel/perf_event_xscale.c           |   4 +-
 arch/arm64/boot/dts/cavium/thunder-88xx.dtsi  |   5 +
 arch/arm64/kernel/perf_event.c                | 145 ++++++++++++++++++++------
 drivers/perf/arm_pmu.c                        |   5 +-
 include/linux/perf/arm_pmu.h                  |   4 +-
 8 files changed, 151 insertions(+), 48 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
@ 2016-02-03 17:11 ` Jan Glauber
  2016-02-15 19:40   ` Will Deacon
  2016-02-03 17:11 ` [PATCH v3 2/5] arm64/perf: Add Cavium ThunderX PMU support Jan Glauber
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:11 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

The implemented Cortex A57 events are not A57 specific.
They are recommended by ARM and can be found on other
ARMv8 SOCs like Cavium ThunderX too. Therefore move
these events to the common PMUv3 table.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index f7ab14c..32fe656 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -87,17 +87,17 @@
 #define ARMV8_PMUV3_PERFCTR_L2D_TLB				0x2F
 #define ARMV8_PMUV3_PERFCTR_L21_TLB				0x30
 
+/* Recommended events. */
+#define ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_LD			0x40
+#define ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_ST			0x41
+#define ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_LD			0x42
+#define ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_ST			0x43
+#define ARMV8_PMUV3_PERFCTR_DTLB_REFILL_LD			0x4C
+#define ARMV8_PMUV3_PERFCTR_DTLB_REFILL_ST			0x4D
+
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREFETCH_LINEFILL			0xC2
 
-/* ARMv8 Cortex-A57 and Cortex-A72 specific event types. */
-#define ARMV8_A57_PERFCTR_L1_DCACHE_ACCESS_LD			0x40
-#define ARMV8_A57_PERFCTR_L1_DCACHE_ACCESS_ST			0x41
-#define ARMV8_A57_PERFCTR_L1_DCACHE_REFILL_LD			0x42
-#define ARMV8_A57_PERFCTR_L1_DCACHE_REFILL_ST			0x43
-#define ARMV8_A57_PERFCTR_DTLB_REFILL_LD			0x4c
-#define ARMV8_A57_PERFCTR_DTLB_REFILL_ST			0x4d
-
 /* PMUv3 HW events mapping. */
 static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
 	PERF_MAP_ALL_UNSUPPORTED,
@@ -174,16 +174,16 @@ static const unsigned armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 					      [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
 	PERF_CACHE_MAP_ALL_UNSUPPORTED,
 
-	[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_A57_PERFCTR_L1_DCACHE_ACCESS_LD,
-	[C(L1D)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_A57_PERFCTR_L1_DCACHE_REFILL_LD,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_A57_PERFCTR_L1_DCACHE_ACCESS_ST,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_A57_PERFCTR_L1_DCACHE_REFILL_ST,
+	[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_LD,
+	[C(L1D)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_LD,
+	[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_ST,
+	[C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_ST,
 
 	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_ICACHE_ACCESS,
 	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1_ICACHE_REFILL,
 
-	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_A57_PERFCTR_DTLB_REFILL_LD,
-	[C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_A57_PERFCTR_DTLB_REFILL_ST,
+	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_DTLB_REFILL_LD,
+	[C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_DTLB_REFILL_ST,
 
 	[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_ITLB_REFILL,
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] arm64/perf: Add Cavium ThunderX PMU support
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events Jan Glauber
@ 2016-02-03 17:11 ` Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 3/5] arm64: dts: Add Cavium ThunderX specific PMU Jan Glauber
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:11 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Support PMU events on Caviums ThunderX SOC. ThunderX supports
some additional counters compared to the default ARMv8 PMUv3:

- branch instructions counter
- stall frontend & backend counters
- L1 dcache load & store counters
- L1 icache counters
- iTLB & dTLB counters
- L1 dcache & icache prefetch counters

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 69 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 32fe656..c038e4e 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -94,10 +94,19 @@
 #define ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_ST			0x43
 #define ARMV8_PMUV3_PERFCTR_DTLB_REFILL_LD			0x4C
 #define ARMV8_PMUV3_PERFCTR_DTLB_REFILL_ST			0x4D
+#define ARMV8_PMUV3_PERFCTR_DTLB_ACCESS_LD			0x4E
+#define ARMV8_PMUV3_PERFCTR_DTLB_ACCESS_ST			0x4F
 
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREFETCH_LINEFILL			0xC2
 
+/* ARMv8 Cavium Thunder specific event types. */
+#define ARMV8_THUNDER_PERFCTR_L1_DCACHE_MISS_ST			0xE9
+#define ARMV8_THUNDER_PERFCTR_L1_DCACHE_PREF_ACCESS		0xEA
+#define ARMV8_THUNDER_PERFCTR_L1_DCACHE_PREF_MISS		0xEB
+#define ARMV8_THUNDER_PERFCTR_L1_ICACHE_PREF_ACCESS		0xEC
+#define ARMV8_THUNDER_PERFCTR_L1_ICACHE_PREF_MISS		0xED
+
 /* PMUv3 HW events mapping. */
 static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
 	PERF_MAP_ALL_UNSUPPORTED,
@@ -131,6 +140,18 @@ static const unsigned armv8_a57_perf_map[PERF_COUNT_HW_MAX] = {
 	[PERF_COUNT_HW_BUS_CYCLES]		= ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
 };
 
+static const unsigned armv8_thunder_perf_map[PERF_COUNT_HW_MAX] = {
+	PERF_MAP_ALL_UNSUPPORTED,
+	[PERF_COUNT_HW_CPU_CYCLES]		= ARMV8_PMUV3_PERFCTR_CLOCK_CYCLES,
+	[PERF_COUNT_HW_INSTRUCTIONS]		= ARMV8_PMUV3_PERFCTR_INSTR_EXECUTED,
+	[PERF_COUNT_HW_CACHE_REFERENCES]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS,
+	[PERF_COUNT_HW_CACHE_MISSES]		= ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= ARMV8_PMUV3_PERFCTR_PC_WRITE,
+	[PERF_COUNT_HW_BRANCH_MISSES]		= ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND]	= ARMV8_PMUV3_PERFCTR_STALL_BACKEND,
+};
+
 static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 						[PERF_COUNT_HW_CACHE_OP_MAX]
 						[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
@@ -193,6 +214,36 @@ static const unsigned armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 	[C(BPU)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
 };
 
+static const unsigned armv8_thunder_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
+						   [PERF_COUNT_HW_CACHE_OP_MAX]
+						   [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+	PERF_CACHE_MAP_ALL_UNSUPPORTED,
+
+	[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_LD,
+	[C(L1D)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL_LD,
+	[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS_ST,
+	[C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_THUNDER_PERFCTR_L1_DCACHE_MISS_ST,
+	[C(L1D)][C(OP_PREFETCH)][C(RESULT_ACCESS)] = ARMV8_THUNDER_PERFCTR_L1_DCACHE_PREF_ACCESS,
+	[C(L1D)][C(OP_PREFETCH)][C(RESULT_MISS)] = ARMV8_THUNDER_PERFCTR_L1_DCACHE_PREF_MISS,
+
+	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1_ICACHE_ACCESS,
+	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1_ICACHE_REFILL,
+	[C(L1I)][C(OP_PREFETCH)][C(RESULT_ACCESS)] = ARMV8_THUNDER_PERFCTR_L1_ICACHE_PREF_ACCESS,
+	[C(L1I)][C(OP_PREFETCH)][C(RESULT_MISS)] = ARMV8_THUNDER_PERFCTR_L1_ICACHE_PREF_MISS,
+
+	[C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_DTLB_ACCESS_LD,
+	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_DTLB_REFILL_LD,
+	[C(DTLB)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_DTLB_ACCESS_ST,
+	[C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_DTLB_REFILL_ST,
+
+	[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_ITLB_REFILL,
+
+	[C(BPU)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_PC_BRANCH_PRED,
+	[C(BPU)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+	[C(BPU)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_PC_BRANCH_PRED,
+	[C(BPU)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+};
+
 #define ARMV8_EVENT_ATTR_RESOLVE(m) #m
 #define ARMV8_EVENT_ATTR(name, config) \
 	PMU_EVENT_ATTR_STRING(name, armv8_event_attr_##name, \
@@ -324,7 +375,6 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
 	NULL,
 };
 
-
 /*
  * Perf Events' indices
  */
@@ -743,6 +793,13 @@ static int armv8_a57_map_event(struct perf_event *event)
 				ARMV8_EVTYPE_EVENT);
 }
 
+static int armv8_thunder_map_event(struct perf_event *event)
+{
+	return armpmu_map_event(event, &armv8_thunder_perf_map,
+				&armv8_thunder_perf_cache_map,
+				ARMV8_EVTYPE_EVENT);
+}
+
 static void armv8pmu_read_num_pmnc_events(void *info)
 {
 	int *nb_cnt = info;
@@ -811,11 +868,21 @@ static int armv8_a72_pmu_init(struct arm_pmu *cpu_pmu)
 	return armv8pmu_probe_num_events(cpu_pmu);
 }
 
+static int armv8_thunder_pmu_init(struct arm_pmu *cpu_pmu)
+{
+	armv8_pmu_init(cpu_pmu);
+	cpu_pmu->name			= "armv8_cavium_thunder";
+	cpu_pmu->map_event		= armv8_thunder_map_event;
+	cpu_pmu->pmu.attr_groups	= armv8_pmuv3_attr_groups;
+	return armv8pmu_probe_num_events(cpu_pmu);
+}
+
 static const struct of_device_id armv8_pmu_of_device_ids[] = {
 	{.compatible = "arm,armv8-pmuv3",	.data = armv8_pmuv3_init},
 	{.compatible = "arm,cortex-a53-pmu",	.data = armv8_a53_pmu_init},
 	{.compatible = "arm,cortex-a57-pmu",	.data = armv8_a57_pmu_init},
 	{.compatible = "arm,cortex-a72-pmu",	.data = armv8_a72_pmu_init},
+	{.compatible = "cavium,thunder-pmu",	.data = armv8_thunder_pmu_init},
 	{},
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] arm64: dts: Add Cavium ThunderX specific PMU
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 2/5] arm64/perf: Add Cavium ThunderX PMU support Jan Glauber
@ 2016-02-03 17:11 ` Jan Glauber
  2016-02-03 17:11 ` [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit Jan Glauber
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:11 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Add a compatible string for the Cavium ThunderX PMU.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 Documentation/devicetree/bindings/arm/pmu.txt | 1 +
 arch/arm64/boot/dts/cavium/thunder-88xx.dtsi  | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
index 5651883..d3999a1 100644
--- a/Documentation/devicetree/bindings/arm/pmu.txt
+++ b/Documentation/devicetree/bindings/arm/pmu.txt
@@ -25,6 +25,7 @@ Required properties:
 	"qcom,scorpion-pmu"
 	"qcom,scorpion-mp-pmu"
 	"qcom,krait-pmu"
+	"cavium,thunder-pmu"
 - interrupts : 1 combined interrupt or 1 per core. If the interrupt is a per-cpu
                interrupt (PPI) then 1 interrupt should be specified.
 
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
index 9cb7cf9..2eb9b22 100644
--- a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
@@ -360,6 +360,11 @@
 		             <1 10 0xff01>;
 	};
 
+	pmu {
+		compatible = "cavium,thunder-pmu", "arm,armv8-pmuv3";
+		interrupts = <1 7 4>;
+	};
+
 	soc {
 		compatible = "simple-bus";
 		#address-cells = <2>;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
                   ` (2 preceding siblings ...)
  2016-02-03 17:11 ` [PATCH v3 3/5] arm64: dts: Add Cavium ThunderX specific PMU Jan Glauber
@ 2016-02-03 17:11 ` Jan Glauber
  2016-02-15 19:55   ` Will Deacon
  2016-02-03 17:12 ` [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1 Jan Glauber
  2016-02-11 13:28 ` [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
  5 siblings, 1 reply; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:11 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

With the long cycle counter bit (LC) disabled the cycle counter is not
working on ThunderX SOC (ThunderX only implements Aarch64).
Also, according to documentation LC == 0 is deprecated.

To keep the code simple the patch does not introduce 64 bit wide counter
functions. Instead writing the cycle counter always sets the upper
32 bits so overflow interrupts are generated as before.

Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index c038e4e..5e4275e 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -405,6 +405,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
 #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
 #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
 #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
 #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
 #define	ARMV8_PMCR_N_MASK	0x1f
 #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
@@ -494,9 +495,16 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u writing wrong counter %d\n",
 			smp_processor_id(), idx);
-	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
-		asm volatile("msr pmccntr_el0, %0" :: "r" (value));
-	else if (armv8pmu_select_counter(idx) == idx)
+	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
+		/*
+		 * Set the upper 32bits as this is a 64bit counter but we only
+		 * count using the lower 32bits and we want an interrupt when
+		 * it overflows.
+		 */
+		u64 value64 = 0xffffffff00000000ULL | value;
+
+		asm volatile("msr pmccntr_el0, %0" :: "r" (value64));
+	} else if (armv8pmu_select_counter(idx) == idx)
 		asm volatile("msr pmxevcntr_el0, %0" :: "r" (value));
 }
 
@@ -768,8 +776,11 @@ static void armv8pmu_reset(void *info)
 		armv8pmu_disable_intens(idx);
 	}
 
-	/* Initialize & Reset PMNC: C and P bits. */
-	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C);
+	/*
+	 * Initialize & Reset PMNC. Request overflow on 64 bit but
+	 * cheat in armv8pmu_write_counter().
+	 */
+	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C | ARMV8_PMCR_LC);
 }
 
 static int armv8_pmuv3_map_event(struct perf_event *event)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
                   ` (3 preceding siblings ...)
  2016-02-03 17:11 ` [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit Jan Glauber
@ 2016-02-03 17:12 ` Jan Glauber
  2016-02-15 20:04   ` Will Deacon
  2016-02-11 13:28 ` [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
  5 siblings, 1 reply; 18+ messages in thread
From: Jan Glauber @ 2016-02-03 17:12 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

ARMv8.1 increases the PMU event number space. Detect the
presence of this PMUv3 type and extend the event mask.

The event mask is moved to struct arm_pmu so different event masks
can exist, depending on the PMU type.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 arch/arm/kernel/perf_event_v6.c     |  6 ++++--
 arch/arm/kernel/perf_event_v7.c     | 29 +++++++++++++++++++----------
 arch/arm/kernel/perf_event_xscale.c |  4 +++-
 arch/arm64/kernel/perf_event.c      | 33 +++++++++++++++++++--------------
 drivers/perf/arm_pmu.c              |  5 +++--
 include/linux/perf/arm_pmu.h        |  4 ++--
 6 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 09413e7..d6769f5 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -481,7 +481,7 @@ static void armv6mpcore_pmu_disable_event(struct perf_event *event)
 static int armv6_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv6_perf_map,
-				&armv6_perf_cache_map, 0xFF);
+				&armv6_perf_cache_map);
 }
 
 static void armv6pmu_init(struct arm_pmu *cpu_pmu)
@@ -494,6 +494,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
+	cpu_pmu->event_mask	= 0xFF;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
 	cpu_pmu->max_period	= (1LLU << 32) - 1;
@@ -531,7 +532,7 @@ static int armv6_1176_pmu_init(struct arm_pmu *cpu_pmu)
 static int armv6mpcore_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv6mpcore_perf_map,
-				&armv6mpcore_perf_cache_map, 0xFF);
+				&armv6mpcore_perf_cache_map);
 }
 
 static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
@@ -545,6 +546,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
+	cpu_pmu->event_mask	= 0xFF;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
 	cpu_pmu->max_period	= (1LLU << 32) - 1;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 4152158..8aab098 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1042,7 +1042,7 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	int idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	unsigned long evtype = hwc->config_base & ARMV7_EVTYPE_EVENT;
+	unsigned long evtype = hwc->config_base & cpu_pmu->event_mask;
 
 	/* Always place a cycle counter into the cycle counter. */
 	if (evtype == ARMV7_PERFCTR_CPU_CYCLES) {
@@ -1109,55 +1109,55 @@ static void armv7pmu_reset(void *info)
 static int armv7_a8_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a8_perf_map,
-				&armv7_a8_perf_cache_map, 0xFF);
+				&armv7_a8_perf_cache_map);
 }
 
 static int armv7_a9_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a9_perf_map,
-				&armv7_a9_perf_cache_map, 0xFF);
+				&armv7_a9_perf_cache_map);
 }
 
 static int armv7_a5_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a5_perf_map,
-				&armv7_a5_perf_cache_map, 0xFF);
+				&armv7_a5_perf_cache_map);
 }
 
 static int armv7_a15_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a15_perf_map,
-				&armv7_a15_perf_cache_map, 0xFF);
+				&armv7_a15_perf_cache_map);
 }
 
 static int armv7_a7_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a7_perf_map,
-				&armv7_a7_perf_cache_map, 0xFF);
+				&armv7_a7_perf_cache_map);
 }
 
 static int armv7_a12_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv7_a12_perf_map,
-				&armv7_a12_perf_cache_map, 0xFF);
+				&armv7_a12_perf_cache_map);
 }
 
 static int krait_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &krait_perf_map,
-				&krait_perf_cache_map, 0xFFFFF);
+				&krait_perf_cache_map);
 }
 
 static int krait_map_event_no_branch(struct perf_event *event)
 {
 	return armpmu_map_event(event, &krait_perf_map_no_branch,
-				&krait_perf_cache_map, 0xFFFFF);
+				&krait_perf_cache_map);
 }
 
 static int scorpion_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &scorpion_perf_map,
-				&scorpion_perf_cache_map, 0xFFFFF);
+				&scorpion_perf_cache_map);
 }
 
 static void armv7pmu_init(struct arm_pmu *cpu_pmu)
@@ -1196,6 +1196,7 @@ static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a8";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a8_map_event;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv1_attr_groups;
 	return armv7_probe_num_events(cpu_pmu);
@@ -1205,6 +1206,7 @@ static int armv7_a9_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a9";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a9_map_event;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv1_attr_groups;
 	return armv7_probe_num_events(cpu_pmu);
@@ -1214,6 +1216,7 @@ static int armv7_a5_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a5";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a5_map_event;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv1_attr_groups;
 	return armv7_probe_num_events(cpu_pmu);
@@ -1223,6 +1226,7 @@ static int armv7_a15_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a15";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a15_map_event;
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv2_attr_groups;
@@ -1233,6 +1237,7 @@ static int armv7_a7_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a7";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a7_map_event;
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv2_attr_groups;
@@ -1243,6 +1248,7 @@ static int armv7_a12_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a12";
+	cpu_pmu->event_mask	= ARMV7_EVTYPE_EVENT;
 	cpu_pmu->map_event	= armv7_a12_map_event;
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
 	cpu_pmu->pmu.attr_groups = armv7_pmuv2_attr_groups;
@@ -1628,6 +1634,7 @@ static int krait_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_krait";
+	cpu_pmu->event_mask	= 0xFFFFF;
 	/* Some early versions of Krait don't support PC write events */
 	if (of_property_read_bool(cpu_pmu->plat_device->dev.of_node,
 				  "qcom,no-pc-write"))
@@ -1957,6 +1964,7 @@ static int scorpion_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_scorpion";
+	cpu_pmu->event_mask	= 0xFFFFF;
 	cpu_pmu->map_event	= scorpion_map_event;
 	cpu_pmu->reset		= scorpion_pmu_reset;
 	cpu_pmu->enable		= scorpion_pmu_enable_event;
@@ -1970,6 +1978,7 @@ static int scorpion_mp_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_scorpion_mp";
+	cpu_pmu->event_mask	= 0xFFFFF;
 	cpu_pmu->map_event	= scorpion_map_event;
 	cpu_pmu->reset		= scorpion_pmu_reset;
 	cpu_pmu->enable		= scorpion_pmu_enable_event;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index aa0499e..8708691 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -358,7 +358,7 @@ static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
 static int xscale_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &xscale_perf_map,
-				&xscale_perf_cache_map, 0xFF);
+				&xscale_perf_cache_map);
 }
 
 static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
@@ -372,6 +372,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->get_event_idx	= xscale1pmu_get_event_idx;
 	cpu_pmu->start		= xscale1pmu_start;
 	cpu_pmu->stop		= xscale1pmu_stop;
+	cpu_pmu->event_mask	= 0xFF;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
 	cpu_pmu->max_period	= (1LLU << 32) - 1;
@@ -742,6 +743,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->get_event_idx	= xscale2pmu_get_event_idx;
 	cpu_pmu->start		= xscale2pmu_start;
 	cpu_pmu->stop		= xscale2pmu_stop;
+	cpu_pmu->event_mask	= 0xFF;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
 	cpu_pmu->max_period	= (1LLU << 32) - 1;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 5e4275e..78b24cb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -419,7 +419,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
 /*
  * PMXEVTYPER: Event selection reg
  */
-#define	ARMV8_EVTYPE_MASK	0xc80003ff	/* Mask for writable bits */
+#define	ARMV8_EVTYPE_FLT_MASK	0xc8000000	/* Writable filter bits */
 #define	ARMV8_EVTYPE_EVENT	0x3ff		/* Mask for EVENT bits */
 
 /*
@@ -510,10 +510,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
 {
-	if (armv8pmu_select_counter(idx) == idx) {
-		val &= ARMV8_EVTYPE_MASK;
+	if (armv8pmu_select_counter(idx) == idx)
 		asm volatile("msr pmxevtyper_el0, %0" :: "r" (val));
-	}
 }
 
 static inline int armv8pmu_enable_counter(int idx)
@@ -570,6 +568,7 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
+	u32 val;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -585,7 +584,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	val = hwc->config_base & (ARMV8_EVTYPE_FLT_MASK | cpu_pmu->event_mask);
+	armv8pmu_write_evtype(idx, val);
 
 	/*
 	 * Enable interrupt for this counter
@@ -716,7 +716,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	int idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	unsigned long evtype = hwc->config_base & ARMV8_EVTYPE_EVENT;
+	unsigned long evtype = hwc->config_base & cpu_pmu->event_mask;
 
 	/* Always place a cycle counter into the cycle counter. */
 	if (evtype == ARMV8_PMUV3_PERFCTR_CLOCK_CYCLES) {
@@ -786,29 +786,25 @@ static void armv8pmu_reset(void *info)
 static int armv8_pmuv3_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv8_pmuv3_perf_map,
-				&armv8_pmuv3_perf_cache_map,
-				ARMV8_EVTYPE_EVENT);
+				&armv8_pmuv3_perf_cache_map);
 }
 
 static int armv8_a53_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv8_a53_perf_map,
-				&armv8_a53_perf_cache_map,
-				ARMV8_EVTYPE_EVENT);
+				&armv8_a53_perf_cache_map);
 }
 
 static int armv8_a57_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv8_a57_perf_map,
-				&armv8_a57_perf_cache_map,
-				ARMV8_EVTYPE_EVENT);
+				&armv8_a57_perf_cache_map);
 }
 
 static int armv8_thunder_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv8_thunder_perf_map,
-				&armv8_thunder_perf_cache_map,
-				ARMV8_EVTYPE_EVENT);
+				&armv8_thunder_perf_cache_map);
 }
 
 static void armv8pmu_read_num_pmnc_events(void *info)
@@ -831,6 +827,8 @@ static int armv8pmu_probe_num_events(struct arm_pmu *arm_pmu)
 
 static void armv8_pmu_init(struct arm_pmu *cpu_pmu)
 {
+	u64 id;
+
 	cpu_pmu->handle_irq		= armv8pmu_handle_irq,
 	cpu_pmu->enable			= armv8pmu_enable_event,
 	cpu_pmu->disable		= armv8pmu_disable_event,
@@ -842,6 +840,13 @@ static void armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->reset			= armv8pmu_reset,
 	cpu_pmu->max_period		= (1LLU << 32) - 1,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
+
+	/* detect ARMv8.1 PMUv3 with extended event mask */
+	id = read_cpuid(ID_AA64DFR0_EL1);
+	if (((id >> 8) & 0xf) == 4)
+		cpu_pmu->event_mask = 0xffff;	/* ARMv8.1 extended events */
+	else
+		cpu_pmu->event_mask = ARMV8_EVTYPE_EVENT;
 }
 
 static int armv8_pmuv3_init(struct arm_pmu *cpu_pmu)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 166637f..79e681f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -79,9 +79,10 @@ armpmu_map_event(struct perf_event *event,
 		 const unsigned (*cache_map)
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
-				[PERF_COUNT_HW_CACHE_RESULT_MAX],
-		 u32 raw_event_mask)
+				[PERF_COUNT_HW_CACHE_RESULT_MAX])
 {
+	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+	u32 raw_event_mask = armpmu->event_mask;
 	u64 config = event->attr.config;
 	int type = event->attr.type;
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 83b5e34..9a4c3a9 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -101,6 +101,7 @@ struct arm_pmu {
 	void		(*free_irq)(struct arm_pmu *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
+	int		event_mask;
 	atomic_t	active_events;
 	struct mutex	reserve_mutex;
 	u64		max_period;
@@ -119,8 +120,7 @@ int armpmu_map_event(struct perf_event *event,
 		     const unsigned (*event_map)[PERF_COUNT_HW_MAX],
 		     const unsigned (*cache_map)[PERF_COUNT_HW_CACHE_MAX]
 						[PERF_COUNT_HW_CACHE_OP_MAX]
-						[PERF_COUNT_HW_CACHE_RESULT_MAX],
-		     u32 raw_event_mask);
+						[PERF_COUNT_HW_CACHE_RESULT_MAX]);
 
 struct pmu_probe_info {
 	unsigned int cpuid;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] Cavium ThunderX PMU support
  2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
                   ` (4 preceding siblings ...)
  2016-02-03 17:12 ` [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1 Jan Glauber
@ 2016-02-11 13:28 ` Jan Glauber
  5 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-11 13:28 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: linux-kernel, linux-arm-kernel

On Wed, Feb 03, 2016 at 06:11:55PM +0100, Jan Glauber wrote:
> Hi,
> 
> I'm reposting the whole series just in case my previous attempt to
> repost only the broken patch was confusing. Patches are based on
> 4.5-rc2.

Can I get a review for these patches?

thanks,
Jan

> Patches 1-3 add support for ThunderX specific PMU events.
> 
> Patch 4 changes the cycle counter to overflow on 64 bit but tries to minimize
> code changes. Without this change perf does not work at all on ThunderX.
> 
> Patch 5 extends the event mask according to ARMv8.1 and also affects arm32.
> 
> Changes to v2:
> - fixed arm compile errors
> 
> Changes to v1:
> - renamed thunderx dt pmu binding to thunder
> 
> --Jan
> 
> Jan Glauber (5):
>   arm64/perf: Rename Cortex A57 events
>   arm64/perf: Add Cavium ThunderX PMU support
>   arm64: dts: Add Cavium ThunderX specific PMU
>   arm64/perf: Enable PMCR long cycle counter bit
>   arm64/perf: Extend event mask for ARMv8.1
> 
>  Documentation/devicetree/bindings/arm/pmu.txt |   1 +
>  arch/arm/kernel/perf_event_v6.c               |   6 +-
>  arch/arm/kernel/perf_event_v7.c               |  29 ++++--
>  arch/arm/kernel/perf_event_xscale.c           |   4 +-
>  arch/arm64/boot/dts/cavium/thunder-88xx.dtsi  |   5 +
>  arch/arm64/kernel/perf_event.c                | 145 ++++++++++++++++++++------
>  drivers/perf/arm_pmu.c                        |   5 +-
>  include/linux/perf/arm_pmu.h                  |   4 +-
>  8 files changed, 151 insertions(+), 48 deletions(-)
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-03 17:11 ` [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events Jan Glauber
@ 2016-02-15 19:40   ` Will Deacon
  2016-02-15 20:06     ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-15 19:40 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Wed, Feb 03, 2016 at 06:11:56PM +0100, Jan Glauber wrote:
> The implemented Cortex A57 events are not A57 specific.
> They are recommended by ARM and can be found on other
> ARMv8 SOCs like Cavium ThunderX too. Therefore move
> these events to the common PMUv3 table.

I can't find anything in the architecture that suggests these event
numbers are necessarily portable between implementations. Am I missing
something?

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit
  2016-02-03 17:11 ` [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit Jan Glauber
@ 2016-02-15 19:55   ` Will Deacon
  2016-02-16  8:04     ` Jan Glauber
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-15 19:55 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Wed, Feb 03, 2016 at 06:11:59PM +0100, Jan Glauber wrote:
> With the long cycle counter bit (LC) disabled the cycle counter is not
> working on ThunderX SOC (ThunderX only implements Aarch64).
> Also, according to documentation LC == 0 is deprecated.
> 
> To keep the code simple the patch does not introduce 64 bit wide counter
> functions. Instead writing the cycle counter always sets the upper
> 32 bits so overflow interrupts are generated as before.

I guess we could look into chained events if we wanted to go the whole
hog and support 64-bit counters, but this looks fine for now.

> Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> ---
>  arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index c038e4e..5e4275e 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -405,6 +405,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
>  #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
>  #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
>  #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
> +#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
>  #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
>  #define	ARMV8_PMCR_N_MASK	0x1f
>  #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
> @@ -494,9 +495,16 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
>  	if (!armv8pmu_counter_valid(cpu_pmu, idx))
>  		pr_err("CPU%u writing wrong counter %d\n",
>  			smp_processor_id(), idx);
> -	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
> -		asm volatile("msr pmccntr_el0, %0" :: "r" (value));
> -	else if (armv8pmu_select_counter(idx) == idx)
> +	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
> +		/*
> +		 * Set the upper 32bits as this is a 64bit counter but we only
> +		 * count using the lower 32bits and we want an interrupt when
> +		 * it overflows.
> +		 */
> +		u64 value64 = 0xffffffff00000000ULL | value;
> +
> +		asm volatile("msr pmccntr_el0, %0" :: "r" (value64));
> +	} else if (armv8pmu_select_counter(idx) == idx)
>  		asm volatile("msr pmxevcntr_el0, %0" :: "r" (value));
>  }
>  
> @@ -768,8 +776,11 @@ static void armv8pmu_reset(void *info)
>  		armv8pmu_disable_intens(idx);
>  	}
>  
> -	/* Initialize & Reset PMNC: C and P bits. */
> -	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C);
> +	/*
> +	 * Initialize & Reset PMNC. Request overflow on 64 bit but
> +	 * cheat in armv8pmu_write_counter().

Can you expand the comment to mention that the 64-bit overflow is only
for the cycle counter, please?

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1
  2016-02-03 17:12 ` [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1 Jan Glauber
@ 2016-02-15 20:04   ` Will Deacon
  2016-02-16  8:00     ` Jan Glauber
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-15 20:04 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Wed, Feb 03, 2016 at 06:12:00PM +0100, Jan Glauber wrote:
> ARMv8.1 increases the PMU event number space. Detect the
> presence of this PMUv3 type and extend the event mask.
> 
> The event mask is moved to struct arm_pmu so different event masks
> can exist, depending on the PMU type.
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> ---
>  arch/arm/kernel/perf_event_v6.c     |  6 ++++--
>  arch/arm/kernel/perf_event_v7.c     | 29 +++++++++++++++++++----------
>  arch/arm/kernel/perf_event_xscale.c |  4 +++-
>  arch/arm64/kernel/perf_event.c      | 33 +++++++++++++++++++--------------
>  drivers/perf/arm_pmu.c              |  5 +++--
>  include/linux/perf/arm_pmu.h        |  4 ++--
>  6 files changed, 50 insertions(+), 31 deletions(-)

[...]

>  static void armv8_pmu_init(struct arm_pmu *cpu_pmu)
>  {
> +	u64 id;
> +
>  	cpu_pmu->handle_irq		= armv8pmu_handle_irq,
>  	cpu_pmu->enable			= armv8pmu_enable_event,
>  	cpu_pmu->disable		= armv8pmu_disable_event,
> @@ -842,6 +840,13 @@ static void armv8_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->reset			= armv8pmu_reset,
>  	cpu_pmu->max_period		= (1LLU << 32) - 1,
>  	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
> +
> +	/* detect ARMv8.1 PMUv3 with extended event mask */
> +	id = read_cpuid(ID_AA64DFR0_EL1);
> +	if (((id >> 8) & 0xf) == 4)

We have helpers for this stuff (cpuid_feature_extract_field)...

> +		cpu_pmu->event_mask = 0xffff;	/* ARMv8.1 extended events */
> +	else
> +		cpu_pmu->event_mask = ARMV8_EVTYPE_EVENT;

... although can't we just update ARMV8_EVTYPE_EVENT to be 0xffff now?
AFAICT, that just eats into bits that used to be RES0, so we shouldn't
see any problems. That should make your patch *much* simpler!

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-15 19:40   ` Will Deacon
@ 2016-02-15 20:06     ` Will Deacon
  2016-02-18  9:13       ` Jan Glauber
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-15 20:06 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Mon, Feb 15, 2016 at 07:40:37PM +0000, Will Deacon wrote:
> On Wed, Feb 03, 2016 at 06:11:56PM +0100, Jan Glauber wrote:
> > The implemented Cortex A57 events are not A57 specific.
> > They are recommended by ARM and can be found on other
> > ARMv8 SOCs like Cavium ThunderX too. Therefore move
> > these events to the common PMUv3 table.
> 
> I can't find anything in the architecture that suggests these event
> numbers are necessarily portable between implementations. Am I missing
> something?

Aha, I just noticed appendix K3.1 (silly me for missing it...).

Lemme check whether or not that mandates that those encodings can't be
used for wildly different things.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1
  2016-02-15 20:04   ` Will Deacon
@ 2016-02-16  8:00     ` Jan Glauber
  2016-02-16 15:12       ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Glauber @ 2016-02-16  8:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Mon, Feb 15, 2016 at 08:04:04PM +0000, Will Deacon wrote:

[...]

> On Wed, Feb 03, 2016 at 06:12:00PM +0100, Jan Glauber wrote:
> > +		cpu_pmu->event_mask = 0xffff;	/* ARMv8.1 extended events */
> > +	else
> > +		cpu_pmu->event_mask = ARMV8_EVTYPE_EVENT;
> 
> ... although can't we just update ARMV8_EVTYPE_EVENT to be 0xffff now?
> AFAICT, that just eats into bits that used to be RES0, so we shouldn't
> see any problems. That should make your patch *much* simpler!

That would of course be easier, but I just can't assess the implications.

Probably I'm missing something but to me it looks like the event mask is the
only verification we do for the user-space selectable events. Is it safe for
implementations that only support 0x3ff events to allow access to the
whole 0xffff range? What memory would be accessed for non-existing
events?

Jan

> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit
  2016-02-15 19:55   ` Will Deacon
@ 2016-02-16  8:04     ` Jan Glauber
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-16  8:04 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Mon, Feb 15, 2016 at 07:55:29PM +0000, Will Deacon wrote:
> On Wed, Feb 03, 2016 at 06:11:59PM +0100, Jan Glauber wrote:
> > @@ -768,8 +776,11 @@ static void armv8pmu_reset(void *info)
> >  		armv8pmu_disable_intens(idx);
> >  	}
> >  
> > -	/* Initialize & Reset PMNC: C and P bits. */
> > -	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C);
> > +	/*
> > +	 * Initialize & Reset PMNC. Request overflow on 64 bit but
> > +	 * cheat in armv8pmu_write_counter().
> 
> Can you expand the comment to mention that the 64-bit overflow is only
> for the cycle counter, please?

OK, how about:

/*
 * Initialize & Reset PMNC. Request overflow interrupt for
 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
 */

Jan

> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1
  2016-02-16  8:00     ` Jan Glauber
@ 2016-02-16 15:12       ` Will Deacon
  2016-02-17 10:47         ` Jan Glauber
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-16 15:12 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Tue, Feb 16, 2016 at 09:00:15AM +0100, Jan Glauber wrote:
> On Mon, Feb 15, 2016 at 08:04:04PM +0000, Will Deacon wrote:
> 
> [...]
> 
> > On Wed, Feb 03, 2016 at 06:12:00PM +0100, Jan Glauber wrote:
> > > +		cpu_pmu->event_mask = 0xffff;	/* ARMv8.1 extended events */
> > > +	else
> > > +		cpu_pmu->event_mask = ARMV8_EVTYPE_EVENT;
> > 
> > ... although can't we just update ARMV8_EVTYPE_EVENT to be 0xffff now?
> > AFAICT, that just eats into bits that used to be RES0, so we shouldn't
> > see any problems. That should make your patch *much* simpler!
> 
> That would of course be easier, but I just can't assess the implications.
> 
> Probably I'm missing something but to me it looks like the event mask is the
> only verification we do for the user-space selectable events. Is it safe for
> implementations that only support 0x3ff events to allow access to the
> whole 0xffff range? What memory would be accessed for non-existing
> events?

Which memory? The worst-case is that we end up writing to some bits in
a register (e.g. PMXEVTYPER) that are RES0 in ARMv8 afaict.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1
  2016-02-16 15:12       ` Will Deacon
@ 2016-02-17 10:47         ` Jan Glauber
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-17 10:47 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Tue, Feb 16, 2016 at 03:12:53PM +0000, Will Deacon wrote:
> On Tue, Feb 16, 2016 at 09:00:15AM +0100, Jan Glauber wrote:
> > On Mon, Feb 15, 2016 at 08:04:04PM +0000, Will Deacon wrote:
> > 
> > [...]
> > 
> > > On Wed, Feb 03, 2016 at 06:12:00PM +0100, Jan Glauber wrote:
> > > > +		cpu_pmu->event_mask = 0xffff;	/* ARMv8.1 extended events */
> > > > +	else
> > > > +		cpu_pmu->event_mask = ARMV8_EVTYPE_EVENT;
> > > 
> > > ... although can't we just update ARMV8_EVTYPE_EVENT to be 0xffff now?
> > > AFAICT, that just eats into bits that used to be RES0, so we shouldn't
> > > see any problems. That should make your patch *much* simpler!
> > 
> > That would of course be easier, but I just can't assess the implications.
> > 
> > Probably I'm missing something but to me it looks like the event mask is the
> > only verification we do for the user-space selectable events. Is it safe for
> > implementations that only support 0x3ff events to allow access to the
> > whole 0xffff range? What memory would be accessed for non-existing
> > events?
> 
> Which memory? The worst-case is that we end up writing to some bits in
> a register (e.g. PMXEVTYPER) that are RES0 in ARMv8 afaict.

OK, I see. Than I'm happy to drop 99% of that patch and just increase
the mask.

Jan

> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-15 20:06     ` Will Deacon
@ 2016-02-18  9:13       ` Jan Glauber
  2016-02-18 11:24         ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Glauber @ 2016-02-18  9:13 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Mon, Feb 15, 2016 at 08:06:13PM +0000, Will Deacon wrote:
> On Mon, Feb 15, 2016 at 07:40:37PM +0000, Will Deacon wrote:
> > On Wed, Feb 03, 2016 at 06:11:56PM +0100, Jan Glauber wrote:
> > > The implemented Cortex A57 events are not A57 specific.
> > > They are recommended by ARM and can be found on other
> > > ARMv8 SOCs like Cavium ThunderX too. Therefore move
> > > these events to the common PMUv3 table.
> > 
> > I can't find anything in the architecture that suggests these event
> > numbers are necessarily portable between implementations. Am I missing
> > something?
> 
> Aha, I just noticed appendix K3.1 (silly me for missing it...).
> 
> Lemme check whether or not that mandates that those encodings can't be
> used for wildly different things.

To me it looks like we would just have duplicated code without the patch,
and at least the event types (e.g. L1D_CACHE_RD) should be identical
across implementations.

But I don't care too much, so please tell me if should drop the patch or
keep it.

thanks,
Jan

> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-18  9:13       ` Jan Glauber
@ 2016-02-18 11:24         ` Will Deacon
  2016-02-18 13:45           ` Jan Glauber
  0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2016-02-18 11:24 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Thu, Feb 18, 2016 at 10:13:07AM +0100, Jan Glauber wrote:
> On Mon, Feb 15, 2016 at 08:06:13PM +0000, Will Deacon wrote:
> > On Mon, Feb 15, 2016 at 07:40:37PM +0000, Will Deacon wrote:
> > > On Wed, Feb 03, 2016 at 06:11:56PM +0100, Jan Glauber wrote:
> > > > The implemented Cortex A57 events are not A57 specific.
> > > > They are recommended by ARM and can be found on other
> > > > ARMv8 SOCs like Cavium ThunderX too. Therefore move
> > > > these events to the common PMUv3 table.
> > > 
> > > I can't find anything in the architecture that suggests these event
> > > numbers are necessarily portable between implementations. Am I missing
> > > something?
> > 
> > Aha, I just noticed appendix K3.1 (silly me for missing it...).
> > 
> > Lemme check whether or not that mandates that those encodings can't be
> > used for wildly different things.
> 
> To me it looks like we would just have duplicated code without the patch,
> and at least the event types (e.g. L1D_CACHE_RD) should be identical
> across implementations.
> 
> But I don't care too much, so please tell me if should drop the patch or
> keep it.

Tell you what then -- how about we simply rename those to ARMV8_IMPDEF_*
instead of ARMV8_A57_*? That way, we can easily identify them as distinct
from the architected events if we need to in future.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events
  2016-02-18 11:24         ` Will Deacon
@ 2016-02-18 13:45           ` Jan Glauber
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Glauber @ 2016-02-18 13:45 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Thu, Feb 18, 2016 at 11:24:29AM +0000, Will Deacon wrote:
> On Thu, Feb 18, 2016 at 10:13:07AM +0100, Jan Glauber wrote:
> > On Mon, Feb 15, 2016 at 08:06:13PM +0000, Will Deacon wrote:
> > > On Mon, Feb 15, 2016 at 07:40:37PM +0000, Will Deacon wrote:
> > > > On Wed, Feb 03, 2016 at 06:11:56PM +0100, Jan Glauber wrote:
> > > > > The implemented Cortex A57 events are not A57 specific.
> > > > > They are recommended by ARM and can be found on other
> > > > > ARMv8 SOCs like Cavium ThunderX too. Therefore move
> > > > > these events to the common PMUv3 table.
> > > > 
> > > > I can't find anything in the architecture that suggests these event
> > > > numbers are necessarily portable between implementations. Am I missing
> > > > something?
> > > 
> > > Aha, I just noticed appendix K3.1 (silly me for missing it...).
> > > 
> > > Lemme check whether or not that mandates that those encodings can't be
> > > used for wildly different things.
> > 
> > To me it looks like we would just have duplicated code without the patch,
> > and at least the event types (e.g. L1D_CACHE_RD) should be identical
> > across implementations.
> > 
> > But I don't care too much, so please tell me if should drop the patch or
> > keep it.
> 
> Tell you what then -- how about we simply rename those to ARMV8_IMPDEF_*
> instead of ARMV8_A57_*? That way, we can easily identify them as distinct
> from the architected events if we need to in future.

Sounds good. I'll refresh and re-post the whole series then.

Jan

> Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-18 13:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 17:11 [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber
2016-02-03 17:11 ` [PATCH v3 1/5] arm64/perf: Rename Cortex A57 events Jan Glauber
2016-02-15 19:40   ` Will Deacon
2016-02-15 20:06     ` Will Deacon
2016-02-18  9:13       ` Jan Glauber
2016-02-18 11:24         ` Will Deacon
2016-02-18 13:45           ` Jan Glauber
2016-02-03 17:11 ` [PATCH v3 2/5] arm64/perf: Add Cavium ThunderX PMU support Jan Glauber
2016-02-03 17:11 ` [PATCH v3 3/5] arm64: dts: Add Cavium ThunderX specific PMU Jan Glauber
2016-02-03 17:11 ` [PATCH v3 4/5] arm64/perf: Enable PMCR long cycle counter bit Jan Glauber
2016-02-15 19:55   ` Will Deacon
2016-02-16  8:04     ` Jan Glauber
2016-02-03 17:12 ` [PATCH v3 5/5] arm64/perf: Extend event mask for ARMv8.1 Jan Glauber
2016-02-15 20:04   ` Will Deacon
2016-02-16  8:00     ` Jan Glauber
2016-02-16 15:12       ` Will Deacon
2016-02-17 10:47         ` Jan Glauber
2016-02-11 13:28 ` [PATCH v3 0/5] Cavium ThunderX PMU support Jan Glauber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).