All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] arm64: perf: Support for chaining event counters
@ 2018-05-18 10:22 ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose


This series adds support for counting PMU events using chained counters.
The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(even and odd) to count a given "event" in 64bit mode. This series adds
the support for this mode in the core arm_pmu driver infrastructure and
also adds the support for armv8 64bit kernel PMU. This also removes the
restriction of using CPU Cycles counter (naturally 64bit) in 32bit mode.

Tested on Juno, Fast models. Applies on 4.17-rc4

Suzuki K Poulose (6):
  arm_pmu: Refactor maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for long event counters
  arm64: perf: Make the cycle counter 64bit by default
  armpmu: Tidy up clear_event_idx call backs
  arm64: perf: Add support for chaining counters

 arch/arm/kernel/perf_event_v6.c     |   8 +-
 arch/arm/kernel/perf_event_v7.c     |   8 +-
 arch/arm/kernel/perf_event_xscale.c |   8 +-
 arch/arm64/kernel/perf_event.c      | 258 ++++++++++++++++++++++++++++++------
 drivers/perf/arm_pmu.c              |  52 ++++++--
 include/linux/perf/arm_pmu.h        |  12 +-
 6 files changed, 281 insertions(+), 65 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/6] arm64: perf: Support for chaining event counters
@ 2018-05-18 10:22 ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel


This series adds support for counting PMU events using chained counters.
The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(even and odd) to count a given "event" in 64bit mode. This series adds
the support for this mode in the core arm_pmu driver infrastructure and
also adds the support for armv8 64bit kernel PMU. This also removes the
restriction of using CPU Cycles counter (naturally 64bit) in 32bit mode.

Tested on Juno, Fast models. Applies on 4.17-rc4

Suzuki K Poulose (6):
  arm_pmu: Refactor maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for long event counters
  arm64: perf: Make the cycle counter 64bit by default
  armpmu: Tidy up clear_event_idx call backs
  arm64: perf: Add support for chaining counters

 arch/arm/kernel/perf_event_v6.c     |   8 +-
 arch/arm/kernel/perf_event_v7.c     |   8 +-
 arch/arm/kernel/perf_event_xscale.c |   8 +-
 arch/arm64/kernel/perf_event.c      | 258 ++++++++++++++++++++++++++++++------
 drivers/perf/arm_pmu.c              |  52 ++++++--
 include/linux/perf/arm_pmu.h        |  12 +-
 6 files changed, 281 insertions(+), 65 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/6] arm_pmu: Refactor maximum period handling
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

Each PMU defines their max_period of the counter as the maximum
value that can be counted. In order to support chaining of the
counters, change this parameter to indicate the counter width
to deduce the max_period. This will be useful to compute the
max_period for chained counters.

No functional changes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v6.c     |  4 ++--
 arch/arm/kernel/perf_event_v7.c     |  2 +-
 arch/arm/kernel/perf_event_xscale.c |  4 ++--
 arch/arm64/kernel/perf_event.c      |  2 +-
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  2 +-
 6 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 1d7061a..d52a3fa 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -497,7 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -548,7 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width  = 32;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 870b66c..3d8ec6a 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1171,7 +1171,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index fcf218d..6eb0e21 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -375,7 +375,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 
 	return 0;
 }
@@ -745,7 +745,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 85a251b..408f92c 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -961,7 +961,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
+	cpu_pmu->counter_width		= 32;
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..e23e1a1 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
+{
+	return (((u64)1) << (pmu->counter_width)) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period(armpmu);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period(armpmu);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period(armpmu) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 40036a5..c8c31cf 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,7 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
+	u8		counter_width;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 1/6] arm_pmu: Refactor maximum period handling
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

Each PMU defines their max_period of the counter as the maximum
value that can be counted. In order to support chaining of the
counters, change this parameter to indicate the counter width
to deduce the max_period. This will be useful to compute the
max_period for chained counters.

No functional changes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v6.c     |  4 ++--
 arch/arm/kernel/perf_event_v7.c     |  2 +-
 arch/arm/kernel/perf_event_xscale.c |  4 ++--
 arch/arm64/kernel/perf_event.c      |  2 +-
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  2 +-
 6 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 1d7061a..d52a3fa 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -497,7 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -548,7 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width  = 32;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 870b66c..3d8ec6a 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1171,7 +1171,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index fcf218d..6eb0e21 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -375,7 +375,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 
 	return 0;
 }
@@ -745,7 +745,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
+	cpu_pmu->counter_width	= 32;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 85a251b..408f92c 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -961,7 +961,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
+	cpu_pmu->counter_width		= 32;
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..e23e1a1 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
+{
+	return (((u64)1) << (pmu->counter_width)) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period(armpmu);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period(armpmu);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period(armpmu) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 40036a5..c8c31cf 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,7 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
+	u8		counter_width;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 4 ++--
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index d52a3fa..720c19e 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 3d8ec6a..2f3c06d 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 6eb0e21..3f2e398 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -317,7 +317,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -338,7 +338,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 408f92c..7660b7a 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index c8c31cf..705e8c3 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 4 ++--
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index d52a3fa..720c19e 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 3d8ec6a..2f3c06d 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 6eb0e21..3f2e398 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -317,7 +317,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -338,7 +338,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 408f92c..7660b7a 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index c8c31cf..705e8c3 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/6] arm_pmu: Add support for long event counters
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

Each PMU has a set of fixed width event counters. But in some
special cases, the events could be counted using a counter which
effectively has twice the normal width of a coutner.
e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cylces. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses double the normal
counter size. This only affects the periods set for each counter.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/perf/arm_pmu.c       | 25 ++++++++++++++++++++++---
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index e23e1a1..1adabb5 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -33,6 +33,21 @@ static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
 	return (((u64)1) << (pmu->counter_width)) - 1;
 }
 
+static inline u64 arm_pmu_get_event_max_period(struct arm_pmu *pmu,
+					       struct perf_event *event)
+{
+	u64 period = arm_pmu_max_period(pmu);
+
+	/*
+	 * To prevent shift-counter-overflow warning, create the
+	 * mask, by shift + OR sequence.
+	 */
+	if (event->hw.flags & ARMPMU_EVT_LONG)
+		period = (period << pmu->counter_width) | period;
+
+	return period;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -122,7 +137,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period(armpmu);
+	max_period = arm_pmu_get_event_max_period(armpmu, event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +163,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +175,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period(armpmu);
+	u64 max_period = arm_pmu_get_event_max_period(armpmu, event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +383,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -670,6 +686,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 			continue;
 
 		event = hw_events->events[idx];
+		/* Chained events could use multiple counters */
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 705e8c3..ed7e3f7 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a counter with double the normal width */
+#define ARMPMU_EVT_LONG			1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/6] arm_pmu: Add support for long event counters
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

Each PMU has a set of fixed width event counters. But in some
special cases, the events could be counted using a counter which
effectively has twice the normal width of a coutner.
e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cylces. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses double the normal
counter size. This only affects the periods set for each counter.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/perf/arm_pmu.c       | 25 ++++++++++++++++++++++---
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index e23e1a1..1adabb5 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -33,6 +33,21 @@ static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
 	return (((u64)1) << (pmu->counter_width)) - 1;
 }
 
+static inline u64 arm_pmu_get_event_max_period(struct arm_pmu *pmu,
+					       struct perf_event *event)
+{
+	u64 period = arm_pmu_max_period(pmu);
+
+	/*
+	 * To prevent shift-counter-overflow warning, create the
+	 * mask, by shift + OR sequence.
+	 */
+	if (event->hw.flags & ARMPMU_EVT_LONG)
+		period = (period << pmu->counter_width) | period;
+
+	return period;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -122,7 +137,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period(armpmu);
+	max_period = arm_pmu_get_event_max_period(armpmu, event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +163,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +175,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period(armpmu);
+	u64 max_period = arm_pmu_get_event_max_period(armpmu, event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +383,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -670,6 +686,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 			continue;
 
 		event = hw_events->events[idx];
+		/* Chained events could use multiple counters */
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 705e8c3..ed7e3f7 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a counter with double the normal width */
+#define ARMPMU_EVT_LONG			1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/6] arm64: perf: Make the cycle counter 64bit by default
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

Make the cycle counter by setting the ARPMU_EVT_LONG flag
to indicate that it uses 64bit counter.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 7660b7a..ea8e060 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -517,7 +517,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
@@ -539,15 +539,9 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u writing wrong counter %d\n",
 			smp_processor_id(), idx);
-	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
-		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
-		 * count using the lower 32bits and we want an interrupt when
-		 * it overflows.
-		 */
-		value |= 0xffffffff00000000ULL;
+	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
+	else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
 
@@ -851,6 +845,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+		event->hw.flags |= ARMPMU_EVT_LONG;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/6] arm64: perf: Make the cycle counter 64bit by default
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

Make the cycle counter by setting the ARPMU_EVT_LONG flag
to indicate that it uses 64bit counter.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 7660b7a..ea8e060 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -517,7 +517,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
@@ -539,15 +539,9 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u writing wrong counter %d\n",
 			smp_processor_id(), idx);
-	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
-		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
-		 * count using the lower 32bits and we want an interrupt when
-		 * it overflows.
-		 */
-		value |= 0xffffffff00000000ULL;
+	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
+	else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
 
@@ -851,6 +845,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+		event->hw.flags |= ARMPMU_EVT_LONG;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/6] arm_pmu: Tidy up clear_event_idx call backs
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. Tidy this up by relying on the clear_event_idx
to do the book keeping, if available. Otherwise, let the core
driver do the default "clear" bit operation. This will be useful
for adding the chained event support, where we leave the event
idx maintenance to the backend.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v7.c |  2 ++
 drivers/perf/arm_pmu.c          | 15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 2f3c06d..1c28b23 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1639,6 +1639,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1968,6 +1969,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1adabb5..7207d01 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -240,6 +240,16 @@ static void armpmu_start(struct perf_event *event, int flags)
 	armpmu->enable(event);
 }
 
+static void armpmu_clear_event_idx(struct arm_pmu *armpmu,
+				   struct pmu_hw_events *hw_events,
+				   struct perf_event *event)
+{
+	if (armpmu->clear_event_idx)
+		armpmu->clear_event_idx(hw_events, event);
+	else
+		clear_bit(event->hw.idx, hw_events->used_mask);
+}
+
 static void
 armpmu_del(struct perf_event *event, int flags)
 {
@@ -250,10 +260,7 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu_clear_event_idx(armpmu, hw_events, event);
 	perf_event_update_userpage(event);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/6] arm_pmu: Tidy up clear_event_idx call backs
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. Tidy this up by relying on the clear_event_idx
to do the book keeping, if available. Otherwise, let the core
driver do the default "clear" bit operation. This will be useful
for adding the chained event support, where we leave the event
idx maintenance to the backend.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v7.c |  2 ++
 drivers/perf/arm_pmu.c          | 15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 2f3c06d..1c28b23 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1639,6 +1639,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1968,6 +1969,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1adabb5..7207d01 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -240,6 +240,16 @@ static void armpmu_start(struct perf_event *event, int flags)
 	armpmu->enable(event);
 }
 
+static void armpmu_clear_event_idx(struct arm_pmu *armpmu,
+				   struct pmu_hw_events *hw_events,
+				   struct perf_event *event)
+{
+	if (armpmu->clear_event_idx)
+		armpmu->clear_event_idx(hw_events, event);
+	else
+		clear_bit(event->hw.idx, hw_events->used_mask);
+}
+
 static void
 armpmu_del(struct perf_event *event, int flags)
 {
@@ -250,10 +260,7 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu_clear_event_idx(armpmu, hw_events, event);
 	perf_event_update_userpage(event);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 10:22 ` Suzuki K Poulose
@ 2018-05-18 10:22   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, robin.murphy, Suzuki K Poulose

Add support for chained event counters. PMUv3 allows chaining
a pair of adjacent PMU counters (with the lower counter number
being always "even"). The low counter is programmed to count
the event of interest and the high counter(odd numbered) is
programmed with a special event code (0x1e - Chain). Thus
we need special allocation schemes to make the full use of
available counters. So, we allocate the counters from either
ends. i.e, chained counters are allocated from the lower
end in pairs of two and the normal counters are allocated
from the higher number. Also makes necessary changes to
handle the chained events as a single event with 2 counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 202 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index ea8e060..5f81cd0 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(chain, "config1:0");
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_chain.attr,
 	NULL,
 };
 
@@ -457,6 +459,12 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	.attrs = armv8_pmuv3_format_attrs,
 };
 
+static bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
+
+
 /*
  * Perf Events' indices
  */
@@ -512,6 +520,36 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	return (armv8pmu_select_counter(idx) == idx) ?
+	       read_sysreg(pmxevcntr_el0) : 0;
+}
+
+static inline u64 armv8pmu_read_chain_counter(int idx)
+{
+	u64 prev_hi, hi, lo;
+
+	do {
+		prev_hi = armv8pmu_read_evcntr(idx);
+		isb();
+		lo = armv8pmu_read_evcntr(idx - 1);
+		isb();
+		hi = armv8pmu_read_evcntr(idx);
+		isb();
+	} while (prev_hi != hi);
+
+	return (hi << 32) | lo;
+}
+
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return armv8pmu_event_is_chained(event) ?
+	       armv8pmu_read_chain_counter(idx) : armv8pmu_read_evcntr(idx);
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -524,12 +562,37 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	if (armv8pmu_select_counter(idx) == idx)
+		write_sysreg(value, pmxevcntr_el0);
+}
+
+static inline void armv8pmu_write_chain_counter(int idx, u64 value)
+{
+	armv8pmu_write_evcntr(idx, value >> 32);
+	isb();
+	armv8pmu_write_evcntr(idx - 1, value);
+	isb();
+}
+
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_write_chain_counter(idx, value);
+	else
+		armv8pmu_write_evcntr(idx, value);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -541,8 +604,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		write_sysreg(value, pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	else
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -553,6 +616,28 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	}
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, write the high counter event type
+	 * followed by the low counter.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
+
+		/* Set the filters as that of the main event for chain */
+		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
+		armv8pmu_write_evtype(idx, chain_evt);
+		isb();
+		idx--;
+	}
+
+	armv8pmu_write_evtype(idx, hwc->config_base);
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -560,6 +645,23 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	/*
+	 * For chained events, we enable the high counter followed by
+	 * the low counter.
+	 */
+	armv8pmu_enable_counter(idx);
+
+	if (armv8pmu_event_is_chained(event)) {
+		isb();
+		armv8pmu_enable_counter(idx - 1);
+	}
+
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * Disable the low counter followed by the high counter
+	 * for chained events.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_disable_counter(idx - 1);
+		isb();
+	}
+
+	armv8pmu_disable_counter(idx);
+}
+
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -574,6 +694,12 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	/* For chained events, enable the interrupt for only the high counter */
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -586,6 +712,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -603,10 +734,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -617,22 +746,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -640,10 +769,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -651,14 +778,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 
 	/*
-	 * Disable counter
+	 * Disable counter for this event
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
-	 * Disable interrupt for this counter
+	 * Disable interrupt for this event counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -747,6 +874,39 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = cpu_pmu->num_events - 1; idx >= ARMV8_IDX_COUNTER0; ++idx)
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even. We allocate chain events
+	 * from the lower index (i.e, counter0) and the single events
+	 * from the higher end to maximise the utilisation.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
@@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
 
-	/* Always prefer to place a cycle counter into the cycle counter. */
+	/*
+	 * Always prefer to place a cycle counter into the cycle counter
+	 * irrespective of whether we are counting 32bit/64bit
+	 */
 	if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
 		if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
 			return ARMV8_IDX_CYCLE_COUNTER;
@@ -764,13 +927,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
+	idx = armv8pmu_event_is_chained(event) ?
+		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
+		armv8pmu_get_single_idx(cpuc, cpu_pmu);
 
-	/* The counters are all in use. */
-	return -EAGAIN;
+	return idx;
+}
+
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	clear_bit(hwc->idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(hwc->idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
-	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
+		/* Prevent chaining for cycle counter */
+		if (armv8pmu_event_is_chained(event))
+			return -EINVAL;
 		event->hw.flags |= ARMPMU_EVT_LONG;
+	} else if (armv8pmu_event_is_chained(event)) {
+		event->hw.flags |= ARMPMU_EVT_LONG;
+	}
 
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
@@ -954,6 +1131,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-18 10:22   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for chained event counters. PMUv3 allows chaining
a pair of adjacent PMU counters (with the lower counter number
being always "even"). The low counter is programmed to count
the event of interest and the high counter(odd numbered) is
programmed with a special event code (0x1e - Chain). Thus
we need special allocation schemes to make the full use of
available counters. So, we allocate the counters from either
ends. i.e, chained counters are allocated from the lower
end in pairs of two and the normal counters are allocated
from the higher number. Also makes necessary changes to
handle the chained events as a single event with 2 counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 202 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index ea8e060..5f81cd0 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(chain, "config1:0");
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_chain.attr,
 	NULL,
 };
 
@@ -457,6 +459,12 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	.attrs = armv8_pmuv3_format_attrs,
 };
 
+static bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
+
+
 /*
  * Perf Events' indices
  */
@@ -512,6 +520,36 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	return (armv8pmu_select_counter(idx) == idx) ?
+	       read_sysreg(pmxevcntr_el0) : 0;
+}
+
+static inline u64 armv8pmu_read_chain_counter(int idx)
+{
+	u64 prev_hi, hi, lo;
+
+	do {
+		prev_hi = armv8pmu_read_evcntr(idx);
+		isb();
+		lo = armv8pmu_read_evcntr(idx - 1);
+		isb();
+		hi = armv8pmu_read_evcntr(idx);
+		isb();
+	} while (prev_hi != hi);
+
+	return (hi << 32) | lo;
+}
+
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return armv8pmu_event_is_chained(event) ?
+	       armv8pmu_read_chain_counter(idx) : armv8pmu_read_evcntr(idx);
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -524,12 +562,37 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	if (armv8pmu_select_counter(idx) == idx)
+		write_sysreg(value, pmxevcntr_el0);
+}
+
+static inline void armv8pmu_write_chain_counter(int idx, u64 value)
+{
+	armv8pmu_write_evcntr(idx, value >> 32);
+	isb();
+	armv8pmu_write_evcntr(idx - 1, value);
+	isb();
+}
+
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_write_chain_counter(idx, value);
+	else
+		armv8pmu_write_evcntr(idx, value);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -541,8 +604,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		write_sysreg(value, pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	else
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -553,6 +616,28 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	}
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, write the high counter event type
+	 * followed by the low counter.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
+
+		/* Set the filters as that of the main event for chain */
+		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
+		armv8pmu_write_evtype(idx, chain_evt);
+		isb();
+		idx--;
+	}
+
+	armv8pmu_write_evtype(idx, hwc->config_base);
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -560,6 +645,23 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	/*
+	 * For chained events, we enable the high counter followed by
+	 * the low counter.
+	 */
+	armv8pmu_enable_counter(idx);
+
+	if (armv8pmu_event_is_chained(event)) {
+		isb();
+		armv8pmu_enable_counter(idx - 1);
+	}
+
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * Disable the low counter followed by the high counter
+	 * for chained events.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_disable_counter(idx - 1);
+		isb();
+	}
+
+	armv8pmu_disable_counter(idx);
+}
+
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -574,6 +694,12 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	/* For chained events, enable the interrupt for only the high counter */
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -586,6 +712,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -603,10 +734,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -617,22 +746,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -640,10 +769,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -651,14 +778,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 
 	/*
-	 * Disable counter
+	 * Disable counter for this event
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
-	 * Disable interrupt for this counter
+	 * Disable interrupt for this event counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -747,6 +874,39 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = cpu_pmu->num_events - 1; idx >= ARMV8_IDX_COUNTER0; ++idx)
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even. We allocate chain events
+	 * from the lower index (i.e, counter0) and the single events
+	 * from the higher end to maximise the utilisation.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
@@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
 
-	/* Always prefer to place a cycle counter into the cycle counter. */
+	/*
+	 * Always prefer to place a cycle counter into the cycle counter
+	 * irrespective of whether we are counting 32bit/64bit
+	 */
 	if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
 		if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
 			return ARMV8_IDX_CYCLE_COUNTER;
@@ -764,13 +927,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
+	idx = armv8pmu_event_is_chained(event) ?
+		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
+		armv8pmu_get_single_idx(cpuc, cpu_pmu);
 
-	/* The counters are all in use. */
-	return -EAGAIN;
+	return idx;
+}
+
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	clear_bit(hwc->idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(hwc->idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
-	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
+		/* Prevent chaining for cycle counter */
+		if (armv8pmu_event_is_chained(event))
+			return -EINVAL;
 		event->hw.flags |= ARMPMU_EVT_LONG;
+	} else if (armv8pmu_event_is_chained(event)) {
+		event->hw.flags |= ARMPMU_EVT_LONG;
+	}
 
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
@@ -954,6 +1131,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/6] arm_pmu: Refactor maximum period handling
  2018-05-18 10:22   ` Suzuki K Poulose
@ 2018-05-18 13:10     ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:10 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

Hi Suzuki,

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Each PMU defines their max_period of the counter as the maximum
> value that can be counted. In order to support chaining of the
> counters, change this parameter to indicate the counter width
> to deduce the max_period. This will be useful to compute the
> max_period for chained counters.
> 
> No functional changes.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm/kernel/perf_event_v6.c     |  4 ++--
>   arch/arm/kernel/perf_event_v7.c     |  2 +-
>   arch/arm/kernel/perf_event_xscale.c |  4 ++--
>   arch/arm64/kernel/perf_event.c      |  2 +-
>   drivers/perf/arm_pmu.c              | 16 ++++++++++++----
>   include/linux/perf/arm_pmu.h        |  2 +-
>   6 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
> index 1d7061a..d52a3fa 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/arch/arm/kernel/perf_event_v6.c
> @@ -497,7 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= armv6pmu_stop;
>   	cpu_pmu->map_event	= armv6_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   }
>   
>   static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
> @@ -548,7 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= armv6pmu_stop;
>   	cpu_pmu->map_event	= armv6mpcore_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width  = 32;
>   
>   	return 0;
>   }
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index 870b66c..3d8ec6a 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1171,7 +1171,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->start		= armv7pmu_start;
>   	cpu_pmu->stop		= armv7pmu_stop;
>   	cpu_pmu->reset		= armv7pmu_reset;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   };
>   
>   static void armv7_read_num_pmnc_events(void *info)
> diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
> index fcf218d..6eb0e21 100644
> --- a/arch/arm/kernel/perf_event_xscale.c
> +++ b/arch/arm/kernel/perf_event_xscale.c
> @@ -375,7 +375,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= xscale1pmu_stop;
>   	cpu_pmu->map_event	= xscale_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   
>   	return 0;
>   }
> @@ -745,7 +745,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= xscale2pmu_stop;
>   	cpu_pmu->map_event	= xscale_map_event;
>   	cpu_pmu->num_events	= 5;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   
>   	return 0;
>   }
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 85a251b..408f92c 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -961,7 +961,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->start			= armv8pmu_start,
>   	cpu_pmu->stop			= armv8pmu_stop,
>   	cpu_pmu->reset			= armv8pmu_reset,
> -	cpu_pmu->max_period		= (1LLU << 32) - 1,
> +	cpu_pmu->counter_width		= 32;

Given that none of the 6 instances above differ, this looks suspiciously 
redundant. AFAICS max_period has been there from the very beginning with 
no explicit justification, so I can only assume it was anticipating more 
future variability than actually turned out. With 8 years of hindsight 
now, I think it would be reasonable to assume that counters are 32-bit 
except in certain special cases where they might be 64-bit; since that 
can't be described by a single "counter size" value anyway, and by the 
end of this series we have the means to handle it correctly via flags, I 
propose that we just get rid of this and hard-code 32 in 
arm_pmu_max_period().

>   	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
>   
>   	return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 1a0d340..e23e1a1 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -28,6 +28,11 @@
>   static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>   static DEFINE_PER_CPU(int, cpu_irq);
>   
> +static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
> +{
> +	return (((u64)1) << (pmu->counter_width)) - 1;

Nit: "1ULL << ..."

Otherwise, looks fine to me.

Robin.

> +}
> +
>   static int
>   armpmu_map_cache_event(const unsigned (*cache_map)
>   				      [PERF_COUNT_HW_CACHE_MAX]
> @@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
>   	struct hw_perf_event *hwc = &event->hw;
>   	s64 left = local64_read(&hwc->period_left);
>   	s64 period = hwc->sample_period;
> +	u64 max_period;
>   	int ret = 0;
>   
> +	max_period = arm_pmu_max_period(armpmu);
>   	if (unlikely(left <= -period)) {
>   		left = period;
>   		local64_set(&hwc->period_left, left);
> @@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
>   	 * effect we are reducing max_period to account for
>   	 * interrupt latency (and we are being very conservative).
>   	 */
> -	if (left > (armpmu->max_period >> 1))
> -		left = armpmu->max_period >> 1;
> +	if (left > (max_period >> 1))
> +		left = (max_period >> 1);
>   
>   	local64_set(&hwc->prev_count, (u64)-left);
>   
> @@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>   	struct hw_perf_event *hwc = &event->hw;
>   	u64 delta, prev_raw_count, new_raw_count;
> +	u64 max_period = arm_pmu_max_period(armpmu);
>   
>   again:
>   	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   			     new_raw_count) != prev_raw_count)
>   		goto again;
>   
> -	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
> +	delta = (new_raw_count - prev_raw_count) & max_period;
>   
>   	local64_add(delta, &event->count);
>   	local64_sub(delta, &hwc->period_left);
> @@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
>   		 * is far less likely to overtake the previous one unless
>   		 * you have some serious IRQ latency issues.
>   		 */
> -		hwc->sample_period  = armpmu->max_period >> 1;
> +		hwc->sample_period  = arm_pmu_max_period(armpmu) >> 1;
>   		hwc->last_period    = hwc->sample_period;
>   		local64_set(&hwc->period_left, hwc->sample_period);
>   	}
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 40036a5..c8c31cf 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -94,7 +94,7 @@ struct arm_pmu {
>   	void		(*reset)(void *);
>   	int		(*map_event)(struct perf_event *event);
>   	int		num_events;
> -	u64		max_period;
> +	u8		counter_width;
>   	bool		secure_access; /* 32-bit ARM only */
>   #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
>   	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/6] arm_pmu: Refactor maximum period handling
@ 2018-05-18 13:10     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Suzuki,

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Each PMU defines their max_period of the counter as the maximum
> value that can be counted. In order to support chaining of the
> counters, change this parameter to indicate the counter width
> to deduce the max_period. This will be useful to compute the
> max_period for chained counters.
> 
> No functional changes.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm/kernel/perf_event_v6.c     |  4 ++--
>   arch/arm/kernel/perf_event_v7.c     |  2 +-
>   arch/arm/kernel/perf_event_xscale.c |  4 ++--
>   arch/arm64/kernel/perf_event.c      |  2 +-
>   drivers/perf/arm_pmu.c              | 16 ++++++++++++----
>   include/linux/perf/arm_pmu.h        |  2 +-
>   6 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
> index 1d7061a..d52a3fa 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/arch/arm/kernel/perf_event_v6.c
> @@ -497,7 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= armv6pmu_stop;
>   	cpu_pmu->map_event	= armv6_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   }
>   
>   static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
> @@ -548,7 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= armv6pmu_stop;
>   	cpu_pmu->map_event	= armv6mpcore_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width  = 32;
>   
>   	return 0;
>   }
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index 870b66c..3d8ec6a 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1171,7 +1171,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->start		= armv7pmu_start;
>   	cpu_pmu->stop		= armv7pmu_stop;
>   	cpu_pmu->reset		= armv7pmu_reset;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   };
>   
>   static void armv7_read_num_pmnc_events(void *info)
> diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
> index fcf218d..6eb0e21 100644
> --- a/arch/arm/kernel/perf_event_xscale.c
> +++ b/arch/arm/kernel/perf_event_xscale.c
> @@ -375,7 +375,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= xscale1pmu_stop;
>   	cpu_pmu->map_event	= xscale_map_event;
>   	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   
>   	return 0;
>   }
> @@ -745,7 +745,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->stop		= xscale2pmu_stop;
>   	cpu_pmu->map_event	= xscale_map_event;
>   	cpu_pmu->num_events	= 5;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
> +	cpu_pmu->counter_width	= 32;
>   
>   	return 0;
>   }
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 85a251b..408f92c 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -961,7 +961,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->start			= armv8pmu_start,
>   	cpu_pmu->stop			= armv8pmu_stop,
>   	cpu_pmu->reset			= armv8pmu_reset,
> -	cpu_pmu->max_period		= (1LLU << 32) - 1,
> +	cpu_pmu->counter_width		= 32;

Given that none of the 6 instances above differ, this looks suspiciously 
redundant. AFAICS max_period has been there from the very beginning with 
no explicit justification, so I can only assume it was anticipating more 
future variability than actually turned out. With 8 years of hindsight 
now, I think it would be reasonable to assume that counters are 32-bit 
except in certain special cases where they might be 64-bit; since that 
can't be described by a single "counter size" value anyway, and by the 
end of this series we have the means to handle it correctly via flags, I 
propose that we just get rid of this and hard-code 32 in 
arm_pmu_max_period().

>   	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
>   
>   	return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 1a0d340..e23e1a1 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -28,6 +28,11 @@
>   static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>   static DEFINE_PER_CPU(int, cpu_irq);
>   
> +static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
> +{
> +	return (((u64)1) << (pmu->counter_width)) - 1;

Nit: "1ULL << ..."

Otherwise, looks fine to me.

Robin.

> +}
> +
>   static int
>   armpmu_map_cache_event(const unsigned (*cache_map)
>   				      [PERF_COUNT_HW_CACHE_MAX]
> @@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
>   	struct hw_perf_event *hwc = &event->hw;
>   	s64 left = local64_read(&hwc->period_left);
>   	s64 period = hwc->sample_period;
> +	u64 max_period;
>   	int ret = 0;
>   
> +	max_period = arm_pmu_max_period(armpmu);
>   	if (unlikely(left <= -period)) {
>   		left = period;
>   		local64_set(&hwc->period_left, left);
> @@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
>   	 * effect we are reducing max_period to account for
>   	 * interrupt latency (and we are being very conservative).
>   	 */
> -	if (left > (armpmu->max_period >> 1))
> -		left = armpmu->max_period >> 1;
> +	if (left > (max_period >> 1))
> +		left = (max_period >> 1);
>   
>   	local64_set(&hwc->prev_count, (u64)-left);
>   
> @@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>   	struct hw_perf_event *hwc = &event->hw;
>   	u64 delta, prev_raw_count, new_raw_count;
> +	u64 max_period = arm_pmu_max_period(armpmu);
>   
>   again:
>   	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   			     new_raw_count) != prev_raw_count)
>   		goto again;
>   
> -	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
> +	delta = (new_raw_count - prev_raw_count) & max_period;
>   
>   	local64_add(delta, &event->count);
>   	local64_sub(delta, &hwc->period_left);
> @@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
>   		 * is far less likely to overtake the previous one unless
>   		 * you have some serious IRQ latency issues.
>   		 */
> -		hwc->sample_period  = armpmu->max_period >> 1;
> +		hwc->sample_period  = arm_pmu_max_period(armpmu) >> 1;
>   		hwc->last_period    = hwc->sample_period;
>   		local64_set(&hwc->period_left, hwc->sample_period);
>   	}
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 40036a5..c8c31cf 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -94,7 +94,7 @@ struct arm_pmu {
>   	void		(*reset)(void *);
>   	int		(*map_event)(struct perf_event *event);
>   	int		num_events;
> -	u64		max_period;
> +	u8		counter_width;
>   	bool		secure_access; /* 32-bit ARM only */
>   #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
>   	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/6] arm_pmu: Add support for long event counters
  2018-05-18 10:22   ` Suzuki K Poulose
@ 2018-05-18 13:22     ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:22 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Each PMU has a set of fixed width event counters. But in some
> special cases, the events could be counted using a counter which
> effectively has twice the normal width of a coutner.
> e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
> only the CPU cylces. Also, the PMU can chain the event counters
> to effectively count as a 64bit counter.

Nit: a few typos in that paragraph.

> Add support for tracking the events that uses double the normal
> counter size. This only affects the periods set for each counter.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   drivers/perf/arm_pmu.c       | 25 ++++++++++++++++++++++---
>   include/linux/perf/arm_pmu.h |  6 ++++++
>   2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index e23e1a1..1adabb5 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -33,6 +33,21 @@ static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
>   	return (((u64)1) << (pmu->counter_width)) - 1;
>   }
>   
> +static inline u64 arm_pmu_get_event_max_period(struct arm_pmu *pmu,

The "get_" here seems a bit at odds with arm_pmu_max_period() - I'd be 
inlined to go for slightly more consistent naming (with a slight 
personal preference towards removing it here rather than adding it there)

> +					       struct perf_event *event)
> +{
> +	u64 period = arm_pmu_max_period(pmu);
> +
> +	/*
> +	 * To prevent shift-counter-overflow warning, create the
> +	 * mask, by shift + OR sequence.
> +	 */
> +	if (event->hw.flags & ARMPMU_EVT_LONG)
> +		period = (period << pmu->counter_width) | period;
> +
> +	return period;
> +}
> +
>   static int
>   armpmu_map_cache_event(const unsigned (*cache_map)
>   				      [PERF_COUNT_HW_CACHE_MAX]
> @@ -122,7 +137,7 @@ int armpmu_event_set_period(struct perf_event *event)
>   	u64 max_period;
>   	int ret = 0;
>   
> -	max_period = arm_pmu_max_period(armpmu);
> +	max_period = arm_pmu_get_event_max_period(armpmu, event);
>   	if (unlikely(left <= -period)) {
>   		left = period;
>   		local64_set(&hwc->period_left, left);
> @@ -148,7 +163,7 @@ int armpmu_event_set_period(struct perf_event *event)
>   
>   	local64_set(&hwc->prev_count, (u64)-left);
>   
> -	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
> +	armpmu->write_counter(event, (u64)(-left) & max_period);
>   
>   	perf_event_update_userpage(event);
>   
> @@ -160,7 +175,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>   	struct hw_perf_event *hwc = &event->hw;
>   	u64 delta, prev_raw_count, new_raw_count;
> -	u64 max_period = arm_pmu_max_period(armpmu);
> +	u64 max_period = arm_pmu_get_event_max_period(armpmu, event);
>   
>   again:
>   	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -368,6 +383,7 @@ __hw_perf_event_init(struct perf_event *event)
>   	struct hw_perf_event *hwc = &event->hw;
>   	int mapping;
>   
> +	hwc->flags = 0;
>   	mapping = armpmu->map_event(event);
>   
>   	if (mapping < 0) {
> @@ -670,6 +686,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>   			continue;
>   
>   		event = hw_events->events[idx];
> +		/* Chained events could use multiple counters */
> +		if (!event)
> +			continue;

This hunk looks a little out of place; does it perhaps belong to patch #6?

Robin.

>   
>   		switch (cmd) {
>   		case CPU_PM_ENTER:
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 705e8c3..ed7e3f7 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -25,6 +25,12 @@
>    */
>   #define ARMPMU_MAX_HWEVENTS		32
>   
> +/*
> + * ARM PMU hw_event flags
> + */
> +/* Event uses a counter with double the normal width */
> +#define ARMPMU_EVT_LONG			1
> +
>   #define HW_OP_UNSUPPORTED		0xFFFF
>   #define C(_x)				PERF_COUNT_HW_CACHE_##_x
>   #define CACHE_OP_UNSUPPORTED		0xFFFF
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/6] arm_pmu: Add support for long event counters
@ 2018-05-18 13:22     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Each PMU has a set of fixed width event counters. But in some
> special cases, the events could be counted using a counter which
> effectively has twice the normal width of a coutner.
> e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
> only the CPU cylces. Also, the PMU can chain the event counters
> to effectively count as a 64bit counter.

Nit: a few typos in that paragraph.

> Add support for tracking the events that uses double the normal
> counter size. This only affects the periods set for each counter.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   drivers/perf/arm_pmu.c       | 25 ++++++++++++++++++++++---
>   include/linux/perf/arm_pmu.h |  6 ++++++
>   2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index e23e1a1..1adabb5 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -33,6 +33,21 @@ static inline u64 arm_pmu_max_period(struct arm_pmu *pmu)
>   	return (((u64)1) << (pmu->counter_width)) - 1;
>   }
>   
> +static inline u64 arm_pmu_get_event_max_period(struct arm_pmu *pmu,

The "get_" here seems a bit at odds with arm_pmu_max_period() - I'd be 
inlined to go for slightly more consistent naming (with a slight 
personal preference towards removing it here rather than adding it there)

> +					       struct perf_event *event)
> +{
> +	u64 period = arm_pmu_max_period(pmu);
> +
> +	/*
> +	 * To prevent shift-counter-overflow warning, create the
> +	 * mask, by shift + OR sequence.
> +	 */
> +	if (event->hw.flags & ARMPMU_EVT_LONG)
> +		period = (period << pmu->counter_width) | period;
> +
> +	return period;
> +}
> +
>   static int
>   armpmu_map_cache_event(const unsigned (*cache_map)
>   				      [PERF_COUNT_HW_CACHE_MAX]
> @@ -122,7 +137,7 @@ int armpmu_event_set_period(struct perf_event *event)
>   	u64 max_period;
>   	int ret = 0;
>   
> -	max_period = arm_pmu_max_period(armpmu);
> +	max_period = arm_pmu_get_event_max_period(armpmu, event);
>   	if (unlikely(left <= -period)) {
>   		left = period;
>   		local64_set(&hwc->period_left, left);
> @@ -148,7 +163,7 @@ int armpmu_event_set_period(struct perf_event *event)
>   
>   	local64_set(&hwc->prev_count, (u64)-left);
>   
> -	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
> +	armpmu->write_counter(event, (u64)(-left) & max_period);
>   
>   	perf_event_update_userpage(event);
>   
> @@ -160,7 +175,7 @@ u64 armpmu_event_update(struct perf_event *event)
>   	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>   	struct hw_perf_event *hwc = &event->hw;
>   	u64 delta, prev_raw_count, new_raw_count;
> -	u64 max_period = arm_pmu_max_period(armpmu);
> +	u64 max_period = arm_pmu_get_event_max_period(armpmu, event);
>   
>   again:
>   	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -368,6 +383,7 @@ __hw_perf_event_init(struct perf_event *event)
>   	struct hw_perf_event *hwc = &event->hw;
>   	int mapping;
>   
> +	hwc->flags = 0;
>   	mapping = armpmu->map_event(event);
>   
>   	if (mapping < 0) {
> @@ -670,6 +686,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>   			continue;
>   
>   		event = hw_events->events[idx];
> +		/* Chained events could use multiple counters */
> +		if (!event)
> +			continue;

This hunk looks a little out of place; does it perhaps belong to patch #6?

Robin.

>   
>   		switch (cmd) {
>   		case CPU_PM_ENTER:
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 705e8c3..ed7e3f7 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -25,6 +25,12 @@
>    */
>   #define ARMPMU_MAX_HWEVENTS		32
>   
> +/*
> + * ARM PMU hw_event flags
> + */
> +/* Event uses a counter with double the normal width */
> +#define ARMPMU_EVT_LONG			1
> +
>   #define HW_OP_UNSUPPORTED		0xFFFF
>   #define C(_x)				PERF_COUNT_HW_CACHE_##_x
>   #define CACHE_OP_UNSUPPORTED		0xFFFF
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 10:22   ` Suzuki K Poulose
@ 2018-05-18 13:49     ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:49 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Add support for chained event counters. PMUv3 allows chaining
> a pair of adjacent PMU counters (with the lower counter number
> being always "even"). The low counter is programmed to count
> the event of interest and the high counter(odd numbered) is
> programmed with a special event code (0x1e - Chain). Thus
> we need special allocation schemes to make the full use of
> available counters. So, we allocate the counters from either
> ends. i.e, chained counters are allocated from the lower
> end in pairs of two and the normal counters are allocated
> from the higher number. Also makes necessary changes to
> handle the chained events as a single event with 2 counters.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 202 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index ea8e060..5f81cd0 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
>   };
>   
>   PMU_FORMAT_ATTR(event, "config:0-15");
> +PMU_FORMAT_ATTR(chain, "config1:0");
>   
>   static struct attribute *armv8_pmuv3_format_attrs[] = {
>   	&format_attr_event.attr,
> +	&format_attr_chain.attr,
>   	NULL,
>   };
>   
> @@ -457,6 +459,12 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
>   	.attrs = armv8_pmuv3_format_attrs,
>   };
>   
> +static bool armv8pmu_event_is_chained(struct perf_event *event)
> +{
> +	return event->attr.config1 & 0x1;
> +}
> +
> +
>   /*
>    * Perf Events' indices
>    */
> @@ -512,6 +520,36 @@ static inline int armv8pmu_select_counter(int idx)
>   	return idx;
>   }
>   
> +static inline u32 armv8pmu_read_evcntr(int idx)
> +{
> +	return (armv8pmu_select_counter(idx) == idx) ?
> +	       read_sysreg(pmxevcntr_el0) : 0;
> +}
> +
> +static inline u64 armv8pmu_read_chain_counter(int idx)
> +{
> +	u64 prev_hi, hi, lo;
> +
> +	do {
> +		prev_hi = armv8pmu_read_evcntr(idx);
> +		isb();
> +		lo = armv8pmu_read_evcntr(idx - 1);
> +		isb();
> +		hi = armv8pmu_read_evcntr(idx);
> +		isb();
> +	} while (prev_hi != hi);

Is it worth trying to elide that last isb() in the highly likely case 
that we don't need it?

> +
> +	return (hi << 32) | lo;
> +}
> +
> +static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	return armv8pmu_event_is_chained(event) ?
> +	       armv8pmu_read_chain_counter(idx) : armv8pmu_read_evcntr(idx);
> +}
> +
>   static inline u64 armv8pmu_read_counter(struct perf_event *event)
>   {
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -524,12 +562,37 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>   			smp_processor_id(), idx);
>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>   		value = read_sysreg(pmccntr_el0);
> -	else if (armv8pmu_select_counter(idx) == idx)
> -		value = read_sysreg(pmxevcntr_el0);
> +	else
> +		value = armv8pmu_read_hw_counter(event);
>   
>   	return value;
>   }
>   
> +static inline void armv8pmu_write_evcntr(int idx, u32 value)
> +{
> +	if (armv8pmu_select_counter(idx) == idx)
> +		write_sysreg(value, pmxevcntr_el0);
> +}
> +
> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
> +{
> +	armv8pmu_write_evcntr(idx, value >> 32);
> +	isb();
> +	armv8pmu_write_evcntr(idx - 1, value);
> +	isb();

Either that isb() is unnecessary, or we are (and have been) missing one 
after a non-chained write.

> +}
> +
> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
> +					     u64 value)
> +{
> +	int idx = event->hw.idx;
> +
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_write_chain_counter(idx, value);
> +	else
> +		armv8pmu_write_evcntr(idx, value);
> +}
> +
>   static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>   {
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -541,8 +604,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>   			smp_processor_id(), idx);
>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>   		write_sysreg(value, pmccntr_el0);
> -	else if (armv8pmu_select_counter(idx) == idx)
> -		write_sysreg(value, pmxevcntr_el0);
> +	else
> +		armv8pmu_write_hw_counter(event, value);
>   }
>   
>   static inline void armv8pmu_write_evtype(int idx, u32 val)
> @@ -553,6 +616,28 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
>   	}
>   }
>   
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, write the high counter event type
> +	 * followed by the low counter.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
> +
> +		/* Set the filters as that of the main event for chain */
> +		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
> +		armv8pmu_write_evtype(idx, chain_evt);
> +		isb();
> +		idx--;
> +	}
> +
> +	armv8pmu_write_evtype(idx, hwc->config_base);
> +}
> +
>   static inline int armv8pmu_enable_counter(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -560,6 +645,23 @@ static inline int armv8pmu_enable_counter(int idx)
>   	return idx;
>   }
>   
> +static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	/*
> +	 * For chained events, we enable the high counter followed by
> +	 * the low counter.
> +	 */
> +	armv8pmu_enable_counter(idx);
> +
> +	if (armv8pmu_event_is_chained(event)) {
> +		isb();
> +		armv8pmu_enable_counter(idx - 1);
> +	}
> +
> +}
> +
>   static inline int armv8pmu_disable_counter(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
>   	return idx;
>   }
>   
> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;

Nit: might as well drop this and be consistent with the enable case.

> +	int idx = hwc->idx;
> +
> +	/*
> +	 * Disable the low counter followed by the high counter
> +	 * for chained events.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		armv8pmu_disable_counter(idx - 1);
> +		isb();
> +	}
> +
> +	armv8pmu_disable_counter(idx);
> +}
> +
> +
>   static inline int armv8pmu_enable_intens(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -574,6 +694,12 @@ static inline int armv8pmu_enable_intens(int idx)
>   	return idx;
>   }
>   
> +static inline int armv8pmu_enable_event_irq(struct perf_event *event)
> +{
> +	/* For chained events, enable the interrupt for only the high counter */
> +	return armv8pmu_enable_intens(event->hw.idx);
> +}
> +
>   static inline int armv8pmu_disable_intens(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -586,6 +712,11 @@ static inline int armv8pmu_disable_intens(int idx)
>   	return idx;
>   }
>   
> +static inline int armv8pmu_disable_event_irq(struct perf_event *event)
> +{
> +	return armv8pmu_disable_intens(event->hw.idx);
> +}
> +
>   static inline u32 armv8pmu_getreset_flags(void)
>   {
>   	u32 value;
> @@ -603,10 +734,8 @@ static inline u32 armv8pmu_getreset_flags(void)
>   static void armv8pmu_enable_event(struct perf_event *event)
>   {
>   	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>   	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>   
>   	/*
>   	 * Enable counter and interrupt, and set the counter to count
> @@ -617,22 +746,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
>   	/*
>   	 * Disable counter
>   	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>   
>   	/*
>   	 * Set event (if destined for PMNx counters).
>   	 */
> -	armv8pmu_write_evtype(idx, hwc->config_base);
> +	armv8pmu_write_event_type(event);
>   
>   	/*
>   	 * Enable interrupt for this counter
>   	 */
> -	armv8pmu_enable_intens(idx);
> +	armv8pmu_enable_event_irq(event);
>   
>   	/*
>   	 * Enable counter
>   	 */
> -	armv8pmu_enable_counter(idx);
> +	armv8pmu_enable_event_counter(event);
>   
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
> @@ -640,10 +769,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
>   static void armv8pmu_disable_event(struct perf_event *event)
>   {
>   	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>   	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>   
>   	/*
>   	 * Disable counter and interrupt
> @@ -651,14 +778,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
>   	raw_spin_lock_irqsave(&events->pmu_lock, flags);
>   
>   	/*
> -	 * Disable counter
> +	 * Disable counter for this event
>   	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>   
>   	/*
> -	 * Disable interrupt for this counter
> +	 * Disable interrupt for this event counter
>   	 */
> -	armv8pmu_disable_intens(idx);
> +	armv8pmu_disable_event_irq(event);
>   
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
> @@ -747,6 +874,39 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
>   
> +static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> +				    struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	for (idx = cpu_pmu->num_events - 1; idx >= ARMV8_IDX_COUNTER0; ++idx)
> +		if (!test_and_set_bit(idx, cpuc->used_mask))
> +			return idx;
> +	return -EAGAIN;
> +}
> +
> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> +				   struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	/*
> +	 * Chaining requires two consecutive event counters, where
> +	 * the lower idx must be even. We allocate chain events
> +	 * from the lower index (i.e, counter0) and the single events
> +	 * from the higher end to maximise the utilisation.
> +	 */
> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
> +			/* Check if the preceding even counter is available */
> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> +				return idx;
> +			/* Release the Odd counter */
> +			clear_bit(idx, cpuc->used_mask);
> +		}
> +	return -EAGAIN;
> +}
> +
>   static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   				  struct perf_event *event)
>   {
> @@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   	struct hw_perf_event *hwc = &event->hw;
>   	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>   
> -	/* Always prefer to place a cycle counter into the cycle counter. */
> +	/*
> +	 * Always prefer to place a cycle counter into the cycle counter
> +	 * irrespective of whether we are counting 32bit/64bit

I don't think that comment change adds much :/

> +	 */
>   	if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>   		if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
>   			return ARMV8_IDX_CYCLE_COUNTER;
> @@ -764,13 +927,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   	/*
>   	 * Otherwise use events counters
>   	 */
> -	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
> -		if (!test_and_set_bit(idx, cpuc->used_mask))
> -			return idx;
> -	}
> +	idx = armv8pmu_event_is_chained(event) ?
> +		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
> +		armv8pmu_get_single_idx(cpuc, cpu_pmu);
>   
> -	/* The counters are all in use. */
> -	return -EAGAIN;
> +	return idx;
> +}
> +
> +static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	clear_bit(hwc->idx, cpuc->used_mask);
> +	if (armv8pmu_event_is_chained(event))
> +		clear_bit(hwc->idx - 1, cpuc->used_mask);
>   }
>   
>   /*
> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>   				       &armv8_pmuv3_perf_cache_map,
>   				       ARMV8_PMU_EVTYPE_EVENT);
>   
> -	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
> +	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
> +		/* Prevent chaining for cycle counter */

Why? Sure, we want to avoid executing the chaining logic if we're 
scheduling a cycles event in the dedicated counter (which is perhaps 
what the comment above wanted to say), but if one ends up allocated into 
a regular counter (e.g. if the user asks for multiple cycle counts with 
different filters), then I don't see any reason to forbid that being 
chained.

Robin.

> +		if (armv8pmu_event_is_chained(event))
> +			return -EINVAL;
>   		event->hw.flags |= ARMPMU_EVT_LONG;
> +	} else if (armv8pmu_event_is_chained(event)) {
> +		event->hw.flags |= ARMPMU_EVT_LONG;
> +	}
>   
>   	/* Onl expose micro/arch events supported by this PMU */
>   	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
> @@ -954,6 +1131,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->read_counter		= armv8pmu_read_counter,
>   	cpu_pmu->write_counter		= armv8pmu_write_counter,
>   	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
> +	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
>   	cpu_pmu->start			= armv8pmu_start,
>   	cpu_pmu->stop			= armv8pmu_stop,
>   	cpu_pmu->reset			= armv8pmu_reset,
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-18 13:49     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 13:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/05/18 11:22, Suzuki K Poulose wrote:
> Add support for chained event counters. PMUv3 allows chaining
> a pair of adjacent PMU counters (with the lower counter number
> being always "even"). The low counter is programmed to count
> the event of interest and the high counter(odd numbered) is
> programmed with a special event code (0x1e - Chain). Thus
> we need special allocation schemes to make the full use of
> available counters. So, we allocate the counters from either
> ends. i.e, chained counters are allocated from the lower
> end in pairs of two and the normal counters are allocated
> from the higher number. Also makes necessary changes to
> handle the chained events as a single event with 2 counters.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 202 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index ea8e060..5f81cd0 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
>   };
>   
>   PMU_FORMAT_ATTR(event, "config:0-15");
> +PMU_FORMAT_ATTR(chain, "config1:0");
>   
>   static struct attribute *armv8_pmuv3_format_attrs[] = {
>   	&format_attr_event.attr,
> +	&format_attr_chain.attr,
>   	NULL,
>   };
>   
> @@ -457,6 +459,12 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
>   	.attrs = armv8_pmuv3_format_attrs,
>   };
>   
> +static bool armv8pmu_event_is_chained(struct perf_event *event)
> +{
> +	return event->attr.config1 & 0x1;
> +}
> +
> +
>   /*
>    * Perf Events' indices
>    */
> @@ -512,6 +520,36 @@ static inline int armv8pmu_select_counter(int idx)
>   	return idx;
>   }
>   
> +static inline u32 armv8pmu_read_evcntr(int idx)
> +{
> +	return (armv8pmu_select_counter(idx) == idx) ?
> +	       read_sysreg(pmxevcntr_el0) : 0;
> +}
> +
> +static inline u64 armv8pmu_read_chain_counter(int idx)
> +{
> +	u64 prev_hi, hi, lo;
> +
> +	do {
> +		prev_hi = armv8pmu_read_evcntr(idx);
> +		isb();
> +		lo = armv8pmu_read_evcntr(idx - 1);
> +		isb();
> +		hi = armv8pmu_read_evcntr(idx);
> +		isb();
> +	} while (prev_hi != hi);

Is it worth trying to elide that last isb() in the highly likely case 
that we don't need it?

> +
> +	return (hi << 32) | lo;
> +}
> +
> +static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	return armv8pmu_event_is_chained(event) ?
> +	       armv8pmu_read_chain_counter(idx) : armv8pmu_read_evcntr(idx);
> +}
> +
>   static inline u64 armv8pmu_read_counter(struct perf_event *event)
>   {
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -524,12 +562,37 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>   			smp_processor_id(), idx);
>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>   		value = read_sysreg(pmccntr_el0);
> -	else if (armv8pmu_select_counter(idx) == idx)
> -		value = read_sysreg(pmxevcntr_el0);
> +	else
> +		value = armv8pmu_read_hw_counter(event);
>   
>   	return value;
>   }
>   
> +static inline void armv8pmu_write_evcntr(int idx, u32 value)
> +{
> +	if (armv8pmu_select_counter(idx) == idx)
> +		write_sysreg(value, pmxevcntr_el0);
> +}
> +
> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
> +{
> +	armv8pmu_write_evcntr(idx, value >> 32);
> +	isb();
> +	armv8pmu_write_evcntr(idx - 1, value);
> +	isb();

Either that isb() is unnecessary, or we are (and have been) missing one 
after a non-chained write.

> +}
> +
> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
> +					     u64 value)
> +{
> +	int idx = event->hw.idx;
> +
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_write_chain_counter(idx, value);
> +	else
> +		armv8pmu_write_evcntr(idx, value);
> +}
> +
>   static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>   {
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -541,8 +604,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>   			smp_processor_id(), idx);
>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>   		write_sysreg(value, pmccntr_el0);
> -	else if (armv8pmu_select_counter(idx) == idx)
> -		write_sysreg(value, pmxevcntr_el0);
> +	else
> +		armv8pmu_write_hw_counter(event, value);
>   }
>   
>   static inline void armv8pmu_write_evtype(int idx, u32 val)
> @@ -553,6 +616,28 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
>   	}
>   }
>   
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, write the high counter event type
> +	 * followed by the low counter.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
> +
> +		/* Set the filters as that of the main event for chain */
> +		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
> +		armv8pmu_write_evtype(idx, chain_evt);
> +		isb();
> +		idx--;
> +	}
> +
> +	armv8pmu_write_evtype(idx, hwc->config_base);
> +}
> +
>   static inline int armv8pmu_enable_counter(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -560,6 +645,23 @@ static inline int armv8pmu_enable_counter(int idx)
>   	return idx;
>   }
>   
> +static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	/*
> +	 * For chained events, we enable the high counter followed by
> +	 * the low counter.
> +	 */
> +	armv8pmu_enable_counter(idx);
> +
> +	if (armv8pmu_event_is_chained(event)) {
> +		isb();
> +		armv8pmu_enable_counter(idx - 1);
> +	}
> +
> +}
> +
>   static inline int armv8pmu_disable_counter(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
>   	return idx;
>   }
>   
> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;

Nit: might as well drop this and be consistent with the enable case.

> +	int idx = hwc->idx;
> +
> +	/*
> +	 * Disable the low counter followed by the high counter
> +	 * for chained events.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		armv8pmu_disable_counter(idx - 1);
> +		isb();
> +	}
> +
> +	armv8pmu_disable_counter(idx);
> +}
> +
> +
>   static inline int armv8pmu_enable_intens(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -574,6 +694,12 @@ static inline int armv8pmu_enable_intens(int idx)
>   	return idx;
>   }
>   
> +static inline int armv8pmu_enable_event_irq(struct perf_event *event)
> +{
> +	/* For chained events, enable the interrupt for only the high counter */
> +	return armv8pmu_enable_intens(event->hw.idx);
> +}
> +
>   static inline int armv8pmu_disable_intens(int idx)
>   {
>   	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -586,6 +712,11 @@ static inline int armv8pmu_disable_intens(int idx)
>   	return idx;
>   }
>   
> +static inline int armv8pmu_disable_event_irq(struct perf_event *event)
> +{
> +	return armv8pmu_disable_intens(event->hw.idx);
> +}
> +
>   static inline u32 armv8pmu_getreset_flags(void)
>   {
>   	u32 value;
> @@ -603,10 +734,8 @@ static inline u32 armv8pmu_getreset_flags(void)
>   static void armv8pmu_enable_event(struct perf_event *event)
>   {
>   	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>   	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>   
>   	/*
>   	 * Enable counter and interrupt, and set the counter to count
> @@ -617,22 +746,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
>   	/*
>   	 * Disable counter
>   	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>   
>   	/*
>   	 * Set event (if destined for PMNx counters).
>   	 */
> -	armv8pmu_write_evtype(idx, hwc->config_base);
> +	armv8pmu_write_event_type(event);
>   
>   	/*
>   	 * Enable interrupt for this counter
>   	 */
> -	armv8pmu_enable_intens(idx);
> +	armv8pmu_enable_event_irq(event);
>   
>   	/*
>   	 * Enable counter
>   	 */
> -	armv8pmu_enable_counter(idx);
> +	armv8pmu_enable_event_counter(event);
>   
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
> @@ -640,10 +769,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
>   static void armv8pmu_disable_event(struct perf_event *event)
>   {
>   	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>   	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>   
>   	/*
>   	 * Disable counter and interrupt
> @@ -651,14 +778,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
>   	raw_spin_lock_irqsave(&events->pmu_lock, flags);
>   
>   	/*
> -	 * Disable counter
> +	 * Disable counter for this event
>   	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>   
>   	/*
> -	 * Disable interrupt for this counter
> +	 * Disable interrupt for this event counter
>   	 */
> -	armv8pmu_disable_intens(idx);
> +	armv8pmu_disable_event_irq(event);
>   
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
> @@ -747,6 +874,39 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
>   	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>   }
>   
> +static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> +				    struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	for (idx = cpu_pmu->num_events - 1; idx >= ARMV8_IDX_COUNTER0; ++idx)
> +		if (!test_and_set_bit(idx, cpuc->used_mask))
> +			return idx;
> +	return -EAGAIN;
> +}
> +
> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> +				   struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	/*
> +	 * Chaining requires two consecutive event counters, where
> +	 * the lower idx must be even. We allocate chain events
> +	 * from the lower index (i.e, counter0) and the single events
> +	 * from the higher end to maximise the utilisation.
> +	 */
> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
> +			/* Check if the preceding even counter is available */
> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> +				return idx;
> +			/* Release the Odd counter */
> +			clear_bit(idx, cpuc->used_mask);
> +		}
> +	return -EAGAIN;
> +}
> +
>   static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   				  struct perf_event *event)
>   {
> @@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   	struct hw_perf_event *hwc = &event->hw;
>   	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>   
> -	/* Always prefer to place a cycle counter into the cycle counter. */
> +	/*
> +	 * Always prefer to place a cycle counter into the cycle counter
> +	 * irrespective of whether we are counting 32bit/64bit

I don't think that comment change adds much :/

> +	 */
>   	if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>   		if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
>   			return ARMV8_IDX_CYCLE_COUNTER;
> @@ -764,13 +927,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   	/*
>   	 * Otherwise use events counters
>   	 */
> -	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
> -		if (!test_and_set_bit(idx, cpuc->used_mask))
> -			return idx;
> -	}
> +	idx = armv8pmu_event_is_chained(event) ?
> +		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
> +		armv8pmu_get_single_idx(cpuc, cpu_pmu);
>   
> -	/* The counters are all in use. */
> -	return -EAGAIN;
> +	return idx;
> +}
> +
> +static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	clear_bit(hwc->idx, cpuc->used_mask);
> +	if (armv8pmu_event_is_chained(event))
> +		clear_bit(hwc->idx - 1, cpuc->used_mask);
>   }
>   
>   /*
> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>   				       &armv8_pmuv3_perf_cache_map,
>   				       ARMV8_PMU_EVTYPE_EVENT);
>   
> -	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
> +	if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
> +		/* Prevent chaining for cycle counter */

Why? Sure, we want to avoid executing the chaining logic if we're 
scheduling a cycles event in the dedicated counter (which is perhaps 
what the comment above wanted to say), but if one ends up allocated into 
a regular counter (e.g. if the user asks for multiple cycle counts with 
different filters), then I don't see any reason to forbid that being 
chained.

Robin.

> +		if (armv8pmu_event_is_chained(event))
> +			return -EINVAL;
>   		event->hw.flags |= ARMPMU_EVT_LONG;
> +	} else if (armv8pmu_event_is_chained(event)) {
> +		event->hw.flags |= ARMPMU_EVT_LONG;
> +	}
>   
>   	/* Onl expose micro/arch events supported by this PMU */
>   	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
> @@ -954,6 +1131,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>   	cpu_pmu->read_counter		= armv8pmu_read_counter,
>   	cpu_pmu->write_counter		= armv8pmu_write_counter,
>   	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
> +	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
>   	cpu_pmu->start			= armv8pmu_start,
>   	cpu_pmu->stop			= armv8pmu_stop,
>   	cpu_pmu->reset			= armv8pmu_reset,
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 10:22   ` Suzuki K Poulose
@ 2018-05-18 14:57     ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 14:57 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

One more thing now that I've actually looked at the Arm ARM...

On 18/05/18 11:22, Suzuki K Poulose wrote:
[...]
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, write the high counter event type
> +	 * followed by the low counter.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
> +
> +		/* Set the filters as that of the main event for chain */
> +		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;

The description of the chain event says that the filtering must only be 
set on the lower counter, and that the chain event itself should be set 
to count everything.

> +		armv8pmu_write_evtype(idx, chain_evt);
> +		isb();
> +		idx--;
> +	}
> +
> +	armv8pmu_write_evtype(idx, hwc->config_base);

It also says that the 'real' event should be set up first and the chain 
event second, with the rather ominous warning of "If software does not 
program the event in this way, the count becomes UNPREDICTABLE."

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-18 14:57     ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-18 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

One more thing now that I've actually looked at the Arm ARM...

On 18/05/18 11:22, Suzuki K Poulose wrote:
[...]
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, write the high counter event type
> +	 * followed by the low counter.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
> +
> +		/* Set the filters as that of the main event for chain */
> +		chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;

The description of the chain event says that the filtering must only be 
set on the lower counter, and that the chain event itself should be set 
to count everything.

> +		armv8pmu_write_evtype(idx, chain_evt);
> +		isb();
> +		idx--;
> +	}
> +
> +	armv8pmu_write_evtype(idx, hwc->config_base);

It also says that the 'real' event should be set up first and the chain 
event second, with the rather ominous warning of "If software does not 
program the event in this way, the count becomes UNPREDICTABLE."

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 13:49     ` Robin Murphy
@ 2018-05-18 15:57       ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 15:57 UTC (permalink / raw)
  To: Robin Murphy, linux-arm-kernel; +Cc: linux-kernel, mark.rutland, will.deacon

Hi Robin,

On 18/05/18 14:49, Robin Murphy wrote:
> On 18/05/18 11:22, Suzuki K Poulose wrote:
>> Add support for chained event counters. PMUv3 allows chaining
>> a pair of adjacent PMU counters (with the lower counter number
>> being always "even"). The low counter is programmed to count
>> the event of interest and the high counter(odd numbered) is
>> programmed with a special event code (0x1e - Chain). Thus
>> we need special allocation schemes to make the full use of
>> available counters. So, we allocate the counters from either
>> ends. i.e, chained counters are allocated from the lower
>> end in pairs of two and the normal counters are allocated
>> from the higher number. Also makes necessary changes to
>> handle the chained events as a single event with 2 counters.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 202 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index ea8e060..5f81cd0 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {

..

>> +static inline u64 armv8pmu_read_chain_counter(int idx)
>> +{
>> +    u64 prev_hi, hi, lo;
>> +
>> +    do {
>> +        prev_hi = armv8pmu_read_evcntr(idx);
>> +        isb();
>> +        lo = armv8pmu_read_evcntr(idx - 1);
>> +        isb();
>> +        hi = armv8pmu_read_evcntr(idx);
>> +        isb();
>> +    } while (prev_hi != hi);
> 
> Is it worth trying to elide that last isb() in the highly likely case that we don't need it?

You're right. Also, I will rework the code to reuse the "hi".

>> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
>> +{
>> +    armv8pmu_write_evcntr(idx, value >> 32);
>> +    isb();
>> +    armv8pmu_write_evcntr(idx - 1, value);
>> +    isb();
> 
> Either that isb() is unnecessary, or we are (and have been) missing one after a non-chained write.

Thats right, it is not necessary, will remove it.


>>   static inline int armv8pmu_disable_counter(int idx)
>>   {
>>       u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>> @@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
>>       return idx;
>>   }
>> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
>> +{
>> +    struct hw_perf_event *hwc = &event->hw;
> 
> Nit: might as well drop this and be consistent with the enable case.

Sure.

>>   static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>>                     struct perf_event *event)
>>   {
>> @@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>>       struct hw_perf_event *hwc = &event->hw;
>>       unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>> -    /* Always prefer to place a cycle counter into the cycle counter. */
>> +    /*
>> +     * Always prefer to place a cycle counter into the cycle counter
>> +     * irrespective of whether we are counting 32bit/64bit
> 
> I don't think that comment change adds much :/
> 

Thats a left over from rebasing. Thanks for spotting.

>>   /*
>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>                          &armv8_pmuv3_perf_cache_map,
>>                          ARMV8_PMU_EVTYPE_EVENT);
>> -    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>> +    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>> +        /* Prevent chaining for cycle counter */
> 
> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.

Ah, I didn't think about that case. I was under the assumption that the
cycles are *only* placed on the cycle counter. I will take care of that.
Thanks for the review.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-18 15:57       ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 18/05/18 14:49, Robin Murphy wrote:
> On 18/05/18 11:22, Suzuki K Poulose wrote:
>> Add support for chained event counters. PMUv3 allows chaining
>> a pair of adjacent PMU counters (with the lower counter number
>> being always "even"). The low counter is programmed to count
>> the event of interest and the high counter(odd numbered) is
>> programmed with a special event code (0x1e - Chain). Thus
>> we need special allocation schemes to make the full use of
>> available counters. So, we allocate the counters from either
>> ends. i.e, chained counters are allocated from the lower
>> end in pairs of two and the normal counters are allocated
>> from the higher number. Also makes necessary changes to
>> handle the chained events as a single event with 2 counters.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> ? arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
>> ? 1 file changed, 202 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index ea8e060..5f81cd0 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -446,9 +446,11 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {

..

>> +static inline u64 armv8pmu_read_chain_counter(int idx)
>> +{
>> +??? u64 prev_hi, hi, lo;
>> +
>> +??? do {
>> +??????? prev_hi = armv8pmu_read_evcntr(idx);
>> +??????? isb();
>> +??????? lo = armv8pmu_read_evcntr(idx - 1);
>> +??????? isb();
>> +??????? hi = armv8pmu_read_evcntr(idx);
>> +??????? isb();
>> +??? } while (prev_hi != hi);
> 
> Is it worth trying to elide that last isb() in the highly likely case that we don't need it?

You're right. Also, I will rework the code to reuse the "hi".

>> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
>> +{
>> +??? armv8pmu_write_evcntr(idx, value >> 32);
>> +??? isb();
>> +??? armv8pmu_write_evcntr(idx - 1, value);
>> +??? isb();
> 
> Either that isb() is unnecessary, or we are (and have been) missing one after a non-chained write.

Thats right, it is not necessary, will remove it.


>> ? static inline int armv8pmu_disable_counter(int idx)
>> ? {
>> ????? u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>> @@ -567,6 +669,24 @@ static inline int armv8pmu_disable_counter(int idx)
>> ????? return idx;
>> ? }
>> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
>> +{
>> +??? struct hw_perf_event *hwc = &event->hw;
> 
> Nit: might as well drop this and be consistent with the enable case.

Sure.

>> ? static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>> ??????????????????? struct perf_event *event)
>> ? {
>> @@ -755,7 +915,10 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>> ????? struct hw_perf_event *hwc = &event->hw;
>> ????? unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>> -??? /* Always prefer to place a cycle counter into the cycle counter. */
>> +??? /*
>> +???? * Always prefer to place a cycle counter into the cycle counter
>> +???? * irrespective of whether we are counting 32bit/64bit
> 
> I don't think that comment change adds much :/
> 

Thats a left over from rebasing. Thanks for spotting.

>> ? /*
>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>> ???????????????????????? &armv8_pmuv3_perf_cache_map,
>> ???????????????????????? ARMV8_PMU_EVTYPE_EVENT);
>> -??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>> +??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>> +??????? /* Prevent chaining for cycle counter */
> 
> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.

Ah, I didn't think about that case. I was under the assumption that the
cycles are *only* placed on the cycle counter. I will take care of that.
Thanks for the review.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 14:57     ` Robin Murphy
@ 2018-05-21 10:49       ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 10:49 UTC (permalink / raw)
  To: Robin Murphy, linux-arm-kernel; +Cc: linux-kernel, mark.rutland, will.deacon

On 18/05/18 15:57, Robin Murphy wrote:
> One more thing now that I've actually looked at the Arm ARM...
> 
> On 18/05/18 11:22, Suzuki K Poulose wrote:
> [...]
>> +static inline void armv8pmu_write_event_type(struct perf_event *event)
>> +{
>> +    struct hw_perf_event *hwc = &event->hw;
>> +    int idx = hwc->idx;
>> +
>> +    /*
>> +     * For chained events, write the high counter event type
>> +     * followed by the low counter.
>> +     */
>> +    if (armv8pmu_event_is_chained(event)) {
>> +        u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
>> +
>> +        /* Set the filters as that of the main event for chain */
>> +        chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
> 
> The description of the chain event says that the filtering must only be set on the lower counter, and that the chain event itself should be set to count everything.

You're right. I intended to fix it, but missed in the rebases.
Thanks for pointing out.

> 
>> +        armv8pmu_write_evtype(idx, chain_evt);
>> +        isb();
>> +        idx--;
>> +    }
>> +
>> +    armv8pmu_write_evtype(idx, hwc->config_base);
> 
> It also says that the 'real' event should be set up first and the chain event second, with the rather ominous warning of "If software does not program the event in this way, the count becomes UNPREDICTABLE."
> 

Yep, will fix it.

Thanks for the review

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-21 10:49       ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/05/18 15:57, Robin Murphy wrote:
> One more thing now that I've actually looked at the Arm ARM...
> 
> On 18/05/18 11:22, Suzuki K Poulose wrote:
> [...]
>> +static inline void armv8pmu_write_event_type(struct perf_event *event)
>> +{
>> +??? struct hw_perf_event *hwc = &event->hw;
>> +??? int idx = hwc->idx;
>> +
>> +??? /*
>> +???? * For chained events, write the high counter event type
>> +???? * followed by the low counter.
>> +???? */
>> +??? if (armv8pmu_event_is_chained(event)) {
>> +??????? u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN;
>> +
>> +??????? /* Set the filters as that of the main event for chain */
>> +??????? chain_evt |= hwc->config_base & ~ARMV8_PMU_EVTYPE_EVENT;
> 
> The description of the chain event says that the filtering must only be set on the lower counter, and that the chain event itself should be set to count everything.

You're right. I intended to fix it, but missed in the rebases.
Thanks for pointing out.

> 
>> +??????? armv8pmu_write_evtype(idx, chain_evt);
>> +??????? isb();
>> +??????? idx--;
>> +??? }
>> +
>> +??? armv8pmu_write_evtype(idx, hwc->config_base);
> 
> It also says that the 'real' event should be set up first and the chain event second, with the rather ominous warning of "If software does not program the event in this way, the count becomes UNPREDICTABLE."
> 

Yep, will fix it.

Thanks for the review

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-18 15:57       ` Suzuki K Poulose
@ 2018-05-21 13:42         ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 13:42 UTC (permalink / raw)
  To: Robin Murphy, linux-arm-kernel; +Cc: linux-kernel, mark.rutland, will.deacon

On 18/05/18 16:57, Suzuki K Poulose wrote:
> Hi Robin,
> 
> On 18/05/18 14:49, Robin Murphy wrote:
>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>> Add support for chained event counters. PMUv3 allows chaining
>>> a pair of adjacent PMU counters (with the lower counter number
>>> being always "even"). The low counter is programmed to count
>>> the event of interest and the high counter(odd numbered) is
>>> programmed with a special event code (0x1e - Chain). Thus
>>> we need special allocation schemes to make the full use of
>>> available counters. So, we allocate the counters from either
>>> ends. i.e, chained counters are allocated from the lower
>>> end in pairs of two and the normal counters are allocated
>>> from the higher number. Also makes necessary changes to
>>> handle the chained events as a single event with 2 counters.
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

> 
>>>   /*
>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>>                          &armv8_pmuv3_perf_cache_map,
>>>                          ARMV8_PMU_EVTYPE_EVENT);
>>> -    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>> +    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>> +        /* Prevent chaining for cycle counter */
>>
>> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.
> 
> Ah, I didn't think about that case. I was under the assumption that the
> cycles are *only* placed on the cycle counter. I will take care of that.
> Thanks for the review.

Robin, Mark, Will

One potential problem I see with allowing chaining of the cycle counter
*and* the promotion of cycle event to 64bit by default is when the user
may actually be able to count 1 less event (due to the promotion of
cycle event to 64bit and thus forcing to use chain, if the cycle counter
is unavailable).

So one option is to drop automatic promotion of the cycle counter to
64bit and do it only when it is requested by the user and use either the
Cycle counter (preferred) or fall back to chaining. That way, the user
has the control over the number of events he can count using the given
set of counters.

Let me know your thoughts.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-21 13:42         ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/05/18 16:57, Suzuki K Poulose wrote:
> Hi Robin,
> 
> On 18/05/18 14:49, Robin Murphy wrote:
>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>> Add support for chained event counters. PMUv3 allows chaining
>>> a pair of adjacent PMU counters (with the lower counter number
>>> being always "even"). The low counter is programmed to count
>>> the event of interest and the high counter(odd numbered) is
>>> programmed with a special event code (0x1e - Chain). Thus
>>> we need special allocation schemes to make the full use of
>>> available counters. So, we allocate the counters from either
>>> ends. i.e, chained counters are allocated from the lower
>>> end in pairs of two and the normal counters are allocated
>>> from the higher number. Also makes necessary changes to
>>> handle the chained events as a single event with 2 counters.
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

> 
>>> ? /*
>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>> ???????????????????????? &armv8_pmuv3_perf_cache_map,
>>> ???????????????????????? ARMV8_PMU_EVTYPE_EVENT);
>>> -??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>> +??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>> +??????? /* Prevent chaining for cycle counter */
>>
>> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.
> 
> Ah, I didn't think about that case. I was under the assumption that the
> cycles are *only* placed on the cycle counter. I will take care of that.
> Thanks for the review.

Robin, Mark, Will

One potential problem I see with allowing chaining of the cycle counter
*and* the promotion of cycle event to 64bit by default is when the user
may actually be able to count 1 less event (due to the promotion of
cycle event to 64bit and thus forcing to use chain, if the cycle counter
is unavailable).

So one option is to drop automatic promotion of the cycle counter to
64bit and do it only when it is requested by the user and use either the
Cycle counter (preferred) or fall back to chaining. That way, the user
has the control over the number of events he can count using the given
set of counters.

Let me know your thoughts.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-21 13:42         ` Suzuki K Poulose
@ 2018-05-21 14:00           ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-21 14:00 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

On 21/05/18 14:42, Suzuki K Poulose wrote:
> On 18/05/18 16:57, Suzuki K Poulose wrote:
>> Hi Robin,
>>
>> On 18/05/18 14:49, Robin Murphy wrote:
>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>> Add support for chained event counters. PMUv3 allows chaining
>>>> a pair of adjacent PMU counters (with the lower counter number
>>>> being always "even"). The low counter is programmed to count
>>>> the event of interest and the high counter(odd numbered) is
>>>> programmed with a special event code (0x1e - Chain). Thus
>>>> we need special allocation schemes to make the full use of
>>>> available counters. So, we allocate the counters from either
>>>> ends. i.e, chained counters are allocated from the lower
>>>> end in pairs of two and the normal counters are allocated
>>>> from the higher number. Also makes necessary changes to
>>>> handle the chained events as a single event with 2 counters.
>>>>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
>>
>>>>   /*
>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct 
>>>> perf_event *event,
>>>>                          &armv8_pmuv3_perf_cache_map,
>>>>                          ARMV8_PMU_EVTYPE_EVENT);
>>>> -    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>> +    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>> +        /* Prevent chaining for cycle counter */
>>>
>>> Why? Sure, we want to avoid executing the chaining logic if we're 
>>> scheduling a cycles event in the dedicated counter (which is perhaps 
>>> what the comment above wanted to say), but if one ends up allocated 
>>> into a regular counter (e.g. if the user asks for multiple cycle 
>>> counts with different filters), then I don't see any reason to forbid 
>>> that being chained.
>>
>> Ah, I didn't think about that case. I was under the assumption that the
>> cycles are *only* placed on the cycle counter. I will take care of that.
>> Thanks for the review.
> 
> Robin, Mark, Will
> 
> One potential problem I see with allowing chaining of the cycle counter
> *and* the promotion of cycle event to 64bit by default is when the user
> may actually be able to count 1 less event (due to the promotion of
> cycle event to 64bit and thus forcing to use chain, if the cycle counter
> is unavailable).

Right, I didn't mean to imply we should inject the "chain" attr 
automatically for all cycles events, just that we shouldn't be rejecting 
it if the user does explicitly set it (but then just ignore it if using 
the dedicated counter).

> So one option is to drop automatic promotion of the cycle counter to
> 64bit and do it only when it is requested by the user and use either the
> Cycle counter (preferred) or fall back to chaining. That way, the user
> has the control over the number of events he can count using the given
> set of counters.

Naively, there doesn't seem to be any inherent harm in always using 
64-bit arithmetic for the dedicated counter, but it would mean that with 
multiple (non-chained) cycles events, some would be taking an interrupt 
every few seconds while one would effectively never overflow. I guess 
the question is whether that matters or not.

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-21 14:00           ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-21 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/05/18 14:42, Suzuki K Poulose wrote:
> On 18/05/18 16:57, Suzuki K Poulose wrote:
>> Hi Robin,
>>
>> On 18/05/18 14:49, Robin Murphy wrote:
>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>> Add support for chained event counters. PMUv3 allows chaining
>>>> a pair of adjacent PMU counters (with the lower counter number
>>>> being always "even"). The low counter is programmed to count
>>>> the event of interest and the high counter(odd numbered) is
>>>> programmed with a special event code (0x1e - Chain). Thus
>>>> we need special allocation schemes to make the full use of
>>>> available counters. So, we allocate the counters from either
>>>> ends. i.e, chained counters are allocated from the lower
>>>> end in pairs of two and the normal counters are allocated
>>>> from the higher number. Also makes necessary changes to
>>>> handle the chained events as a single event with 2 counters.
>>>>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
>>
>>>> ? /*
>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct 
>>>> perf_event *event,
>>>> ???????????????????????? &armv8_pmuv3_perf_cache_map,
>>>> ???????????????????????? ARMV8_PMU_EVTYPE_EVENT);
>>>> -??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>> +??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>> +??????? /* Prevent chaining for cycle counter */
>>>
>>> Why? Sure, we want to avoid executing the chaining logic if we're 
>>> scheduling a cycles event in the dedicated counter (which is perhaps 
>>> what the comment above wanted to say), but if one ends up allocated 
>>> into a regular counter (e.g. if the user asks for multiple cycle 
>>> counts with different filters), then I don't see any reason to forbid 
>>> that being chained.
>>
>> Ah, I didn't think about that case. I was under the assumption that the
>> cycles are *only* placed on the cycle counter. I will take care of that.
>> Thanks for the review.
> 
> Robin, Mark, Will
> 
> One potential problem I see with allowing chaining of the cycle counter
> *and* the promotion of cycle event to 64bit by default is when the user
> may actually be able to count 1 less event (due to the promotion of
> cycle event to 64bit and thus forcing to use chain, if the cycle counter
> is unavailable).

Right, I didn't mean to imply we should inject the "chain" attr 
automatically for all cycles events, just that we shouldn't be rejecting 
it if the user does explicitly set it (but then just ignore it if using 
the dedicated counter).

> So one option is to drop automatic promotion of the cycle counter to
> 64bit and do it only when it is requested by the user and use either the
> Cycle counter (preferred) or fall back to chaining. That way, the user
> has the control over the number of events he can count using the given
> set of counters.

Naively, there doesn't seem to be any inherent harm in always using 
64-bit arithmetic for the dedicated counter, but it would mean that with 
multiple (non-chained) cycles events, some would be taking an interrupt 
every few seconds while one would effectively never overflow. I guess 
the question is whether that matters or not.

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-21 14:00           ` Robin Murphy
@ 2018-05-21 14:41             ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 14:41 UTC (permalink / raw)
  To: Robin Murphy, linux-arm-kernel; +Cc: linux-kernel, mark.rutland, will.deacon

On 21/05/18 15:00, Robin Murphy wrote:
> On 21/05/18 14:42, Suzuki K Poulose wrote:
>> On 18/05/18 16:57, Suzuki K Poulose wrote:
>>> Hi Robin,
>>>
>>> On 18/05/18 14:49, Robin Murphy wrote:
>>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>>> Add support for chained event counters. PMUv3 allows chaining
>>>>> a pair of adjacent PMU counters (with the lower counter number
>>>>> being always "even"). The low counter is programmed to count
>>>>> the event of interest and the high counter(odd numbered) is
>>>>> programmed with a special event code (0x1e - Chain). Thus
>>>>> we need special allocation schemes to make the full use of
>>>>> available counters. So, we allocate the counters from either
>>>>> ends. i.e, chained counters are allocated from the lower
>>>>> end in pairs of two and the normal counters are allocated
>>>>> from the higher number. Also makes necessary changes to
>>>>> handle the chained events as a single event with 2 counters.
>>>>>
>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>>>
>>>>>   /*
>>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>>>>                          &armv8_pmuv3_perf_cache_map,
>>>>>                          ARMV8_PMU_EVTYPE_EVENT);
>>>>> -    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>>> +    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>>> +        /* Prevent chaining for cycle counter */
>>>>
>>>> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.
>>>
>>> Ah, I didn't think about that case. I was under the assumption that the
>>> cycles are *only* placed on the cycle counter. I will take care of that.
>>> Thanks for the review.
>>
>> Robin, Mark, Will
>>
>> One potential problem I see with allowing chaining of the cycle counter
>> *and* the promotion of cycle event to 64bit by default is when the user
>> may actually be able to count 1 less event (due to the promotion of
>> cycle event to 64bit and thus forcing to use chain, if the cycle counter
>> is unavailable).
> 
> Right, I didn't mean to imply we should inject the "chain" attr automatically for all cycles events, just that we shouldn't be rejecting it if the user does explicitly set it (but then just ignore it if using the dedicated counter).

Right, I was not talking about the automatic "chain" for cycles events. The
problem is we don't know if we would get the "cycle" counter for the given
event until it is "added", at which point we have already decided
whether the event is 32bit or 64bit. So, we cannot really delay the decision
until that. Thats where this comes up. Given a cycle event (without an explicit
chain request), do we treat it as a 64bit event or not ? If we do, we could

1) get the Cycle counter, all is fine.

2) If not, fallback to Chaining. The user looses a counter.

> 
>> So one option is to drop automatic promotion of the cycle counter to
>> 64bit and do it only when it is requested by the user and use either the
>> Cycle counter (preferred) or fall back to chaining. That way, the user
>> has the control over the number of events he can count using the given
>> set of counters.
> 
> Naively, there doesn't seem to be any inherent harm in always using 64-bit arithmetic for the dedicated counter, but it would mean that with multiple (non-chained) cycles events, some would be taking an interrupt every few seconds while one would effectively never overflow. I guess the question is whether that matters or not.
> 

The problem is we can't have a mask per counter as we don't know where
the event would be placed until it is added. Or we should delay the
period initialisation/update to post get_event_idx().

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-21 14:41             ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-21 14:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/05/18 15:00, Robin Murphy wrote:
> On 21/05/18 14:42, Suzuki K Poulose wrote:
>> On 18/05/18 16:57, Suzuki K Poulose wrote:
>>> Hi Robin,
>>>
>>> On 18/05/18 14:49, Robin Murphy wrote:
>>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>>> Add support for chained event counters. PMUv3 allows chaining
>>>>> a pair of adjacent PMU counters (with the lower counter number
>>>>> being always "even"). The low counter is programmed to count
>>>>> the event of interest and the high counter(odd numbered) is
>>>>> programmed with a special event code (0x1e - Chain). Thus
>>>>> we need special allocation schemes to make the full use of
>>>>> available counters. So, we allocate the counters from either
>>>>> ends. i.e, chained counters are allocated from the lower
>>>>> end in pairs of two and the normal counters are allocated
>>>>> from the higher number. Also makes necessary changes to
>>>>> handle the chained events as a single event with 2 counters.
>>>>>
>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>>>
>>>>> ? /*
>>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>>>>> ???????????????????????? &armv8_pmuv3_perf_cache_map,
>>>>> ???????????????????????? ARMV8_PMU_EVTYPE_EVENT);
>>>>> -??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>>> +??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>>> +??????? /* Prevent chaining for cycle counter */
>>>>
>>>> Why? Sure, we want to avoid executing the chaining logic if we're scheduling a cycles event in the dedicated counter (which is perhaps what the comment above wanted to say), but if one ends up allocated into a regular counter (e.g. if the user asks for multiple cycle counts with different filters), then I don't see any reason to forbid that being chained.
>>>
>>> Ah, I didn't think about that case. I was under the assumption that the
>>> cycles are *only* placed on the cycle counter. I will take care of that.
>>> Thanks for the review.
>>
>> Robin, Mark, Will
>>
>> One potential problem I see with allowing chaining of the cycle counter
>> *and* the promotion of cycle event to 64bit by default is when the user
>> may actually be able to count 1 less event (due to the promotion of
>> cycle event to 64bit and thus forcing to use chain, if the cycle counter
>> is unavailable).
> 
> Right, I didn't mean to imply we should inject the "chain" attr automatically for all cycles events, just that we shouldn't be rejecting it if the user does explicitly set it (but then just ignore it if using the dedicated counter).

Right, I was not talking about the automatic "chain" for cycles events. The
problem is we don't know if we would get the "cycle" counter for the given
event until it is "added", at which point we have already decided
whether the event is 32bit or 64bit. So, we cannot really delay the decision
until that. Thats where this comes up. Given a cycle event (without an explicit
chain request), do we treat it as a 64bit event or not ? If we do, we could

1) get the Cycle counter, all is fine.

2) If not, fallback to Chaining. The user looses a counter.

> 
>> So one option is to drop automatic promotion of the cycle counter to
>> 64bit and do it only when it is requested by the user and use either the
>> Cycle counter (preferred) or fall back to chaining. That way, the user
>> has the control over the number of events he can count using the given
>> set of counters.
> 
> Naively, there doesn't seem to be any inherent harm in always using 64-bit arithmetic for the dedicated counter, but it would mean that with multiple (non-chained) cycles events, some would be taking an interrupt every few seconds while one would effectively never overflow. I guess the question is whether that matters or not.
> 

The problem is we can't have a mask per counter as we don't know where
the event would be placed until it is added. Or we should delay the
period initialisation/update to post get_event_idx().

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/6] arm64: perf: Add support for chaining counters
  2018-05-21 14:41             ` Suzuki K Poulose
@ 2018-05-21 15:29               ` Robin Murphy
  -1 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-21 15:29 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon

On 21/05/18 15:41, Suzuki K Poulose wrote:
> On 21/05/18 15:00, Robin Murphy wrote:
>> On 21/05/18 14:42, Suzuki K Poulose wrote:
>>> On 18/05/18 16:57, Suzuki K Poulose wrote:
>>>> Hi Robin,
>>>>
>>>> On 18/05/18 14:49, Robin Murphy wrote:
>>>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>>>> Add support for chained event counters. PMUv3 allows chaining
>>>>>> a pair of adjacent PMU counters (with the lower counter number
>>>>>> being always "even"). The low counter is programmed to count
>>>>>> the event of interest and the high counter(odd numbered) is
>>>>>> programmed with a special event code (0x1e - Chain). Thus
>>>>>> we need special allocation schemes to make the full use of
>>>>>> available counters. So, we allocate the counters from either
>>>>>> ends. i.e, chained counters are allocated from the lower
>>>>>> end in pairs of two and the normal counters are allocated
>>>>>> from the higher number. Also makes necessary changes to
>>>>>> handle the chained events as a single event with 2 counters.
>>>>>>
>>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>>>
>>>>>>   /*
>>>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct 
>>>>>> perf_event *event,
>>>>>>                          &armv8_pmuv3_perf_cache_map,
>>>>>>                          ARMV8_PMU_EVTYPE_EVENT);
>>>>>> -    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>>>> +    if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>>>> +        /* Prevent chaining for cycle counter */
>>>>>
>>>>> Why? Sure, we want to avoid executing the chaining logic if we're 
>>>>> scheduling a cycles event in the dedicated counter (which is 
>>>>> perhaps what the comment above wanted to say), but if one ends up 
>>>>> allocated into a regular counter (e.g. if the user asks for 
>>>>> multiple cycle counts with different filters), then I don't see any 
>>>>> reason to forbid that being chained.
>>>>
>>>> Ah, I didn't think about that case. I was under the assumption that the
>>>> cycles are *only* placed on the cycle counter. I will take care of 
>>>> that.
>>>> Thanks for the review.
>>>
>>> Robin, Mark, Will
>>>
>>> One potential problem I see with allowing chaining of the cycle counter
>>> *and* the promotion of cycle event to 64bit by default is when the user
>>> may actually be able to count 1 less event (due to the promotion of
>>> cycle event to 64bit and thus forcing to use chain, if the cycle counter
>>> is unavailable).
>>
>> Right, I didn't mean to imply we should inject the "chain" attr 
>> automatically for all cycles events, just that we shouldn't be 
>> rejecting it if the user does explicitly set it (but then just ignore 
>> it if using the dedicated counter).
> 
> Right, I was not talking about the automatic "chain" for cycles events. The
> problem is we don't know if we would get the "cycle" counter for the given
> event until it is "added", at which point we have already decided
> whether the event is 32bit or 64bit. So, we cannot really delay the 
> decision
> until that. Thats where this comes up. Given a cycle event (without an 
> explicit
> chain request), do we treat it as a 64bit event or not ? If we do, we could
> 
> 1) get the Cycle counter, all is fine.
> 
> 2) If not, fallback to Chaining. The user looses a counter.

Ah, I think I see where our wires might be getting crossed here - taking 
a second look at patch #4 I see you're inherently associating 
ARMPMU_EVT_LONG with the ARMV8_PMUV3_PERFCTR_CPU_CYCLES event ID. What 
I'm thinking of is that we would only set that flag if and when we've 
allocated the cycle counter or a chained pair of regular counters (i.e. 
in get_event_idx() or later). In other words, it becomes a property of 
the counter(s) backing the event, rather than of the hardware event 
itself, which I think makes logical sense.

>>> So one option is to drop automatic promotion of the cycle counter to
>>> 64bit and do it only when it is requested by the user and use either the
>>> Cycle counter (preferred) or fall back to chaining. That way, the user
>>> has the control over the number of events he can count using the given
>>> set of counters.
>>
>> Naively, there doesn't seem to be any inherent harm in always using 
>> 64-bit arithmetic for the dedicated counter, but it would mean that 
>> with multiple (non-chained) cycles events, some would be taking an 
>> interrupt every few seconds while one would effectively never 
>> overflow. I guess the question is whether that matters or not.
>>
> 
> The problem is we can't have a mask per counter as we don't know where
> the event would be placed until it is added. Or we should delay the
> period initialisation/update to post get_event_idx().

...but does indeed mean that we can't initialise period stuff until we 
know where the event has been placed. I reckon that is a reasonable 
thing to do, but let's see what Mark and Will think.

(I guess there's also an ugly compromise in which we could re-run the 
sample_period setup logic as a special case when allocating the cycle 
counter, but even I don't like the idea of that)

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/6] arm64: perf: Add support for chaining counters
@ 2018-05-21 15:29               ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2018-05-21 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/05/18 15:41, Suzuki K Poulose wrote:
> On 21/05/18 15:00, Robin Murphy wrote:
>> On 21/05/18 14:42, Suzuki K Poulose wrote:
>>> On 18/05/18 16:57, Suzuki K Poulose wrote:
>>>> Hi Robin,
>>>>
>>>> On 18/05/18 14:49, Robin Murphy wrote:
>>>>> On 18/05/18 11:22, Suzuki K Poulose wrote:
>>>>>> Add support for chained event counters. PMUv3 allows chaining
>>>>>> a pair of adjacent PMU counters (with the lower counter number
>>>>>> being always "even"). The low counter is programmed to count
>>>>>> the event of interest and the high counter(odd numbered) is
>>>>>> programmed with a special event code (0x1e - Chain). Thus
>>>>>> we need special allocation schemes to make the full use of
>>>>>> available counters. So, we allocate the counters from either
>>>>>> ends. i.e, chained counters are allocated from the lower
>>>>>> end in pairs of two and the normal counters are allocated
>>>>>> from the higher number. Also makes necessary changes to
>>>>>> handle the chained events as a single event with 2 counters.
>>>>>>
>>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>>>
>>>>>> ? /*
>>>>>> @@ -845,8 +1016,14 @@ static int __armv8_pmuv3_map_event(struct 
>>>>>> perf_event *event,
>>>>>> ???????????????????????? &armv8_pmuv3_perf_cache_map,
>>>>>> ???????????????????????? ARMV8_PMU_EVTYPE_EVENT);
>>>>>> -??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
>>>>>> +??? if (hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
>>>>>> +??????? /* Prevent chaining for cycle counter */
>>>>>
>>>>> Why? Sure, we want to avoid executing the chaining logic if we're 
>>>>> scheduling a cycles event in the dedicated counter (which is 
>>>>> perhaps what the comment above wanted to say), but if one ends up 
>>>>> allocated into a regular counter (e.g. if the user asks for 
>>>>> multiple cycle counts with different filters), then I don't see any 
>>>>> reason to forbid that being chained.
>>>>
>>>> Ah, I didn't think about that case. I was under the assumption that the
>>>> cycles are *only* placed on the cycle counter. I will take care of 
>>>> that.
>>>> Thanks for the review.
>>>
>>> Robin, Mark, Will
>>>
>>> One potential problem I see with allowing chaining of the cycle counter
>>> *and* the promotion of cycle event to 64bit by default is when the user
>>> may actually be able to count 1 less event (due to the promotion of
>>> cycle event to 64bit and thus forcing to use chain, if the cycle counter
>>> is unavailable).
>>
>> Right, I didn't mean to imply we should inject the "chain" attr 
>> automatically for all cycles events, just that we shouldn't be 
>> rejecting it if the user does explicitly set it (but then just ignore 
>> it if using the dedicated counter).
> 
> Right, I was not talking about the automatic "chain" for cycles events. The
> problem is we don't know if we would get the "cycle" counter for the given
> event until it is "added", at which point we have already decided
> whether the event is 32bit or 64bit. So, we cannot really delay the 
> decision
> until that. Thats where this comes up. Given a cycle event (without an 
> explicit
> chain request), do we treat it as a 64bit event or not ? If we do, we could
> 
> 1) get the Cycle counter, all is fine.
> 
> 2) If not, fallback to Chaining. The user looses a counter.

Ah, I think I see where our wires might be getting crossed here - taking 
a second look at patch #4 I see you're inherently associating 
ARMPMU_EVT_LONG with the ARMV8_PMUV3_PERFCTR_CPU_CYCLES event ID. What 
I'm thinking of is that we would only set that flag if and when we've 
allocated the cycle counter or a chained pair of regular counters (i.e. 
in get_event_idx() or later). In other words, it becomes a property of 
the counter(s) backing the event, rather than of the hardware event 
itself, which I think makes logical sense.

>>> So one option is to drop automatic promotion of the cycle counter to
>>> 64bit and do it only when it is requested by the user and use either the
>>> Cycle counter (preferred) or fall back to chaining. That way, the user
>>> has the control over the number of events he can count using the given
>>> set of counters.
>>
>> Naively, there doesn't seem to be any inherent harm in always using 
>> 64-bit arithmetic for the dedicated counter, but it would mean that 
>> with multiple (non-chained) cycles events, some would be taking an 
>> interrupt every few seconds while one would effectively never 
>> overflow. I guess the question is whether that matters or not.
>>
> 
> The problem is we can't have a mask per counter as we don't know where
> the event would be placed until it is added. Or we should delay the
> period initialisation/update to post get_event_idx().

...but does indeed mean that we can't initialise period stuff until we 
know where the event has been placed. I reckon that is a reasonable 
thing to do, but let's see what Mark and Will think.

(I guess there's also an ugly compromise in which we could re-run the 
sample_period setup logic as a special case when allocating the cycle 
counter, but even I don't like the idea of that)

Robin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
  2018-05-18 10:22   ` Suzuki K Poulose
@ 2018-05-21 23:30     ` kbuild test robot
  -1 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-05-21 23:30 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: kbuild-all, linux-arm-kernel, linux-kernel, mark.rutland,
	will.deacon, robin.murphy, Suzuki K Poulose

[-- Attachment #1: Type: text/plain, Size: 3219 bytes --]

Hi Suzuki,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.17-rc6 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Suzuki-K-Poulose/arm64-perf-Support-for-chaining-event-counters/20180521-102117
config: arm-corgi_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/kernel/perf_event_xscale.c: In function 'xscale2pmu_init':
>> arch/arm/kernel/perf_event_xscale.c:741:24: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     cpu_pmu->read_counter = xscale2pmu_read_counter;
                           ^
   arch/arm/kernel/perf_event_xscale.c:742:25: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     cpu_pmu->write_counter = xscale2pmu_write_counter;
                            ^
   cc1: some warnings being treated as errors

vim +741 arch/arm/kernel/perf_event_xscale.c

43eab878 Will Deacon        2010-11-13  734  
351a102d Greg Kroah-Hartman 2012-12-21  735  static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
43eab878 Will Deacon        2010-11-13  736  {
3d1ff755 Mark Rutland       2012-12-19  737  	cpu_pmu->name		= "armv5_xscale2";
513c99ce Sudeep Holla       2012-07-31  738  	cpu_pmu->handle_irq	= xscale2pmu_handle_irq;
513c99ce Sudeep Holla       2012-07-31  739  	cpu_pmu->enable		= xscale2pmu_enable_event;
513c99ce Sudeep Holla       2012-07-31  740  	cpu_pmu->disable	= xscale2pmu_disable_event;
513c99ce Sudeep Holla       2012-07-31 @741  	cpu_pmu->read_counter	= xscale2pmu_read_counter;
513c99ce Sudeep Holla       2012-07-31  742  	cpu_pmu->write_counter	= xscale2pmu_write_counter;
513c99ce Sudeep Holla       2012-07-31  743  	cpu_pmu->get_event_idx	= xscale2pmu_get_event_idx;
513c99ce Sudeep Holla       2012-07-31  744  	cpu_pmu->start		= xscale2pmu_start;
513c99ce Sudeep Holla       2012-07-31  745  	cpu_pmu->stop		= xscale2pmu_stop;
513c99ce Sudeep Holla       2012-07-31  746  	cpu_pmu->map_event	= xscale_map_event;
513c99ce Sudeep Holla       2012-07-31  747  	cpu_pmu->num_events	= 5;
547faa3c Suzuki K Poulose   2018-05-18  748  	cpu_pmu->counter_width	= 32;
513c99ce Sudeep Holla       2012-07-31  749  
513c99ce Sudeep Holla       2012-07-31  750  	return 0;
43eab878 Will Deacon        2010-11-13  751  }
a12c72cc Mark Rutland       2015-05-26  752  

:::::: The code at line 741 was first introduced by commit
:::::: 513c99ce4e64245be1f83f56039ec4891b451955 ARM: perf: allocate CPU PMU dynamically at probe time

:::::: TO: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
:::::: CC: Will Deacon <will.deacon@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18856 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
@ 2018-05-21 23:30     ` kbuild test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-05-21 23:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Suzuki,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.17-rc6 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Suzuki-K-Poulose/arm64-perf-Support-for-chaining-event-counters/20180521-102117
config: arm-corgi_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/kernel/perf_event_xscale.c: In function 'xscale2pmu_init':
>> arch/arm/kernel/perf_event_xscale.c:741:24: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     cpu_pmu->read_counter = xscale2pmu_read_counter;
                           ^
   arch/arm/kernel/perf_event_xscale.c:742:25: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     cpu_pmu->write_counter = xscale2pmu_write_counter;
                            ^
   cc1: some warnings being treated as errors

vim +741 arch/arm/kernel/perf_event_xscale.c

43eab878 Will Deacon        2010-11-13  734  
351a102d Greg Kroah-Hartman 2012-12-21  735  static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
43eab878 Will Deacon        2010-11-13  736  {
3d1ff755 Mark Rutland       2012-12-19  737  	cpu_pmu->name		= "armv5_xscale2";
513c99ce Sudeep Holla       2012-07-31  738  	cpu_pmu->handle_irq	= xscale2pmu_handle_irq;
513c99ce Sudeep Holla       2012-07-31  739  	cpu_pmu->enable		= xscale2pmu_enable_event;
513c99ce Sudeep Holla       2012-07-31  740  	cpu_pmu->disable	= xscale2pmu_disable_event;
513c99ce Sudeep Holla       2012-07-31 @741  	cpu_pmu->read_counter	= xscale2pmu_read_counter;
513c99ce Sudeep Holla       2012-07-31  742  	cpu_pmu->write_counter	= xscale2pmu_write_counter;
513c99ce Sudeep Holla       2012-07-31  743  	cpu_pmu->get_event_idx	= xscale2pmu_get_event_idx;
513c99ce Sudeep Holla       2012-07-31  744  	cpu_pmu->start		= xscale2pmu_start;
513c99ce Sudeep Holla       2012-07-31  745  	cpu_pmu->stop		= xscale2pmu_stop;
513c99ce Sudeep Holla       2012-07-31  746  	cpu_pmu->map_event	= xscale_map_event;
513c99ce Sudeep Holla       2012-07-31  747  	cpu_pmu->num_events	= 5;
547faa3c Suzuki K Poulose   2018-05-18  748  	cpu_pmu->counter_width	= 32;
513c99ce Sudeep Holla       2012-07-31  749  
513c99ce Sudeep Holla       2012-07-31  750  	return 0;
43eab878 Will Deacon        2010-11-13  751  }
a12c72cc Mark Rutland       2015-05-26  752  

:::::: The code at line 741 was first introduced by commit
:::::: 513c99ce4e64245be1f83f56039ec4891b451955 ARM: perf: allocate CPU PMU dynamically at probe time

:::::: TO: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
:::::: CC: Will Deacon <will.deacon@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 18856 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180522/fdf9dc7a/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
  2018-05-21 23:30     ` kbuild test robot
@ 2018-05-22  9:42       ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-22  9:42 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-arm-kernel, linux-kernel, mark.rutland,
	will.deacon, robin.murphy

On 22/05/18 00:30, kbuild test robot wrote:
> Hi Suzuki,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on tip/perf/core]
> [also build test ERROR on v4.17-rc6 next-20180517]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Suzuki-K-Poulose/arm64-perf-Support-for-chaining-event-counters/20180521-102117
> config: arm-corgi_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=arm
> 
> All errors (new ones prefixed by >>):
> 
>     arch/arm/kernel/perf_event_xscale.c: In function 'xscale2pmu_init':
>>> arch/arm/kernel/perf_event_xscale.c:741:24: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
>       cpu_pmu->read_counter = xscale2pmu_read_counter;
>                             ^
>     arch/arm/kernel/perf_event_xscale.c:742:25: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
>       cpu_pmu->write_counter = xscale2pmu_write_counter;
>                              ^
>     cc1: some warnings being treated as errors


Thanks for the report, I have fixed this for the next version.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/6] arm_pmu: Change API to support 64bit counter values
@ 2018-05-22  9:42       ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-22  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/05/18 00:30, kbuild test robot wrote:
> Hi Suzuki,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on tip/perf/core]
> [also build test ERROR on v4.17-rc6 next-20180517]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Suzuki-K-Poulose/arm64-perf-Support-for-chaining-event-counters/20180521-102117
> config: arm-corgi_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=arm
> 
> All errors (new ones prefixed by >>):
> 
>     arch/arm/kernel/perf_event_xscale.c: In function 'xscale2pmu_init':
>>> arch/arm/kernel/perf_event_xscale.c:741:24: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
>       cpu_pmu->read_counter = xscale2pmu_read_counter;
>                             ^
>     arch/arm/kernel/perf_event_xscale.c:742:25: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
>       cpu_pmu->write_counter = xscale2pmu_write_counter;
>                              ^
>     cc1: some warnings being treated as errors


Thanks for the report, I have fixed this for the next version.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-05-22  9:42 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 10:22 [PATCH 0/6] arm64: perf: Support for chaining event counters Suzuki K Poulose
2018-05-18 10:22 ` Suzuki K Poulose
2018-05-18 10:22 ` [PATCH 1/6] arm_pmu: Refactor maximum period handling Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-18 13:10   ` Robin Murphy
2018-05-18 13:10     ` Robin Murphy
2018-05-18 10:22 ` [PATCH 2/6] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-21 23:30   ` kbuild test robot
2018-05-21 23:30     ` kbuild test robot
2018-05-22  9:42     ` Suzuki K Poulose
2018-05-22  9:42       ` Suzuki K Poulose
2018-05-18 10:22 ` [PATCH 3/6] arm_pmu: Add support for long event counters Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-18 13:22   ` Robin Murphy
2018-05-18 13:22     ` Robin Murphy
2018-05-18 10:22 ` [PATCH 4/6] arm64: perf: Make the cycle counter 64bit by default Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-18 10:22 ` [PATCH 5/6] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-18 10:22 ` [PATCH 6/6] arm64: perf: Add support for chaining counters Suzuki K Poulose
2018-05-18 10:22   ` Suzuki K Poulose
2018-05-18 13:49   ` Robin Murphy
2018-05-18 13:49     ` Robin Murphy
2018-05-18 15:57     ` Suzuki K Poulose
2018-05-18 15:57       ` Suzuki K Poulose
2018-05-21 13:42       ` Suzuki K Poulose
2018-05-21 13:42         ` Suzuki K Poulose
2018-05-21 14:00         ` Robin Murphy
2018-05-21 14:00           ` Robin Murphy
2018-05-21 14:41           ` Suzuki K Poulose
2018-05-21 14:41             ` Suzuki K Poulose
2018-05-21 15:29             ` Robin Murphy
2018-05-21 15:29               ` Robin Murphy
2018-05-18 14:57   ` Robin Murphy
2018-05-18 14:57     ` Robin Murphy
2018-05-21 10:49     ` Suzuki K Poulose
2018-05-21 10:49       ` Suzuki K Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.