linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] arm64: perf: Support for chained counters
@ 2018-05-29 10:55 Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 1/5] arm_pmu: Clean up maximum period handling Suzuki K Poulose
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

This series adds support for counting PMU events using 64bit counters
for arm64 PMU.

The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(low even and hig odd counters) to count a given "event" in 64bit mode.
This series adds the support for 64bit events in the core arm_pmu driver
infrastructure and adds the support for armv8 64bit kernel PMU to use
chained counters to count in 64bit mode. For CPU cycles, we use the cycle
counter in 64bit mode, when requested. If the cycle counter is not available,
we fall back to chaining the counters.

Tested on Juno, Fast models. Applies on 4.17-rc4

Change since v1:
 - Remove unnecessary isb()s in chain counter reads/writes
 - Fix event programming order for counters
 - Tighten chain counter event read sequence
 - Set chain event to count in all ELs
 - Cleanup helpers to be consistent
 - Fix build break on xcale PMU (reported by kbuild-robot)
 - Remove the explicit counter width field from pmu backends and default
   to 32bit.
 - Rename flag ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT and
   the format string "chain" => "bits64". (Unfortunately we can't use "64bit"
   and I am open for suggestion on a better name)
 - Rename armpmu_get_event_max_period() => armpmu_event_max_period()
 - For 64bit CPU cycles events, allow chaining if cycle counter is
   not available.

Suzuki K Poulose (5):
  arm_pmu: Clean up maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for 64bit event counters
  arm_pmu: Tidy up clear_event_idx call backs
  arm64: perf: Add support for chaining event counters

 arch/arm/kernel/perf_event_v6.c     |   6 +-
 arch/arm/kernel/perf_event_v7.c     |   7 +-
 arch/arm/kernel/perf_event_xscale.c |  10 +-
 arch/arm64/kernel/perf_event.c      | 234 +++++++++++++++++++++++++++++++-----
 drivers/perf/arm_pmu.c              |  41 +++++--
 include/linux/perf/arm_pmu.h        |  11 +-
 6 files changed, 254 insertions(+), 55 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] arm_pmu: Clean up maximum period handling
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
@ 2018-05-29 10:55 ` Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 2/5] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.

No functional changes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Remove max_period field.
---
 arch/arm/kernel/perf_event_v6.c     |  2 --
 arch/arm/kernel/perf_event_v7.c     |  1 -
 arch/arm/kernel/perf_event_xscale.c |  2 --
 arch/arm64/kernel/perf_event.c      |  1 -
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  1 -
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 1d7061a..89d6f35 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -497,7 +497,6 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -548,7 +547,6 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 870b66c..18c038e 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1171,7 +1171,6 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index fcf218d..8ba0e14 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -375,7 +375,6 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
@@ -745,7 +744,6 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 85a251b..2c42f88 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -961,7 +961,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..8962d26 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(void)
+{
+	return (1ULL << 32) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period();
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period();
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period() >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 40036a5..2ee62bc 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,6 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] arm_pmu: Change API to support 64bit counter values
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 1/5] arm_pmu: Clean up maximum period handling Suzuki K Poulose
@ 2018-05-29 10:55 ` Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Fix build break on xcale (Thanks kbuild-robot)
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 8 ++++----
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 89d6f35..24eb94f 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 18c038e..5f342fc 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 8ba0e14..b9b9376 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -317,7 +317,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -338,7 +338,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -680,7 +680,7 @@ static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale2pmu_read_counter(struct perf_event *event)
+static inline u64 xscale2pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -707,7 +707,7 @@ static inline u32 xscale2pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale2pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale2pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 2c42f88..74d30d9 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 2ee62bc..f1cfe12 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 1/5] arm_pmu: Clean up maximum period handling Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 2/5] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
@ 2018-05-29 10:55 ` Suzuki K Poulose
  2018-06-06 16:48   ` Mark Rutland
  2018-05-29 10:55 ` [PATCH v2 4/5] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

Each PMU has a set of 32bit event counters. But in some
special cases, the events could be counted using counters
which are effectively 64bit wide.

e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cycles. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses 64bit counters.
This only affects the periods set for each counter in the core
driver.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Rename ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT
---
 drivers/perf/arm_pmu.c       | 14 ++++++++------
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8962d26..ff858e6 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,9 +28,10 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
-static inline u64 arm_pmu_max_period(void)
+static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
-	return (1ULL << 32) - 1;
+	return (event->hw.flags & ARMPMU_EVT_64BIT) ?
+		~0ULL : (1ULL << 32) - 1;
 }
 
 static int
@@ -122,7 +123,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period();
+	max_period = arm_pmu_event_max_period(event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +149,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +161,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period();
+	u64 max_period = arm_pmu_event_max_period(event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +369,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -410,7 +412,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = arm_pmu_max_period() >> 1;
+		hwc->sample_period  = arm_pmu_event_max_period(event) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f1cfe12..1e26b14 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a 64bit counter */
+#define ARMPMU_EVT_64BIT		1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] arm_pmu: Tidy up clear_event_idx call backs
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2018-05-29 10:55 ` [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-05-29 10:55 ` Suzuki K Poulose
  2018-05-29 10:55 ` [PATCH v2 5/5] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  2018-06-05 15:00 ` [PATCH v2 0/5] arm64: perf: Support for chained counters Julien Thierry
  5 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. Tidy this up by relying on the clear_event_idx
to do the book keeping, if available. Otherwise, let the core
driver do the default "clear" bit operation. This will be useful
for adding the chained event support, where we leave the event
idx maintenance to the backend.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kernel/perf_event_v7.c |  2 ++
 drivers/perf/arm_pmu.c          | 15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 5f342fc..5e78faa 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1638,6 +1638,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1967,6 +1968,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index ff858e6..9ae7e68 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -226,6 +226,16 @@ static void armpmu_start(struct perf_event *event, int flags)
 	armpmu->enable(event);
 }
 
+static void armpmu_clear_event_idx(struct arm_pmu *armpmu,
+				   struct pmu_hw_events *hw_events,
+				   struct perf_event *event)
+{
+	if (armpmu->clear_event_idx)
+		armpmu->clear_event_idx(hw_events, event);
+	else
+		clear_bit(event->hw.idx, hw_events->used_mask);
+}
+
 static void
 armpmu_del(struct perf_event *event, int flags)
 {
@@ -236,10 +246,7 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu_clear_event_idx(armpmu, hw_events, event);
 	perf_event_update_userpage(event);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2018-05-29 10:55 ` [PATCH v2 4/5] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-05-29 10:55 ` Suzuki K Poulose
  2018-06-06 18:01   ` Mark Rutland
  2018-06-05 15:00 ` [PATCH v2 0/5] arm64: perf: Support for chained counters Julien Thierry
  5 siblings, 1 reply; 16+ messages in thread
From: Suzuki K Poulose @ 2018-05-29 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy, Suzuki K Poulose

Add support for 64bit event by using chained event counters
and 64bit cycle counters.

Arm v8 PMUv3 allows chaining a pair of adjacent PMU counters
(with the lower counter number being always "even"). The low
counter is programmed to count the event of interest and the
high counter(odd numbered) is programmed with a special event
code (0x1e - Chain). Thus we need special allocation schemes
to make the full use of available counters. So, we allocate the
counters from either ends. i.e, chained counters are allocated
from the lower end in pairs of two and the normal counters are
allocated from the higher number. Also makes necessary changes to
handle the chained events as a single event with 2 counters.

For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V1:
 - Remove unnecessary isb()s
 - Fix event programming order for counters
 - Tighten chain coutner event read sequence
 - Set chain event to count in all ELs
 - Cleanup helpers to be consistent
 - Rename format to chain => bits64
 - Fix a counter allocation for single counters
 - Allow chaining on CPU cycle counters and do not promote
   the events to 64bit by default.
---
 arch/arm64/kernel/perf_event.c | 226 ++++++++++++++++++++++++++++++++++++-----
 drivers/perf/arm_pmu.c         |   6 ++
 2 files changed, 207 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 74d30d9..4f193e0 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(bits64, "config1:0");
+
+static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_bits64.attr,
 	NULL,
 };
 
@@ -466,6 +473,16 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
 /*
+ * Use chained counter for a 64bit event, if we could not allocate
+ * the 64bit cycle counter.
+ */
+static inline bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	return armv8pmu_event_is_64bit(event) &&
+	       (event->hw.idx != ARMV8_IDX_CYCLE_COUNTER);
+}
+
+/*
  * ARMv8 low level PMU access
  */
 
@@ -512,24 +529,78 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	return (armv8pmu_select_counter(idx) == idx) ?
+	       read_sysreg(pmxevcntr_el0) : 0;
+}
+
+static inline u64 armv8pmu_read_chain_counter(int idx)
+{
+	u64 prev_hi, hi, lo;
+
+	hi = armv8pmu_read_evcntr(idx);
+	do {
+		prev_hi = hi;
+		isb();
+		lo = armv8pmu_read_evcntr(idx - 1);
+		isb();
+		hi = armv8pmu_read_evcntr(idx);
+	} while (prev_hi != hi);
+
+	return (hi << 32) | lo;
+}
+
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return armv8pmu_event_is_chained(event) ?
+	       armv8pmu_read_chain_counter(idx) : armv8pmu_read_evcntr(idx);
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	if (armv8pmu_select_counter(idx) == idx)
+		write_sysreg(value, pmxevcntr_el0);
+}
+
+static inline void armv8pmu_write_chain_counter(int idx, u64 value)
+{
+	armv8pmu_write_evcntr(idx, value >> 32);
+	isb();
+	armv8pmu_write_evcntr(idx - 1, value);
+}
+
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_write_chain_counter(idx, value);
+	else
+		armv8pmu_write_evcntr(idx, value);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -541,14 +612,14 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
 		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
-		 * count using the lower 32bits and we want an interrupt when
-		 * it overflows.
+		 * Set the upper 32bits if we are counting this in
+		 * 32bit mode, as this is a 64bit counter.
 		 */
-		value |= 0xffffffff00000000ULL;
+		if (!armv8pmu_event_is_64bit(event))
+			value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	} else
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -559,6 +630,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	}
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, write the the low counter event type
+	 * followed by the high counter. The high counter is programmed
+	 * with CHAIN event code with filters set to count at all ELs.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
+				ARMV8_PMU_INCLUDE_EL2;
+
+		armv8pmu_write_evtype(idx - 1, hwc->config_base);
+		isb();
+		armv8pmu_write_evtype(idx, chain_evt);
+	} else
+		armv8pmu_write_evtype(idx, hwc->config_base);
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -566,6 +658,21 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	/*
+	 * For chained events, we enable the high counter followed by
+	 * the low counter.
+	 */
+	armv8pmu_enable_counter(idx);
+	if (armv8pmu_event_is_chained(event)) {
+		isb();
+		armv8pmu_enable_counter(idx - 1);
+	}
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -573,6 +680,23 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * Disable the low counter followed by the high counter
+	 * for chained events.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_disable_counter(idx - 1);
+		isb();
+	}
+
+	armv8pmu_disable_counter(idx);
+}
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -580,6 +704,12 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	/* For chained events, enable the interrupt for only the high counter */
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -592,6 +722,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -609,10 +744,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -623,22 +756,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -646,10 +779,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -659,12 +790,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Disable interrupt for this counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -753,6 +884,39 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = cpu_pmu->num_events - 1; idx >= ARMV8_IDX_COUNTER0; --idx)
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even. We allocate chain events
+	 * from the lower index (i.e, counter0) and the single events
+	 * from the higher end to maximise the utilisation.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
@@ -770,13 +934,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
+	idx = armv8pmu_event_is_chained(event) ?
+		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
+		armv8pmu_get_single_idx(cpuc, cpu_pmu);
 
-	/* The counters are all in use. */
-	return -EAGAIN;
+	return idx;
+}
+
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	clear_bit(idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -851,6 +1023,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (armv8pmu_event_is_64bit(event))
+		event->hw.flags |= ARMPMU_EVT_64BIT;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
@@ -957,6 +1132,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 9ae7e68..b55382e 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -679,6 +679,12 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 			continue;
 
 		event = hw_events->events[idx];
+		/*
+		 * If there is no event at this idx (e.g, an idx used
+		 * by a chained event in Arm v8 PMUv3), skip it.
+		 */
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/5] arm64: perf: Support for chained counters
  2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2018-05-29 10:55 ` [PATCH v2 5/5] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-06-05 15:00 ` Julien Thierry
  5 siblings, 0 replies; 16+ messages in thread
From: Julien Thierry @ 2018-06-05 15:00 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, robin.murphy



On 29/05/18 11:55, Suzuki K Poulose wrote:
> This series adds support for counting PMU events using 64bit counters
> for arm64 PMU.
> 
> The Arm v8 PMUv3 supports combining two adjacent 32bit counters
> (low even and hig odd counters) to count a given "event" in 64bit mode.
> This series adds the support for 64bit events in the core arm_pmu driver
> infrastructure and adds the support for armv8 64bit kernel PMU to use
> chained counters to count in 64bit mode. For CPU cycles, we use the cycle
> counter in 64bit mode, when requested. If the cycle counter is not available,
> we fall back to chaining the counters.
> 
> Tested on Juno, Fast models. Applies on 4.17-rc4
> 

For the series:

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters
  2018-05-29 10:55 ` [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-06-06 16:48   ` Mark Rutland
  2018-06-07  7:34     ` Suzuki K Poulose
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2018-06-06 16:48 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On Tue, May 29, 2018 at 11:55:54AM +0100, Suzuki K Poulose wrote:
> Each PMU has a set of 32bit event counters. But in some
> special cases, the events could be counted using counters
> which are effectively 64bit wide.
> 
> e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
> only the CPU cycles. Also, the PMU can chain the event counters
> to effectively count as a 64bit counter.
> 
> Add support for tracking the events that uses 64bit counters.
> This only affects the periods set for each counter in the core
> driver.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>  - Rename ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT
> ---
>  drivers/perf/arm_pmu.c       | 14 ++++++++------
>  include/linux/perf/arm_pmu.h |  6 ++++++
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8962d26..ff858e6 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -28,9 +28,10 @@
>  static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>  static DEFINE_PER_CPU(int, cpu_irq);
>  
> -static inline u64 arm_pmu_max_period(void)
> +static inline u64 arm_pmu_event_max_period(struct perf_event *event)
>  {
> -	return (1ULL << 32) - 1;
> +	return (event->hw.flags & ARMPMU_EVT_64BIT) ?
> +		~0ULL : (1ULL << 32) - 1;
>  }

Could we please have:

static inline u64 arm_pmu_event_max_period(struct perf_event *event)
{
	if (event->hw.flags & ARMPMU_EVT_64BIT)
		return GENMASK_ULL(63, 0);
	else
		return GENMASK_ULL(31, 0);
}

... since that's obviously balanced, with both values generated in the
same way.

Otherwise this looks good to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-05-29 10:55 ` [PATCH v2 5/5] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-06-06 18:01   ` Mark Rutland
  2018-06-08 14:46     ` Suzuki K Poulose
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2018-06-06 18:01 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On Tue, May 29, 2018 at 11:55:56AM +0100, Suzuki K Poulose wrote:
> Add support for 64bit event by using chained event counters
> and 64bit cycle counters.
> 
> Arm v8 PMUv3 allows chaining a pair of adjacent PMU counters
> (with the lower counter number being always "even"). The low
> counter is programmed to count the event of interest and the
> high counter(odd numbered) is programmed with a special event
> code (0x1e - Chain).

I found this a little difficult to read. How about:

PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.

> Thus we need special allocation schemes
> to make the full use of available counters. So, we allocate the
> counters from either ends. i.e, chained counters are allocated
> from the lower end in pairs of two and the normal counters are
> allocated from the higher number. Also makes necessary changes to
> handle the chained events as a single event with 2 counters.

Why do we need to allocate in this way? 

I can see this might make allocating a pair of counters more likely in
some cases, but there are still others where it wouldn't be possible
(and we'd rely on the rotation logic to repack the counters for us).

[...]

> +static inline u32 armv8pmu_read_evcntr(int idx)
> +{
> +	return (armv8pmu_select_counter(idx) == idx) ?
> +	       read_sysreg(pmxevcntr_el0) : 0;
> +}

Given armv8pmu_select_counter() always returns the idx passed to it, I
don't think we need to check anything here. We can clean that up to be
void, and remove the existing checks.

[...]

> +static inline u64 armv8pmu_read_chain_counter(int idx)
> +{
> +	u64 prev_hi, hi, lo;
> +
> +	hi = armv8pmu_read_evcntr(idx);
> +	do {
> +		prev_hi = hi;
> +		isb();
> +		lo = armv8pmu_read_evcntr(idx - 1);
> +		isb();
> +		hi = armv8pmu_read_evcntr(idx);
> +	} while (prev_hi != hi);
> +
> +	return (hi << 32) | lo;
> +}

> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
> +{
> +	armv8pmu_write_evcntr(idx, value >> 32);
> +	isb();
> +	armv8pmu_write_evcntr(idx - 1, value);
> +}

Can we use upper_32_bits() and lower_32_bits() here?

As a more general thing, I think we need to clean up the way we
read/write counters, because I don't think that it's right that we poke
them while they're running -- that means you get some arbitrary skew on
counter groups.

It looks like the only case we do that is the IRQ handler, so we should
be able to stop/start the PMU there.

With that, we can get rid of the ISB here, and likewise in
read_chain_counter, which wouldn't need to be a loop.

> +
> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
> +					     u64 value)
> +{
> +	int idx = event->hw.idx;
> +
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_write_chain_counter(idx, value);
> +	else
> +		armv8pmu_write_evcntr(idx, value);
> +}
> +
>  static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -541,14 +612,14 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  			smp_processor_id(), idx);
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
>  		/*
> -		 * Set the upper 32bits as this is a 64bit counter but we only
> -		 * count using the lower 32bits and we want an interrupt when
> -		 * it overflows.
> +		 * Set the upper 32bits if we are counting this in
> +		 * 32bit mode, as this is a 64bit counter.
>  		 */

It would be good to keep the explaination as to why.

> -		value |= 0xffffffff00000000ULL;
> +		if (!armv8pmu_event_is_64bit(event))
> +			value |= 0xffffffff00000000ULL;
>  		write_sysreg(value, pmccntr_el0);
> -	} else if (armv8pmu_select_counter(idx) == idx)
> -		write_sysreg(value, pmxevcntr_el0);
> +	} else
> +		armv8pmu_write_hw_counter(event, value);
>  }

> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, write the the low counter event type
> +	 * followed by the high counter. The high counter is programmed
> +	 * with CHAIN event code with filters set to count at all ELs.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
> +				ARMV8_PMU_INCLUDE_EL2;
> +
> +		armv8pmu_write_evtype(idx - 1, hwc->config_base);
> +		isb();
> +		armv8pmu_write_evtype(idx, chain_evt);

The ISB isn't necessary here, AFAICT. We only do this while the PMU is
disabled; no?

> +	} else
> +		armv8pmu_write_evtype(idx, hwc->config_base);
> +}

[...]

> +static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	/*
> +	 * For chained events, we enable the high counter followed by
> +	 * the low counter.
> +	 */
> +	armv8pmu_enable_counter(idx);
> +	if (armv8pmu_event_is_chained(event)) {
> +		isb();
> +		armv8pmu_enable_counter(idx - 1);
> +	}
> +}

If we fix up the IRQ handler, this would only happen with the PMU as a
whole disabled, and we wouldn't care about ordering of enable/disable of
each event.

[...]

> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * Disable the low counter followed by the high counter
> +	 * for chained events.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		armv8pmu_disable_counter(idx - 1);
> +		isb();
> +	}
> +
> +	armv8pmu_disable_counter(idx);
> +}

... likewise.

> @@ -679,6 +679,12 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>  			continue;
>  
>  		event = hw_events->events[idx];
> +		/*
> +		 * If there is no event at this idx (e.g, an idx used
> +		 * by a chained event in Arm v8 PMUv3), skip it.
> +		 */
> +		if (!event)
> +			continue;

We may as well lose the used_mask test if we're looking at the event
regardless.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters
  2018-06-06 16:48   ` Mark Rutland
@ 2018-06-07  7:34     ` Suzuki K Poulose
  0 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-06-07  7:34 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On 06/06/2018 05:48 PM, Mark Rutland wrote:
> On Tue, May 29, 2018 at 11:55:54AM +0100, Suzuki K Poulose wrote:
>> Each PMU has a set of 32bit event counters. But in some
>> special cases, the events could be counted using counters
>> which are effectively 64bit wide.
>>
>> e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
>> only the CPU cycles. Also, the PMU can chain the event counters
>> to effectively count as a 64bit counter.
>>
>> Add support for tracking the events that uses 64bit counters.
>> This only affects the periods set for each counter in the core
>> driver.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v1:
>>   - Rename ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT
>> ---
>>   drivers/perf/arm_pmu.c       | 14 ++++++++------
>>   include/linux/perf/arm_pmu.h |  6 ++++++
>>   2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 8962d26..ff858e6 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -28,9 +28,10 @@
>>   static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>>   static DEFINE_PER_CPU(int, cpu_irq);
>>   
>> -static inline u64 arm_pmu_max_period(void)
>> +static inline u64 arm_pmu_event_max_period(struct perf_event *event)
>>   {
>> -	return (1ULL << 32) - 1;
>> +	return (event->hw.flags & ARMPMU_EVT_64BIT) ?
>> +		~0ULL : (1ULL << 32) - 1;
>>   }
> 
> Could we please have:
> 
> static inline u64 arm_pmu_event_max_period(struct perf_event *event)
> {
> 	if (event->hw.flags & ARMPMU_EVT_64BIT)
> 		return GENMASK_ULL(63, 0);
> 	else
> 		return GENMASK_ULL(31, 0);
> }
> 
> ... since that's obviously balanced, with both values generated in the
> same way.
> 

Sure, will do

Suzuki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-06 18:01   ` Mark Rutland
@ 2018-06-08 14:46     ` Suzuki K Poulose
  2018-06-08 15:24       ` Mark Rutland
  2018-06-11 13:54       ` Suzuki K Poulose
  0 siblings, 2 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-06-08 14:46 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

Hi Mark,

On 06/06/2018 07:01 PM, Mark Rutland wrote:
> On Tue, May 29, 2018 at 11:55:56AM +0100, Suzuki K Poulose wrote:
>> Add support for 64bit event by using chained event counters
>> and 64bit cycle counters.
>>
>> Arm v8 PMUv3 allows chaining a pair of adjacent PMU counters
>> (with the lower counter number being always "even"). The low
>> counter is programmed to count the event of interest and the
>> high counter(odd numbered) is programmed with a special event
>> code (0x1e - Chain).
> 
> I found this a little difficult to read. How about:
> 
> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
> forming a 64-bit counter. The low/even counter is programmed to count
> the event of interest, and the high/odd counter is programmed to count
> the CHAIN event, taken when the low/even counter overflows.
> 

Sure, that looks better.

>> Thus we need special allocation schemes
>> to make the full use of available counters. So, we allocate the
>> counters from either ends. i.e, chained counters are allocated
>> from the lower end in pairs of two and the normal counters are
>> allocated from the higher number. Also makes necessary changes to
>> handle the chained events as a single event with 2 counters.
> 
> Why do we need to allocate in this way?
> 
> I can see this might make allocating a pair of counters more likely in
> some cases, but there are still others where it wouldn't be possible
> (and we'd rely on the rotation logic to repack the counters for us).

It makes the efficient use of the counters in all cases and allows
counting maximum number of events with any given set, keeping the 
precedence on the order of their "inserts".
e.g, if the number of counters happened to be "odd" (not sure if it is 
even possible).

> 
> [...]
> 
>> +static inline u32 armv8pmu_read_evcntr(int idx)
>> +{
>> +	return (armv8pmu_select_counter(idx) == idx) ?
>> +	       read_sysreg(pmxevcntr_el0) : 0;
>> +}
> 
> Given armv8pmu_select_counter() always returns the idx passed to it, I
> don't think we need to check anything here. We can clean that up to be
> void, and remove the existing checks.
> 
> [...]

OK.

> 
>> +static inline u64 armv8pmu_read_chain_counter(int idx)
>> +{
>> +	u64 prev_hi, hi, lo;
>> +
>> +	hi = armv8pmu_read_evcntr(idx);
>> +	do {
>> +		prev_hi = hi;
>> +		isb();
>> +		lo = armv8pmu_read_evcntr(idx - 1);
>> +		isb();
>> +		hi = armv8pmu_read_evcntr(idx);
>> +	} while (prev_hi != hi);
>> +
>> +	return (hi << 32) | lo;
>> +}
> 
>> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
>> +{
>> +	armv8pmu_write_evcntr(idx, value >> 32);
>> +	isb();
>> +	armv8pmu_write_evcntr(idx - 1, value);
>> +}
> 
> Can we use upper_32_bits() and lower_32_bits() here?
> 
> As a more general thing, I think we need to clean up the way we
> read/write counters, because I don't think that it's right that we poke
> them while they're running -- that means you get some arbitrary skew on
> counter groups.
> 
> It looks like the only case we do that is the IRQ handler, so we should
> be able to stop/start the PMU there.

Since we don't stop the "counting" of events usually when an IRQ is
triggered, the skew will be finally balanced when the events are stopped
in a the group. So, I think, stopping the PMU may not be really a good
thing after all. Just my thought.

> 
> With that, we can get rid of the ISB here, and likewise in
> read_chain_counter, which wouldn't need to be a loop.
> 
>> +
>> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
>> +					     u64 value)
>> +{
>> +	int idx = event->hw.idx;
>> +
>> +	if (armv8pmu_event_is_chained(event))
>> +		armv8pmu_write_chain_counter(idx, value);
>> +	else
>> +		armv8pmu_write_evcntr(idx, value);
>> +}
>> +
>>   static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>   {
>>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>> @@ -541,14 +612,14 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>   			smp_processor_id(), idx);
>>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
>>   		/*
>> -		 * Set the upper 32bits as this is a 64bit counter but we only
>> -		 * count using the lower 32bits and we want an interrupt when
>> -		 * it overflows.
>> +		 * Set the upper 32bits if we are counting this in
>> +		 * 32bit mode, as this is a 64bit counter.
>>   		 */
> 
> It would be good to keep the explaination as to why.
> 

Sure

>> -		value |= 0xffffffff00000000ULL;
>> +		if (!armv8pmu_event_is_64bit(event))
>> +			value |= 0xffffffff00000000ULL;
>>   		write_sysreg(value, pmccntr_el0);
>> -	} else if (armv8pmu_select_counter(idx) == idx)
>> -		write_sysreg(value, pmxevcntr_el0);
>> +	} else
>> +		armv8pmu_write_hw_counter(event, value);
>>   }
> 
>> +static inline void armv8pmu_write_event_type(struct perf_event *event)
>> +{
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	int idx = hwc->idx;
>> +
>> +	/*
>> +	 * For chained events, write the the low counter event type
>> +	 * followed by the high counter. The high counter is programmed
>> +	 * with CHAIN event code with filters set to count at all ELs.
>> +	 */
>> +	if (armv8pmu_event_is_chained(event)) {
>> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
>> +				ARMV8_PMU_INCLUDE_EL2;
>> +
>> +		armv8pmu_write_evtype(idx - 1, hwc->config_base);
>> +		isb();
>> +		armv8pmu_write_evtype(idx, chain_evt);
> 
> The ISB isn't necessary here, AFAICT. We only do this while the PMU is
> disabled; no?

You're right. I was just following the ARM ARM.

> 
>> +	} else
>> +		armv8pmu_write_evtype(idx, hwc->config_base);
>> +}
> 
> [...]
> 
>> @@ -679,6 +679,12 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>>   			continue;
>>   
>>   		event = hw_events->events[idx];
>> +		/*
>> +		 * If there is no event at this idx (e.g, an idx used
>> +		 * by a chained event in Arm v8 PMUv3), skip it.
>> +		 */
>> +		if (!event)
>> +			continue;
> 
> We may as well lose the used_mask test if we're looking at the event
> regardless.

Ok

Suzuki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-08 14:46     ` Suzuki K Poulose
@ 2018-06-08 15:24       ` Mark Rutland
  2018-06-08 16:05         ` Suzuki K Poulose
  2018-06-11 13:54       ` Suzuki K Poulose
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2018-06-08 15:24 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On Fri, Jun 08, 2018 at 03:46:57PM +0100, Suzuki K Poulose wrote:
> On 06/06/2018 07:01 PM, Mark Rutland wrote:
> > > Thus we need special allocation schemes
> > > to make the full use of available counters. So, we allocate the
> > > counters from either ends. i.e, chained counters are allocated
> > > from the lower end in pairs of two and the normal counters are
> > > allocated from the higher number. Also makes necessary changes to
> > > handle the chained events as a single event with 2 counters.
> > 
> > Why do we need to allocate in this way?
> > 
> > I can see this might make allocating a pair of counters more likely in
> > some cases, but there are still others where it wouldn't be possible
> > (and we'd rely on the rotation logic to repack the counters for us).
> 
> It makes the efficient use of the counters in all cases and allows
> counting maximum number of events with any given set, keeping the precedence
> on the order of their "inserts".
> e.g, if the number of counters happened to be "odd" (not sure if it is even
> possible).

Unfortunately, that doesn't always work.

Say you have a system with 8 counters, and you open 8 (32-bit) events.

Then you close the events in counters 0, 2, 4, and 6. The live events
aren't moved, and stay where they are, in counters 1, 3, 5, and 7.

Now, say you open a 64-bit event. When we try to add it, we try to
allocate an index for two consecutive counters, and find that we can't,
despite 4 counters being free.

We return -EAGAIN to the core perf code, whereupon it removes any other
events in that group (none in this case).

Then we wait for the rotation logic to pull all of the events out and
schedule them back in, re-packing them, which should (eventually) work
regardless of how we allocate counters.

... we might need to re-pack events to solve that. :/

[...]

> > > +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
> > > +{
> > > +	armv8pmu_write_evcntr(idx, value >> 32);
> > > +	isb();
> > > +	armv8pmu_write_evcntr(idx - 1, value);
> > > +}
> > 
> > Can we use upper_32_bits() and lower_32_bits() here?
> > 
> > As a more general thing, I think we need to clean up the way we
> > read/write counters, because I don't think that it's right that we poke
> > them while they're running -- that means you get some arbitrary skew on
> > counter groups.
> > 
> > It looks like the only case we do that is the IRQ handler, so we should
> > be able to stop/start the PMU there.
> 
> Since we don't stop the "counting" of events usually when an IRQ is
> triggered, the skew will be finally balanced when the events are stopped
> in a the group. So, I think, stopping the PMU may not be really a good
> thing after all. Just my thought.

That's not quite right -- if one event in a group overflows, we'll
reprogram it *while* other events are counting, losing some events in
the process.

Stopping the PMU for the duration of the IRQ handler ensures that events
in a group are always in-sync with one another.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-08 15:24       ` Mark Rutland
@ 2018-06-08 16:05         ` Suzuki K Poulose
  0 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-06-08 16:05 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On 08/06/18 16:24, Mark Rutland wrote:
> On Fri, Jun 08, 2018 at 03:46:57PM +0100, Suzuki K Poulose wrote:
>> On 06/06/2018 07:01 PM, Mark Rutland wrote:
>>>> Thus we need special allocation schemes
>>>> to make the full use of available counters. So, we allocate the
>>>> counters from either ends. i.e, chained counters are allocated
>>>> from the lower end in pairs of two and the normal counters are
>>>> allocated from the higher number. Also makes necessary changes to
>>>> handle the chained events as a single event with 2 counters.
>>>
>>> Why do we need to allocate in this way?
>>>
>>> I can see this might make allocating a pair of counters more likely in
>>> some cases, but there are still others where it wouldn't be possible
>>> (and we'd rely on the rotation logic to repack the counters for us).
>>
>> It makes the efficient use of the counters in all cases and allows
>> counting maximum number of events with any given set, keeping the precedence
>> on the order of their "inserts".
>> e.g, if the number of counters happened to be "odd" (not sure if it is even
>> possible).
> 
> Unfortunately, that doesn't always work.
> 
> Say you have a system with 8 counters, and you open 8 (32-bit) events.

I was talking about the following (imaginary) case :

We have 7 counters, and you have 5 32bit counters and 1 64bit counter.
Without the above scheme, you would place first 5 events on the first
5 counters and then you can't place the 64bit counter, as you are now
left with a low/odd counter and a high/even counter.

> 
> Then you close the events in counters 0, 2, 4, and 6. The live events
> aren't moved, and stay where they are, in counters 1, 3, 5, and 7.
> 
> Now, say you open a 64-bit event. When we try to add it, we try to
> allocate an index for two consecutive counters, and find that we can't,
> despite 4 counters being free.
> 
> We return -EAGAIN to the core perf code, whereupon it removes any other
> events in that group (none in this case).
> 
> Then we wait for the rotation logic to pull all of the events out and
> schedule them back in, re-packing them, which should (eventually) work
> regardless of how we allocate counters.
> 
> ... we might need to re-pack events to solve that. :/

I agree, removing and putting them back in is not going to work unless
we re-pack or delay allocating the counters until we start the PMU.

> 
> [...]
> 
>>>> +static inline void armv8pmu_write_chain_counter(int idx, u64 value)
>>>> +{
>>>> +	armv8pmu_write_evcntr(idx, value >> 32);
>>>> +	isb();
>>>> +	armv8pmu_write_evcntr(idx - 1, value);
>>>> +}
>>>
>>> Can we use upper_32_bits() and lower_32_bits() here?
>>>
>>> As a more general thing, I think we need to clean up the way we
>>> read/write counters, because I don't think that it's right that we poke
>>> them while they're running -- that means you get some arbitrary skew on
>>> counter groups.
>>>
>>> It looks like the only case we do that is the IRQ handler, so we should
>>> be able to stop/start the PMU there.
>>
>> Since we don't stop the "counting" of events usually when an IRQ is
>> triggered, the skew will be finally balanced when the events are stopped
>> in a the group. So, I think, stopping the PMU may not be really a good
>> thing after all. Just my thought.
> 
> That's not quite right -- if one event in a group overflows, we'll
> reprogram it *while* other events are counting, losing some events in
> the process.

Oh yes, you're right. I can fix it.


> 
> Stopping the PMU for the duration of the IRQ handler ensures that events
> in a group are always in-sync with one another.

Suzuki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-08 14:46     ` Suzuki K Poulose
  2018-06-08 15:24       ` Mark Rutland
@ 2018-06-11 13:54       ` Suzuki K Poulose
  2018-06-11 14:24         ` Mark Rutland
  1 sibling, 1 reply; 16+ messages in thread
From: Suzuki K Poulose @ 2018-06-11 13:54 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On 08/06/18 15:46, Suzuki K Poulose wrote:
> Hi Mark,
> 
> On 06/06/2018 07:01 PM, Mark Rutland wrote:
>> On Tue, May 29, 2018 at 11:55:56AM +0100, Suzuki K Poulose wrote:

> 
>>> -        value |= 0xffffffff00000000ULL;
>>> +        if (!armv8pmu_event_is_64bit(event))
>>> +            value |= 0xffffffff00000000ULL;
>>>           write_sysreg(value, pmccntr_el0);
>>> -    } else if (armv8pmu_select_counter(idx) == idx)
>>> -        write_sysreg(value, pmxevcntr_el0);
>>> +    } else
>>> +        armv8pmu_write_hw_counter(event, value);
>>>   }
>>
>>> +static inline void armv8pmu_write_event_type(struct perf_event *event)
>>> +{
>>> +    struct hw_perf_event *hwc = &event->hw;
>>> +    int idx = hwc->idx;
>>> +
>>> +    /*
>>> +     * For chained events, write the the low counter event type
>>> +     * followed by the high counter. The high counter is programmed
>>> +     * with CHAIN event code with filters set to count at all ELs.
>>> +     */
>>> +    if (armv8pmu_event_is_chained(event)) {
>>> +        u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
>>> +                ARMV8_PMU_INCLUDE_EL2;
>>> +
>>> +        armv8pmu_write_evtype(idx - 1, hwc->config_base);
>>> +        isb();
>>> +        armv8pmu_write_evtype(idx, chain_evt);
>>
>> The ISB isn't necessary here, AFAICT. We only do this while the PMU is
>> disabled; no?
> 
> You're right. I was just following the ARM ARM.

Taking another look, it is not clear about the semantics of "pmu->enable()"
and pmu->disable() callbacks. I don't see any reference to them in the perf core
driver anymore. The perf core uses add() / del () instead, with the PMU
turned off. Do you have any idea about the enable()/disable() callbacks ?
Am I missing something ?

Suzuki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-11 13:54       ` Suzuki K Poulose
@ 2018-06-11 14:24         ` Mark Rutland
  2018-06-11 16:18           ` Suzuki K Poulose
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Rutland @ 2018-06-11 14:24 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On Mon, Jun 11, 2018 at 02:54:16PM +0100, Suzuki K Poulose wrote:
> On 08/06/18 15:46, Suzuki K Poulose wrote:
> > On 06/06/2018 07:01 PM, Mark Rutland wrote:
> > > On Tue, May 29, 2018 at 11:55:56AM +0100, Suzuki K Poulose wrote:
> 
> > > > -        value |= 0xffffffff00000000ULL;
> > > > +        if (!armv8pmu_event_is_64bit(event))
> > > > +            value |= 0xffffffff00000000ULL;
> > > >           write_sysreg(value, pmccntr_el0);
> > > > -    } else if (armv8pmu_select_counter(idx) == idx)
> > > > -        write_sysreg(value, pmxevcntr_el0);
> > > > +    } else
> > > > +        armv8pmu_write_hw_counter(event, value);
> > > >   }
> > > 
> > > > +static inline void armv8pmu_write_event_type(struct perf_event *event)
> > > > +{
> > > > +    struct hw_perf_event *hwc = &event->hw;
> > > > +    int idx = hwc->idx;
> > > > +
> > > > +    /*
> > > > +     * For chained events, write the the low counter event type
> > > > +     * followed by the high counter. The high counter is programmed
> > > > +     * with CHAIN event code with filters set to count at all ELs.
> > > > +     */
> > > > +    if (armv8pmu_event_is_chained(event)) {
> > > > +        u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
> > > > +                ARMV8_PMU_INCLUDE_EL2;
> > > > +
> > > > +        armv8pmu_write_evtype(idx - 1, hwc->config_base);
> > > > +        isb();
> > > > +        armv8pmu_write_evtype(idx, chain_evt);
> > > 
> > > The ISB isn't necessary here, AFAICT. We only do this while the PMU is
> > > disabled; no?
> > 
> > You're right. I was just following the ARM ARM.
> 
> Taking another look, it is not clear about the semantics of "pmu->enable()"
> and pmu->disable() callbacks. 

I was talking about pmu::{pmu_disable,pmu_enable}(), so I'm not sure I
follow how arm_pmu::{enable,disable}() are relevant here.

The arm_pmu::{enable,disable}() callbacks enable or disable individual
counters. For example, leaving unused counters disabled may save power,
even if the PMU as a whole is enabled.

> I don't see any reference to them in the perf core
> driver anymore. The perf core uses add() / del () instead, with the PMU
> turned off. Do you have any idea about the enable()/disable() callbacks ?

I'm not sure I understand what you're asking here.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] arm64: perf: Add support for chaining event counters
  2018-06-11 14:24         ` Mark Rutland
@ 2018-06-11 16:18           ` Suzuki K Poulose
  0 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2018-06-11 16:18 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy

On 11/06/18 15:24, Mark Rutland wrote:
> On Mon, Jun 11, 2018 at 02:54:16PM +0100, Suzuki K Poulose wrote:
>> On 08/06/18 15:46, Suzuki K Poulose wrote:
>>> On 06/06/2018 07:01 PM, Mark Rutland wrote:
>>>> On Tue, May 29, 2018 at 11:55:56AM +0100, Suzuki K Poulose wrote:
>>
>>>>> -        value |= 0xffffffff00000000ULL;
>>>>> +        if (!armv8pmu_event_is_64bit(event))
>>>>> +            value |= 0xffffffff00000000ULL;
>>>>>            write_sysreg(value, pmccntr_el0);
>>>>> -    } else if (armv8pmu_select_counter(idx) == idx)
>>>>> -        write_sysreg(value, pmxevcntr_el0);
>>>>> +    } else
>>>>> +        armv8pmu_write_hw_counter(event, value);
>>>>>    }
>>>>
>>>>> +static inline void armv8pmu_write_event_type(struct perf_event *event)
>>>>> +{
>>>>> +    struct hw_perf_event *hwc = &event->hw;
>>>>> +    int idx = hwc->idx;
>>>>> +
>>>>> +    /*
>>>>> +     * For chained events, write the the low counter event type
>>>>> +     * followed by the high counter. The high counter is programmed
>>>>> +     * with CHAIN event code with filters set to count at all ELs.
>>>>> +     */
>>>>> +    if (armv8pmu_event_is_chained(event)) {
>>>>> +        u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
>>>>> +                ARMV8_PMU_INCLUDE_EL2;
>>>>> +
>>>>> +        armv8pmu_write_evtype(idx - 1, hwc->config_base);
>>>>> +        isb();
>>>>> +        armv8pmu_write_evtype(idx, chain_evt);
>>>>
>>>> The ISB isn't necessary here, AFAICT. We only do this while the PMU is
>>>> disabled; no?
>>>
>>> You're right. I was just following the ARM ARM.
>>
>> Taking another look, it is not clear about the semantics of "pmu->enable()"
>> and pmu->disable() callbacks.
> 
> I was talking about pmu::{pmu_disable,pmu_enable}(), so I'm not sure I
> follow how arm_pmu::{enable,disable}() are relevant here.> 
> The arm_pmu::{enable,disable}() callbacks enable or disable individual
> counters. For example, leaving unused counters disabled may save power,
> even if the PMU as a whole is enabled.

Ah, I mistook cpu_pmu->enable/disable for the core pmu ops. My bad.
Sorry about the noise.

Suzuki

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-11 16:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29 10:55 [PATCH v2 0/5] arm64: perf: Support for chained counters Suzuki K Poulose
2018-05-29 10:55 ` [PATCH v2 1/5] arm_pmu: Clean up maximum period handling Suzuki K Poulose
2018-05-29 10:55 ` [PATCH v2 2/5] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
2018-05-29 10:55 ` [PATCH v2 3/5] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
2018-06-06 16:48   ` Mark Rutland
2018-06-07  7:34     ` Suzuki K Poulose
2018-05-29 10:55 ` [PATCH v2 4/5] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
2018-05-29 10:55 ` [PATCH v2 5/5] arm64: perf: Add support for chaining event counters Suzuki K Poulose
2018-06-06 18:01   ` Mark Rutland
2018-06-08 14:46     ` Suzuki K Poulose
2018-06-08 15:24       ` Mark Rutland
2018-06-08 16:05         ` Suzuki K Poulose
2018-06-11 13:54       ` Suzuki K Poulose
2018-06-11 14:24         ` Mark Rutland
2018-06-11 16:18           ` Suzuki K Poulose
2018-06-05 15:00 ` [PATCH v2 0/5] arm64: perf: Support for chained counters Julien Thierry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).