linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] arm64: perf: Support for chained counters
@ 2018-06-19 10:15 Suzuki K Poulose
  2018-06-19 10:15 ` [PATCH v3 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

This series adds support for counting PMU events using 64bit counters
for arm64 PMU.

The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(low even and hig odd counters) to count a given "event" in 64bit mode.
This series adds the support for 64bit events in the core arm_pmu driver
infrastructure and adds the support for armv8 64bit kernel PMU to use
chained counters to count in 64bit mode. For CPU cycles, we use the cycle
counter in 64bit mode, only when requested. If the cycle counter is not
available, we fall back to chaining the counters.

Tested on Juno, Fast models. Applies on 4.18-rc1

Changes since v2:
 - Clean up select counter routine. (Mark Rutland)
 - Stop PMU while processing overflows (Mark Rutland)
 - Drop special allocation algorithm for chain indices
 - Since we access the counters when the PMU is stopped,
   get rid of the unncessary barriers.
 - Ensure a counter is allocated when checking for chained event

Change since v1:
 - Remove unnecessary isb()s in chain counter reads/writes
 - Fix event programming order for counters
 - Tighten chain counter event read sequence
 - Set chain event to count in all ELs
 - Cleanup helpers to be consistent
 - Fix build break on xcale PMU (reported by kbuild-robot)
 - Remove the explicit counter width field from pmu backends and default
   to 32bit.
 - Rename flag ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT and
   the format string "chain" => "bits64". (Unfortunately we can't use "64bit"
   and I am open for suggestion on a better name)
 - Rename armpmu_get_event_max_period() => armpmu_event_max_period()
 - For 64bit CPU cycles events, allow chaining if cycle counter is
   not available.

Suzuki K Poulose (7):
  arm_pmu: Clean up maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for 64bit event counters
  arm_pmu: Tidy up clear_event_idx call backs
  arm64: perf: Clean up armv8pmu_select_counter
  arm64: perf: Disable PMU while processing counter overflows
  arm64: perf: Add support for chaining event counters

 arch/arm/kernel/perf_event_v6.c     |   6 +-
 arch/arm/kernel/perf_event_v7.c     |   7 +-
 arch/arm/kernel/perf_event_xscale.c |  10 +-
 arch/arm64/kernel/perf_event.c      | 249 +++++++++++++++++++++++++++++-------
 drivers/perf/arm_pmu.c              |  51 +++++---
 include/linux/perf/arm_pmu.h        |  11 +-
 6 files changed, 255 insertions(+), 79 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/7] arm_pmu: Clean up maximum period handling
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-19 10:45   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.

No functional changes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     |  2 --
 arch/arm/kernel/perf_event_v7.c     |  1 -
 arch/arm/kernel/perf_event_xscale.c |  2 --
 arch/arm64/kernel/perf_event.c      |  1 -
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  1 -
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index be42c4f..f64a6bf 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -495,7 +495,6 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -546,7 +545,6 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 57f01e0..ecca4cd 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1170,7 +1170,6 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 88d1a76..c4f0294 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -374,7 +374,6 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
@@ -743,7 +742,6 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aa..678ecff 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -960,7 +960,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a6347d4..6ddc00d 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(void)
+{
+	return (1ULL << 32) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period();
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period();
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period() >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index ad54444..12c30a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,6 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-06-19 10:15 ` [PATCH v3 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-19 10:52   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 - No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 8 ++++----
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index f64a6bf..0729f98 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index ecca4cd..fd7ce01 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index c4f0294..942230f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -316,7 +316,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -337,7 +337,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -678,7 +678,7 @@ static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale2pmu_read_counter(struct perf_event *event)
+static inline u64 xscale2pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -705,7 +705,7 @@ static inline u32 xscale2pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale2pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale2pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 678ecff..66a2ffd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 12c30a2..f7126a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-06-19 10:15 ` [PATCH v3 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
  2018-06-19 10:15 ` [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-19 10:57   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

Each PMU has a set of 32bit event counters. But in some
special cases, the events could be counted using counters
which are effectively 64bit wide.

e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cycles. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses 64bit counters.
This only affects the periods set for each counter in the core
driver.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v2:
 - None
---
 drivers/perf/arm_pmu.c       | 15 +++++++++------
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 6ddc00d..e3766a8 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,9 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
-static inline u64 arm_pmu_max_period(void)
+static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
-	return (1ULL << 32) - 1;
+	return (event->hw.flags & ARMPMU_EVT_64BIT) ?
+		GENMASK_ULL(63, 0) :
+		GENMASK_ULL(31, 0);
 }
 
 static int
@@ -122,7 +124,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period();
+	max_period = arm_pmu_event_max_period(event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +150,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +162,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period();
+	u64 max_period = arm_pmu_event_max_period(event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +370,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -410,7 +413,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = arm_pmu_max_period() >> 1;
+		hwc->sample_period  = arm_pmu_event_max_period(event) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f7126a2..10f92e1 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a 64bit counter */
+#define ARMPMU_EVT_64BIT		1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2018-06-19 10:15 ` [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-29 13:27   ` Mark Rutland
  2018-06-29 13:40   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. Tidy this up by relying on the clear_event_idx
to do the book keeping, if available. Otherwise, let the core
driver do the default "clear" bit operation. This will be useful
for adding the chained event support, where we leave the event
idx maintenance to the backend.

Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v2:
 - Reset the event counter after an event is removed.
---
 arch/arm/kernel/perf_event_v7.c |  2 ++
 drivers/perf/arm_pmu.c          | 17 +++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index fd7ce01..765d265 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1637,6 +1637,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1966,6 +1967,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	clear_bit(hwc->idx, cpuc->used_mask);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index e3766a8..6e10e8c 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -227,6 +227,16 @@ static void armpmu_start(struct perf_event *event, int flags)
 	armpmu->enable(event);
 }
 
+static void armpmu_clear_event_idx(struct arm_pmu *armpmu,
+				   struct pmu_hw_events *hw_events,
+				   struct perf_event *event)
+{
+	if (armpmu->clear_event_idx)
+		armpmu->clear_event_idx(hw_events, event);
+	else
+		clear_bit(event->hw.idx, hw_events->used_mask);
+}
+
 static void
 armpmu_del(struct perf_event *event, int flags)
 {
@@ -237,11 +247,10 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu_clear_event_idx(armpmu, hw_events, event);
 	perf_event_update_userpage(event);
+	/* Clear the allocated counter */
+	hwc->idx = -1;
 }
 
 static int
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2018-06-19 10:15 ` [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-29 13:29   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
  2018-06-19 10:15 ` [PATCH v3 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 66a2ffd..9ce3729 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -503,13 +503,17 @@ static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
 	return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
 }
 
-static inline int armv8pmu_select_counter(int idx)
+static inline void armv8pmu_select_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
 	write_sysreg(counter, pmselr_el0);
 	isb();
+}
 
-	return idx;
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	armv8pmu_select_counter(idx);
+	return read_sysreg(pmxevcntr_el0);
 }
 
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
@@ -524,12 +528,18 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_evcntr(idx);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	armv8pmu_select_counter(idx);
+	write_sysreg(value, pmxevcntr_el0);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -547,16 +557,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 		 */
 		value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	} else
+		armv8pmu_write_evcntr(idx, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
 {
-	if (armv8pmu_select_counter(idx) == idx) {
-		val &= ARMV8_PMU_EVTYPE_MASK;
-		write_sysreg(val, pmxevtyper_el0);
-	}
+	armv8pmu_select_counter(idx);
+	val &= ARMV8_PMU_EVTYPE_MASK;
+	write_sysreg(val, pmxevtyper_el0);
 }
 
 static inline int armv8pmu_enable_counter(int idx)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2018-06-19 10:15 ` [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-19 10:43   ` Mark Rutland
  2018-06-19 10:15 ` [PATCH v3 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.

This patch also moves the PMU stop/start routines to avoid a
forward declaration.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 50 +++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 9ce3729..eebc635 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -678,6 +678,28 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static void armv8pmu_start(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Enable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Disable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
 static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 {
 	u32 pmovsr;
@@ -702,6 +724,11 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	 */
 	regs = get_irq_regs();
 
+	/*
+	 * Stop the PMU while processing the counter overflows
+	 * to prevent skews in group events.
+	 */
+	armv8pmu_stop(cpu_pmu);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
 		struct hw_perf_event *hwc;
@@ -726,6 +753,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (perf_event_overflow(event, &data, regs))
 			cpu_pmu->disable(event);
 	}
+	armv8pmu_start(cpu_pmu);
 
 	/*
 	 * Handle the pending perf events.
@@ -739,28 +767,6 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
-static void armv8pmu_start(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Enable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
-static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Disable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 7/7] arm64: perf: Add support for chaining event counters
  2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2018-06-19 10:15 ` [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
@ 2018-06-19 10:15 ` Suzuki K Poulose
  2018-06-29 14:01   ` Mark Rutland
  6 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-19 10:15 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, linux-kernel, will.deacon, robin.murphy,
	julien.thierry, Suzuki K Poulose

Add support for 64bit event by using chained event counters
and 64bit cycle counters.

PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.

For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes sinec v2:
 - Drop special allocation algorithm for chain indices
 - Since we access the counters when the PMU is stopped,
   get rid of the unncessary barriers.
 - Ensure a counter is allocated when checking for chained event
---
 arch/arm64/kernel/perf_event.c | 184 ++++++++++++++++++++++++++++++++++++-----
 drivers/perf/arm_pmu.c         |  13 ++-
 2 files changed, 169 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index eebc635..a03def7 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(bits64, "config1:0");
+
+static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_bits64.attr,
 	NULL,
 };
 
@@ -466,6 +473,20 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
 /*
+ * Use chained counter for a 64bit event, if we could not allocate
+ * the 64bit cycle counter. This must be called after a counter
+ * was allocated.
+ */
+static inline bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return !WARN_ON(idx < 0) &&
+	       armv8pmu_event_is_64bit(event) &&
+	       (event->hw.idx != ARMV8_IDX_CYCLE_COUNTER);
+}
+
+/*
  * ARMv8 low level PMU access
  */
 
@@ -516,12 +537,28 @@ static inline u32 armv8pmu_read_evcntr(int idx)
 	return read_sysreg(pmxevcntr_el0);
 }
 
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = 0;
+
+	val = armv8pmu_read_evcntr(idx);
+	/*
+	 * We always read the counter with the PMU turned off.
+	 * So we don't need special care for reading chained
+	 * counters.
+	 */
+	if (armv8pmu_event_is_chained(event))
+		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
+	return val;
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
@@ -529,7 +566,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
 	else
-		value = armv8pmu_read_evcntr(idx);
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
@@ -540,6 +577,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
 	write_sysreg(value, pmxevcntr_el0);
 }
 
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_write_evcntr(idx, upper_32_bits(value));
+		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
+	} else {
+		armv8pmu_write_evcntr(idx, value);
+	}
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -551,14 +601,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
 		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
+		 * Set the upper 32bits as this is a 64bit counter, if we only
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		value |= 0xffffffff00000000ULL;
+		if (!armv8pmu_event_is_64bit(event))
+			value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
 	} else
-		armv8pmu_write_evcntr(idx, value);
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -568,6 +619,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	write_sysreg(val, pmxevtyper_el0);
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, the low counter is programmed to count
+	 * the event of interest and the high counter is programmed
+	 * with CHAIN event code with filters set to count at all ELs.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
+				ARMV8_PMU_INCLUDE_EL2;
+
+		armv8pmu_write_evtype(idx - 1, hwc->config_base);
+		armv8pmu_write_evtype(idx, chain_evt);
+	} else {
+		armv8pmu_write_evtype(idx, hwc->config_base);
+	}
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -575,6 +647,16 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	armv8pmu_enable_counter(idx);
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_enable_counter(idx - 1);
+	isb();
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -582,6 +664,16 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_disable_counter(idx - 1);
+	armv8pmu_disable_counter(idx);
+}
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -589,6 +681,12 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	/* For chained events, enable the interrupt for only the high counter */
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -601,6 +699,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -618,10 +721,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -632,22 +733,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -655,10 +756,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -668,12 +767,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Disable interrupt for this counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -767,6 +866,37 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++)
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
@@ -784,13 +914,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
+	idx = armv8pmu_event_is_64bit(event) ?
+		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
+		armv8pmu_get_single_idx(cpuc, cpu_pmu);
 
-	/* The counters are all in use. */
-	return -EAGAIN;
+	return idx;
+}
+
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	clear_bit(idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -865,6 +1003,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (armv8pmu_event_is_64bit(event))
+		event->hw.flags |= ARMPMU_EVT_64BIT;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
@@ -971,6 +1112,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 6e10e8c..a4675e4 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -674,14 +674,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 	int idx;
 
 	for (idx = 0; idx < armpmu->num_events; idx++) {
-		/*
-		 * If the counter is not used skip it, there is no
-		 * need of stopping/restarting it.
-		 */
-		if (!test_bit(idx, hw_events->used_mask))
-			continue;
-
 		event = hw_events->events[idx];
+		/*
+		 * If there is no event at this idx (e.g, an idx used
+		 * by a chained event in Arm v8 PMUv3), skip it.
+		 */
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows
  2018-06-19 10:15 ` [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
@ 2018-06-19 10:43   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-19 10:43 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:41AM +0100, Suzuki K Poulose wrote:
> The arm64 PMU updates the event counters and reprograms the
> counters in the overflow IRQ handler without disabling the
> PMU. This could potentially cause skews in for group counters,
> where the overflowed counters may potentially loose some event
> counts, while they are reprogrammed. To prevent this, disable
> the PMU while we process the counter overflows and enable it
> right back when we are done.
> 
> This patch also moves the PMU stop/start routines to avoid a
> forward declaration.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

This makes me realise that we could remove the pmu_lock, but that's not
a new problem, and we can address that separately.

Thanks,
Mark.

> ---
>  arch/arm64/kernel/perf_event.c | 50 +++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 9ce3729..eebc635 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -678,6 +678,28 @@ static void armv8pmu_disable_event(struct perf_event *event)
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
>  
> +static void armv8pmu_start(struct arm_pmu *cpu_pmu)
> +{
> +	unsigned long flags;
> +	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> +
> +	raw_spin_lock_irqsave(&events->pmu_lock, flags);
> +	/* Enable all counters */
> +	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
> +	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
> +}
> +
> +static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
> +{
> +	unsigned long flags;
> +	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> +
> +	raw_spin_lock_irqsave(&events->pmu_lock, flags);
> +	/* Disable all counters */
> +	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
> +	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
> +}
> +
>  static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  {
>  	u32 pmovsr;
> @@ -702,6 +724,11 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  	 */
>  	regs = get_irq_regs();
>  
> +	/*
> +	 * Stop the PMU while processing the counter overflows
> +	 * to prevent skews in group events.
> +	 */
> +	armv8pmu_stop(cpu_pmu);
>  	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
>  		struct perf_event *event = cpuc->events[idx];
>  		struct hw_perf_event *hwc;
> @@ -726,6 +753,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  		if (perf_event_overflow(event, &data, regs))
>  			cpu_pmu->disable(event);
>  	}
> +	armv8pmu_start(cpu_pmu);
>  
>  	/*
>  	 * Handle the pending perf events.
> @@ -739,28 +767,6 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  	return IRQ_HANDLED;
>  }
>  
> -static void armv8pmu_start(struct arm_pmu *cpu_pmu)
> -{
> -	unsigned long flags;
> -	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -
> -	raw_spin_lock_irqsave(&events->pmu_lock, flags);
> -	/* Enable all counters */
> -	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
> -	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
> -}
> -
> -static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
> -{
> -	unsigned long flags;
> -	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -
> -	raw_spin_lock_irqsave(&events->pmu_lock, flags);
> -	/* Disable all counters */
> -	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
> -	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
> -}
> -
>  static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  				  struct perf_event *event)
>  {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/7] arm_pmu: Clean up maximum period handling
  2018-06-19 10:15 ` [PATCH v3 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
@ 2018-06-19 10:45   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-19 10:45 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:36AM +0100, Suzuki K Poulose wrote:
> Each PMU defines their max_period of the counter as the maximum
> value that can be counted. Since all the PMU backends support
> 32bit counters by default, let us remove the redundant field.
> 
> No functional changes.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
> No changes since v2
> ---
>  arch/arm/kernel/perf_event_v6.c     |  2 --
>  arch/arm/kernel/perf_event_v7.c     |  1 -
>  arch/arm/kernel/perf_event_xscale.c |  2 --
>  arch/arm64/kernel/perf_event.c      |  1 -
>  drivers/perf/arm_pmu.c              | 16 ++++++++++++----
>  include/linux/perf/arm_pmu.h        |  1 -
>  6 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
> index be42c4f..f64a6bf 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/arch/arm/kernel/perf_event_v6.c
> @@ -495,7 +495,6 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->stop		= armv6pmu_stop;
>  	cpu_pmu->map_event	= armv6_map_event;
>  	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
>  }
>  
>  static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
> @@ -546,7 +545,6 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->stop		= armv6pmu_stop;
>  	cpu_pmu->map_event	= armv6mpcore_map_event;
>  	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
>  
>  	return 0;
>  }
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index 57f01e0..ecca4cd 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1170,7 +1170,6 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->start		= armv7pmu_start;
>  	cpu_pmu->stop		= armv7pmu_stop;
>  	cpu_pmu->reset		= armv7pmu_reset;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
>  };
>  
>  static void armv7_read_num_pmnc_events(void *info)
> diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
> index 88d1a76..c4f0294 100644
> --- a/arch/arm/kernel/perf_event_xscale.c
> +++ b/arch/arm/kernel/perf_event_xscale.c
> @@ -374,7 +374,6 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->stop		= xscale1pmu_stop;
>  	cpu_pmu->map_event	= xscale_map_event;
>  	cpu_pmu->num_events	= 3;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
>  
>  	return 0;
>  }
> @@ -743,7 +742,6 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->stop		= xscale2pmu_stop;
>  	cpu_pmu->map_event	= xscale_map_event;
>  	cpu_pmu->num_events	= 5;
> -	cpu_pmu->max_period	= (1LLU << 32) - 1;
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 33147aa..678ecff 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -960,7 +960,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->start			= armv8pmu_start,
>  	cpu_pmu->stop			= armv8pmu_stop,
>  	cpu_pmu->reset			= armv8pmu_reset,
> -	cpu_pmu->max_period		= (1LLU << 32) - 1,
>  	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
>  
>  	return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index a6347d4..6ddc00d 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -28,6 +28,11 @@
>  static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>  static DEFINE_PER_CPU(int, cpu_irq);
>  
> +static inline u64 arm_pmu_max_period(void)
> +{
> +	return (1ULL << 32) - 1;
> +}
> +
>  static int
>  armpmu_map_cache_event(const unsigned (*cache_map)
>  				      [PERF_COUNT_HW_CACHE_MAX]
> @@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
>  	struct hw_perf_event *hwc = &event->hw;
>  	s64 left = local64_read(&hwc->period_left);
>  	s64 period = hwc->sample_period;
> +	u64 max_period;
>  	int ret = 0;
>  
> +	max_period = arm_pmu_max_period();
>  	if (unlikely(left <= -period)) {
>  		left = period;
>  		local64_set(&hwc->period_left, left);
> @@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
>  	 * effect we are reducing max_period to account for
>  	 * interrupt latency (and we are being very conservative).
>  	 */
> -	if (left > (armpmu->max_period >> 1))
> -		left = armpmu->max_period >> 1;
> +	if (left > (max_period >> 1))
> +		left = (max_period >> 1);
>  
>  	local64_set(&hwc->prev_count, (u64)-left);
>  
> @@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
>  	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
>  	u64 delta, prev_raw_count, new_raw_count;
> +	u64 max_period = arm_pmu_max_period();
>  
>  again:
>  	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
>  			     new_raw_count) != prev_raw_count)
>  		goto again;
>  
> -	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
> +	delta = (new_raw_count - prev_raw_count) & max_period;
>  
>  	local64_add(delta, &event->count);
>  	local64_sub(delta, &hwc->period_left);
> @@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
>  		 * is far less likely to overtake the previous one unless
>  		 * you have some serious IRQ latency issues.
>  		 */
> -		hwc->sample_period  = armpmu->max_period >> 1;
> +		hwc->sample_period  = arm_pmu_max_period() >> 1;
>  		hwc->last_period    = hwc->sample_period;
>  		local64_set(&hwc->period_left, hwc->sample_period);
>  	}
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index ad54444..12c30a2 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -94,7 +94,6 @@ struct arm_pmu {
>  	void		(*reset)(void *);
>  	int		(*map_event)(struct perf_event *event);
>  	int		num_events;
> -	u64		max_period;
>  	bool		secure_access; /* 32-bit ARM only */
>  #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
>  	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values
  2018-06-19 10:15 ` [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
@ 2018-06-19 10:52   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-19 10:52 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:37AM +0100, Suzuki K Poulose wrote:
> Convert the {read/write}_counter APIs to handle 64bit values
> to enable supporting chained event counters.

It might be worth a note that the underlying helpers will still only
write 32-bit values, and we'll only pass those 32-bit values, so this
shouldn't cause any functional change.

> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  - No changes since v2
> ---
>  arch/arm/kernel/perf_event_v6.c     | 4 ++--
>  arch/arm/kernel/perf_event_v7.c     | 4 ++--
>  arch/arm/kernel/perf_event_xscale.c | 8 ++++----
>  arch/arm64/kernel/perf_event.c      | 9 ++++-----
>  include/linux/perf/arm_pmu.h        | 4 ++--
>  5 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
> index f64a6bf..0729f98 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/arch/arm/kernel/perf_event_v6.c
> @@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
>  	return ret;
>  }
>  
> -static inline u32 armv6pmu_read_counter(struct perf_event *event)
> +static inline u64 armv6pmu_read_counter(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> @@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
>  	return value;
>  }
>  
> -static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
> +static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index ecca4cd..fd7ce01 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
>  	isb();
>  }
>  
> -static inline u32 armv7pmu_read_counter(struct perf_event *event)
> +static inline u64 armv7pmu_read_counter(struct perf_event *event)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
> @@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
>  	return value;
>  }
>  
> -static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
> +static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
> diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
> index c4f0294..942230f 100644
> --- a/arch/arm/kernel/perf_event_xscale.c
> +++ b/arch/arm/kernel/perf_event_xscale.c
> @@ -316,7 +316,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
>  
> -static inline u32 xscale1pmu_read_counter(struct perf_event *event)
> +static inline u64 xscale1pmu_read_counter(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> @@ -337,7 +337,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
>  	return val;
>  }
>  
> -static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
> +static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> @@ -678,7 +678,7 @@ static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
>  
> -static inline u32 xscale2pmu_read_counter(struct perf_event *event)
> +static inline u64 xscale2pmu_read_counter(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> @@ -705,7 +705,7 @@ static inline u32 xscale2pmu_read_counter(struct perf_event *event)
>  	return val;
>  }
>  
> -static inline void xscale2pmu_write_counter(struct perf_event *event, u32 val)
> +static inline void xscale2pmu_write_counter(struct perf_event *event, u64 val)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int counter = hwc->idx;
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 678ecff..66a2ffd 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
>  	return idx;
>  }
>  
> -static inline u32 armv8pmu_read_counter(struct perf_event *event)
> +static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
> @@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
>  	return value;
>  }
>  
> -static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
> +static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
> @@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
>  		 * count using the lower 32bits and we want an interrupt when
>  		 * it overflows.
>  		 */
> -		u64 value64 = 0xffffffff00000000ULL | value;
> -
> -		write_sysreg(value64, pmccntr_el0);
> +		value |= 0xffffffff00000000ULL;
> +		write_sysreg(value, pmccntr_el0);
>  	} else if (armv8pmu_select_counter(idx) == idx)
>  		write_sysreg(value, pmxevcntr_el0);
>  }
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 12c30a2..f7126a2 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -87,8 +87,8 @@ struct arm_pmu {
>  					 struct perf_event *event);
>  	int		(*set_event_filter)(struct hw_perf_event *evt,
>  					    struct perf_event_attr *attr);
> -	u32		(*read_counter)(struct perf_event *event);
> -	void		(*write_counter)(struct perf_event *event, u32 val);
> +	u64		(*read_counter)(struct perf_event *event);
> +	void		(*write_counter)(struct perf_event *event, u64 val);
>  	void		(*start)(struct arm_pmu *);
>  	void		(*stop)(struct arm_pmu *);
>  	void		(*reset)(void *);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters
  2018-06-19 10:15 ` [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-06-19 10:57   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-19 10:57 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:38AM +0100, Suzuki K Poulose wrote:
> Each PMU has a set of 32bit event counters. But in some
> special cases, the events could be counted using counters
> which are effectively 64bit wide.
> 
> e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
> only the CPU cycles. Also, the PMU can chain the event counters
> to effectively count as a 64bit counter.
> 
> Add support for tracking the events that uses 64bit counters.
> This only affects the periods set for each counter in the core
> driver.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v2:
>  - None
> ---
>  drivers/perf/arm_pmu.c       | 15 +++++++++------
>  include/linux/perf/arm_pmu.h |  6 ++++++
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 6ddc00d..e3766a8 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -28,9 +28,11 @@
>  static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
>  static DEFINE_PER_CPU(int, cpu_irq);
>  
> -static inline u64 arm_pmu_max_period(void)
> +static inline u64 arm_pmu_event_max_period(struct perf_event *event)
>  {
> -	return (1ULL << 32) - 1;
> +	return (event->hw.flags & ARMPMU_EVT_64BIT) ?
> +		GENMASK_ULL(63, 0) :
> +		GENMASK_ULL(31, 0);

Please make this:

	if (event->hw.flags & ARMPMU_EVT_64BIT)
		return GENMASK_ULL(63, 0);
	else
		return GENMASK_ULL(31, 0);

... which I think is more legible than a ternary spread across multiple
lines.

The rest looks fine, so with that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

>  }
>  
>  static int
> @@ -122,7 +124,7 @@ int armpmu_event_set_period(struct perf_event *event)
>  	u64 max_period;
>  	int ret = 0;
>  
> -	max_period = arm_pmu_max_period();
> +	max_period = arm_pmu_event_max_period(event);
>  	if (unlikely(left <= -period)) {
>  		left = period;
>  		local64_set(&hwc->period_left, left);
> @@ -148,7 +150,7 @@ int armpmu_event_set_period(struct perf_event *event)
>  
>  	local64_set(&hwc->prev_count, (u64)-left);
>  
> -	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
> +	armpmu->write_counter(event, (u64)(-left) & max_period);
>  
>  	perf_event_update_userpage(event);
>  
> @@ -160,7 +162,7 @@ u64 armpmu_event_update(struct perf_event *event)
>  	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
>  	u64 delta, prev_raw_count, new_raw_count;
> -	u64 max_period = arm_pmu_max_period();
> +	u64 max_period = arm_pmu_event_max_period(event);
>  
>  again:
>  	prev_raw_count = local64_read(&hwc->prev_count);
> @@ -368,6 +370,7 @@ __hw_perf_event_init(struct perf_event *event)
>  	struct hw_perf_event *hwc = &event->hw;
>  	int mapping;
>  
> +	hwc->flags = 0;
>  	mapping = armpmu->map_event(event);
>  
>  	if (mapping < 0) {
> @@ -410,7 +413,7 @@ __hw_perf_event_init(struct perf_event *event)
>  		 * is far less likely to overtake the previous one unless
>  		 * you have some serious IRQ latency issues.
>  		 */
> -		hwc->sample_period  = arm_pmu_max_period() >> 1;
> +		hwc->sample_period  = arm_pmu_event_max_period(event) >> 1;
>  		hwc->last_period    = hwc->sample_period;
>  		local64_set(&hwc->period_left, hwc->sample_period);
>  	}
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index f7126a2..10f92e1 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -25,6 +25,12 @@
>   */
>  #define ARMPMU_MAX_HWEVENTS		32
>  
> +/*
> + * ARM PMU hw_event flags
> + */
> +/* Event uses a 64bit counter */
> +#define ARMPMU_EVT_64BIT		1
> +
>  #define HW_OP_UNSUPPORTED		0xFFFF
>  #define C(_x)				PERF_COUNT_HW_CACHE_##_x
>  #define CACHE_OP_UNSUPPORTED		0xFFFF
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-06-19 10:15 ` [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-06-29 13:27   ` Mark Rutland
  2018-06-29 13:40   ` Mark Rutland
  1 sibling, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 13:27 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:39AM +0100, Suzuki K Poulose wrote:
> The armpmu uses get_event_idx callback to allocate an event
> counter for a given event, which marks the selected counter
> as "used". Now, when we delete the counter, the arm_pmu goes
> ahead and clears the "used" bit and then invokes the "clear_event_idx"
> call back, which kind of splits the job between the core code
> and the backend. Tidy this up by relying on the clear_event_idx
> to do the book keeping, if available. Otherwise, let the core
> driver do the default "clear" bit operation. This will be useful
> for adding the chained event support, where we leave the event
> idx maintenance to the backend.
> 
> Also, when an event is removed from the PMU, reset the hw.idx
> to indicate that a counter is not allocated for this event,
> to help the backends do better checks. This will be also used
> for the chain counter support.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

TBH, I think it might be better to make clear_event_idx() mandatory,
such that each backend provides both clear / set, next to each other in
the code.

It's a bit more code, but it would restore orthogonality to the API.

Mark.

> ---
> Changes since v2:
>  - Reset the event counter after an event is removed.
> ---
>  arch/arm/kernel/perf_event_v7.c |  2 ++
>  drivers/perf/arm_pmu.c          | 17 +++++++++++++----
>  2 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index fd7ce01..765d265 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1637,6 +1637,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool krait_event = EVENT_CPU(hwc->config_base);
>  
> +	clear_bit(hwc->idx, cpuc->used_mask);
>  	if (venum_event || krait_event) {
>  		bit = krait_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);
> @@ -1966,6 +1967,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool scorpion_event = EVENT_CPU(hwc->config_base);
>  
> +	clear_bit(hwc->idx, cpuc->used_mask);
>  	if (venum_event || scorpion_event) {
>  		bit = scorpion_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index e3766a8..6e10e8c 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -227,6 +227,16 @@ static void armpmu_start(struct perf_event *event, int flags)
>  	armpmu->enable(event);
>  }
>  
> +static void armpmu_clear_event_idx(struct arm_pmu *armpmu,
> +				   struct pmu_hw_events *hw_events,
> +				   struct perf_event *event)
> +{
> +	if (armpmu->clear_event_idx)
> +		armpmu->clear_event_idx(hw_events, event);
> +	else
> +		clear_bit(event->hw.idx, hw_events->used_mask);
> +}
> +
>  static void
>  armpmu_del(struct perf_event *event, int flags)
>  {
> @@ -237,11 +247,10 @@ armpmu_del(struct perf_event *event, int flags)
>  
>  	armpmu_stop(event, PERF_EF_UPDATE);
>  	hw_events->events[idx] = NULL;
> -	clear_bit(idx, hw_events->used_mask);
> -	if (armpmu->clear_event_idx)
> -		armpmu->clear_event_idx(hw_events, event);
> -
> +	armpmu_clear_event_idx(armpmu, hw_events, event);
>  	perf_event_update_userpage(event);
> +	/* Clear the allocated counter */
> +	hwc->idx = -1;
>  }
>  
>  static int
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter
  2018-06-19 10:15 ` [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
@ 2018-06-29 13:29   ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 13:29 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:40AM +0100, Suzuki K Poulose wrote:
> armv8pmu_select_counter always returns the passed idx. So
> let us make that void and get rid of the pointless checks.
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/kernel/perf_event.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 66a2ffd..9ce3729 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -503,13 +503,17 @@ static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
>  	return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
>  }
>  
> -static inline int armv8pmu_select_counter(int idx)
> +static inline void armv8pmu_select_counter(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>  	write_sysreg(counter, pmselr_el0);
>  	isb();
> +}
>  
> -	return idx;
> +static inline u32 armv8pmu_read_evcntr(int idx)
> +{
> +	armv8pmu_select_counter(idx);
> +	return read_sysreg(pmxevcntr_el0);
>  }
>  
>  static inline u64 armv8pmu_read_counter(struct perf_event *event)
> @@ -524,12 +528,18 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  			smp_processor_id(), idx);
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>  		value = read_sysreg(pmccntr_el0);
> -	else if (armv8pmu_select_counter(idx) == idx)
> -		value = read_sysreg(pmxevcntr_el0);
> +	else
> +		value = armv8pmu_read_evcntr(idx);
>  
>  	return value;
>  }
>  
> +static inline void armv8pmu_write_evcntr(int idx, u32 value)
> +{
> +	armv8pmu_select_counter(idx);
> +	write_sysreg(value, pmxevcntr_el0);
> +}
> +
>  static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -547,16 +557,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  		 */
>  		value |= 0xffffffff00000000ULL;
>  		write_sysreg(value, pmccntr_el0);
> -	} else if (armv8pmu_select_counter(idx) == idx)
> -		write_sysreg(value, pmxevcntr_el0);
> +	} else
> +		armv8pmu_write_evcntr(idx, value);
>  }
>  
>  static inline void armv8pmu_write_evtype(int idx, u32 val)
>  {
> -	if (armv8pmu_select_counter(idx) == idx) {
> -		val &= ARMV8_PMU_EVTYPE_MASK;
> -		write_sysreg(val, pmxevtyper_el0);
> -	}
> +	armv8pmu_select_counter(idx);
> +	val &= ARMV8_PMU_EVTYPE_MASK;
> +	write_sysreg(val, pmxevtyper_el0);
>  }
>  
>  static inline int armv8pmu_enable_counter(int idx)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-06-19 10:15 ` [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
  2018-06-29 13:27   ` Mark Rutland
@ 2018-06-29 13:40   ` Mark Rutland
  2018-06-29 14:18     ` Suzuki K Poulose
  1 sibling, 1 reply; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 13:40 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:39AM +0100, Suzuki K Poulose wrote:
> The armpmu uses get_event_idx callback to allocate an event
> counter for a given event, which marks the selected counter
> as "used". Now, when we delete the counter, the arm_pmu goes
> ahead and clears the "used" bit and then invokes the "clear_event_idx"
> call back, which kind of splits the job between the core code
> and the backend. Tidy this up by relying on the clear_event_idx
> to do the book keeping, if available. Otherwise, let the core
> driver do the default "clear" bit operation. This will be useful
> for adding the chained event support, where we leave the event
> idx maintenance to the backend.
> 
> Also, when an event is removed from the PMU, reset the hw.idx
> to indicate that a counter is not allocated for this event,
> to help the backends do better checks. This will be also used
> for the chain counter support.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v2:
>  - Reset the event counter after an event is removed.
> ---
>  arch/arm/kernel/perf_event_v7.c |  2 ++
>  drivers/perf/arm_pmu.c          | 17 +++++++++++++----
>  2 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index fd7ce01..765d265 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1637,6 +1637,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool krait_event = EVENT_CPU(hwc->config_base);
>  
> +	clear_bit(hwc->idx, cpuc->used_mask);
>  	if (venum_event || krait_event) {
>  		bit = krait_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);
> @@ -1966,6 +1967,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool scorpion_event = EVENT_CPU(hwc->config_base);
>  
> +	clear_bit(hwc->idx, cpuc->used_mask);
>  	if (venum_event || scorpion_event) {
>  		bit = scorpion_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);

As an aside, I think there's an existing problem with krait and
cpu_pm_pmu_setup(), and we'll end up with the same issue when chained
counters use multiple counters for one event.

The krait code sets multiple bits in the PMU's pmu_hw_events::used_mask,
but only one of these will have a corresponding (non-NULL) entry in
pmu_hw_events::events[].

In cpu_pm_pmu_setup(), when we find the auxilliary counter associated
with an event, its bit will be set in used_mask, but the pointer will be
NULL, and that will blow up in start/stop.

We can't just set multiple slots to point at the same counter, or we'd
try to start/stop an event multiple times, which would also be bad.

I guess the best thing to do would be to avoid the test_bit(), and just
skip an idx if hw_events->events[idx] is NULL.

Would you mind spinning a patch to that effect?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 7/7] arm64: perf: Add support for chaining event counters
  2018-06-19 10:15 ` [PATCH v3 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-06-29 14:01   ` Mark Rutland
  2018-06-29 14:29     ` Suzuki K Poulose
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 14:01 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Tue, Jun 19, 2018 at 11:15:42AM +0100, Suzuki K Poulose wrote:
> Add support for 64bit event by using chained event counters
> and 64bit cycle counters.
> 
> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
> forming a 64-bit counter. The low/even counter is programmed to count
> the event of interest, and the high/odd counter is programmed to count
> the CHAIN event, taken when the low/even counter overflows.
> 
> For CPU cycles, when 64bit mode is requested, the cycle counter
> is used in 64bit mode. If the cycle counter is not available,
> falls back to chaining.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes sinec v2:
>  - Drop special allocation algorithm for chain indices
>  - Since we access the counters when the PMU is stopped,
>    get rid of the unncessary barriers.
>  - Ensure a counter is allocated when checking for chained event
> ---
>  arch/arm64/kernel/perf_event.c | 184 ++++++++++++++++++++++++++++++++++++-----
>  drivers/perf/arm_pmu.c         |  13 ++-
>  2 files changed, 169 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index eebc635..a03def7 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
>  };
>  
>  PMU_FORMAT_ATTR(event, "config:0-15");
> +PMU_FORMAT_ATTR(bits64, "config1:0");

I'm not too keen on the "bits64" name here -- it's a little unusual.
Perhaps "long"?

> +
> +static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
> +{
> +	return event->attr.config1 & 0x1;
> +}
>  
>  static struct attribute *armv8_pmuv3_format_attrs[] = {
>  	&format_attr_event.attr,
> +	&format_attr_bits64.attr,
>  	NULL,
>  };
>  
> @@ -466,6 +473,20 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
>  	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
>  
>  /*
> + * Use chained counter for a 64bit event, if we could not allocate
> + * the 64bit cycle counter. This must be called after a counter
> + * was allocated.
> + */
> +static inline bool armv8pmu_event_is_chained(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	return !WARN_ON(idx < 0) &&
> +	       armv8pmu_event_is_64bit(event) &&
> +	       (event->hw.idx != ARMV8_IDX_CYCLE_COUNTER);
> +}

It took me a moment to parse this. Perhaps:

	/*
	 * The dedicated cycle counter doesn't requrie chaining, but
	 * when this is already in use, we must chain two programmable
	 * counters to form a 64-bit cycle counter.
	 */

> +
> +/*
>   * ARMv8 low level PMU access
>   */
>  
> @@ -516,12 +537,28 @@ static inline u32 armv8pmu_read_evcntr(int idx)
>  	return read_sysreg(pmxevcntr_el0);
>  }
>  
> +static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +	u64 val = 0;
> +
> +	val = armv8pmu_read_evcntr(idx);
> +	/*
> +	 * We always read the counter with the PMU turned off.
> +	 * So we don't need special care for reading chained
> +	 * counters.
> +	 */

I think this comment can go.

> +	if (armv8pmu_event_is_chained(event))
> +		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
> +	return val;
> +}
> +
>  static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
>  	int idx = hwc->idx;
> -	u32 value = 0;
> +	u64 value = 0;
>  
>  	if (!armv8pmu_counter_valid(cpu_pmu, idx))
>  		pr_err("CPU%u reading wrong counter %d\n",
> @@ -529,7 +566,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>  		value = read_sysreg(pmccntr_el0);
>  	else
> -		value = armv8pmu_read_evcntr(idx);
> +		value = armv8pmu_read_hw_counter(event);
>  
>  	return value;
>  }
> @@ -540,6 +577,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
>  	write_sysreg(value, pmxevcntr_el0);
>  }
>  
> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
> +					     u64 value)
> +{
> +	int idx = event->hw.idx;
> +
> +	if (armv8pmu_event_is_chained(event)) {
> +		armv8pmu_write_evcntr(idx, upper_32_bits(value));
> +		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
> +	} else {
> +		armv8pmu_write_evcntr(idx, value);
> +	}
> +}
> +
>  static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -551,14 +601,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  			smp_processor_id(), idx);
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
>  		/*
> -		 * Set the upper 32bits as this is a 64bit counter but we only
> +		 * Set the upper 32bits as this is a 64bit counter, if we only
>  		 * count using the lower 32bits and we want an interrupt when
>  		 * it overflows.
>  		 */

Let's reword this as:

		/*
		 * The cycles counter is really a 64-bit counter.
		 * When treating it as a 32-bit counter, we only count
		 * the lower 32 bits, and set the upper 32-bits so that
		 * we get an interrupt upon 32-bit overflow.
		 */

> -		value |= 0xffffffff00000000ULL;
> +		if (!armv8pmu_event_is_64bit(event))
> +			value |= 0xffffffff00000000ULL;
>  		write_sysreg(value, pmccntr_el0);
>  	} else
> -		armv8pmu_write_evcntr(idx, value);
> +		armv8pmu_write_hw_counter(event, value);
>  }
>  
>  static inline void armv8pmu_write_evtype(int idx, u32 val)
> @@ -568,6 +619,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
>  	write_sysreg(val, pmxevtyper_el0);
>  }
>  
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, the low counter is programmed to count
> +	 * the event of interest and the high counter is programmed
> +	 * with CHAIN event code with filters set to count at all ELs.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
> +				ARMV8_PMU_INCLUDE_EL2;
> +
> +		armv8pmu_write_evtype(idx - 1, hwc->config_base);
> +		armv8pmu_write_evtype(idx, chain_evt);
> +	} else {
> +		armv8pmu_write_evtype(idx, hwc->config_base);
> +	}
> +}
> +
>  static inline int armv8pmu_enable_counter(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -575,6 +647,16 @@ static inline int armv8pmu_enable_counter(int idx)
>  	return idx;
>  }
>  
> +static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	armv8pmu_enable_counter(idx);
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_enable_counter(idx - 1);
> +	isb();
> +}
> +
>  static inline int armv8pmu_disable_counter(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -582,6 +664,16 @@ static inline int armv8pmu_disable_counter(int idx)
>  	return idx;
>  }
>  
> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_disable_counter(idx - 1);
> +	armv8pmu_disable_counter(idx);
> +}
> +
>  static inline int armv8pmu_enable_intens(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -589,6 +681,12 @@ static inline int armv8pmu_enable_intens(int idx)
>  	return idx;
>  }
>  
> +static inline int armv8pmu_enable_event_irq(struct perf_event *event)
> +{
> +	/* For chained events, enable the interrupt for only the high counter */
> +	return armv8pmu_enable_intens(event->hw.idx);
> +}

I think the comment can go -- we don't have something corresponding in
the disable path.

> +
>  static inline int armv8pmu_disable_intens(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -601,6 +699,11 @@ static inline int armv8pmu_disable_intens(int idx)
>  	return idx;
>  }
>  
> +static inline int armv8pmu_disable_event_irq(struct perf_event *event)
> +{
> +	return armv8pmu_disable_intens(event->hw.idx);
> +}
> +
>  static inline u32 armv8pmu_getreset_flags(void)
>  {
>  	u32 value;
> @@ -618,10 +721,8 @@ static inline u32 armv8pmu_getreset_flags(void)
>  static void armv8pmu_enable_event(struct perf_event *event)
>  {
>  	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>  
>  	/*
>  	 * Enable counter and interrupt, and set the counter to count
> @@ -632,22 +733,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
>  	/*
>  	 * Disable counter
>  	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>  
>  	/*
>  	 * Set event (if destined for PMNx counters).
>  	 */
> -	armv8pmu_write_evtype(idx, hwc->config_base);
> +	armv8pmu_write_event_type(event);
>  
>  	/*
>  	 * Enable interrupt for this counter
>  	 */
> -	armv8pmu_enable_intens(idx);
> +	armv8pmu_enable_event_irq(event);
>  
>  	/*
>  	 * Enable counter
>  	 */
> -	armv8pmu_enable_counter(idx);
> +	armv8pmu_enable_event_counter(event);
>  
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
> @@ -655,10 +756,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
>  static void armv8pmu_disable_event(struct perf_event *event)
>  {
>  	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>  
>  	/*
>  	 * Disable counter and interrupt
> @@ -668,12 +767,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
>  	/*
>  	 * Disable counter
>  	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>  
>  	/*
>  	 * Disable interrupt for this counter
>  	 */
> -	armv8pmu_disable_intens(idx);
> +	armv8pmu_disable_event_irq(event);
>  
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
> @@ -767,6 +866,37 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  	return IRQ_HANDLED;
>  }
>  
> +static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> +				    struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++)
> +		if (!test_and_set_bit(idx, cpuc->used_mask))
> +			return idx;

Please add braces to the for loop, given the loop body covers multiple
lines.

> +	return -EAGAIN;
> +}
> +
> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> +				   struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	/*
> +	 * Chaining requires two consecutive event counters, where
> +	 * the lower idx must be even.
> +	 */
> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2)
> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
> +			/* Check if the preceding even counter is available */
> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> +				return idx;
> +			/* Release the Odd counter */
> +			clear_bit(idx, cpuc->used_mask);
> +		}

Please add braces to the for loop, given the loop body covers multiple
lines.

> +	return -EAGAIN;
> +}
> +
>  static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  				  struct perf_event *event)
>  {
> @@ -784,13 +914,21 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	/*
>  	 * Otherwise use events counters
>  	 */
> -	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
> -		if (!test_and_set_bit(idx, cpuc->used_mask))
> -			return idx;
> -	}
> +	idx = armv8pmu_event_is_64bit(event) ?
> +		armv8pmu_get_chain_idx(cpuc, cpu_pmu) :
> +		armv8pmu_get_single_idx(cpuc, cpu_pmu);

Please use an if-else rather than a ternary here. You can return
directly in either case.

>  
> -	/* The counters are all in use. */
> -	return -EAGAIN;
> +	return idx;
> +}
> +
> +static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	clear_bit(idx, cpuc->used_mask);
> +	if (armv8pmu_event_is_chained(event))
> +		clear_bit(idx - 1, cpuc->used_mask);
>  }
>  
>  /*
> @@ -865,6 +1003,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>  				       &armv8_pmuv3_perf_cache_map,
>  				       ARMV8_PMU_EVTYPE_EVENT);
>  
> +	if (armv8pmu_event_is_64bit(event))
> +		event->hw.flags |= ARMPMU_EVT_64BIT;
> +
>  	/* Onl expose micro/arch events supported by this PMU */
>  	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
>  	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
> @@ -971,6 +1112,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->read_counter		= armv8pmu_read_counter,
>  	cpu_pmu->write_counter		= armv8pmu_write_counter,
>  	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
> +	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
>  	cpu_pmu->start			= armv8pmu_start,
>  	cpu_pmu->stop			= armv8pmu_stop,
>  	cpu_pmu->reset			= armv8pmu_reset,
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 6e10e8c..a4675e4 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -674,14 +674,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>  	int idx;
>  
>  	for (idx = 0; idx < armpmu->num_events; idx++) {
> -		/*
> -		 * If the counter is not used skip it, there is no
> -		 * need of stopping/restarting it.
> -		 */
> -		if (!test_bit(idx, hw_events->used_mask))
> -			continue;
> -
>  		event = hw_events->events[idx];
> +		/*
> +		 * If there is no event at this idx (e.g, an idx used
> +		 * by a chained event in Arm v8 PMUv3), skip it.
> +		 */
> +		if (!event)
> +			continue;
>  
>  		switch (cmd) {
>  		case CPU_PM_ENTER:

Please make this change an individual patch earlier in the series. It's
a bug fix needed for krait, too.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-06-29 13:40   ` Mark Rutland
@ 2018-06-29 14:18     ` Suzuki K Poulose
  2018-06-29 14:29       ` Mark Rutland
  0 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-29 14:18 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry


Hi Mark,

On 29/06/18 14:40, Mark Rutland wrote:
> On Tue, Jun 19, 2018 at 11:15:39AM +0100, Suzuki K Poulose wrote:
>> The armpmu uses get_event_idx callback to allocate an event
>> counter for a given event, which marks the selected counter
>> as "used". Now, when we delete the counter, the arm_pmu goes
>> ahead and clears the "used" bit and then invokes the "clear_event_idx"
>> call back, which kind of splits the job between the core code
>> and the backend. Tidy this up by relying on the clear_event_idx
>> to do the book keeping, if available. Otherwise, let the core
>> driver do the default "clear" bit operation. This will be useful
>> for adding the chained event support, where we leave the event
>> idx maintenance to the backend.
>>
>> Also, when an event is removed from the PMU, reset the hw.idx
>> to indicate that a counter is not allocated for this event,
>> to help the backends do better checks. This will be also used
>> for the chain counter support.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v2:
>>   - Reset the event counter after an event is removed.
>> ---
>>   arch/arm/kernel/perf_event_v7.c |  2 ++
>>   drivers/perf/arm_pmu.c          | 17 +++++++++++++----
>>   2 files changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
>> index fd7ce01..765d265 100644
>> --- a/arch/arm/kernel/perf_event_v7.c
>> +++ b/arch/arm/kernel/perf_event_v7.c
>> @@ -1637,6 +1637,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>>   	bool venum_event = EVENT_VENUM(hwc->config_base);
>>   	bool krait_event = EVENT_CPU(hwc->config_base);
>>   
>> +	clear_bit(hwc->idx, cpuc->used_mask);
>>   	if (venum_event || krait_event) {
>>   		bit = krait_event_to_bit(event, region, group);
>>   		clear_bit(bit, cpuc->used_mask);
>> @@ -1966,6 +1967,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>>   	bool venum_event = EVENT_VENUM(hwc->config_base);
>>   	bool scorpion_event = EVENT_CPU(hwc->config_base);
>>   
>> +	clear_bit(hwc->idx, cpuc->used_mask);
>>   	if (venum_event || scorpion_event) {
>>   		bit = scorpion_event_to_bit(event, region, group);
>>   		clear_bit(bit, cpuc->used_mask);
> 
> As an aside, I think there's an existing problem with krait and
> cpu_pm_pmu_setup(), and we'll end up with the same issue when chained
> counters use multiple counters for one event.
> 
> The krait code sets multiple bits in the PMU's pmu_hw_events::used_mask,
> but only one of these will have a corresponding (non-NULL) entry in
> pmu_hw_events::events[].

The Krait pmu allocates the "krait" specific event bit beyond the armpmu->num_events.
See krait_event_to_bit(). And we don't go beyond armpmu->num_events for the "events"
check. So we should be safe. And it passes on the idx within the num_events back
to armpmu. So whatever krait pmu does is invisible to the generic driver.

> 
> In cpu_pm_pmu_setup(), when we find the auxilliary counter associated
> with an event, its bit will be set in used_mask, but the pointer will be
> NULL, and that will blow up in start/stop.

> 
> We can't just set multiple slots to point at the same counter, or we'd
> try to start/stop an event multiple times, which would also be bad.

No, we don't. Since we cap the loops at armpmu->num_events.

> 
> I guess the best thing to do would be to avoid the test_bit(), and just
> skip an idx if hw_events->events[idx] is NULL.
> 
> Would you mind spinning a patch to that effect?

Eitherway, I could split that change to a new one.


Cheers
Suzuki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 7/7] arm64: perf: Add support for chaining event counters
  2018-06-29 14:01   ` Mark Rutland
@ 2018-06-29 14:29     ` Suzuki K Poulose
  2018-06-29 14:39       ` Mark Rutland
  0 siblings, 1 reply; 20+ messages in thread
From: Suzuki K Poulose @ 2018-06-29 14:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On 29/06/18 15:01, Mark Rutland wrote:
> On Tue, Jun 19, 2018 at 11:15:42AM +0100, Suzuki K Poulose wrote:
>> Add support for 64bit event by using chained event counters
>> and 64bit cycle counters.
>>
>> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
>> forming a 64-bit counter. The low/even counter is programmed to count
>> the event of interest, and the high/odd counter is programmed to count
>> the CHAIN event, taken when the low/even counter overflows.
>>
>> For CPU cycles, when 64bit mode is requested, the cycle counter
>> is used in 64bit mode. If the cycle counter is not available,
>> falls back to chaining.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes sinec v2:
>>   - Drop special allocation algorithm for chain indices
>>   - Since we access the counters when the PMU is stopped,
>>     get rid of the unncessary barriers.
>>   - Ensure a counter is allocated when checking for chained event
>> ---
>>   arch/arm64/kernel/perf_event.c | 184 ++++++++++++++++++++++++++++++++++++-----
>>   drivers/perf/arm_pmu.c         |  13 ++-
>>   2 files changed, 169 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index eebc635..a03def7 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
>>   };
>>   
>>   PMU_FORMAT_ATTR(event, "config:0-15");
>> +PMU_FORMAT_ATTR(bits64, "config1:0");
> 
> I'm not too keen on the "bits64" name here -- it's a little unusual.
> Perhaps "long"?

I wasn't either. The other option was _64bit, but it is easier to misspell.
The only reason for not sticking to "long" is, that gives a perception that
the user always get "double" the normal counter width, which may break
if we ever get 64bit PMU counters by default without chaining.

> 
>> +
>> +static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
>> +{
>> +	return event->attr.config1 & 0x1;
>> +}
>>   
>>   static struct attribute *armv8_pmuv3_format_attrs[] = {
>>   	&format_attr_event.attr,
>> +	&format_attr_bits64.attr,
>>   	NULL,
>>   };
>>   
>> @@ -466,6 +473,20 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
>>   	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
>>   
>>   /*
>> + * Use chained counter for a 64bit event, if we could not allocate
>> + * the 64bit cycle counter. This must be called after a counter
>> + * was allocated.
>> + */
>> +static inline bool armv8pmu_event_is_chained(struct perf_event *event)
>> +{
>> +	int idx = event->hw.idx;
>> +
>> +	return !WARN_ON(idx < 0) &&
>> +	       armv8pmu_event_is_64bit(event) &&
>> +	       (event->hw.idx != ARMV8_IDX_CYCLE_COUNTER);
>> +}
> 
> It took me a moment to parse this. Perhaps:

Yes, it does look a bit weird.

> 
> 	/*
> 	 * The dedicated cycle counter doesn't requrie chaining, but
> 	 * when this is already in use, we must chain two programmable
> 	 * counters to form a 64-bit cycle counter.
> 	 */

That sounds like, we check only for cycle events, which is not true, how
about :

	/*
	 * We must chain two programmable counters for 64 bit events,
	 * except when we have allocated the 64bit cycle counter (for CPU
	 *cycles event).
	 */


> 
>> +
>> +/*
>>    * ARMv8 low level PMU access
>>    */
>>   
>> @@ -516,12 +537,28 @@ static inline u32 armv8pmu_read_evcntr(int idx)
>>   	return read_sysreg(pmxevcntr_el0);
>>   }
>>   
>> +static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
>> +{
>> +	int idx = event->hw.idx;
>> +	u64 val = 0;
>> +
>> +	val = armv8pmu_read_evcntr(idx);
>> +	/*
>> +	 * We always read the counter with the PMU turned off.
>> +	 * So we don't need special care for reading chained
>> +	 * counters.
>> +	 */
> 
> I think this comment can go.

OK

>> +	if (armv8pmu_event_is_chained(event))
>> +		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
>> +	return val;
>> +}
>> +
>>   static inline u64 armv8pmu_read_counter(struct perf_event *event)
>>   {
>>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>>   	struct hw_perf_event *hwc = &event->hw;
>>   	int idx = hwc->idx;
>> -	u32 value = 0;
>> +	u64 value = 0;
>>   
>>   	if (!armv8pmu_counter_valid(cpu_pmu, idx))
>>   		pr_err("CPU%u reading wrong counter %d\n",
>> @@ -529,7 +566,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>>   		value = read_sysreg(pmccntr_el0);
>>   	else
>> -		value = armv8pmu_read_evcntr(idx);
>> +		value = armv8pmu_read_hw_counter(event);
>>   
>>   	return value;
>>   }
>> @@ -540,6 +577,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
>>   	write_sysreg(value, pmxevcntr_el0);
>>   }
>>   
>> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
>> +					     u64 value)
>> +{
>> +	int idx = event->hw.idx;
>> +
>> +	if (armv8pmu_event_is_chained(event)) {
>> +		armv8pmu_write_evcntr(idx, upper_32_bits(value));
>> +		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
>> +	} else {
>> +		armv8pmu_write_evcntr(idx, value);
>> +	}
>> +}
>> +
>>   static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>   {
>>   	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>> @@ -551,14 +601,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>   			smp_processor_id(), idx);
>>   	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
>>   		/*
>> -		 * Set the upper 32bits as this is a 64bit counter but we only
>> +		 * Set the upper 32bits as this is a 64bit counter, if we only
>>   		 * count using the lower 32bits and we want an interrupt when
>>   		 * it overflows.
>>   		 */
> 
> Let's reword this as:
> 
> 		/*
> 		 * The cycles counter is really a 64-bit counter.
> 		 * When treating it as a 32-bit counter, we only count
> 		 * the lower 32 bits, and set the upper 32-bits so that
> 		 * we get an interrupt upon 32-bit overflow.
> 		 */
> 

OK

...

>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 6e10e8c..a4675e4 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -674,14 +674,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>>   	int idx;
>>   
>>   	for (idx = 0; idx < armpmu->num_events; idx++) {
>> -		/*
>> -		 * If the counter is not used skip it, there is no
>> -		 * need of stopping/restarting it.
>> -		 */
>> -		if (!test_bit(idx, hw_events->used_mask))
>> -			continue;
>> -
>>   		event = hw_events->events[idx];
>> +		/*
>> +		 * If there is no event at this idx (e.g, an idx used
>> +		 * by a chained event in Arm v8 PMUv3), skip it.
>> +		 */
>> +		if (!event)
>> +			continue;
>>   
>>   		switch (cmd) {
>>   		case CPU_PM_ENTER:
> 
> Please make this change an individual patch earlier in the series. It's

I can do that. But I don't think the krait needs this, as explained in the
other patch.

> a bug fix needed for krait, too.

Thanks for the review.

Suzuki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-06-29 14:18     ` Suzuki K Poulose
@ 2018-06-29 14:29       ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 14:29 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Fri, Jun 29, 2018 at 03:18:23PM +0100, Suzuki K Poulose wrote:
> 
> Hi Mark,
> 
> On 29/06/18 14:40, Mark Rutland wrote:
> > On Tue, Jun 19, 2018 at 11:15:39AM +0100, Suzuki K Poulose wrote:
> > > The armpmu uses get_event_idx callback to allocate an event
> > > counter for a given event, which marks the selected counter
> > > as "used". Now, when we delete the counter, the arm_pmu goes
> > > ahead and clears the "used" bit and then invokes the "clear_event_idx"
> > > call back, which kind of splits the job between the core code
> > > and the backend. Tidy this up by relying on the clear_event_idx
> > > to do the book keeping, if available. Otherwise, let the core
> > > driver do the default "clear" bit operation. This will be useful
> > > for adding the chained event support, where we leave the event
> > > idx maintenance to the backend.
> > > 
> > > Also, when an event is removed from the PMU, reset the hw.idx
> > > to indicate that a counter is not allocated for this event,
> > > to help the backends do better checks. This will be also used
> > > for the chain counter support.
> > > 
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > > Changes since v2:
> > >   - Reset the event counter after an event is removed.
> > > ---
> > >   arch/arm/kernel/perf_event_v7.c |  2 ++
> > >   drivers/perf/arm_pmu.c          | 17 +++++++++++++----
> > >   2 files changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> > > index fd7ce01..765d265 100644
> > > --- a/arch/arm/kernel/perf_event_v7.c
> > > +++ b/arch/arm/kernel/perf_event_v7.c
> > > @@ -1637,6 +1637,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> > >   	bool venum_event = EVENT_VENUM(hwc->config_base);
> > >   	bool krait_event = EVENT_CPU(hwc->config_base);
> > > +	clear_bit(hwc->idx, cpuc->used_mask);
> > >   	if (venum_event || krait_event) {
> > >   		bit = krait_event_to_bit(event, region, group);
> > >   		clear_bit(bit, cpuc->used_mask);
> > > @@ -1966,6 +1967,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> > >   	bool venum_event = EVENT_VENUM(hwc->config_base);
> > >   	bool scorpion_event = EVENT_CPU(hwc->config_base);
> > > +	clear_bit(hwc->idx, cpuc->used_mask);
> > >   	if (venum_event || scorpion_event) {
> > >   		bit = scorpion_event_to_bit(event, region, group);
> > >   		clear_bit(bit, cpuc->used_mask);
> > 
> > As an aside, I think there's an existing problem with krait and
> > cpu_pm_pmu_setup(), and we'll end up with the same issue when chained
> > counters use multiple counters for one event.
> > 
> > The krait code sets multiple bits in the PMU's pmu_hw_events::used_mask,
> > but only one of these will have a corresponding (non-NULL) entry in
> > pmu_hw_events::events[].
> 
> The Krait pmu allocates the "krait" specific event bit beyond the armpmu->num_events.
> See krait_event_to_bit(). And we don't go beyond armpmu->num_events for the "events"
> check. So we should be safe. And it passes on the idx within the num_events back
> to armpmu. So whatever krait pmu does is invisible to the generic driver.

Ah; I had missed that, sorry for the noise.

> > I guess the best thing to do would be to avoid the test_bit(), and just
> > skip an idx if hw_events->events[idx] is NULL.
> > 
> > Would you mind spinning a patch to that effect?
> 
> Eitherway, I could split that change to a new one.

No need -- sorry for the noise.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 7/7] arm64: perf: Add support for chaining event counters
  2018-06-29 14:29     ` Suzuki K Poulose
@ 2018-06-29 14:39       ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2018-06-29 14:39 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, robin.murphy,
	julien.thierry

On Fri, Jun 29, 2018 at 03:29:12PM +0100, Suzuki K Poulose wrote:
> On 29/06/18 15:01, Mark Rutland wrote:
> > On Tue, Jun 19, 2018 at 11:15:42AM +0100, Suzuki K Poulose wrote:
> > > Add support for 64bit event by using chained event counters
> > > and 64bit cycle counters.
> > > 
> > > PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
> > > forming a 64-bit counter. The low/even counter is programmed to count
> > > the event of interest, and the high/odd counter is programmed to count
> > > the CHAIN event, taken when the low/even counter overflows.
> > > 
> > > For CPU cycles, when 64bit mode is requested, the cycle counter
> > > is used in 64bit mode. If the cycle counter is not available,
> > > falls back to chaining.
> > > 
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > > Changes sinec v2:
> > >   - Drop special allocation algorithm for chain indices
> > >   - Since we access the counters when the PMU is stopped,
> > >     get rid of the unncessary barriers.
> > >   - Ensure a counter is allocated when checking for chained event
> > > ---
> > >   arch/arm64/kernel/perf_event.c | 184 ++++++++++++++++++++++++++++++++++++-----
> > >   drivers/perf/arm_pmu.c         |  13 ++-
> > >   2 files changed, 169 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > > index eebc635..a03def7 100644
> > > --- a/arch/arm64/kernel/perf_event.c
> > > +++ b/arch/arm64/kernel/perf_event.c
> > > @@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
> > >   };
> > >   PMU_FORMAT_ATTR(event, "config:0-15");
> > > +PMU_FORMAT_ATTR(bits64, "config1:0");
> > 
> > I'm not too keen on the "bits64" name here -- it's a little unusual.
> > Perhaps "long"?
> 
> I wasn't either. The other option was _64bit, but it is easier to misspell.
> The only reason for not sticking to "long" is, that gives a perception that
> the user always get "double" the normal counter width, which may break
> if we ever get 64bit PMU counters by default without chaining.

I'm happy to take that risk with "long".

We can probably document that under Documentation/perf if we're
particularly worried.

[...]

> > > @@ -466,6 +473,20 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
> > >   	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
> > >   /*
> > > + * Use chained counter for a 64bit event, if we could not allocate
> > > + * the 64bit cycle counter. This must be called after a counter
> > > + * was allocated.
> > > + */
> > > +static inline bool armv8pmu_event_is_chained(struct perf_event *event)
> > > +{
> > > +	int idx = event->hw.idx;
> > > +
> > > +	return !WARN_ON(idx < 0) &&
> > > +	       armv8pmu_event_is_64bit(event) &&
> > > +	       (event->hw.idx != ARMV8_IDX_CYCLE_COUNTER);
> > > +}
> > 
> > It took me a moment to parse this. Perhaps:
> 
> Yes, it does look a bit weird.
> 
> > 
> > 	/*
> > 	 * The dedicated cycle counter doesn't requrie chaining, but
> > 	 * when this is already in use, we must chain two programmable
> > 	 * counters to form a 64-bit cycle counter.
> > 	 */
> 
> That sounds like, we check only for cycle events, which is not true, how
> about :
> 
> 	/*
> 	 * We must chain two programmable counters for 64 bit events,
> 	 * except when we have allocated the 64bit cycle counter (for CPU
> 	 *cycles event).
> 	 */

That sounds much better, yes please!

> > > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > > index 6e10e8c..a4675e4 100644
> > > --- a/drivers/perf/arm_pmu.c
> > > +++ b/drivers/perf/arm_pmu.c
> > > @@ -674,14 +674,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
> > >   	int idx;
> > >   	for (idx = 0; idx < armpmu->num_events; idx++) {
> > > -		/*
> > > -		 * If the counter is not used skip it, there is no
> > > -		 * need of stopping/restarting it.
> > > -		 */
> > > -		if (!test_bit(idx, hw_events->used_mask))
> > > -			continue;
> > > -
> > >   		event = hw_events->events[idx];
> > > +		/*
> > > +		 * If there is no event at this idx (e.g, an idx used
> > > +		 * by a chained event in Arm v8 PMUv3), skip it.
> > > +		 */
> > > +		if (!event)
> > > +			continue;
> > >   		switch (cmd) {
> > >   		case CPU_PM_ENTER:
> > 
> > Please make this change an individual patch earlier in the series. It's
> 
> I can do that. But I don't think the krait needs this, as explained in the
> other patch.

No need. I'm fine with it as-is; sorry for the noise!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-06-29 14:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-19 10:15 [PATCH v3 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
2018-06-19 10:15 ` [PATCH v3 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
2018-06-19 10:45   ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
2018-06-19 10:52   ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
2018-06-19 10:57   ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
2018-06-29 13:27   ` Mark Rutland
2018-06-29 13:40   ` Mark Rutland
2018-06-29 14:18     ` Suzuki K Poulose
2018-06-29 14:29       ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
2018-06-29 13:29   ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
2018-06-19 10:43   ` Mark Rutland
2018-06-19 10:15 ` [PATCH v3 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
2018-06-29 14:01   ` Mark Rutland
2018-06-29 14:29     ` Suzuki K Poulose
2018-06-29 14:39       ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).