linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] arm64: perf: Support for chained counters
@ 2018-07-10  8:57 Suzuki K Poulose
  2018-07-10  8:57 ` [PATCH v5 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose


This series adds support for counting PMU events using 64bit counters
for arm64 PMU.

The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(low even and hig odd counters) to count a given "event" in 64bit mode.
This series adds the support for 64bit events in the core arm_pmu driver
infrastructure and adds the support for armv8 64bit kernel PMU to use
chained counters to count in 64bit mode. For CPU cycles, we use the cycle
counter in 64bit mode, only when requested. If the cycle counter is not
available, we fall back to chaining the counters.

Tested on Juno, Fast models. Applies on 4.18-rc4

Change since v4:
 - Add missing clear_event_idx call back for xscale PMU
 - Drop comment in cpu_pm_pmu_setup()

Changes since v3:
 - Rename format string "bits64" => "long"
 - Add clear_event_idx call back for all PMU backends
 - Address style comments
 - Add Acked-by's

Changes since v2:
 - Clean up select counter routine. (Mark Rutland)
 - Stop PMU while processing overflows (Mark Rutland)
 - Drop special allocation algorithm for chain indices
 - Since we access the counters when the PMU is stopped,
   get rid of the unncessary barriers.
 - Ensure a counter is allocated when checking for chained event

Change since v1:
 - Remove unnecessary isb()s in chain counter reads/writes
 - Fix event programming order for counters
 - Tighten chain counter event read sequence
 - Set chain event to count in all ELs
 - Cleanup helpers to be consistent
 - Fix build break on xcale PMU (reported by kbuild-robot)
 - Remove the explicit counter width field from pmu backends and default
   to 32bit.
 - Rename flag ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT and
   the format string "chain" => "bits64". (Unfortunately we can't use "64bit"
   and I am open for suggestion on a better name)
 - Rename armpmu_get_event_max_period() => armpmu_event_max_period()
 - For 64bit CPU cycles events, allow chaining if cycle counter is
   not available.


Suzuki K Poulose (7):
  arm_pmu: Clean up maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for 64bit event counters
  arm_pmu: Tidy up clear_event_idx call backs
  arm64: perf: Clean up armv8pmu_select_counter
  arm64: perf: Disable PMU while processing counter overflows
  arm64: perf: Add support for chaining event counters

 arch/arm/kernel/perf_event_v6.c     |  14 +-
 arch/arm/kernel/perf_event_v7.c     |  14 +-
 arch/arm/kernel/perf_event_xscale.c |  18 ++-
 arch/arm64/kernel/perf_event.c      | 251 +++++++++++++++++++++++++++++-------
 drivers/perf/arm_pmu.c              |  38 +++---
 include/linux/perf/arm_pmu.h        |  11 +-
 6 files changed, 264 insertions(+), 82 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/7] arm_pmu: Clean up maximum period handling
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
@ 2018-07-10  8:57 ` Suzuki K Poulose
  2018-07-10  8:57 ` [PATCH v5 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.

No functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     |  2 --
 arch/arm/kernel/perf_event_v7.c     |  1 -
 arch/arm/kernel/perf_event_xscale.c |  2 --
 arch/arm64/kernel/perf_event.c      |  1 -
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  1 -
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index be42c4f..f64a6bf 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -495,7 +495,6 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -546,7 +545,6 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 57f01e0..ecca4cd 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1170,7 +1170,6 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 88d1a76..c4f0294 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -374,7 +374,6 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
@@ -743,7 +742,6 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aa..678ecff 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -960,7 +960,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a6347d4..6ddc00d 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(void)
+{
+	return (1ULL << 32) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period();
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period();
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period() >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index ad54444..12c30a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,6 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/7] arm_pmu: Change API to support 64bit counter values
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-07-10  8:57 ` [PATCH v5 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
@ 2018-07-10  8:57 ` Suzuki K Poulose
  2018-07-10  8:58 ` [PATCH v5 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters. The backends still
use 32bit values and we pass them 32bit values only. So in effect
there are no functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 - No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 8 ++++----
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index f64a6bf..0729f98 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index ecca4cd..fd7ce01 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index c4f0294..942230f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -316,7 +316,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -337,7 +337,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -678,7 +678,7 @@ static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale2pmu_read_counter(struct perf_event *event)
+static inline u64 xscale2pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -705,7 +705,7 @@ static inline u32 xscale2pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale2pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale2pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 678ecff..66a2ffd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 12c30a2..f7126a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 3/7] arm_pmu: Add support for 64bit event counters
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-07-10  8:57 ` [PATCH v5 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
  2018-07-10  8:57 ` [PATCH v5 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
@ 2018-07-10  8:58 ` Suzuki K Poulose
  2018-07-10  8:58 ` [PATCH v5 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Each PMU has a set of 32bit event counters. But in some
special cases, the events could be counted using counters
which are effectively 64bit wide.

e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cycles. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses 64bit counters.
This only affects the periods set for each counter in the core
driver.

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v3:
 - Replace ternary operator with if..else for max_period
 - Added Acked-by from Mark R
---
 drivers/perf/arm_pmu.c       | 16 ++++++++++------
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 6ddc00d..8cad6b5 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,9 +28,12 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
-static inline u64 arm_pmu_max_period(void)
+static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
-	return (1ULL << 32) - 1;
+	if (event->hw.flags & ARMPMU_EVT_64BIT)
+		return GENMASK_ULL(63, 0);
+	else
+		return GENMASK_ULL(31, 0);
 }
 
 static int
@@ -122,7 +125,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period();
+	max_period = arm_pmu_event_max_period(event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +151,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +163,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period();
+	u64 max_period = arm_pmu_event_max_period(event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +371,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -410,7 +414,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = arm_pmu_max_period() >> 1;
+		hwc->sample_period  = arm_pmu_event_max_period(event) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f7126a2..10f92e1 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a 64bit counter */
+#define ARMPMU_EVT_64BIT		1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2018-07-10  8:58 ` [PATCH v5 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-07-10  8:58 ` Suzuki K Poulose
  2018-07-10  8:58 ` [PATCH v5 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. To keep things tidy, mandate the implementation
of clear_event_idx() and add it for exisiting backends.
This will be useful for adding the chained event support, where
we leave the event idx maintenance to the backend.

Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v4:
 - Add missing clear_event_idx call back for xscale PMU
---
 arch/arm/kernel/perf_event_v6.c     | 8 ++++++++
 arch/arm/kernel/perf_event_v7.c     | 9 +++++++++
 arch/arm/kernel/perf_event_xscale.c | 8 ++++++++
 arch/arm64/kernel/perf_event.c      | 7 +++++++
 drivers/perf/arm_pmu.c              | 7 +++----
 5 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 0729f98..1ae99de 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -411,6 +411,12 @@ armv6pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	}
 }
 
+static void armv6pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 static void armv6pmu_disable_event(struct perf_event *event)
 {
 	unsigned long val, mask, evt, flags;
@@ -491,6 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv6pmu_read_counter;
 	cpu_pmu->write_counter	= armv6pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
@@ -541,6 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv6pmu_read_counter;
 	cpu_pmu->write_counter	= armv6pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index fd7ce01..acb49a9 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1058,6 +1058,12 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
+static void armv7pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 /*
  * Add an event filter to a given event. This will only work for PMUv2 PMUs.
  */
@@ -1167,6 +1173,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv7pmu_read_counter;
 	cpu_pmu->write_counter	= armv7pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv7pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv7pmu_clear_event_idx;
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
@@ -1637,6 +1644,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	armv7pmu_clear_event_idx(cpuc, event);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1966,6 +1974,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	armv7pmu_clear_event_idx(cpuc, event);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 942230f..f6cdcac 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -292,6 +292,12 @@ xscale1pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	}
 }
 
+static void xscalepmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 static void xscale1pmu_start(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
@@ -370,6 +376,7 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= xscale1pmu_read_counter;
 	cpu_pmu->write_counter	= xscale1pmu_write_counter;
 	cpu_pmu->get_event_idx	= xscale1pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = xscalepmu_clear_event_idx;
 	cpu_pmu->start		= xscale1pmu_start;
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
@@ -738,6 +745,7 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= xscale2pmu_read_counter;
 	cpu_pmu->write_counter	= xscale2pmu_write_counter;
 	cpu_pmu->get_event_idx	= xscale2pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = xscalepmu_clear_event_idx;
 	cpu_pmu->start		= xscale2pmu_start;
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 66a2ffd..ac66851 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -778,6 +778,12 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				  struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 /*
  * Add an event filter to a given event. This will only work for PMUv2 PMUs.
  */
@@ -956,6 +962,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8cad6b5..a288810 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -238,11 +238,10 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu->clear_event_idx(hw_events, event);
 	perf_event_update_userpage(event);
+	/* Clear the allocated counter */
+	hwc->idx = -1;
 }
 
 static int
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 5/7] arm64: perf: Clean up armv8pmu_select_counter
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2018-07-10  8:58 ` [PATCH v5 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-07-10  8:58 ` Suzuki K Poulose
  2018-07-10  8:58 ` [PATCH v5 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index ac66851..bc014af 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -503,13 +503,17 @@ static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
 	return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
 }
 
-static inline int armv8pmu_select_counter(int idx)
+static inline void armv8pmu_select_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
 	write_sysreg(counter, pmselr_el0);
 	isb();
+}
 
-	return idx;
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	armv8pmu_select_counter(idx);
+	return read_sysreg(pmxevcntr_el0);
 }
 
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
@@ -524,12 +528,18 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_evcntr(idx);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	armv8pmu_select_counter(idx);
+	write_sysreg(value, pmxevcntr_el0);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -547,16 +557,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 		 */
 		value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	} else
+		armv8pmu_write_evcntr(idx, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
 {
-	if (armv8pmu_select_counter(idx) == idx) {
-		val &= ARMV8_PMU_EVTYPE_MASK;
-		write_sysreg(val, pmxevtyper_el0);
-	}
+	armv8pmu_select_counter(idx);
+	val &= ARMV8_PMU_EVTYPE_MASK;
+	write_sysreg(val, pmxevtyper_el0);
 }
 
 static inline int armv8pmu_enable_counter(int idx)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 6/7] arm64: perf: Disable PMU while processing counter overflows
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2018-07-10  8:58 ` [PATCH v5 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
@ 2018-07-10  8:58 ` Suzuki K Poulose
  2018-07-10  8:58 ` [PATCH v5 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  2018-07-10 14:31 ` [PATCH v5 0/7] arm64: perf: Support for chained counters Will Deacon
  7 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.

This patch also moves the PMU stop/start routines to avoid a
forward declaration.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 50 +++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index bc014af..b414d81 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -678,6 +678,28 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static void armv8pmu_start(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Enable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Disable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
 static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 {
 	u32 pmovsr;
@@ -702,6 +724,11 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	 */
 	regs = get_irq_regs();
 
+	/*
+	 * Stop the PMU while processing the counter overflows
+	 * to prevent skews in group events.
+	 */
+	armv8pmu_stop(cpu_pmu);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
 		struct hw_perf_event *hwc;
@@ -726,6 +753,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (perf_event_overflow(event, &data, regs))
 			cpu_pmu->disable(event);
 	}
+	armv8pmu_start(cpu_pmu);
 
 	/*
 	 * Handle the pending perf events.
@@ -739,28 +767,6 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
-static void armv8pmu_start(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Enable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
-static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Disable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 7/7] arm64: perf: Add support for chaining event counters
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2018-07-10  8:58 ` [PATCH v5 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
@ 2018-07-10  8:58 ` Suzuki K Poulose
  2018-07-10 17:18   ` Mark Rutland
  2018-07-10 14:31 ` [PATCH v5 0/7] arm64: perf: Support for chained counters Will Deacon
  7 siblings, 1 reply; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10  8:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Add support for 64bit event by using chained event counters
and 64bit cycle counters.

PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.

For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v4:
 - Drop comment in cpu_pm_pmu_setup
Changes since v3:
 - Rename format name from "bits64 => long"
 - Address other comments on style.
---
 arch/arm64/kernel/perf_event.c | 185 +++++++++++++++++++++++++++++++++++------
 drivers/perf/arm_pmu.c         |   9 +-
 2 files changed, 160 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index b414d81..dfff5ed 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(long, "config1:0");
+
+static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_long.attr,
 	NULL,
 };
 
@@ -466,6 +473,21 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
 /*
+ * We must chain two programmable counters for 64 bit events,
+ * except when we have allocated the 64bit cycle counter (for CPU
+ * cycles event). This must be called only when the event has
+ * a counter allocated.
+ */
+static inline bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return !WARN_ON(idx < 0) &&
+	       armv8pmu_event_is_64bit(event) &&
+	       (idx != ARMV8_IDX_CYCLE_COUNTER);
+}
+
+/*
  * ARMv8 low level PMU access
  */
 
@@ -516,12 +538,23 @@ static inline u32 armv8pmu_read_evcntr(int idx)
 	return read_sysreg(pmxevcntr_el0);
 }
 
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = 0;
+
+	val = armv8pmu_read_evcntr(idx);
+	if (armv8pmu_event_is_chained(event))
+		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
+	return val;
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
@@ -529,7 +562,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
 	else
-		value = armv8pmu_read_evcntr(idx);
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
@@ -540,6 +573,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
 	write_sysreg(value, pmxevcntr_el0);
 }
 
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_write_evcntr(idx, upper_32_bits(value));
+		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
+	} else {
+		armv8pmu_write_evcntr(idx, value);
+	}
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -551,14 +597,16 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
 		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
-		 * count using the lower 32bits and we want an interrupt when
-		 * it overflows.
+		 * The cycles counter is really a 64-bit counter.
+		 * When treating it as a 32-bit counter, we only count
+		 * the lower 32 bits, and set the upper 32-bits so that
+		 * we get an interrupt upon 32-bit overflow.
 		 */
-		value |= 0xffffffff00000000ULL;
+		if (!armv8pmu_event_is_64bit(event))
+			value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
 	} else
-		armv8pmu_write_evcntr(idx, value);
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -568,6 +616,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	write_sysreg(val, pmxevtyper_el0);
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, the low counter is programmed to count
+	 * the event of interest and the high counter is programmed
+	 * with CHAIN event code with filters set to count at all ELs.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
+				ARMV8_PMU_INCLUDE_EL2;
+
+		armv8pmu_write_evtype(idx - 1, hwc->config_base);
+		armv8pmu_write_evtype(idx, chain_evt);
+	} else {
+		armv8pmu_write_evtype(idx, hwc->config_base);
+	}
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -575,6 +644,16 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	armv8pmu_enable_counter(idx);
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_enable_counter(idx - 1);
+	isb();
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -582,6 +661,16 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_disable_counter(idx - 1);
+	armv8pmu_disable_counter(idx);
+}
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -589,6 +678,11 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -601,6 +695,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -618,10 +717,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -632,22 +729,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -655,10 +752,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -668,12 +763,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Disable interrupt for this counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -767,10 +862,42 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx ++) {
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	}
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
-	int idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
@@ -784,19 +911,20 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	if (armv8pmu_event_is_64bit(event))
+		return	armv8pmu_get_chain_idx(cpuc, cpu_pmu);
+	else
+		return armv8pmu_get_single_idx(cpuc, cpu_pmu);
 }
 
 static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
-				  struct perf_event *event)
+				     struct perf_event *event)
 {
-	clear_bit(event->hw.idx, cpuc->used_mask);
+	int idx = event->hw.idx;
+
+	clear_bit(idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -871,6 +999,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (armv8pmu_event_is_64bit(event))
+		event->hw.flags |= ARMPMU_EVT_64BIT;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a288810..7f01f6f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -665,14 +665,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 	int idx;
 
 	for (idx = 0; idx < armpmu->num_events; idx++) {
-		/*
-		 * If the counter is not used skip it, there is no
-		 * need of stopping/restarting it.
-		 */
-		if (!test_bit(idx, hw_events->used_mask))
-			continue;
-
 		event = hw_events->events[idx];
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/7] arm64: perf: Support for chained counters
  2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (6 preceding siblings ...)
  2018-07-10  8:58 ` [PATCH v5 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-07-10 14:31 ` Will Deacon
  2018-07-10 15:01   ` Suzuki K Poulose
  7 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2018-07-10 14:31 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, julien.thierry,
	robin.murphy

Hi Suzuki,

On Tue, Jul 10, 2018 at 09:57:57AM +0100, Suzuki K Poulose wrote:
> This series adds support for counting PMU events using 64bit counters
> for arm64 PMU.
> 
> The Arm v8 PMUv3 supports combining two adjacent 32bit counters
> (low even and hig odd counters) to count a given "event" in 64bit mode.
> This series adds the support for 64bit events in the core arm_pmu driver
> infrastructure and adds the support for armv8 64bit kernel PMU to use
> chained counters to count in 64bit mode. For CPU cycles, we use the cycle
> counter in 64bit mode, only when requested. If the cycle counter is not
> available, we fall back to chaining the counters.
> 
> Tested on Juno, Fast models. Applies on 4.18-rc4

Thanks, this looks pretty good to me. How far did you get with the perf
fuzzer?

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/7] arm64: perf: Support for chained counters
  2018-07-10 14:31 ` [PATCH v5 0/7] arm64: perf: Support for chained counters Will Deacon
@ 2018-07-10 15:01   ` Suzuki K Poulose
  0 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2018-07-10 15:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, julien.thierry,
	robin.murphy

On 07/10/2018 03:31 PM, Will Deacon wrote:
> Hi Suzuki,
> 
> On Tue, Jul 10, 2018 at 09:57:57AM +0100, Suzuki K Poulose wrote:
>> This series adds support for counting PMU events using 64bit counters
>> for arm64 PMU.
>>
>> The Arm v8 PMUv3 supports combining two adjacent 32bit counters
>> (low even and hig odd counters) to count a given "event" in 64bit mode.
>> This series adds the support for 64bit events in the core arm_pmu driver
>> infrastructure and adds the support for armv8 64bit kernel PMU to use
>> chained counters to count in 64bit mode. For CPU cycles, we use the cycle
>> counter in 64bit mode, only when requested. If the cycle counter is not
>> available, we fall back to chaining the counters.
>>
>> Tested on Juno, Fast models. Applies on 4.18-rc4
> 
> Thanks, this looks pretty good to me. How far did you get with the perf
> fuzzer?

Hi Will,

I ran perf_fuzzer on Juno and the fuzzer hung with RCU stalls.
However, this was reproducible without the series. I am yet to
investigate the RCU stall.

Log below (from 4.18-rc3, without the series).

[445885.968489] perf_fuzzer     R  running task        0 11468   4397 
0x00000003
[445885.975568] Call trace:
[445885.978084]  ret_from_fork+0x0/0x1c
[445963.841606] INFO: rcu_preempt self-detected stall on CPU
[445963.845814] INFO: rcu_preempt detected stalls on CPUs/tasks:
[445963.846957] 	2-....: (1 GPs behind) idle=376/1/4611686018427387908 
softirq=57145/57145 fqs=54519307
[445963.852649] 	
[445963.861784] 	2-....: (1 GPs behind) idle=376/1/4611686018427387908 
softirq=57145/57145 fqs=54519307
[445963.861787]  (t=109254007 jiffies g=61092 c=61091 q=1517144)
[445963.878244] 	
[445963.878245] Task dump for CPU 2:
[445963.878250] perf_fuzzer     R
[445963.879897] (detected by 0, t=109254007 jiffies, g=61092, c=61091, 
q=1517144)
[445963.883167]   running task        0 11468   4397 0x00000003
[445963.898932] Call trace:
[445963.901439]  dump_backtrace+0x0/0x1e0
[445963.905149]  show_stack+0x24/0x30
[445963.908516]  sched_show_task+0x220/0x2c0
[445963.912484]  dump_cpu_task+0x48/0x58
[445963.916109]  rcu_dump_cpu_stacks+0xa0/0xe0
[445963.920250]  rcu_check_callbacks+0xa40/0xc20
[445963.924564]  update_process_times+0x34/0x60
[445963.928791]  tick_sched_handle.isra.5+0x44/0x68
[445963.933361]  tick_sched_timer+0x50/0xa0
[445963.937244]  __hrtimer_run_queues+0x19c/0x6d0
[445963.941644]  hrtimer_interrupt+0xec/0x248
[445963.945699]  arch_timer_handler_phys+0x3c/0x50
[445963.950184]  handle_percpu_devid_irq+0xcc/0x4a0
[445963.954754]  generic_handle_irq+0x34/0x50
[445963.958808]  __handle_domain_irq+0x68/0xc0
[445963.962948]  gic_handle_irq+0x60/0xb8
[445963.966657]  el1_irq+0xb4/0x130
[445963.969851]  __do_softirq+0xcc/0x5c4
[445963.973475]  irq_exit+0x12c/0x138
[445963.976841]  __handle_domain_irq+0x6c/0xc0
[445963.980981]  gic_handle_irq+0x60/0xb8
[445963.984691]  el0_irq_naked+0x50/0x5c
[445963.988316] Task dump for CPU 2:
[445963.991601] perf_fuzzer     R  running task        0 11468   4397 
0x00000003
[445963.998679] Call trace:
[445964.001194]  ret_from_fork+0x0/0x1c
root@localhost:~# uname -a
Linux localhost.localdomain 4.18.0-rc3 #117 SMP PREEMPT Thu Jul 5 
11:47:12 BST 2018 aarch64 aarch64 aarch64 GNU/Linux

Suzuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 7/7] arm64: perf: Add support for chaining event counters
  2018-07-10  8:58 ` [PATCH v5 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-07-10 17:18   ` Mark Rutland
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2018-07-10 17:18 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, julien.thierry,
	robin.murphy

On Tue, Jul 10, 2018 at 09:58:04AM +0100, Suzuki K Poulose wrote:
> Add support for 64bit event by using chained event counters
> and 64bit cycle counters.
> 
> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
> forming a 64-bit counter. The low/even counter is programmed to count
> the event of interest, and the high/odd counter is programmed to count
> the CHAIN event, taken when the low/even counter overflows.
> 
> For CPU cycles, when 64bit mode is requested, the cycle counter
> is used in 64bit mode. If the cycle counter is not available,
> falls back to chaining.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
> Changes since v4:
>  - Drop comment in cpu_pm_pmu_setup
> Changes since v3:
>  - Rename format name from "bits64 => long"
>  - Address other comments on style.
> ---
>  arch/arm64/kernel/perf_event.c | 185 +++++++++++++++++++++++++++++++++++------
>  drivers/perf/arm_pmu.c         |   9 +-
>  2 files changed, 160 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index b414d81..dfff5ed 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
>  };
>  
>  PMU_FORMAT_ATTR(event, "config:0-15");
> +PMU_FORMAT_ATTR(long, "config1:0");
> +
> +static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
> +{
> +	return event->attr.config1 & 0x1;
> +}
>  
>  static struct attribute *armv8_pmuv3_format_attrs[] = {
>  	&format_attr_event.attr,
> +	&format_attr_long.attr,
>  	NULL,
>  };
>  
> @@ -466,6 +473,21 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
>  	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
>  
>  /*
> + * We must chain two programmable counters for 64 bit events,
> + * except when we have allocated the 64bit cycle counter (for CPU
> + * cycles event). This must be called only when the event has
> + * a counter allocated.
> + */
> +static inline bool armv8pmu_event_is_chained(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	return !WARN_ON(idx < 0) &&
> +	       armv8pmu_event_is_64bit(event) &&
> +	       (idx != ARMV8_IDX_CYCLE_COUNTER);
> +}
> +
> +/*
>   * ARMv8 low level PMU access
>   */
>  
> @@ -516,12 +538,23 @@ static inline u32 armv8pmu_read_evcntr(int idx)
>  	return read_sysreg(pmxevcntr_el0);
>  }
>  
> +static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +	u64 val = 0;
> +
> +	val = armv8pmu_read_evcntr(idx);
> +	if (armv8pmu_event_is_chained(event))
> +		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
> +	return val;
> +}
> +
>  static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
>  	int idx = hwc->idx;
> -	u32 value = 0;
> +	u64 value = 0;
>  
>  	if (!armv8pmu_counter_valid(cpu_pmu, idx))
>  		pr_err("CPU%u reading wrong counter %d\n",
> @@ -529,7 +562,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
>  		value = read_sysreg(pmccntr_el0);
>  	else
> -		value = armv8pmu_read_evcntr(idx);
> +		value = armv8pmu_read_hw_counter(event);
>  
>  	return value;
>  }
> @@ -540,6 +573,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
>  	write_sysreg(value, pmxevcntr_el0);
>  }
>  
> +static inline void armv8pmu_write_hw_counter(struct perf_event *event,
> +					     u64 value)
> +{
> +	int idx = event->hw.idx;
> +
> +	if (armv8pmu_event_is_chained(event)) {
> +		armv8pmu_write_evcntr(idx, upper_32_bits(value));
> +		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
> +	} else {
> +		armv8pmu_write_evcntr(idx, value);
> +	}
> +}
> +
>  static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  {
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> @@ -551,14 +597,16 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
>  			smp_processor_id(), idx);
>  	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
>  		/*
> -		 * Set the upper 32bits as this is a 64bit counter but we only
> -		 * count using the lower 32bits and we want an interrupt when
> -		 * it overflows.
> +		 * The cycles counter is really a 64-bit counter.
> +		 * When treating it as a 32-bit counter, we only count
> +		 * the lower 32 bits, and set the upper 32-bits so that
> +		 * we get an interrupt upon 32-bit overflow.
>  		 */
> -		value |= 0xffffffff00000000ULL;
> +		if (!armv8pmu_event_is_64bit(event))
> +			value |= 0xffffffff00000000ULL;
>  		write_sysreg(value, pmccntr_el0);
>  	} else
> -		armv8pmu_write_evcntr(idx, value);
> +		armv8pmu_write_hw_counter(event, value);
>  }
>  
>  static inline void armv8pmu_write_evtype(int idx, u32 val)
> @@ -568,6 +616,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
>  	write_sysreg(val, pmxevtyper_el0);
>  }
>  
> +static inline void armv8pmu_write_event_type(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	/*
> +	 * For chained events, the low counter is programmed to count
> +	 * the event of interest and the high counter is programmed
> +	 * with CHAIN event code with filters set to count at all ELs.
> +	 */
> +	if (armv8pmu_event_is_chained(event)) {
> +		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
> +				ARMV8_PMU_INCLUDE_EL2;
> +
> +		armv8pmu_write_evtype(idx - 1, hwc->config_base);
> +		armv8pmu_write_evtype(idx, chain_evt);
> +	} else {
> +		armv8pmu_write_evtype(idx, hwc->config_base);
> +	}
> +}
> +
>  static inline int armv8pmu_enable_counter(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -575,6 +644,16 @@ static inline int armv8pmu_enable_counter(int idx)
>  	return idx;
>  }
>  
> +static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> +{
> +	int idx = event->hw.idx;
> +
> +	armv8pmu_enable_counter(idx);
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_enable_counter(idx - 1);
> +	isb();
> +}
> +
>  static inline int armv8pmu_disable_counter(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -582,6 +661,16 @@ static inline int armv8pmu_disable_counter(int idx)
>  	return idx;
>  }
>  
> +static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	if (armv8pmu_event_is_chained(event))
> +		armv8pmu_disable_counter(idx - 1);
> +	armv8pmu_disable_counter(idx);
> +}
> +
>  static inline int armv8pmu_enable_intens(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -589,6 +678,11 @@ static inline int armv8pmu_enable_intens(int idx)
>  	return idx;
>  }
>  
> +static inline int armv8pmu_enable_event_irq(struct perf_event *event)
> +{
> +	return armv8pmu_enable_intens(event->hw.idx);
> +}
> +
>  static inline int armv8pmu_disable_intens(int idx)
>  {
>  	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> @@ -601,6 +695,11 @@ static inline int armv8pmu_disable_intens(int idx)
>  	return idx;
>  }
>  
> +static inline int armv8pmu_disable_event_irq(struct perf_event *event)
> +{
> +	return armv8pmu_disable_intens(event->hw.idx);
> +}
> +
>  static inline u32 armv8pmu_getreset_flags(void)
>  {
>  	u32 value;
> @@ -618,10 +717,8 @@ static inline u32 armv8pmu_getreset_flags(void)
>  static void armv8pmu_enable_event(struct perf_event *event)
>  {
>  	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>  
>  	/*
>  	 * Enable counter and interrupt, and set the counter to count
> @@ -632,22 +729,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
>  	/*
>  	 * Disable counter
>  	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>  
>  	/*
>  	 * Set event (if destined for PMNx counters).
>  	 */
> -	armv8pmu_write_evtype(idx, hwc->config_base);
> +	armv8pmu_write_event_type(event);
>  
>  	/*
>  	 * Enable interrupt for this counter
>  	 */
> -	armv8pmu_enable_intens(idx);
> +	armv8pmu_enable_event_irq(event);
>  
>  	/*
>  	 * Enable counter
>  	 */
> -	armv8pmu_enable_counter(idx);
> +	armv8pmu_enable_event_counter(event);
>  
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
> @@ -655,10 +752,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
>  static void armv8pmu_disable_event(struct perf_event *event)
>  {
>  	unsigned long flags;
> -	struct hw_perf_event *hwc = &event->hw;
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
> -	int idx = hwc->idx;
>  
>  	/*
>  	 * Disable counter and interrupt
> @@ -668,12 +763,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
>  	/*
>  	 * Disable counter
>  	 */
> -	armv8pmu_disable_counter(idx);
> +	armv8pmu_disable_event_counter(event);
>  
>  	/*
>  	 * Disable interrupt for this counter
>  	 */
> -	armv8pmu_disable_intens(idx);
> +	armv8pmu_disable_event_irq(event);
>  
>  	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
>  }
> @@ -767,10 +862,42 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
>  	return IRQ_HANDLED;
>  }
>  
> +static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
> +				    struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx ++) {
> +		if (!test_and_set_bit(idx, cpuc->used_mask))
> +			return idx;
> +	}
> +	return -EAGAIN;
> +}
> +
> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> +				   struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	/*
> +	 * Chaining requires two consecutive event counters, where
> +	 * the lower idx must be even.
> +	 */
> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
> +			/* Check if the preceding even counter is available */
> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> +				return idx;
> +			/* Release the Odd counter */
> +			clear_bit(idx, cpuc->used_mask);
> +		}
> +	}
> +	return -EAGAIN;
> +}
> +
>  static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  				  struct perf_event *event)
>  {
> -	int idx;
>  	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
>  	struct hw_perf_event *hwc = &event->hw;
>  	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
> @@ -784,19 +911,20 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	/*
>  	 * Otherwise use events counters
>  	 */
> -	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
> -		if (!test_and_set_bit(idx, cpuc->used_mask))
> -			return idx;
> -	}
> -
> -	/* The counters are all in use. */
> -	return -EAGAIN;
> +	if (armv8pmu_event_is_64bit(event))
> +		return	armv8pmu_get_chain_idx(cpuc, cpu_pmu);
> +	else
> +		return armv8pmu_get_single_idx(cpuc, cpu_pmu);
>  }
>  
>  static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> -				  struct perf_event *event)
> +				     struct perf_event *event)
>  {
> -	clear_bit(event->hw.idx, cpuc->used_mask);
> +	int idx = event->hw.idx;
> +
> +	clear_bit(idx, cpuc->used_mask);
> +	if (armv8pmu_event_is_chained(event))
> +		clear_bit(idx - 1, cpuc->used_mask);
>  }
>  
>  /*
> @@ -871,6 +999,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
>  				       &armv8_pmuv3_perf_cache_map,
>  				       ARMV8_PMU_EVTYPE_EVENT);
>  
> +	if (armv8pmu_event_is_64bit(event))
> +		event->hw.flags |= ARMPMU_EVT_64BIT;
> +
>  	/* Onl expose micro/arch events supported by this PMU */
>  	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
>  	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index a288810..7f01f6f 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -665,14 +665,9 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>  	int idx;
>  
>  	for (idx = 0; idx < armpmu->num_events; idx++) {
> -		/*
> -		 * If the counter is not used skip it, there is no
> -		 * need of stopping/restarting it.
> -		 */
> -		if (!test_bit(idx, hw_events->used_mask))
> -			continue;
> -
>  		event = hw_events->events[idx];
> +		if (!event)
> +			continue;
>  
>  		switch (cmd) {
>  		case CPU_PM_ENTER:
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-10 18:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-10  8:57 [PATCH v5 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
2018-07-10  8:57 ` [PATCH v5 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
2018-07-10  8:57 ` [PATCH v5 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
2018-07-10  8:58 ` [PATCH v5 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
2018-07-10  8:58 ` [PATCH v5 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
2018-07-10  8:58 ` [PATCH v5 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
2018-07-10  8:58 ` [PATCH v5 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
2018-07-10  8:58 ` [PATCH v5 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
2018-07-10 17:18   ` Mark Rutland
2018-07-10 14:31 ` [PATCH v5 0/7] arm64: perf: Support for chained counters Will Deacon
2018-07-10 15:01   ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).