linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] arm64: perf: Support for chained counters
@ 2018-07-02 21:59 Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose


This series adds support for counting PMU events using 64bit counters
for arm64 PMU.

The Arm v8 PMUv3 supports combining two adjacent 32bit counters
(low even and hig odd counters) to count a given "event" in 64bit mode.
This series adds the support for 64bit events in the core arm_pmu driver
infrastructure and adds the support for armv8 64bit kernel PMU to use
chained counters to count in 64bit mode. For CPU cycles, we use the cycle
counter in 64bit mode, only when requested. If the cycle counter is not
available, we fall back to chaining the counters.

Tested on Juno, Fast models. Applies on 4.18-rc3

Changes since v3:
 - Rename format string "bits64" => "long"
 - Add clear_event_idx call back for all PMU backends
 - Address style comments
 - Add Acked-by's

Changes since v2:
 - Clean up select counter routine. (Mark Rutland)
 - Stop PMU while processing overflows (Mark Rutland)
 - Drop special allocation algorithm for chain indices
 - Since we access the counters when the PMU is stopped,
   get rid of the unncessary barriers.
 - Ensure a counter is allocated when checking for chained event

Change since v1:
 - Remove unnecessary isb()s in chain counter reads/writes
 - Fix event programming order for counters
 - Tighten chain counter event read sequence
 - Set chain event to count in all ELs
 - Cleanup helpers to be consistent
 - Fix build break on xcale PMU (reported by kbuild-robot)
 - Remove the explicit counter width field from pmu backends and default
   to 32bit.
 - Rename flag ARMPMU_EVT_LONG => ARMPMU_EVT_64BIT and
   the format string "chain" => "bits64". (Unfortunately we can't use "64bit"
   and I am open for suggestion on a better name)
 - Rename armpmu_get_event_max_period() => armpmu_event_max_period()
 - For 64bit CPU cycles events, allow chaining if cycle counter is
   not available.


Suzuki K Poulose (7):
  arm_pmu: Clean up maximum period handling
  arm_pmu: Change API to support 64bit counter values
  arm_pmu: Add support for 64bit event counters
  arm_pmu: Tidy up clear_event_idx call backs
  arm64: perf: Clean up armv8pmu_select_counter
  arm64: perf: Disable PMU while processing counter overflows
  arm64: perf: Add support for chaining event counters

 arch/arm/kernel/perf_event_v6.c     |  14 +-
 arch/arm/kernel/perf_event_v7.c     |  14 +-
 arch/arm/kernel/perf_event_xscale.c |  10 +-
 arch/arm64/kernel/perf_event.c      | 251 +++++++++++++++++++++++++++++-------
 drivers/perf/arm_pmu.c              |  42 +++---
 include/linux/perf/arm_pmu.h        |  11 +-
 6 files changed, 260 insertions(+), 82 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/7] arm_pmu: Clean up maximum period handling
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.

No functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     |  2 --
 arch/arm/kernel/perf_event_v7.c     |  1 -
 arch/arm/kernel/perf_event_xscale.c |  2 --
 arch/arm64/kernel/perf_event.c      |  1 -
 drivers/perf/arm_pmu.c              | 16 ++++++++++++----
 include/linux/perf/arm_pmu.h        |  1 -
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index be42c4f..f64a6bf 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -495,7 +495,6 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 }
 
 static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu)
@@ -546,7 +545,6 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 57f01e0..ecca4cd 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1170,7 +1170,6 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 };
 
 static void armv7_read_num_pmnc_events(void *info)
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 88d1a76..c4f0294 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -374,7 +374,6 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale1pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 3;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
@@ -743,7 +742,6 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->stop		= xscale2pmu_stop;
 	cpu_pmu->map_event	= xscale_map_event;
 	cpu_pmu->num_events	= 5;
-	cpu_pmu->max_period	= (1LLU << 32) - 1;
 
 	return 0;
 }
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aa..678ecff 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -960,7 +960,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
-	cpu_pmu->max_period		= (1LLU << 32) - 1,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a6347d4..6ddc00d 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,6 +28,11 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
+static inline u64 arm_pmu_max_period(void)
+{
+	return (1ULL << 32) - 1;
+}
+
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
 				      [PERF_COUNT_HW_CACHE_MAX]
@@ -114,8 +119,10 @@ int armpmu_event_set_period(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
+	u64 max_period;
 	int ret = 0;
 
+	max_period = arm_pmu_max_period();
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -136,8 +143,8 @@ int armpmu_event_set_period(struct perf_event *event)
 	 * effect we are reducing max_period to account for
 	 * interrupt latency (and we are being very conservative).
 	 */
-	if (left > (armpmu->max_period >> 1))
-		left = armpmu->max_period >> 1;
+	if (left > (max_period >> 1))
+		left = (max_period >> 1);
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
@@ -153,6 +160,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
+	u64 max_period = arm_pmu_max_period();
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -162,7 +170,7 @@ u64 armpmu_event_update(struct perf_event *event)
 			     new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+	delta = (new_raw_count - prev_raw_count) & max_period;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -402,7 +410,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = armpmu->max_period >> 1;
+		hwc->sample_period  = arm_pmu_max_period() >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index ad54444..12c30a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -94,7 +94,6 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		num_events;
-	u64		max_period;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/7] arm_pmu: Change API to support 64bit counter values
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters. The backends still
use 32bit values and we pass them 32bit values only. So in effect
there are no functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 - No changes since v2
---
 arch/arm/kernel/perf_event_v6.c     | 4 ++--
 arch/arm/kernel/perf_event_v7.c     | 4 ++--
 arch/arm/kernel/perf_event_xscale.c | 8 ++++----
 arch/arm64/kernel/perf_event.c      | 9 ++++-----
 include/linux/perf/arm_pmu.h        | 4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index f64a6bf..0729f98 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -233,7 +233,7 @@ armv6_pmcr_counter_has_overflowed(unsigned long pmcr,
 	return ret;
 }
 
-static inline u32 armv6pmu_read_counter(struct perf_event *event)
+static inline u64 armv6pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -251,7 +251,7 @@ static inline u32 armv6pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv6pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv6pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index ecca4cd..fd7ce01 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -743,7 +743,7 @@ static inline void armv7_pmnc_select_counter(int idx)
 	isb();
 }
 
-static inline u32 armv7pmu_read_counter(struct perf_event *event)
+static inline u64 armv7pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -763,7 +763,7 @@ static inline u32 armv7pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv7pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv7pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index c4f0294..942230f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -316,7 +316,7 @@ static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale1pmu_read_counter(struct perf_event *event)
+static inline u64 xscale1pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -337,7 +337,7 @@ static inline u32 xscale1pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale1pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale1pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -678,7 +678,7 @@ static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
-static inline u32 xscale2pmu_read_counter(struct perf_event *event)
+static inline u64 xscale2pmu_read_counter(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
@@ -705,7 +705,7 @@ static inline u32 xscale2pmu_read_counter(struct perf_event *event)
 	return val;
 }
 
-static inline void xscale2pmu_write_counter(struct perf_event *event, u32 val)
+static inline void xscale2pmu_write_counter(struct perf_event *event, u64 val)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int counter = hwc->idx;
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 678ecff..66a2ffd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -512,7 +512,7 @@ static inline int armv8pmu_select_counter(int idx)
 	return idx;
 }
 
-static inline u32 armv8pmu_read_counter(struct perf_event *event)
+static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -530,7 +530,7 @@ static inline u32 armv8pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
-static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
+static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
@@ -545,9 +545,8 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 		 * count using the lower 32bits and we want an interrupt when
 		 * it overflows.
 		 */
-		u64 value64 = 0xffffffff00000000ULL | value;
-
-		write_sysreg(value64, pmccntr_el0);
+		value |= 0xffffffff00000000ULL;
+		write_sysreg(value, pmccntr_el0);
 	} else if (armv8pmu_select_counter(idx) == idx)
 		write_sysreg(value, pmxevcntr_el0);
 }
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 12c30a2..f7126a2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -87,8 +87,8 @@ struct arm_pmu {
 					 struct perf_event *event);
 	int		(*set_event_filter)(struct hw_perf_event *evt,
 					    struct perf_event_attr *attr);
-	u32		(*read_counter)(struct perf_event *event);
-	void		(*write_counter)(struct perf_event *event, u32 val);
+	u64		(*read_counter)(struct perf_event *event);
+	void		(*write_counter)(struct perf_event *event, u64 val);
 	void		(*start)(struct arm_pmu *);
 	void		(*stop)(struct arm_pmu *);
 	void		(*reset)(void *);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/7] arm_pmu: Add support for 64bit event counters
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Each PMU has a set of 32bit event counters. But in some
special cases, the events could be counted using counters
which are effectively 64bit wide.

e.g, Arm V8 PMUv3 has a 64 bit cycle counter which can count
only the CPU cycles. Also, the PMU can chain the event counters
to effectively count as a 64bit counter.

Add support for tracking the events that uses 64bit counters.
This only affects the periods set for each counter in the core
driver.

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v3:
 - Replace ternary operator with if..else for max_period
 - Added Acked-by from Mark R
---
 drivers/perf/arm_pmu.c       | 16 ++++++++++------
 include/linux/perf/arm_pmu.h |  6 ++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 6ddc00d..8cad6b5 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -28,9 +28,12 @@
 static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 
-static inline u64 arm_pmu_max_period(void)
+static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
-	return (1ULL << 32) - 1;
+	if (event->hw.flags & ARMPMU_EVT_64BIT)
+		return GENMASK_ULL(63, 0);
+	else
+		return GENMASK_ULL(31, 0);
 }
 
 static int
@@ -122,7 +125,7 @@ int armpmu_event_set_period(struct perf_event *event)
 	u64 max_period;
 	int ret = 0;
 
-	max_period = arm_pmu_max_period();
+	max_period = arm_pmu_event_max_period(event);
 	if (unlikely(left <= -period)) {
 		left = period;
 		local64_set(&hwc->period_left, left);
@@ -148,7 +151,7 @@ int armpmu_event_set_period(struct perf_event *event)
 
 	local64_set(&hwc->prev_count, (u64)-left);
 
-	armpmu->write_counter(event, (u64)(-left) & 0xffffffff);
+	armpmu->write_counter(event, (u64)(-left) & max_period);
 
 	perf_event_update_userpage(event);
 
@@ -160,7 +163,7 @@ u64 armpmu_event_update(struct perf_event *event)
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	u64 delta, prev_raw_count, new_raw_count;
-	u64 max_period = arm_pmu_max_period();
+	u64 max_period = arm_pmu_event_max_period(event);
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
@@ -368,6 +371,7 @@ __hw_perf_event_init(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int mapping;
 
+	hwc->flags = 0;
 	mapping = armpmu->map_event(event);
 
 	if (mapping < 0) {
@@ -410,7 +414,7 @@ __hw_perf_event_init(struct perf_event *event)
 		 * is far less likely to overtake the previous one unless
 		 * you have some serious IRQ latency issues.
 		 */
-		hwc->sample_period  = arm_pmu_max_period() >> 1;
+		hwc->sample_period  = arm_pmu_event_max_period(event) >> 1;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f7126a2..10f92e1 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -25,6 +25,12 @@
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+/*
+ * ARM PMU hw_event flags
+ */
+/* Event uses a 64bit counter */
+#define ARMPMU_EVT_64BIT		1
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2018-07-02 21:59 ` [PATCH v4 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-03 12:49   ` Mark Rutland
  2018-07-02 21:59 ` [PATCH v4 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. To keep things tidy, mandate the implementation
of clear_event_idx() and add it for exisiting backends.
This will be useful for adding the chained event support, where
we leave the event idx maintenance to the backend.

Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v3:
 - Add clear_event_idx call back for backends.
---
 arch/arm/kernel/perf_event_v6.c | 8 ++++++++
 arch/arm/kernel/perf_event_v7.c | 9 +++++++++
 arch/arm64/kernel/perf_event.c  | 7 +++++++
 drivers/perf/arm_pmu.c          | 7 +++----
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 0729f98..1ae99de 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -411,6 +411,12 @@ armv6pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	}
 }
 
+static void armv6pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 static void armv6pmu_disable_event(struct perf_event *event)
 {
 	unsigned long val, mask, evt, flags;
@@ -491,6 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv6pmu_read_counter;
 	cpu_pmu->write_counter	= armv6pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6_map_event;
@@ -541,6 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv6pmu_read_counter;
 	cpu_pmu->write_counter	= armv6pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
 	cpu_pmu->start		= armv6pmu_start;
 	cpu_pmu->stop		= armv6pmu_stop;
 	cpu_pmu->map_event	= armv6mpcore_map_event;
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index fd7ce01..acb49a9 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1058,6 +1058,12 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
+static void armv7pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 /*
  * Add an event filter to a given event. This will only work for PMUv2 PMUs.
  */
@@ -1167,6 +1173,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter	= armv7pmu_read_counter;
 	cpu_pmu->write_counter	= armv7pmu_write_counter;
 	cpu_pmu->get_event_idx	= armv7pmu_get_event_idx;
+	cpu_pmu->clear_event_idx = armv7pmu_clear_event_idx;
 	cpu_pmu->start		= armv7pmu_start;
 	cpu_pmu->stop		= armv7pmu_stop;
 	cpu_pmu->reset		= armv7pmu_reset;
@@ -1637,6 +1644,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool krait_event = EVENT_CPU(hwc->config_base);
 
+	armv7pmu_clear_event_idx(cpuc, event);
 	if (venum_event || krait_event) {
 		bit = krait_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
@@ -1966,6 +1974,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
 	bool venum_event = EVENT_VENUM(hwc->config_base);
 	bool scorpion_event = EVENT_CPU(hwc->config_base);
 
+	armv7pmu_clear_event_idx(cpuc, event);
 	if (venum_event || scorpion_event) {
 		bit = scorpion_event_to_bit(event, region, group);
 		clear_bit(bit, cpuc->used_mask);
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 66a2ffd..ac66851 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -778,6 +778,12 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
+static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
+				  struct perf_event *event)
+{
+	clear_bit(event->hw.idx, cpuc->used_mask);
+}
+
 /*
  * Add an event filter to a given event. This will only work for PMUv2 PMUs.
  */
@@ -956,6 +962,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->read_counter		= armv8pmu_read_counter,
 	cpu_pmu->write_counter		= armv8pmu_write_counter,
 	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
+	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
 	cpu_pmu->start			= armv8pmu_start,
 	cpu_pmu->stop			= armv8pmu_stop,
 	cpu_pmu->reset			= armv8pmu_reset,
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8cad6b5..a288810 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -238,11 +238,10 @@ armpmu_del(struct perf_event *event, int flags)
 
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
-	clear_bit(idx, hw_events->used_mask);
-	if (armpmu->clear_event_idx)
-		armpmu->clear_event_idx(hw_events, event);
-
+	armpmu->clear_event_idx(hw_events, event);
 	perf_event_update_userpage(event);
+	/* Clear the allocated counter */
+	hwc->idx = -1;
 }
 
 static int
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/7] arm64: perf: Clean up armv8pmu_select_counter
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2018-07-02 21:59 ` [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  6 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index ac66851..bc014af 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -503,13 +503,17 @@ static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
 	return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx));
 }
 
-static inline int armv8pmu_select_counter(int idx)
+static inline void armv8pmu_select_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
 	write_sysreg(counter, pmselr_el0);
 	isb();
+}
 
-	return idx;
+static inline u32 armv8pmu_read_evcntr(int idx)
+{
+	armv8pmu_select_counter(idx);
+	return read_sysreg(pmxevcntr_el0);
 }
 
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
@@ -524,12 +528,18 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
-	else if (armv8pmu_select_counter(idx) == idx)
-		value = read_sysreg(pmxevcntr_el0);
+	else
+		value = armv8pmu_read_evcntr(idx);
 
 	return value;
 }
 
+static inline void armv8pmu_write_evcntr(int idx, u32 value)
+{
+	armv8pmu_select_counter(idx);
+	write_sysreg(value, pmxevcntr_el0);
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -547,16 +557,15 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 		 */
 		value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
-	} else if (armv8pmu_select_counter(idx) == idx)
-		write_sysreg(value, pmxevcntr_el0);
+	} else
+		armv8pmu_write_evcntr(idx, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
 {
-	if (armv8pmu_select_counter(idx) == idx) {
-		val &= ARMV8_PMU_EVTYPE_MASK;
-		write_sysreg(val, pmxevtyper_el0);
-	}
+	armv8pmu_select_counter(idx);
+	val &= ARMV8_PMU_EVTYPE_MASK;
+	write_sysreg(val, pmxevtyper_el0);
 }
 
 static inline int armv8pmu_enable_counter(int idx)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/7] arm64: perf: Disable PMU while processing counter overflows
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2018-07-02 21:59 ` [PATCH v4 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-02 21:59 ` [PATCH v4 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
  6 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.

This patch also moves the PMU stop/start routines to avoid a
forward declaration.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/perf_event.c | 50 +++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index bc014af..b414d81 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -678,6 +678,28 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
 
+static void armv8pmu_start(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Enable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
+{
+	unsigned long flags;
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	raw_spin_lock_irqsave(&events->pmu_lock, flags);
+	/* Disable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
+	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
 static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 {
 	u32 pmovsr;
@@ -702,6 +724,11 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	 */
 	regs = get_irq_regs();
 
+	/*
+	 * Stop the PMU while processing the counter overflows
+	 * to prevent skews in group events.
+	 */
+	armv8pmu_stop(cpu_pmu);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
 		struct hw_perf_event *hwc;
@@ -726,6 +753,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (perf_event_overflow(event, &data, regs))
 			cpu_pmu->disable(event);
 	}
+	armv8pmu_start(cpu_pmu);
 
 	/*
 	 * Handle the pending perf events.
@@ -739,28 +767,6 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
-static void armv8pmu_start(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Enable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
-static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
-{
-	unsigned long flags;
-	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
-	raw_spin_lock_irqsave(&events->pmu_lock, flags);
-	/* Disable all counters */
-	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
-	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
-}
-
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 7/7] arm64: perf: Add support for chaining event counters
  2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2018-07-02 21:59 ` [PATCH v4 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
@ 2018-07-02 21:59 ` Suzuki K Poulose
  2018-07-03 13:00   ` Mark Rutland
  6 siblings, 1 reply; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-02 21:59 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, will.deacon, julien.thierry,
	robin.murphy, Suzuki K Poulose

Add support for 64bit event by using chained event counters
and 64bit cycle counters.

PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.

For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v3:
 - Rename format name from "bits64 => long"
 - Address other comments on style.
---
 arch/arm64/kernel/perf_event.c | 185 +++++++++++++++++++++++++++++++++++------
 drivers/perf/arm_pmu.c         |  13 ++-
 2 files changed, 164 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index b414d81..dfff5ed 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -446,9 +446,16 @@ static struct attribute_group armv8_pmuv3_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-15");
+PMU_FORMAT_ATTR(long, "config1:0");
+
+static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
+{
+	return event->attr.config1 & 0x1;
+}
 
 static struct attribute *armv8_pmuv3_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_long.attr,
 	NULL,
 };
 
@@ -466,6 +473,21 @@ static struct attribute_group armv8_pmuv3_format_attr_group = {
 	(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
 /*
+ * We must chain two programmable counters for 64 bit events,
+ * except when we have allocated the 64bit cycle counter (for CPU
+ * cycles event). This must be called only when the event has
+ * a counter allocated.
+ */
+static inline bool armv8pmu_event_is_chained(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	return !WARN_ON(idx < 0) &&
+	       armv8pmu_event_is_64bit(event) &&
+	       (idx != ARMV8_IDX_CYCLE_COUNTER);
+}
+
+/*
  * ARMv8 low level PMU access
  */
 
@@ -516,12 +538,23 @@ static inline u32 armv8pmu_read_evcntr(int idx)
 	return read_sysreg(pmxevcntr_el0);
 }
 
+static inline u64 armv8pmu_read_hw_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = 0;
+
+	val = armv8pmu_read_evcntr(idx);
+	if (armv8pmu_event_is_chained(event))
+		val = (val << 32) | armv8pmu_read_evcntr(idx - 1);
+	return val;
+}
+
 static inline u64 armv8pmu_read_counter(struct perf_event *event)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
-	u32 value = 0;
+	u64 value = 0;
 
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u reading wrong counter %d\n",
@@ -529,7 +562,7 @@ static inline u64 armv8pmu_read_counter(struct perf_event *event)
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
 		value = read_sysreg(pmccntr_el0);
 	else
-		value = armv8pmu_read_evcntr(idx);
+		value = armv8pmu_read_hw_counter(event);
 
 	return value;
 }
@@ -540,6 +573,19 @@ static inline void armv8pmu_write_evcntr(int idx, u32 value)
 	write_sysreg(value, pmxevcntr_el0);
 }
 
+static inline void armv8pmu_write_hw_counter(struct perf_event *event,
+					     u64 value)
+{
+	int idx = event->hw.idx;
+
+	if (armv8pmu_event_is_chained(event)) {
+		armv8pmu_write_evcntr(idx, upper_32_bits(value));
+		armv8pmu_write_evcntr(idx - 1, lower_32_bits(value));
+	} else {
+		armv8pmu_write_evcntr(idx, value);
+	}
+}
+
 static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 {
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
@@ -551,14 +597,16 @@ static inline void armv8pmu_write_counter(struct perf_event *event, u64 value)
 			smp_processor_id(), idx);
 	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
 		/*
-		 * Set the upper 32bits as this is a 64bit counter but we only
-		 * count using the lower 32bits and we want an interrupt when
-		 * it overflows.
+		 * The cycles counter is really a 64-bit counter.
+		 * When treating it as a 32-bit counter, we only count
+		 * the lower 32 bits, and set the upper 32-bits so that
+		 * we get an interrupt upon 32-bit overflow.
 		 */
-		value |= 0xffffffff00000000ULL;
+		if (!armv8pmu_event_is_64bit(event))
+			value |= 0xffffffff00000000ULL;
 		write_sysreg(value, pmccntr_el0);
 	} else
-		armv8pmu_write_evcntr(idx, value);
+		armv8pmu_write_hw_counter(event, value);
 }
 
 static inline void armv8pmu_write_evtype(int idx, u32 val)
@@ -568,6 +616,27 @@ static inline void armv8pmu_write_evtype(int idx, u32 val)
 	write_sysreg(val, pmxevtyper_el0);
 }
 
+static inline void armv8pmu_write_event_type(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	/*
+	 * For chained events, the low counter is programmed to count
+	 * the event of interest and the high counter is programmed
+	 * with CHAIN event code with filters set to count at all ELs.
+	 */
+	if (armv8pmu_event_is_chained(event)) {
+		u32 chain_evt = ARMV8_PMUV3_PERFCTR_CHAIN |
+				ARMV8_PMU_INCLUDE_EL2;
+
+		armv8pmu_write_evtype(idx - 1, hwc->config_base);
+		armv8pmu_write_evtype(idx, chain_evt);
+	} else {
+		armv8pmu_write_evtype(idx, hwc->config_base);
+	}
+}
+
 static inline int armv8pmu_enable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -575,6 +644,16 @@ static inline int armv8pmu_enable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_enable_event_counter(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+
+	armv8pmu_enable_counter(idx);
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_enable_counter(idx - 1);
+	isb();
+}
+
 static inline int armv8pmu_disable_counter(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -582,6 +661,16 @@ static inline int armv8pmu_disable_counter(int idx)
 	return idx;
 }
 
+static inline void armv8pmu_disable_event_counter(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	if (armv8pmu_event_is_chained(event))
+		armv8pmu_disable_counter(idx - 1);
+	armv8pmu_disable_counter(idx);
+}
+
 static inline int armv8pmu_enable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -589,6 +678,11 @@ static inline int armv8pmu_enable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_enable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_enable_intens(event->hw.idx);
+}
+
 static inline int armv8pmu_disable_intens(int idx)
 {
 	u32 counter = ARMV8_IDX_TO_COUNTER(idx);
@@ -601,6 +695,11 @@ static inline int armv8pmu_disable_intens(int idx)
 	return idx;
 }
 
+static inline int armv8pmu_disable_event_irq(struct perf_event *event)
+{
+	return armv8pmu_disable_intens(event->hw.idx);
+}
+
 static inline u32 armv8pmu_getreset_flags(void)
 {
 	u32 value;
@@ -618,10 +717,8 @@ static inline u32 armv8pmu_getreset_flags(void)
 static void armv8pmu_enable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
@@ -632,22 +729,22 @@ static void armv8pmu_enable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Set event (if destined for PMNx counters).
 	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	armv8pmu_write_event_type(event);
 
 	/*
 	 * Enable interrupt for this counter
 	 */
-	armv8pmu_enable_intens(idx);
+	armv8pmu_enable_event_irq(event);
 
 	/*
 	 * Enable counter
 	 */
-	armv8pmu_enable_counter(idx);
+	armv8pmu_enable_event_counter(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -655,10 +752,8 @@ static void armv8pmu_enable_event(struct perf_event *event)
 static void armv8pmu_disable_event(struct perf_event *event)
 {
 	unsigned long flags;
-	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-	int idx = hwc->idx;
 
 	/*
 	 * Disable counter and interrupt
@@ -668,12 +763,12 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	/*
 	 * Disable counter
 	 */
-	armv8pmu_disable_counter(idx);
+	armv8pmu_disable_event_counter(event);
 
 	/*
 	 * Disable interrupt for this counter
 	 */
-	armv8pmu_disable_intens(idx);
+	armv8pmu_disable_event_irq(event);
 
 	raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
 }
@@ -767,10 +862,42 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 	return IRQ_HANDLED;
 }
 
+static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
+				    struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx ++) {
+		if (!test_and_set_bit(idx, cpuc->used_mask))
+			return idx;
+	}
+	return -EAGAIN;
+}
+
+static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
+				   struct arm_pmu *cpu_pmu)
+{
+	int idx;
+
+	/*
+	 * Chaining requires two consecutive event counters, where
+	 * the lower idx must be even.
+	 */
+	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
+		if (!test_and_set_bit(idx, cpuc->used_mask)) {
+			/* Check if the preceding even counter is available */
+			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
+				return idx;
+			/* Release the Odd counter */
+			clear_bit(idx, cpuc->used_mask);
+		}
+	}
+	return -EAGAIN;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
-	int idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
@@ -784,19 +911,20 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	/*
 	 * Otherwise use events counters
 	 */
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	if (armv8pmu_event_is_64bit(event))
+		return	armv8pmu_get_chain_idx(cpuc, cpu_pmu);
+	else
+		return armv8pmu_get_single_idx(cpuc, cpu_pmu);
 }
 
 static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
-				  struct perf_event *event)
+				     struct perf_event *event)
 {
-	clear_bit(event->hw.idx, cpuc->used_mask);
+	int idx = event->hw.idx;
+
+	clear_bit(idx, cpuc->used_mask);
+	if (armv8pmu_event_is_chained(event))
+		clear_bit(idx - 1, cpuc->used_mask);
 }
 
 /*
@@ -871,6 +999,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 				       &armv8_pmuv3_perf_cache_map,
 				       ARMV8_PMU_EVTYPE_EVENT);
 
+	if (armv8pmu_event_is_64bit(event))
+		event->hw.flags |= ARMPMU_EVT_64BIT;
+
 	/* Onl expose micro/arch events supported by this PMU */
 	if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
 	    && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index a288810..85758b8 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -665,14 +665,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 	int idx;
 
 	for (idx = 0; idx < armpmu->num_events; idx++) {
-		/*
-		 * If the counter is not used skip it, there is no
-		 * need of stopping/restarting it.
-		 */
-		if (!test_bit(idx, hw_events->used_mask))
-			continue;
-
 		event = hw_events->events[idx];
+		/*
+		 * If there is no event at this idx (e.g, an idx used
+		 * by a chained event in Arm v8 PMUv3), skip it.
+		 */
+		if (!event)
+			continue;
 
 		switch (cmd) {
 		case CPU_PM_ENTER:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-07-02 21:59 ` [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
@ 2018-07-03 12:49   ` Mark Rutland
  2018-07-03 13:13     ` Suzuki K Poulose
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2018-07-03 12:49 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, julien.thierry,
	robin.murphy

On Mon, Jul 02, 2018 at 10:59:45PM +0100, Suzuki K Poulose wrote:
> The armpmu uses get_event_idx callback to allocate an event
> counter for a given event, which marks the selected counter
> as "used". Now, when we delete the counter, the arm_pmu goes
> ahead and clears the "used" bit and then invokes the "clear_event_idx"
> call back, which kind of splits the job between the core code
> and the backend. To keep things tidy, mandate the implementation
> of clear_event_idx() and add it for exisiting backends.
> This will be useful for adding the chained event support, where
> we leave the event idx maintenance to the backend.
> 
> Also, when an event is removed from the PMU, reset the hw.idx
> to indicate that a counter is not allocated for this event,
> to help the backends do better checks. This will be also used
> for the chain counter support.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v3:
>  - Add clear_event_idx call back for backends.
> ---
>  arch/arm/kernel/perf_event_v6.c | 8 ++++++++
>  arch/arm/kernel/perf_event_v7.c | 9 +++++++++
>  arch/arm64/kernel/perf_event.c  | 7 +++++++
>  drivers/perf/arm_pmu.c          | 7 +++----
>  4 files changed, 27 insertions(+), 4 deletions(-)

I beleive this is missing the xscale PMU code.

Otherwiwse, this looks good to me.

Thanks,
Mark.

> diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
> index 0729f98..1ae99de 100644
> --- a/arch/arm/kernel/perf_event_v6.c
> +++ b/arch/arm/kernel/perf_event_v6.c
> @@ -411,6 +411,12 @@ armv6pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	}
>  }
>  
> +static void armv6pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	clear_bit(event->hw.idx, cpuc->used_mask);
> +}
> +
>  static void armv6pmu_disable_event(struct perf_event *event)
>  {
>  	unsigned long val, mask, evt, flags;
> @@ -491,6 +497,7 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->read_counter	= armv6pmu_read_counter;
>  	cpu_pmu->write_counter	= armv6pmu_write_counter;
>  	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
> +	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
>  	cpu_pmu->start		= armv6pmu_start;
>  	cpu_pmu->stop		= armv6pmu_stop;
>  	cpu_pmu->map_event	= armv6_map_event;
> @@ -541,6 +548,7 @@ static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->read_counter	= armv6pmu_read_counter;
>  	cpu_pmu->write_counter	= armv6pmu_write_counter;
>  	cpu_pmu->get_event_idx	= armv6pmu_get_event_idx;
> +	cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx;
>  	cpu_pmu->start		= armv6pmu_start;
>  	cpu_pmu->stop		= armv6pmu_stop;
>  	cpu_pmu->map_event	= armv6mpcore_map_event;
> diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
> index fd7ce01..acb49a9 100644
> --- a/arch/arm/kernel/perf_event_v7.c
> +++ b/arch/arm/kernel/perf_event_v7.c
> @@ -1058,6 +1058,12 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	return -EAGAIN;
>  }
>  
> +static void armv7pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	clear_bit(event->hw.idx, cpuc->used_mask);
> +}
> +
>  /*
>   * Add an event filter to a given event. This will only work for PMUv2 PMUs.
>   */
> @@ -1167,6 +1173,7 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->read_counter	= armv7pmu_read_counter;
>  	cpu_pmu->write_counter	= armv7pmu_write_counter;
>  	cpu_pmu->get_event_idx	= armv7pmu_get_event_idx;
> +	cpu_pmu->clear_event_idx = armv7pmu_clear_event_idx;
>  	cpu_pmu->start		= armv7pmu_start;
>  	cpu_pmu->stop		= armv7pmu_stop;
>  	cpu_pmu->reset		= armv7pmu_reset;
> @@ -1637,6 +1644,7 @@ static void krait_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool krait_event = EVENT_CPU(hwc->config_base);
>  
> +	armv7pmu_clear_event_idx(cpuc, event);
>  	if (venum_event || krait_event) {
>  		bit = krait_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);
> @@ -1966,6 +1974,7 @@ static void scorpion_pmu_clear_event_idx(struct pmu_hw_events *cpuc,
>  	bool venum_event = EVENT_VENUM(hwc->config_base);
>  	bool scorpion_event = EVENT_CPU(hwc->config_base);
>  
> +	armv7pmu_clear_event_idx(cpuc, event);
>  	if (venum_event || scorpion_event) {
>  		bit = scorpion_event_to_bit(event, region, group);
>  		clear_bit(bit, cpuc->used_mask);
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 66a2ffd..ac66851 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -778,6 +778,12 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	return -EAGAIN;
>  }
>  
> +static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
> +				  struct perf_event *event)
> +{
> +	clear_bit(event->hw.idx, cpuc->used_mask);
> +}
> +
>  /*
>   * Add an event filter to a given event. This will only work for PMUv2 PMUs.
>   */
> @@ -956,6 +962,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
>  	cpu_pmu->read_counter		= armv8pmu_read_counter,
>  	cpu_pmu->write_counter		= armv8pmu_write_counter,
>  	cpu_pmu->get_event_idx		= armv8pmu_get_event_idx,
> +	cpu_pmu->clear_event_idx	= armv8pmu_clear_event_idx,
>  	cpu_pmu->start			= armv8pmu_start,
>  	cpu_pmu->stop			= armv8pmu_stop,
>  	cpu_pmu->reset			= armv8pmu_reset,
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8cad6b5..a288810 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -238,11 +238,10 @@ armpmu_del(struct perf_event *event, int flags)
>  
>  	armpmu_stop(event, PERF_EF_UPDATE);
>  	hw_events->events[idx] = NULL;
> -	clear_bit(idx, hw_events->used_mask);
> -	if (armpmu->clear_event_idx)
> -		armpmu->clear_event_idx(hw_events, event);
> -
> +	armpmu->clear_event_idx(hw_events, event);
>  	perf_event_update_userpage(event);
> +	/* Clear the allocated counter */
> +	hwc->idx = -1;
>  }
>  
>  static int
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 7/7] arm64: perf: Add support for chaining event counters
  2018-07-02 21:59 ` [PATCH v4 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
@ 2018-07-03 13:00   ` Mark Rutland
  2018-07-03 13:12     ` Suzuki K Poulose
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2018-07-03 13:00 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, will.deacon, julien.thierry,
	robin.murphy

On Mon, Jul 02, 2018 at 10:59:48PM +0100, Suzuki K Poulose wrote:
> Add support for 64bit event by using chained event counters
> and 64bit cycle counters.
> 
> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
> forming a 64-bit counter. The low/even counter is programmed to count
> the event of interest, and the high/odd counter is programmed to count
> the CHAIN event, taken when the low/even counter overflows.
> 
> For CPU cycles, when 64bit mode is requested, the cycle counter
> is used in 64bit mode. If the cycle counter is not available,
> falls back to chaining.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v3:
>  - Rename format name from "bits64 => long"
>  - Address other comments on style.
> ---
>  arch/arm64/kernel/perf_event.c | 185 +++++++++++++++++++++++++++++++++++------
>  drivers/perf/arm_pmu.c         |  13 ++-
>  2 files changed, 164 insertions(+), 34 deletions(-)

> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
> +				   struct arm_pmu *cpu_pmu)
> +{
> +	int idx;
> +
> +	/*
> +	 * Chaining requires two consecutive event counters, where
> +	 * the lower idx must be even.
> +	 */
> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
> +			/* Check if the preceding even counter is available */
> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
> +				return idx;
> +			/* Release the Odd counter */
> +			clear_bit(idx, cpuc->used_mask);
> +		}
> +	}
> +	return -EAGAIN;
> +}

This means that we'll sometimes fail to pack events, but I guess that
most of the time the rotation logic will save us.

We might need to defer counter allocation in future if that's a real
problem.

> @@ -665,14 +665,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>  	int idx;
>  
>  	for (idx = 0; idx < armpmu->num_events; idx++) {
> -		/*
> -		 * If the counter is not used skip it, there is no
> -		 * need of stopping/restarting it.
> -		 */
> -		if (!test_bit(idx, hw_events->used_mask))
> -			continue;
> -
>  		event = hw_events->events[idx];
> +		/*
> +		 * If there is no event at this idx (e.g, an idx used
> +		 * by a chained event in Arm v8 PMUv3), skip it.
> +		 */
> +		if (!event)
> +			continue;

I think we can drop the comment here.

Other than the above and the xscale fixup, this looks good to me.

Have you thrown the perf fuzzer at this?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 7/7] arm64: perf: Add support for chaining event counters
  2018-07-03 13:00   ` Mark Rutland
@ 2018-07-03 13:12     ` Suzuki K Poulose
  0 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-03 13:12 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, will.deacon, julien.thierry,
	robin.murphy

On 03/07/18 14:00, Mark Rutland wrote:
> On Mon, Jul 02, 2018 at 10:59:48PM +0100, Suzuki K Poulose wrote:
>> Add support for 64bit event by using chained event counters
>> and 64bit cycle counters.
>>
>> PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
>> forming a 64-bit counter. The low/even counter is programmed to count
>> the event of interest, and the high/odd counter is programmed to count
>> the CHAIN event, taken when the low/even counter overflows.
>>
>> For CPU cycles, when 64bit mode is requested, the cycle counter
>> is used in 64bit mode. If the cycle counter is not available,
>> falls back to chaining.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v3:
>>   - Rename format name from "bits64 => long"
>>   - Address other comments on style.
>> ---
>>   arch/arm64/kernel/perf_event.c | 185 +++++++++++++++++++++++++++++++++++------
>>   drivers/perf/arm_pmu.c         |  13 ++-
>>   2 files changed, 164 insertions(+), 34 deletions(-)
> 
>> +static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
>> +				   struct arm_pmu *cpu_pmu)
>> +{
>> +	int idx;
>> +
>> +	/*
>> +	 * Chaining requires two consecutive event counters, where
>> +	 * the lower idx must be even.
>> +	 */
>> +	for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) {
>> +		if (!test_and_set_bit(idx, cpuc->used_mask)) {
>> +			/* Check if the preceding even counter is available */
>> +			if (!test_and_set_bit(idx - 1, cpuc->used_mask))
>> +				return idx;
>> +			/* Release the Odd counter */
>> +			clear_bit(idx, cpuc->used_mask);
>> +		}
>> +	}
>> +	return -EAGAIN;
>> +}
> 
> This means that we'll sometimes fail to pack events, but I guess that
> most of the time the rotation logic will save us.
> 
> We might need to defer counter allocation in future if that's a real
> problem.

Ok.

> 
>> @@ -665,14 +665,13 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
>>   	int idx;
>>   
>>   	for (idx = 0; idx < armpmu->num_events; idx++) {
>> -		/*
>> -		 * If the counter is not used skip it, there is no
>> -		 * need of stopping/restarting it.
>> -		 */
>> -		if (!test_bit(idx, hw_events->used_mask))
>> -			continue;
>> -
>>   		event = hw_events->events[idx];
>> +		/*
>> +		 * If there is no event at this idx (e.g, an idx used
>> +		 * by a chained event in Arm v8 PMUv3), skip it.
>> +		 */
>> +		if (!event)
>> +			continue;
> 
> I think we can drop the comment here.
> 
> Other than the above and the xscale fixup, this looks good to me.

Thanks, I will fix it up.

> 
> Have you thrown the perf fuzzer at this?

I tried fuzzer on the earlier version, but the fuzzer itself crashes
due to its own bug (even without the series). I vaguely remember that it
gets SIGSEGV due to some operation on an fd (which was a tty).
I will re-run it on the latest series with 4.18-rc3.


Thanks
Suzuki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs
  2018-07-03 12:49   ` Mark Rutland
@ 2018-07-03 13:13     ` Suzuki K Poulose
  0 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2018-07-03 13:13 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, will.deacon, julien.thierry,
	robin.murphy

On 03/07/18 13:49, Mark Rutland wrote:
> On Mon, Jul 02, 2018 at 10:59:45PM +0100, Suzuki K Poulose wrote:
>> The armpmu uses get_event_idx callback to allocate an event
>> counter for a given event, which marks the selected counter
>> as "used". Now, when we delete the counter, the arm_pmu goes
>> ahead and clears the "used" bit and then invokes the "clear_event_idx"
>> call back, which kind of splits the job between the core code
>> and the backend. To keep things tidy, mandate the implementation
>> of clear_event_idx() and add it for exisiting backends.
>> This will be useful for adding the chained event support, where
>> we leave the event idx maintenance to the backend.
>>
>> Also, when an event is removed from the PMU, reset the hw.idx
>> to indicate that a counter is not allocated for this event,
>> to help the backends do better checks. This will be also used
>> for the chain counter support.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v3:
>>   - Add clear_event_idx call back for backends.
>> ---
>>   arch/arm/kernel/perf_event_v6.c | 8 ++++++++
>>   arch/arm/kernel/perf_event_v7.c | 9 +++++++++
>>   arch/arm64/kernel/perf_event.c  | 7 +++++++
>>   drivers/perf/arm_pmu.c          | 7 +++----
>>   4 files changed, 27 insertions(+), 4 deletions(-)
> 
> I beleive this is missing the xscale PMU code.

Ah! Thanks for spotting. I will fix that.
> 
> Otherwiwse, this looks good to me.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-07-03 13:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02 21:59 [PATCH v4 0/7] arm64: perf: Support for chained counters Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 1/7] arm_pmu: Clean up maximum period handling Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 2/7] arm_pmu: Change API to support 64bit counter values Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 3/7] arm_pmu: Add support for 64bit event counters Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 4/7] arm_pmu: Tidy up clear_event_idx call backs Suzuki K Poulose
2018-07-03 12:49   ` Mark Rutland
2018-07-03 13:13     ` Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 5/7] arm64: perf: Clean up armv8pmu_select_counter Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 6/7] arm64: perf: Disable PMU while processing counter overflows Suzuki K Poulose
2018-07-02 21:59 ` [PATCH v4 7/7] arm64: perf: Add support for chaining event counters Suzuki K Poulose
2018-07-03 13:00   ` Mark Rutland
2018-07-03 13:12     ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).