linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/11] arm-cci: PMU updates
@ 2016-01-04 11:54 Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 01/11] arm-cci: Define CCI counter period Suzuki K. Poulose
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

This series includes:

 - Work around for writing to CCI-500/550(introduced later) PMU
   counters (Patches 1-9)
 - Support for CCI-550 PMU (10-11) with Acked-bys.

Since all of these are related I am clubbing it all in one series
so that it is easier to carry them around (and merge it possibly).

The CCI PMU driver sets the event counter to the half of the maximum
value(2^31) it can count before we start the counters via
pmu_event_set_period(). This is done to give us the best chance to
handle the overflow interrupt, taking care of extreme interrupt latencies.

However, CCI-500 comes with advanced power saving schemes, which disables
the clock to the event counters unless the counters are enabled to count
(PMCR.CEN). This prevents the driver from writing the period to the
counters before starting them.  Also, there is no way we can reset the
individual event counter to 0 (PMCR.RST resets all the counters, losing
their current readings). However the value of the counter is preserved and
could be read back, when the counters are not enabled.

So we cannot reliably use the counters and compute the number of events
generated during the sampling period since we don't have the value of the
counter at start.

Here are the possible solutions:

 1) Disable clock gating on CCI-500 by setting Control_Override_Reg[bit3].
    - The Control_Override_Reg is secure (and hence not programmable from
      Linux), and also has an impact on power consumption.

 2) Change the order of operations
	i.e,
	a) Program and enable individual counters
	b) Enable counting on all the counters by setting PMCR.CEN
	c) Write the period to the individual counters
	d) Disable the counters
    - This could cause in unnecessary noise in the other counters and is
      costly (we should repeat this for all enabled counters).

 3) Don't set the counter value, instead use the current count as the
    starting count and compute the delta at the end of sampling.

 4) Modified version of 2, which disables all the other counters, except
    the target counter, with the target counter programmed with an invalid
    event code(which guarantees that the counter won't change during the
    operation).

This patch implements option 4 for CCI-500(and CCI-550). CCI-400 behavior
remains unchanged.

The tree including [1] on top of 4.4-rc8 is available at :

    git://linux-arm.org/linux-skp.git   cci-updates/4.4-rc8

Changes since V4:
 - Drop transaction hooks. Instead, group and delay the writes to pmu_enable().
 - Rebased to 4.4-rc8

Changes sinces V3:
 - Added transaction hooks to batch the writes to PMU counters for
   group events.
 - Pulled ARM CCI 550 PMU support patches

Changes since V2:
 - Rebased to 4.4-rc1 + Mark's patch to simply PMU syfs attributes [1]
 - Address comments on v2.
 - Split the introduction of write_counter hook to a separate patch

Changes since V1:
 - Choose 4 instead of 3 above, suggested by Mark Rutland

 [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/373129.html

Suzuki K. Poulose (11):
  arm-cci: Define CCI counter period
  arm-cci: Refactor pmu_write_counter
  arm-cci: Group writes to counter
  arm-cci: Refactor CCI PMU enable/disable methods
  arm-cci PMU: Delay counter writes to pmu_enable
  arm-cci: Get the status of a counter
  arm-cci: Add routines to save/restore all counters
  arm-cci: Provide hook for writing to PMU counters
  arm-cci: CCI-500: Work around PMU counter writes
  arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU
  arm-cci: CoreLink CCI-550 PMU driver

 Documentation/devicetree/bindings/arm/cci.txt |    2 +
 drivers/bus/Kconfig                           |   10 +-
 drivers/bus/arm-cci.c                         |  524 +++++++++++++++++++------
 3 files changed, 408 insertions(+), 128 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 01/11] arm-cci: Define CCI counter period
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 18:27   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter Suzuki K. Poulose
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Instead of hard coding the period we program on the PMU
counters, define a symbol.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index ee47e6b..3786879 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -85,6 +85,14 @@ static const struct of_device_id arm_cci_matches[] = {
 #define CCI_PMU_CNTR_MASK		((1ULL << 32) -1)
 #define CCI_PMU_CNTR_LAST(cci_pmu)	(cci_pmu->num_cntrs - 1)
 
+/*
+ * The CCI PMU counters have a period of 2^32. To account for the
+ * possiblity of extreme interrupt latency we program for a period of
+ * half that. Hopefully we can handle the interrupt before another 2^31
+ * events occur and the counter overtakes its previous value.
+ */
+#define CCI_CNTR_PERIOD		(1UL << 31)
+
 #define CCI_PMU_MAX_HW_CNTRS(model) \
 	((model)->num_hw_cntrs + (model)->fixed_hw_cntrs)
 
@@ -797,15 +805,8 @@ static void pmu_read(struct perf_event *event)
 void pmu_event_set_period(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	/*
-	 * The CCI PMU counters have a period of 2^32. To account for the
-	 * possiblity of extreme interrupt latency we program for a period of
-	 * half that. Hopefully we can handle the interrupt before another 2^31
-	 * events occur and the counter overtakes its previous value.
-	 */
-	u64 val = 1ULL << 31;
-	local64_set(&hwc->prev_count, val);
-	pmu_write_counter(event, val);
+	local64_set(&hwc->prev_count, CCI_CNTR_PERIOD);
+	pmu_write_counter(event, CCI_CNTR_PERIOD);
 }
 
 static irqreturn_t pmu_handle_irq(int irq_num, void *dev)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 01/11] arm-cci: Define CCI counter period Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 19:01   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 03/11] arm-cci: Group writes to counter Suzuki K. Poulose
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Refactor pmu_write_counter to add __pmu_write_counter() which
will actually write to the counter once the event is validated.
This can be used by hooks specific to CCI PMU model to program
the counter, where the event is already validated.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 3786879..ce0d3ef 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -767,16 +767,22 @@ static u32 pmu_read_counter(struct perf_event *event)
 	return value;
 }
 
+static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
+{
+	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
+}
+
 static void pmu_write_counter(struct perf_event *event, u32 value)
 {
 	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
 	struct hw_perf_event *hw_counter = &event->hw;
 	int idx = hw_counter->idx;
 
-	if (unlikely(!pmu_is_valid_counter(cci_pmu, idx)))
+	if (unlikely(!pmu_is_valid_counter(cci_pmu, idx))) {
 		dev_err(&cci_pmu->plat_device->dev, "Invalid CCI PMU counter %d\n", idx);
-	else
-		pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
+		return;
+	}
+	__pmu_write_counter(cci_pmu, value, idx);
 }
 
 static u64 pmu_event_update(struct perf_event *event)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 03/11] arm-cci: Group writes to counter
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 01/11] arm-cci: Define CCI counter period Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 19:03   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 04/11] arm-cci: Refactor CCI PMU enable/disable methods Suzuki K. Poulose
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Add a helper to group the writes to PMU counter, this will be
used to delay setting the event period to pmu::pmu_enable()

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index ce0d3ef..f6b8717 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -785,6 +785,21 @@ static void pmu_write_counter(struct perf_event *event, u32 value)
 	__pmu_write_counter(cci_pmu, value, idx);
 }
 
+/* Write a value to a given set of counters */
+static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
+{
+	int i;
+
+	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
+		__pmu_write_counter(cci_pmu, value, i);
+}
+
+static void __maybe_unused
+pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
+{
+	__pmu_write_counters(cci_pmu, mask, value);
+}
+
 static u64 pmu_event_update(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 04/11] arm-cci: Refactor CCI PMU enable/disable methods
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (2 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 03/11] arm-cci: Group writes to counter Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable Suzuki K. Poulose
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

This patch refactors the CCI PMU driver code a little bit to
make it easier share the code for enabling/disabling the CCI
PMU. This will be used by the hooks to work around the special cases
where writing to a counter is not always that easy(e.g, CCI-500)

No functional changes.

Cc: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index f6b8717..0189f3a 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -675,6 +675,26 @@ static u32 pmu_get_max_counters(void)
 		CCI_PMCR_NCNT_MASK) >> CCI_PMCR_NCNT_SHIFT;
 }
 
+/* Should be called with cci_pmu->hw_events->pmu_lock held */
+static void __cci_pmu_enable(void)
+{
+	u32 val;
+
+	/* Enable all the PMU counters. */
+	val = readl_relaxed(cci_ctrl_base + CCI_PMCR) | CCI_PMCR_CEN;
+	writel(val, cci_ctrl_base + CCI_PMCR);
+}
+
+/* Should be called with cci_pmu->hw_events->pmu_lock held */
+static void __cci_pmu_disable(void)
+{
+	u32 val;
+
+	/* Disable all the PMU counters. */
+	val = readl_relaxed(cci_ctrl_base + CCI_PMCR) & ~CCI_PMCR_CEN;
+	writel(val, cci_ctrl_base + CCI_PMCR);
+}
+
 static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *event)
 {
 	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
@@ -902,16 +922,12 @@ static void cci_pmu_enable(struct pmu *pmu)
 	struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
 	int enabled = bitmap_weight(hw_events->used_mask, cci_pmu->num_cntrs);
 	unsigned long flags;
-	u32 val;
 
 	if (!enabled)
 		return;
 
 	raw_spin_lock_irqsave(&hw_events->pmu_lock, flags);
-
-	/* Enable all the PMU counters. */
-	val = readl_relaxed(cci_ctrl_base + CCI_PMCR) | CCI_PMCR_CEN;
-	writel(val, cci_ctrl_base + CCI_PMCR);
+	__cci_pmu_enable();
 	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
 
 }
@@ -921,13 +937,9 @@ static void cci_pmu_disable(struct pmu *pmu)
 	struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
 	struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
 	unsigned long flags;
-	u32 val;
 
 	raw_spin_lock_irqsave(&hw_events->pmu_lock, flags);
-
-	/* Disable all the PMU counters. */
-	val = readl_relaxed(cci_ctrl_base + CCI_PMCR) & ~CCI_PMCR_CEN;
-	writel(val, cci_ctrl_base + CCI_PMCR);
+	__cci_pmu_disable();
 	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (3 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 04/11] arm-cci: Refactor CCI PMU enable/disable methods Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 19:24   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 06/11] arm-cci: Get the status of a counter Suzuki K. Poulose
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Delay setting the event periods for enabled events to pmu::pmu_enable().
We mark the event.hw->state PERF_HES_ARCH for the events that we know
have their counts recorded and have been started. Since we reprogram the
counters every time before count, we can set the counters for all the
event counters which are !STOPPED && ARCH.

Grouping the writes to counters can ammortise the cost of the operation
on PMUs where it is expensive (e.g, CCI-500).

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 0189f3a..c768ee4 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -916,6 +916,40 @@ static void hw_perf_event_destroy(struct perf_event *event)
 	}
 }
 
+/*
+ * Program the CCI PMU counters which have PERF_HES_ARCH set
+ * with the event period and mark them ready before we enable
+ * PMU.
+ */
+void cci_pmu_update_counters(struct cci_pmu *cci_pmu)
+{
+	int i;
+	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
+
+	memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
+
+	for_each_set_bit(i, cci_pmu->hw_events.used_mask, cci_pmu->num_cntrs) {
+		struct hw_perf_event *hwe;
+
+		if (!cci_pmu->hw_events.events[i]) {
+			WARN_ON(1);
+			continue;
+		}
+
+		hwe = &cci_pmu->hw_events.events[i]->hw;
+		/* Leave the events which are not counting */
+		if (hwe->state & PERF_HES_STOPPED)
+			continue;
+		if (hwe->state & PERF_HES_ARCH) {
+			set_bit(i, mask);
+			hwe->state &= ~PERF_HES_ARCH;
+			local64_set(&hwe->prev_count, CCI_CNTR_PERIOD);
+		}
+	}
+
+	pmu_write_counters(cci_pmu, mask, CCI_CNTR_PERIOD);
+}
+
 static void cci_pmu_enable(struct pmu *pmu)
 {
 	struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
@@ -927,6 +961,7 @@ static void cci_pmu_enable(struct pmu *pmu)
 		return;
 
 	raw_spin_lock_irqsave(&hw_events->pmu_lock, flags);
+	cci_pmu_update_counters(cci_pmu);
 	__cci_pmu_enable();
 	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
 
@@ -980,8 +1015,11 @@ static void cci_pmu_start(struct perf_event *event, int pmu_flags)
 	/* Configure the counter unless you are counting a fixed event */
 	if (!pmu_fixed_hw_idx(cci_pmu, idx))
 		pmu_set_event(cci_pmu, idx, hwc->config_base);
-
-	pmu_event_set_period(event);
+	/*
+	 * Mark this counter, so that we can program the
+	 * counter with the event_period. see cci_pmu_enable()
+	 */
+	hwc->state = PERF_HES_ARCH;
 	pmu_enable_counter(cci_pmu, idx);
 
 	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 06/11] arm-cci: Get the status of a counter
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (4 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters Suzuki K. Poulose
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Add helper routines to check if the counter is enabled or not.

Cc: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index c768ee4..a3938ef 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -660,6 +660,12 @@ static void pmu_enable_counter(struct cci_pmu *cci_pmu, int idx)
 	pmu_write_register(cci_pmu, 1, idx, CCI_PMU_CNTR_CTRL);
 }
 
+static bool __maybe_unused
+pmu_counter_is_enabled(struct cci_pmu *cci_pmu, int idx)
+{
+	return (pmu_read_register(cci_pmu, idx, CCI_PMU_CNTR_CTRL) & 0x1) != 0;
+}
+
 static void pmu_set_event(struct cci_pmu *cci_pmu, int idx, unsigned long event)
 {
 	pmu_write_register(cci_pmu, event, idx, CCI_PMU_EVT_SEL);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (5 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 06/11] arm-cci: Get the status of a counter Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-11 10:50   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters Suzuki K. Poulose
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Adds helper routines to disable the counter controls for
all the counters on the CCI PMU and restore it back, by
preserving the original state in caller provided mask.

Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index a3938ef..2f1fcf0 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -672,6 +672,44 @@ static void pmu_set_event(struct cci_pmu *cci_pmu, int idx, unsigned long event)
 }
 
 /*
+ * For all counters on the CCI-PMU, disable any 'enabled' counters,
+ * saving the changed counters in the mask, so that we can restore
+ * it later using pmu_restore_counters. The mask is private to the
+ * caller. We cannot rely on the used_mask maintained by the CCI_PMU
+ * as it only tells us if the counter is assigned to perf_event or not.
+ * The state of the perf_event cannot be locked by the PMU layer, hence
+ * we check the individual counter status (which can be locked by
+ * cci_pm->hw_events->pmu_lock).
+ *
+ * @mask should be initialised by the caller.
+ */
+static void __maybe_unused
+pmu_save_counters(struct cci_pmu *cci_pmu, unsigned long *mask)
+{
+	int i;
+
+	for (i = 0; i < cci_pmu->num_cntrs; i++) {
+		if (pmu_counter_is_enabled(cci_pmu, i)) {
+			set_bit(i, mask);
+			pmu_disable_counter(cci_pmu, i);
+		}
+	}
+}
+
+/*
+ * Restore the status of the counters. Reversal of the pmu_disable_counters().
+ * For each counter set in the mask, enable the counter back.
+ */
+static void __maybe_unused
+pmu_restore_counters(struct cci_pmu *cci_pmu, unsigned long *mask)
+{
+	int i;
+
+	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
+		pmu_enable_counter(cci_pmu, i);
+}
+
+/*
  * Returns the number of programmable counters actually implemented
  * by the cci
  */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (6 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-11 10:54   ` Mark Rutland
  2016-01-04 11:54 ` [PATCH v5 09/11] arm-cci: CCI-500: Work around PMU counter writes Suzuki K. Poulose
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Add a hook for writing to CCI PMU counters. This callback
can be used for CCI models which requires some extra work
to program the PMU counter values. To accommodate group writes
and single counter writes, the call back accepts a bitmask
of the counter indices which need to be programmed with the
given value.

Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 2f1fcf0..47c9581 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -134,6 +134,7 @@ struct cci_pmu_model {
 	struct event_range event_ranges[CCI_IF_MAX];
 	int (*validate_hw_event)(struct cci_pmu *, unsigned long);
 	int (*get_event_idx)(struct cci_pmu *, struct cci_pmu_hw_events *, unsigned long);
+	void (*write_counters)(struct cci_pmu *, unsigned long *, u32 val);
 };
 
 static struct cci_pmu_model cci_pmu_models[];
@@ -846,7 +847,15 @@ static void pmu_write_counter(struct perf_event *event, u32 value)
 		dev_err(&cci_pmu->plat_device->dev, "Invalid CCI PMU counter %d\n", idx);
 		return;
 	}
-	__pmu_write_counter(cci_pmu, value, idx);
+
+	if (cci_pmu->model->write_counters) {
+		unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
+
+		memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
+		set_bit(idx, mask);
+		cci_pmu->model->write_counters(cci_pmu, mask, value);
+	} else
+		__pmu_write_counter(cci_pmu, value, idx);
 }
 
 /* Write a value to a given set of counters */
@@ -861,7 +870,10 @@ static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u
 static void __maybe_unused
 pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
 {
-	__pmu_write_counters(cci_pmu, mask, value);
+	if (cci_pmu->model->write_counters)
+		cci_pmu->model->write_counters(cci_pmu, mask, value);
+	else
+		__pmu_write_counters(cci_pmu, mask, value);
 }
 
 static u64 pmu_event_update(struct perf_event *event)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 09/11] arm-cci: CCI-500: Work around PMU counter writes
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (7 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 10/11] arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 11/11] arm-cci: CoreLink CCI-550 PMU driver Suzuki K. Poulose
  10 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

The CCI PMU driver sets the event counter to the half of the maximum
value(2^31) it can count before we start the counters via
pmu_event_set_period(). This is done to give us the best chance to
handle the overflow interrupt, taking care of extreme interrupt latencies.

However, CCI-500 comes with advanced power saving schemes, which
disables the clock to the event counters unless the counters are enabled to
count (PMCR.CEN). This prevents the driver from writing the period to the
counters before starting them.  Also, there is no way we can reset the
individual event counter to 0 (PMCR.RST resets all the counters, losing
their current readings). However the value of the counter is preserved and
could be read back, when the counters are not enabled.

So we cannot reliably use the counters and compute the number of events
generated during the sampling period since we don't have the value of the
counter at start.

This patch works around this issue by changing writes to the counter
with the following steps.

 1) Disable all the counters (remembering any counters which were enabled)
 2) Enable the PMU, now that all the counters are disabled.

 For each counter to be programmed, repeat steps 3-7
 3) Save the current event and program the target counter to count an
    invalid event, which by spec is guaranteed to not-generate any events.
 4) Enable the target counter.
 5) Write to the target counter.
 6) Disable the target counter
 7) Restore the event back on the target counter.

 8) Disable the PMU
 9) Restore the status of the all the counters

Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 47c9581..cb2b468 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -837,6 +837,68 @@ static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
 	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
 }
 
+#ifdef CONFIG_ARM_CCI500_PMU
+
+/*
+ * CCI-500 has advanced power saving policies, which could gate the
+ * clocks to the PMU counters, which makes the writes to them ineffective.
+ * The only way to write to those counters is when the global counters
+ * are enabled and the particular counter is enabled.
+ *
+ * So we do the following :
+ *
+ * 1) Disable all the PMU counters, saving their current state
+ * 2) Enable the global PMU profiling, now that all counters are
+ *    disabled.
+ *
+ * For each counter to be programmed, repeat steps 3-7:
+ *
+ * 3) Write an invalid event code to the event control register for the
+      counter, so that the counters are not modified.
+ * 4) Enable the counter control for the counter.
+ * 5) Set the counter value
+ * 6) Restore the event in the target counter
+ * 7) Disable the counter
+ *
+ * 8) Disable the global PMU.
+ * 9) Restore the status of the rest of the counters.
+ *
+ * We choose an event code which has very little chances of getting
+ * assigned a valid code for step(2). We use the highest possible
+ * event code (0x1f) for the master interface 0.
+ */
+#define CCI500_INVALID_EVENT	((CCI500_PORT_M0 << CCI500_PMU_EVENT_SOURCE_SHIFT) | \
+				 (CCI500_PMU_EVENT_CODE_MASK << CCI500_PMU_EVENT_CODE_SHIFT))
+static void cci500_pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
+{
+	int i;
+	unsigned long saved_mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
+
+	memset(saved_mask, 0,
+		BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
+
+	pmu_save_counters(cci_pmu, saved_mask);
+
+	/* Now that all the counters are disabled, we can safely turn the PMU on */
+	__cci_pmu_enable();
+
+	for_each_set_bit(i, mask, cci_pmu->num_cntrs) {
+		u32 event = cci_pmu->hw_events.events[i]->hw.config_base;
+
+		pmu_set_event(cci_pmu, i, CCI500_INVALID_EVENT);
+		pmu_enable_counter(cci_pmu, i);
+		__pmu_write_counter(cci_pmu, value, i);
+		pmu_disable_counter(cci_pmu, i);
+		pmu_set_event(cci_pmu, i, event);
+	}
+
+	__cci_pmu_disable();
+
+	pmu_restore_counters(cci_pmu, saved_mask);
+}
+
+#endif	/* CONFIG_ARM_CCI500_PMU */
+
 static void pmu_write_counter(struct perf_event *event, u32 value)
 {
 	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
@@ -1478,6 +1540,7 @@ static struct cci_pmu_model cci_pmu_models[] = {
 			},
 		},
 		.validate_hw_event = cci500_validate_hw_event,
+		.write_counters	= cci500_pmu_write_counters,
 	},
 #endif
 };
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 10/11] arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (8 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 09/11] arm-cci: CCI-500: Work around PMU counter writes Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  2016-01-04 11:54 ` [PATCH v5 11/11] arm-cci: CoreLink CCI-550 PMU driver Suzuki K. Poulose
  10 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

CCI-550 PMU shares most of the CCI-500 PMU attributes including the
event format, PMU event codes. The only difference is an additional
master interface (MI6 - 0xe). Hence we share the driver code for both,
except for a model specific event validate method.
This patch renames the common CCI500 symbols to CCI5xx, including the
Kconfig symbol.

No functional changes to the PMU driver.

Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/Kconfig   |    2 +-
 drivers/bus/arm-cci.c |  218 +++++++++++++++++++++++++------------------------
 2 files changed, 112 insertions(+), 108 deletions(-)

diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 116b363..3793f4e 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -34,7 +34,7 @@ config ARM_CCI400_PORT_CTRL
 	  Low level power management driver for CCI400 cache coherent
 	  interconnect for ARM platforms.
 
-config ARM_CCI500_PMU
+config ARM_CCI5xx_PMU
 	bool "ARM CCI500 PMU support"
 	depends on (ARM && CPU_V7) || ARM64
 	depends on PERF_EVENTS
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index cb2b468..9180187 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -52,7 +52,7 @@ static const struct of_device_id arm_cci_matches[] = {
 #ifdef CONFIG_ARM_CCI400_COMMON
 	{.compatible = "arm,cci-400", .data = CCI400_PORTS_DATA },
 #endif
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 	{ .compatible = "arm,cci-500", },
 #endif
 	{},
@@ -100,7 +100,7 @@ static const struct of_device_id arm_cci_matches[] = {
 enum {
 	CCI_IF_SLAVE,
 	CCI_IF_MASTER,
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 	CCI_IF_GLOBAL,
 #endif
 	CCI_IF_MAX,
@@ -162,7 +162,7 @@ enum cci_models {
 	CCI400_R0,
 	CCI400_R1,
 #endif
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 	CCI500_R0,
 #endif
 	CCI_MODEL_MAX
@@ -432,73 +432,67 @@ static inline struct cci_pmu_model *probe_cci_model(struct platform_device *pdev
 }
 #endif	/* CONFIG_ARM_CCI400_PMU */
 
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 
 /*
- * CCI500 provides 8 independent event counters that can count
- * any of the events available.
- *
- * CCI500 PMU event id is an 9-bit value made of two parts.
+ * CCI5xx PMU event id is an 9-bit value made of two parts.
  *	 bits [8:5] - Source for the event
- *		      0x0-0x6 - Slave interfaces
- *		      0x8-0xD - Master interfaces
- *		      0xf     - Global Events
- *		      0x7,0xe - Reserved
- *
  *	 bits [4:0] - Event code (specific to type of interface)
+ *
+ *
  */
 
 /* Port ids */
-#define CCI500_PORT_S0			0x0
-#define CCI500_PORT_S1			0x1
-#define CCI500_PORT_S2			0x2
-#define CCI500_PORT_S3			0x3
-#define CCI500_PORT_S4			0x4
-#define CCI500_PORT_S5			0x5
-#define CCI500_PORT_S6			0x6
-
-#define CCI500_PORT_M0			0x8
-#define CCI500_PORT_M1			0x9
-#define CCI500_PORT_M2			0xa
-#define CCI500_PORT_M3			0xb
-#define CCI500_PORT_M4			0xc
-#define CCI500_PORT_M5			0xd
-
-#define CCI500_PORT_GLOBAL 		0xf
-
-#define CCI500_PMU_EVENT_MASK		0x1ffUL
-#define CCI500_PMU_EVENT_SOURCE_SHIFT	0x5
-#define CCI500_PMU_EVENT_SOURCE_MASK	0xf
-#define CCI500_PMU_EVENT_CODE_SHIFT	0x0
-#define CCI500_PMU_EVENT_CODE_MASK	0x1f
-
-#define CCI500_PMU_EVENT_SOURCE(event)	\
-	((event >> CCI500_PMU_EVENT_SOURCE_SHIFT) & CCI500_PMU_EVENT_SOURCE_MASK)
-#define CCI500_PMU_EVENT_CODE(event)	\
-	((event >> CCI500_PMU_EVENT_CODE_SHIFT) & CCI500_PMU_EVENT_CODE_MASK)
-
-#define CCI500_SLAVE_PORT_MIN_EV	0x00
-#define CCI500_SLAVE_PORT_MAX_EV	0x1f
-#define CCI500_MASTER_PORT_MIN_EV	0x00
-#define CCI500_MASTER_PORT_MAX_EV	0x06
-#define CCI500_GLOBAL_PORT_MIN_EV	0x00
-#define CCI500_GLOBAL_PORT_MAX_EV	0x0f
-
-
-#define CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(_name, _config) \
-	CCI_EXT_ATTR_ENTRY(_name, cci500_pmu_global_event_show, \
+#define CCI5xx_PORT_S0			0x0
+#define CCI5xx_PORT_S1			0x1
+#define CCI5xx_PORT_S2			0x2
+#define CCI5xx_PORT_S3			0x3
+#define CCI5xx_PORT_S4			0x4
+#define CCI5xx_PORT_S5			0x5
+#define CCI5xx_PORT_S6			0x6
+
+#define CCI5xx_PORT_M0			0x8
+#define CCI5xx_PORT_M1			0x9
+#define CCI5xx_PORT_M2			0xa
+#define CCI5xx_PORT_M3			0xb
+#define CCI5xx_PORT_M4			0xc
+#define CCI5xx_PORT_M5			0xd
+
+#define CCI5xx_PORT_GLOBAL		0xf
+
+#define CCI5xx_PMU_EVENT_MASK		0x1ffUL
+#define CCI5xx_PMU_EVENT_SOURCE_SHIFT	0x5
+#define CCI5xx_PMU_EVENT_SOURCE_MASK	0xf
+#define CCI5xx_PMU_EVENT_CODE_SHIFT	0x0
+#define CCI5xx_PMU_EVENT_CODE_MASK	0x1f
+
+#define CCI5xx_PMU_EVENT_SOURCE(event)	\
+	((event >> CCI5xx_PMU_EVENT_SOURCE_SHIFT) & CCI5xx_PMU_EVENT_SOURCE_MASK)
+#define CCI5xx_PMU_EVENT_CODE(event)	\
+	((event >> CCI5xx_PMU_EVENT_CODE_SHIFT) & CCI5xx_PMU_EVENT_CODE_MASK)
+
+#define CCI5xx_SLAVE_PORT_MIN_EV	0x00
+#define CCI5xx_SLAVE_PORT_MAX_EV	0x1f
+#define CCI5xx_MASTER_PORT_MIN_EV	0x00
+#define CCI5xx_MASTER_PORT_MAX_EV	0x06
+#define CCI5xx_GLOBAL_PORT_MIN_EV	0x00
+#define CCI5xx_GLOBAL_PORT_MAX_EV	0x0f
+
+
+#define CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(_name, _config) \
+	CCI_EXT_ATTR_ENTRY(_name, cci5xx_pmu_global_event_show, \
 					(unsigned long) _config)
 
-static ssize_t cci500_pmu_global_event_show(struct device *dev,
+static ssize_t cci5xx_pmu_global_event_show(struct device *dev,
 				struct device_attribute *attr, char *buf);
 
-static struct attribute *cci500_pmu_format_attrs[] = {
+static struct attribute *cci5xx_pmu_format_attrs[] = {
 	CCI_FORMAT_EXT_ATTR_ENTRY(event, "config:0-4"),
 	CCI_FORMAT_EXT_ATTR_ENTRY(source, "config:5-8"),
 	NULL,
 };
 
-static struct attribute *cci500_pmu_event_attrs[] = {
+static struct attribute *cci5xx_pmu_event_attrs[] = {
 	/* Slave events */
 	CCI_EVENT_EXT_ATTR_ENTRY(si_rrq_hs_arvalid, 0x0),
 	CCI_EVENT_EXT_ATTR_ENTRY(si_rrq_dev, 0x1),
@@ -543,64 +537,73 @@ static struct attribute *cci500_pmu_event_attrs[] = {
 	CCI_EVENT_EXT_ATTR_ENTRY(mi_w_resp_stall, 0x6),
 
 	/* Global events */
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_0_1, 0x0),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_2_3, 0x1),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_4_5, 0x2),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_6_7, 0x3),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_0_1, 0x4),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_2_3, 0x5),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_4_5, 0x6),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_6_7, 0x7),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_back_invalidation, 0x8),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_stall_alloc_busy, 0x9),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_stall_tt_full, 0xA),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_wrq, 0xB),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_cd_hs, 0xC),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_rq_stall_addr_hazard, 0xD),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snopp_rq_stall_tt_full, 0xE),
-	CCI500_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_rq_tzmp1_prot, 0xF),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_0_1, 0x0),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_2_3, 0x1),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_4_5, 0x2),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_filter_bank_6_7, 0x3),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_0_1, 0x4),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_2_3, 0x5),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_4_5, 0x6),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_access_miss_filter_bank_6_7, 0x7),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_back_invalidation, 0x8),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_stall_alloc_busy, 0x9),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_stall_tt_full, 0xA),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_wrq, 0xB),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_cd_hs, 0xC),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_rq_stall_addr_hazard, 0xD),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snopp_rq_stall_tt_full, 0xE),
+	CCI5xx_GLOBAL_EVENT_EXT_ATTR_ENTRY(cci_snoop_rq_tzmp1_prot, 0xF),
 	NULL
 };
 
-static ssize_t cci500_pmu_global_event_show(struct device *dev,
+static ssize_t cci5xx_pmu_global_event_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
 	struct dev_ext_attribute *eattr = container_of(attr,
 					struct dev_ext_attribute, attr);
 	/* Global events have single fixed source code */
 	return snprintf(buf, PAGE_SIZE, "event=0x%lx,source=0x%x\n",
-				(unsigned long)eattr->var, CCI500_PORT_GLOBAL);
+				(unsigned long)eattr->var, CCI5xx_PORT_GLOBAL);
 }
 
+/*
+ * CCI500 provides 8 independent event counters that can count
+ * any of the events available.
+ * CCI500 PMU event source ids
+ *	0x0-0x6 - Slave interfaces
+ *	0x8-0xD - Master interfaces
+ *	0xf     - Global Events
+ *	0x7,0xe - Reserved
+ */
 static int cci500_validate_hw_event(struct cci_pmu *cci_pmu,
 					unsigned long hw_event)
 {
-	u32 ev_source = CCI500_PMU_EVENT_SOURCE(hw_event);
-	u32 ev_code = CCI500_PMU_EVENT_CODE(hw_event);
+	u32 ev_source = CCI5xx_PMU_EVENT_SOURCE(hw_event);
+	u32 ev_code = CCI5xx_PMU_EVENT_CODE(hw_event);
 	int if_type;
 
-	if (hw_event & ~CCI500_PMU_EVENT_MASK)
+	if (hw_event & ~CCI5xx_PMU_EVENT_MASK)
 		return -ENOENT;
 
 	switch (ev_source) {
-	case CCI500_PORT_S0:
-	case CCI500_PORT_S1:
-	case CCI500_PORT_S2:
-	case CCI500_PORT_S3:
-	case CCI500_PORT_S4:
-	case CCI500_PORT_S5:
-	case CCI500_PORT_S6:
+	case CCI5xx_PORT_S0:
+	case CCI5xx_PORT_S1:
+	case CCI5xx_PORT_S2:
+	case CCI5xx_PORT_S3:
+	case CCI5xx_PORT_S4:
+	case CCI5xx_PORT_S5:
+	case CCI5xx_PORT_S6:
 		if_type = CCI_IF_SLAVE;
 		break;
-	case CCI500_PORT_M0:
-	case CCI500_PORT_M1:
-	case CCI500_PORT_M2:
-	case CCI500_PORT_M3:
-	case CCI500_PORT_M4:
-	case CCI500_PORT_M5:
+	case CCI5xx_PORT_M0:
+	case CCI5xx_PORT_M1:
+	case CCI5xx_PORT_M2:
+	case CCI5xx_PORT_M3:
+	case CCI5xx_PORT_M4:
+	case CCI5xx_PORT_M5:
 		if_type = CCI_IF_MASTER;
 		break;
-	case CCI500_PORT_GLOBAL:
+	case CCI5xx_PORT_GLOBAL:
 		if_type = CCI_IF_GLOBAL;
 		break;
 	default:
@@ -613,7 +616,8 @@ static int cci500_validate_hw_event(struct cci_pmu *cci_pmu,
 
 	return -ENOENT;
 }
-#endif	/* CONFIG_ARM_CCI500_PMU */
+
+#endif	/* CONFIG_ARM_CCI5xx_PMU */
 
 static ssize_t cci_pmu_format_show(struct device *dev,
 			struct device_attribute *attr, char *buf)
@@ -837,7 +841,7 @@ static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
 	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
 }
 
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 
 /*
  * CCI-500 has advanced power saving policies, which could gate the
@@ -867,9 +871,9 @@ static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
  * assigned a valid code for step(2). We use the highest possible
  * event code (0x1f) for the master interface 0.
  */
-#define CCI500_INVALID_EVENT	((CCI500_PORT_M0 << CCI500_PMU_EVENT_SOURCE_SHIFT) | \
-				 (CCI500_PMU_EVENT_CODE_MASK << CCI500_PMU_EVENT_CODE_SHIFT))
-static void cci500_pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
+#define CCI5xx_INVALID_EVENT	((CCI5xx_PORT_M0 << CCI5xx_PMU_EVENT_SOURCE_SHIFT) | \
+				 (CCI5xx_PMU_EVENT_CODE_MASK << CCI5xx_PMU_EVENT_CODE_SHIFT))
+static void cci5xx_pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
 {
 	int i;
 	unsigned long saved_mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
@@ -885,7 +889,7 @@ static void cci500_pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *ma
 	for_each_set_bit(i, mask, cci_pmu->num_cntrs) {
 		u32 event = cci_pmu->hw_events.events[i]->hw.config_base;
 
-		pmu_set_event(cci_pmu, i, CCI500_INVALID_EVENT);
+		pmu_set_event(cci_pmu, i, CCI5xx_INVALID_EVENT);
 		pmu_enable_counter(cci_pmu, i);
 		__pmu_write_counter(cci_pmu, value, i);
 		pmu_disable_counter(cci_pmu, i);
@@ -897,7 +901,7 @@ static void cci500_pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *ma
 	pmu_restore_counters(cci_pmu, saved_mask);
 }
 
-#endif	/* CONFIG_ARM_CCI500_PMU */
+#endif	/* CONFIG_ARM_CCI5xx_PMU */
 
 static void pmu_write_counter(struct perf_event *event, u32 value)
 {
@@ -1517,30 +1521,30 @@ static struct cci_pmu_model cci_pmu_models[] = {
 		.get_event_idx = cci400_get_event_idx,
 	},
 #endif
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 	[CCI500_R0] = {
 		.name = "CCI_500",
 		.fixed_hw_cntrs = 0,
 		.num_hw_cntrs = 8,
 		.cntr_size = SZ_64K,
-		.format_attrs = cci500_pmu_format_attrs,
-		.event_attrs = cci500_pmu_event_attrs,
+		.format_attrs = cci5xx_pmu_format_attrs,
+		.event_attrs = cci5xx_pmu_event_attrs,
 		.event_ranges = {
 			[CCI_IF_SLAVE] = {
-				CCI500_SLAVE_PORT_MIN_EV,
-				CCI500_SLAVE_PORT_MAX_EV,
+				CCI5xx_SLAVE_PORT_MIN_EV,
+				CCI5xx_SLAVE_PORT_MAX_EV,
 			},
 			[CCI_IF_MASTER] = {
-				CCI500_MASTER_PORT_MIN_EV,
-				CCI500_MASTER_PORT_MAX_EV,
+				CCI5xx_MASTER_PORT_MIN_EV,
+				CCI5xx_MASTER_PORT_MAX_EV,
 			},
 			[CCI_IF_GLOBAL] = {
-				CCI500_GLOBAL_PORT_MIN_EV,
-				CCI500_GLOBAL_PORT_MAX_EV,
+				CCI5xx_GLOBAL_PORT_MIN_EV,
+				CCI5xx_GLOBAL_PORT_MAX_EV,
 			},
 		},
 		.validate_hw_event = cci500_validate_hw_event,
-		.write_counters	= cci500_pmu_write_counters,
+		.write_counters	= cci5xx_pmu_write_counters,
 	},
 #endif
 };
@@ -1560,7 +1564,7 @@ static const struct of_device_id arm_cci_pmu_matches[] = {
 		.data	= &cci_pmu_models[CCI400_R1],
 	},
 #endif
-#ifdef CONFIG_ARM_CCI500_PMU
+#ifdef CONFIG_ARM_CCI5xx_PMU
 	{
 		.compatible = "arm,cci-500-pmu,r0",
 		.data = &cci_pmu_models[CCI500_R0],
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 11/11] arm-cci: CoreLink CCI-550 PMU driver
  2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
                   ` (9 preceding siblings ...)
  2016-01-04 11:54 ` [PATCH v5 10/11] arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU Suzuki K. Poulose
@ 2016-01-04 11:54 ` Suzuki K. Poulose
  10 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-04 11:54 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, arm, mark.rutland, punit.agrawal, peterz,
	Suzuki K. Poulose

Add ARM CoreLink CCI-550  cache coherent interconnect PMU
driver support. The CCI-550 PMU shares all the attributes of CCI-500
PMU, except for an additional master interface (MI-6 - 0xe).
CCI-550 requires the same work around as for CCI-500 to
write to the PMU counter.

Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 Documentation/devicetree/bindings/arm/cci.txt |    2 +
 drivers/bus/Kconfig                           |    8 +--
 drivers/bus/arm-cci.c                         |   85 ++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/cci.txt b/Documentation/devicetree/bindings/arm/cci.txt
index aef1d20..a1a5a7e 100644
--- a/Documentation/devicetree/bindings/arm/cci.txt
+++ b/Documentation/devicetree/bindings/arm/cci.txt
@@ -34,6 +34,7 @@ specific to ARM.
 		Definition: must contain one of the following:
 			    "arm,cci-400"
 			    "arm,cci-500"
+			    "arm,cci-550"
 
 	- reg
 		Usage: required
@@ -101,6 +102,7 @@ specific to ARM.
 				 "arm,cci-400-pmu"  - DEPRECATED, permitted only where OS has
 						      secure acces to CCI registers
 				 "arm,cci-500-pmu,r0"
+				 "arm,cci-550-pmu,r0"
 		- reg:
 			Usage: required
 			Value type: Integer cells. A register entry, expressed
diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 3793f4e..54c030b 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -35,14 +35,14 @@ config ARM_CCI400_PORT_CTRL
 	  interconnect for ARM platforms.
 
 config ARM_CCI5xx_PMU
-	bool "ARM CCI500 PMU support"
+	bool "ARM CCI-500/CCI-550 PMU support"
 	depends on (ARM && CPU_V7) || ARM64
 	depends on PERF_EVENTS
 	select ARM_CCI_PMU
 	help
-	  Support for PMU events monitoring on the ARM CCI-500 cache coherent
-	  interconnect. CCI-500 provides 8 independent event counters, which
-	  can count events pertaining to the slave/master interfaces as well
+	  Support for PMU events monitoring on the ARM CCI-500/CCI-550 cache
+	  coherent interconnects. Both of them provide 8 independent event counters,
+	  which can count events pertaining to the slave/master interfaces as well
 	  as the internal events to the CCI.
 
 	  If unsure, say Y
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 9180187..cfe2afe8 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -54,6 +54,7 @@ static const struct of_device_id arm_cci_matches[] = {
 #endif
 #ifdef CONFIG_ARM_CCI5xx_PMU
 	{ .compatible = "arm,cci-500", },
+	{ .compatible = "arm,cci-550", },
 #endif
 	{},
 };
@@ -164,6 +165,7 @@ enum cci_models {
 #endif
 #ifdef CONFIG_ARM_CCI5xx_PMU
 	CCI500_R0,
+	CCI550_R0,
 #endif
 	CCI_MODEL_MAX
 };
@@ -457,6 +459,7 @@ static inline struct cci_pmu_model *probe_cci_model(struct platform_device *pdev
 #define CCI5xx_PORT_M3			0xb
 #define CCI5xx_PORT_M4			0xc
 #define CCI5xx_PORT_M5			0xd
+#define CCI5xx_PORT_M6			0xe
 
 #define CCI5xx_PORT_GLOBAL		0xf
 
@@ -617,6 +620,58 @@ static int cci500_validate_hw_event(struct cci_pmu *cci_pmu,
 	return -ENOENT;
 }
 
+/*
+ * CCI550 provides 8 independent event counters that can count
+ * any of the events available.
+ * CCI550 PMU event source ids
+ *	0x0-0x6 - Slave interfaces
+ *	0x8-0xe - Master interfaces
+ *	0xf     - Global Events
+ *	0x7	- Reserved
+ */
+static int cci550_validate_hw_event(struct cci_pmu *cci_pmu,
+					unsigned long hw_event)
+{
+	u32 ev_source = CCI5xx_PMU_EVENT_SOURCE(hw_event);
+	u32 ev_code = CCI5xx_PMU_EVENT_CODE(hw_event);
+	int if_type;
+
+	if (hw_event & ~CCI5xx_PMU_EVENT_MASK)
+		return -ENOENT;
+
+	switch (ev_source) {
+	case CCI5xx_PORT_S0:
+	case CCI5xx_PORT_S1:
+	case CCI5xx_PORT_S2:
+	case CCI5xx_PORT_S3:
+	case CCI5xx_PORT_S4:
+	case CCI5xx_PORT_S5:
+	case CCI5xx_PORT_S6:
+		if_type = CCI_IF_SLAVE;
+		break;
+	case CCI5xx_PORT_M0:
+	case CCI5xx_PORT_M1:
+	case CCI5xx_PORT_M2:
+	case CCI5xx_PORT_M3:
+	case CCI5xx_PORT_M4:
+	case CCI5xx_PORT_M5:
+	case CCI5xx_PORT_M6:
+		if_type = CCI_IF_MASTER;
+		break;
+	case CCI5xx_PORT_GLOBAL:
+		if_type = CCI_IF_GLOBAL;
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	if (ev_code >= cci_pmu->model->event_ranges[if_type].min &&
+		ev_code <= cci_pmu->model->event_ranges[if_type].max)
+		return hw_event;
+
+	return -ENOENT;
+}
+
 #endif	/* CONFIG_ARM_CCI5xx_PMU */
 
 static ssize_t cci_pmu_format_show(struct device *dev,
@@ -844,7 +899,7 @@ static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
 #ifdef CONFIG_ARM_CCI5xx_PMU
 
 /*
- * CCI-500 has advanced power saving policies, which could gate the
+ * CCI-500/CCI-550 has advanced power saving policies, which could gate the
  * clocks to the PMU counters, which makes the writes to them ineffective.
  * The only way to write to those counters is when the global counters
  * are enabled and the particular counter is enabled.
@@ -1546,6 +1601,30 @@ static struct cci_pmu_model cci_pmu_models[] = {
 		.validate_hw_event = cci500_validate_hw_event,
 		.write_counters	= cci5xx_pmu_write_counters,
 	},
+	[CCI550_R0] = {
+		.name = "CCI_550",
+		.fixed_hw_cntrs = 0,
+		.num_hw_cntrs = 8,
+		.cntr_size = SZ_64K,
+		.format_attrs = cci5xx_pmu_format_attrs,
+		.event_attrs = cci5xx_pmu_event_attrs,
+		.event_ranges = {
+			[CCI_IF_SLAVE] = {
+				CCI5xx_SLAVE_PORT_MIN_EV,
+				CCI5xx_SLAVE_PORT_MAX_EV,
+			},
+			[CCI_IF_MASTER] = {
+				CCI5xx_MASTER_PORT_MIN_EV,
+				CCI5xx_MASTER_PORT_MAX_EV,
+			},
+			[CCI_IF_GLOBAL] = {
+				CCI5xx_GLOBAL_PORT_MIN_EV,
+				CCI5xx_GLOBAL_PORT_MAX_EV,
+			},
+		},
+		.validate_hw_event = cci550_validate_hw_event,
+		.write_counters	= cci5xx_pmu_write_counters,
+	},
 #endif
 };
 
@@ -1569,6 +1648,10 @@ static const struct of_device_id arm_cci_pmu_matches[] = {
 		.compatible = "arm,cci-500-pmu,r0",
 		.data = &cci_pmu_models[CCI500_R0],
 	},
+	{
+		.compatible = "arm,cci-550-pmu,r0",
+		.data = &cci_pmu_models[CCI550_R0],
+	},
 #endif
 	{},
 };
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 01/11] arm-cci: Define CCI counter period
  2016-01-04 11:54 ` [PATCH v5 01/11] arm-cci: Define CCI counter period Suzuki K. Poulose
@ 2016-01-04 18:27   ` Mark Rutland
  2016-01-05  9:50     ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-04 18:27 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:40AM +0000, Suzuki K. Poulose wrote:
> Instead of hard coding the period we program on the PMU
> counters, define a symbol.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index ee47e6b..3786879 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -85,6 +85,14 @@ static const struct of_device_id arm_cci_matches[] = {
>  #define CCI_PMU_CNTR_MASK		((1ULL << 32) -1)
>  #define CCI_PMU_CNTR_LAST(cci_pmu)	(cci_pmu->num_cntrs - 1)
>  
> +/*
> + * The CCI PMU counters have a period of 2^32. To account for the
> + * possiblity of extreme interrupt latency we program for a period of
> + * half that. Hopefully we can handle the interrupt before another 2^31
> + * events occur and the counter overtakes its previous value.
> + */
> +#define CCI_CNTR_PERIOD		(1UL << 31)
> +
>  #define CCI_PMU_MAX_HW_CNTRS(model) \
>  	((model)->num_hw_cntrs + (model)->fixed_hw_cntrs)
>  
> @@ -797,15 +805,8 @@ static void pmu_read(struct perf_event *event)
>  void pmu_event_set_period(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
> -	/*
> -	 * The CCI PMU counters have a period of 2^32. To account for the
> -	 * possiblity of extreme interrupt latency we program for a period of
> -	 * half that. Hopefully we can handle the interrupt before another 2^31
> -	 * events occur and the counter overtakes its previous value.
> -	 */
> -	u64 val = 1ULL << 31;
> -	local64_set(&hwc->prev_count, val);
> -	pmu_write_counter(event, val);
> +	local64_set(&hwc->prev_count, CCI_CNTR_PERIOD);
> +	pmu_write_counter(event, CCI_CNTR_PERIOD);

I think this is a little misleading (and confusing), as we're conflating
the period with its inverse. This wouldn't work for any other value of
CCI_CNTR_PERIOD.

Perhaps s/PERIOD/START_VAL/, leaving everything else as-is?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter
  2016-01-04 11:54 ` [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter Suzuki K. Poulose
@ 2016-01-04 19:01   ` Mark Rutland
  0 siblings, 0 replies; 29+ messages in thread
From: Mark Rutland @ 2016-01-04 19:01 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:41AM +0000, Suzuki K. Poulose wrote:
> Refactor pmu_write_counter to add __pmu_write_counter() which
> will actually write to the counter once the event is validated.
> This can be used by hooks specific to CCI PMU model to program
> the counter, where the event is already validated.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index 3786879..ce0d3ef 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -767,16 +767,22 @@ static u32 pmu_read_counter(struct perf_event *event)
>  	return value;
>  }
>  
> +static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
> +{
> +	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
> +}
> +
>  static void pmu_write_counter(struct perf_event *event, u32 value)
>  {
>  	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
>  	struct hw_perf_event *hw_counter = &event->hw;
>  	int idx = hw_counter->idx;
>  
> -	if (unlikely(!pmu_is_valid_counter(cci_pmu, idx)))
> +	if (unlikely(!pmu_is_valid_counter(cci_pmu, idx))) {
>  		dev_err(&cci_pmu->plat_device->dev, "Invalid CCI PMU counter %d\n", idx);
> -	else
> -		pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
> +		return;
> +	}
> +	__pmu_write_counter(cci_pmu, value, idx);
>  }
>  
>  static u64 pmu_event_update(struct perf_event *event)
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 03/11] arm-cci: Group writes to counter
  2016-01-04 11:54 ` [PATCH v5 03/11] arm-cci: Group writes to counter Suzuki K. Poulose
@ 2016-01-04 19:03   ` Mark Rutland
  2016-01-05 10:51     ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-04 19:03 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:42AM +0000, Suzuki K. Poulose wrote:
> Add a helper to group the writes to PMU counter, this will be
> used to delay setting the event period to pmu::pmu_enable()
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index ce0d3ef..f6b8717 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -785,6 +785,21 @@ static void pmu_write_counter(struct perf_event *event, u32 value)
>  	__pmu_write_counter(cci_pmu, value, idx);
>  }
>  
> +/* Write a value to a given set of counters */
> +static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
> +{
> +	int i;
> +
> +	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
> +		__pmu_write_counter(cci_pmu, value, i);
> +}

I don't understand this as-is. Why do all the counters have the same
value?

> +static void __maybe_unused
> +pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
> +{
> +	__pmu_write_counters(cci_pmu, mask, value);
> +}

Why are these not just one function for now?

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-04 11:54 ` [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable Suzuki K. Poulose
@ 2016-01-04 19:24   ` Mark Rutland
  2016-01-05  9:59     ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-04 19:24 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:44AM +0000, Suzuki K. Poulose wrote:
> Delay setting the event periods for enabled events to pmu::pmu_enable().
> We mark the event.hw->state PERF_HES_ARCH for the events that we know
> have their counts recorded and have been started.

Please add a comment to the code stating exactly what PERF_HES_ARCH
means for the CCI PMU driver, so it's easy to find.

> Since we reprogram the counters every time before count, we can set
> the counters for all the event counters which are !STOPPED && ARCH.
> 
> Grouping the writes to counters can ammortise the cost of the operation
> on PMUs where it is expensive (e.g, CCI-500).
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   42 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index 0189f3a..c768ee4 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -916,6 +916,40 @@ static void hw_perf_event_destroy(struct perf_event *event)
>  	}
>  }
>  
> +/*
> + * Program the CCI PMU counters which have PERF_HES_ARCH set
> + * with the event period and mark them ready before we enable
> + * PMU.
> + */
> +void cci_pmu_update_counters(struct cci_pmu *cci_pmu)
> +{
> +	int i;
> +	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];

I think this can be:

	DECLARE_BITMAP(mask, cci_pmu->num_cntrs);

> +
> +	memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));

Likewise:

	bitmap_zero(mask, cci_pmu->num_cntrs);

> +
> +	for_each_set_bit(i, cci_pmu->hw_events.used_mask, cci_pmu->num_cntrs) {
> +		struct hw_perf_event *hwe;
> +
> +		if (!cci_pmu->hw_events.events[i]) {
> +			WARN_ON(1);
> +			continue;
> +		}
> +

		if (WARN_ON(!cci_pmu->hw_events.events[i]))
			continue;

> +		hwe = &cci_pmu->hw_events.events[i]->hw;
> +		/* Leave the events which are not counting */
> +		if (hwe->state & PERF_HES_STOPPED)
> +			continue;
> +		if (hwe->state & PERF_HES_ARCH) {
> +			set_bit(i, mask);
> +			hwe->state &= ~PERF_HES_ARCH;
> +			local64_set(&hwe->prev_count, CCI_CNTR_PERIOD);
> +		}
> +	}
> +
> +	pmu_write_counters(cci_pmu, mask, CCI_CNTR_PERIOD);
> +}
> +
>  static void cci_pmu_enable(struct pmu *pmu)
>  {
>  	struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
> @@ -927,6 +961,7 @@ static void cci_pmu_enable(struct pmu *pmu)
>  		return;
>  
>  	raw_spin_lock_irqsave(&hw_events->pmu_lock, flags);
> +	cci_pmu_update_counters(cci_pmu);
>  	__cci_pmu_enable();
>  	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
>  
> @@ -980,8 +1015,11 @@ static void cci_pmu_start(struct perf_event *event, int pmu_flags)
>  	/* Configure the counter unless you are counting a fixed event */
>  	if (!pmu_fixed_hw_idx(cci_pmu, idx))
>  		pmu_set_event(cci_pmu, idx, hwc->config_base);
> -
> -	pmu_event_set_period(event);
> +	/*
> +	 * Mark this counter, so that we can program the
> +	 * counter with the event_period. see cci_pmu_enable()
> +	 */
> +	hwc->state = PERF_HES_ARCH;

Why couldn't we have kept pmu_event_set_period here, and have that set
prev_count and PERF_HES_ARCH?

Then we'd be able to do the same betching for overflow too.

What am I missing?

Mark.

>  	pmu_enable_counter(cci_pmu, idx);
>  
>  	raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 01/11] arm-cci: Define CCI counter period
  2016-01-04 18:27   ` Mark Rutland
@ 2016-01-05  9:50     ` Suzuki K. Poulose
  0 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-05  9:50 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 04/01/16 18:27, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 11:54:40AM +0000, Suzuki K. Poulose wrote:
>> Instead of hard coding the period we program on the PMU
>> counters, define a symbol.
>>

>> -	u64 val = 1ULL << 31;
>> -	local64_set(&hwc->prev_count, val);
>> -	pmu_write_counter(event, val);
>> +	local64_set(&hwc->prev_count, CCI_CNTR_PERIOD);
>> +	pmu_write_counter(event, CCI_CNTR_PERIOD);
>
> I think this is a little misleading (and confusing), as we're conflating
> the period with its inverse. This wouldn't work for any other value of
> CCI_CNTR_PERIOD.
>
> Perhaps s/PERIOD/START_VAL/, leaving everything else as-is?

You are right, will change it.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-04 19:24   ` Mark Rutland
@ 2016-01-05  9:59     ` Suzuki K. Poulose
  2016-01-11 10:46       ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-05  9:59 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 04/01/16 19:24, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 11:54:44AM +0000, Suzuki K. Poulose wrote:
>> Delay setting the event periods for enabled events to pmu::pmu_enable().
>> We mark the event.hw->state PERF_HES_ARCH for the events that we know
>> have their counts recorded and have been started.
>
> Please add a comment to the code stating exactly what PERF_HES_ARCH
> means for the CCI PMU driver, so it's easy to find.
>

Sure.

>> +void cci_pmu_update_counters(struct cci_pmu *cci_pmu)
>> +{
>> +	int i;
>> +	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
>
> I think this can be:
>
> 	DECLARE_BITMAP(mask, cci_pmu->num_cntrs);
>
>> +
>> +	memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
>
> Likewise:
>
> 	bitmap_zero(mask, cci_pmu->num_cntrs);

OK

>> +		if (!cci_pmu->hw_events.events[i]) {
>> +			WARN_ON(1);
>> +			continue;
>> +		}
>> +
>
> 		if (WARN_ON(!cci_pmu->hw_events.events[i]))
> 			continue;

OK
  
>> @@ -980,8 +1015,11 @@ static void cci_pmu_start(struct perf_event *event, int pmu_flags)
>>   	/* Configure the counter unless you are counting a fixed event */
>>   	if (!pmu_fixed_hw_idx(cci_pmu, idx))
>>   		pmu_set_event(cci_pmu, idx, hwc->config_base);
>> -
>> -	pmu_event_set_period(event);
>> +	/*
>> +	 * Mark this counter, so that we can program the
>> +	 * counter with the event_period. see cci_pmu_enable()
>> +	 */
>> +	hwc->state = PERF_HES_ARCH;
>
> Why couldn't we have kept pmu_event_set_period here, and have that set
> prev_count and PERF_HES_ARCH?
>
> Then we'd be able to do the same betching for overflow too.

The pmu is not disabled while we are in overflow irq handler. Hence there may
not be a pmu_enable() which would set the period for the counter which
overflowed, if defer the write in that case. Is that assumption wrong ?

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 03/11] arm-cci: Group writes to counter
  2016-01-04 19:03   ` Mark Rutland
@ 2016-01-05 10:51     ` Suzuki K. Poulose
  2016-01-11 10:44       ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-05 10:51 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 04/01/16 19:03, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 11:54:42AM +0000, Suzuki K. Poulose wrote:
>> Add a helper to group the writes to PMU counter, this will be
>> used to delay setting the event period to pmu::pmu_enable()
>>

>> +/* Write a value to a given set of counters */
>> +static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
>> +{
>> +	int i;
>> +
>> +	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
>> +		__pmu_write_counter(cci_pmu, value, i);
>> +}
>
> I don't understand this as-is. Why do all the counters have the same
> value?

The only value we write to the counters is the period. This routine writes
a given value to a set of counters specified by the mask (not to be confused
with the PMU->hw_events->mask). This will help to group the writes to the counters,
especially since preparatory steps to write to a single counter itself is costly.
So, we do all the preparation only once for a batch of counters.

The other option is to use hw_events->prev_count (which should be set before calling
the function) for each counter specified in the mask. I am fine with either of the
two.

>
>> +static void __maybe_unused
>> +pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
>> +{
>> +	__pmu_write_counters(cci_pmu, mask, value);
>> +}
>
> Why are these not just one function for now?

Yes, this could be just one function for now, until we introduce the hooks. This was
a written to avoid another refactoring in the later patch.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 03/11] arm-cci: Group writes to counter
  2016-01-05 10:51     ` Suzuki K. Poulose
@ 2016-01-11 10:44       ` Mark Rutland
  2016-01-11 10:48         ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-11 10:44 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Tue, Jan 05, 2016 at 10:51:47AM +0000, Suzuki K. Poulose wrote:
> On 04/01/16 19:03, Mark Rutland wrote:
> >On Mon, Jan 04, 2016 at 11:54:42AM +0000, Suzuki K. Poulose wrote:
> >>Add a helper to group the writes to PMU counter, this will be
> >>used to delay setting the event period to pmu::pmu_enable()
> >>
> 
> >>+/* Write a value to a given set of counters */
> >>+static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
> >>+{
> >>+	int i;
> >>+
> >>+	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
> >>+		__pmu_write_counter(cci_pmu, value, i);
> >>+}
> >
> >I don't understand this as-is. Why do all the counters have the same
> >value?
> 
> The only value we write to the counters is the period. This routine writes
> a given value to a set of counters specified by the mask (not to be confused
> with the PMU->hw_events->mask). This will help to group the writes to the counters,
> especially since preparatory steps to write to a single counter itself is costly.
> So, we do all the preparation only once for a batch of counters.
> 
> The other option is to use hw_events->prev_count (which should be set before calling
> the function) for each counter specified in the mask. I am fine with either of the
> two.

I think this would be clearer using prev_count.

I guess it doesn't matter since we won't support sampling, but it would
match the shape of other PMU drivers.

> >>+static void __maybe_unused
> >>+pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
> >>+{
> >>+	__pmu_write_counters(cci_pmu, mask, value);
> >>+}
> >
> >Why are these not just one function for now?
> 
> Yes, this could be just one function for now, until we introduce the hooks. This was
> a written to avoid another refactoring in the later patch.

Ok. Either way is fine.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-05  9:59     ` Suzuki K. Poulose
@ 2016-01-11 10:46       ` Mark Rutland
  2016-01-11 11:08         ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-11 10:46 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Tue, Jan 05, 2016 at 09:59:13AM +0000, Suzuki K. Poulose wrote:
> On 04/01/16 19:24, Mark Rutland wrote:
> >On Mon, Jan 04, 2016 at 11:54:44AM +0000, Suzuki K. Poulose wrote:
> >>Delay setting the event periods for enabled events to pmu::pmu_enable().
> >>We mark the event.hw->state PERF_HES_ARCH for the events that we know
> >>have their counts recorded and have been started.
> >
> >Please add a comment to the code stating exactly what PERF_HES_ARCH
> >means for the CCI PMU driver, so it's easy to find.
> >
> 
> Sure.
> 
> >>+void cci_pmu_update_counters(struct cci_pmu *cci_pmu)
> >>+{
> >>+	int i;
> >>+	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
> >
> >I think this can be:
> >
> >	DECLARE_BITMAP(mask, cci_pmu->num_cntrs);
> >
> >>+
> >>+	memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
> >
> >Likewise:
> >
> >	bitmap_zero(mask, cci_pmu->num_cntrs);
> 
> OK
> 
> >>+		if (!cci_pmu->hw_events.events[i]) {
> >>+			WARN_ON(1);
> >>+			continue;
> >>+		}
> >>+
> >
> >		if (WARN_ON(!cci_pmu->hw_events.events[i]))
> >			continue;
> 
> OK
> >>@@ -980,8 +1015,11 @@ static void cci_pmu_start(struct perf_event *event, int pmu_flags)
> >>  	/* Configure the counter unless you are counting a fixed event */
> >>  	if (!pmu_fixed_hw_idx(cci_pmu, idx))
> >>  		pmu_set_event(cci_pmu, idx, hwc->config_base);
> >>-
> >>-	pmu_event_set_period(event);
> >>+	/*
> >>+	 * Mark this counter, so that we can program the
> >>+	 * counter with the event_period. see cci_pmu_enable()
> >>+	 */
> >>+	hwc->state = PERF_HES_ARCH;
> >
> >Why couldn't we have kept pmu_event_set_period here, and have that set
> >prev_count and PERF_HES_ARCH?
> >
> >Then we'd be able to do the same betching for overflow too.
> 
> The pmu is not disabled while we are in overflow irq handler. Hence there may
> not be a pmu_enable() which would set the period for the counter which
> overflowed, if defer the write in that case. Is that assumption wrong ?

As the driver stands today, yes.

However, wouldn't it make more sense to disable the PMU for the overflow
handler, such that we can reuse the batching logic?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 03/11] arm-cci: Group writes to counter
  2016-01-11 10:44       ` Mark Rutland
@ 2016-01-11 10:48         ` Suzuki K. Poulose
  0 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-11 10:48 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 11/01/16 10:44, Mark Rutland wrote:
> On Tue, Jan 05, 2016 at 10:51:47AM +0000, Suzuki K. Poulose wrote:
>> On 04/01/16 19:03, Mark Rutland wrote:
>>> On Mon, Jan 04, 2016 at 11:54:42AM +0000, Suzuki K. Poulose wrote:
>>>> Add a helper to group the writes to PMU counter, this will be
>>>> used to delay setting the event period to pmu::pmu_enable()
>>>>
>>
>>>> +/* Write a value to a given set of counters */
>>>> +static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
>>>> +		__pmu_write_counter(cci_pmu, value, i);
>>>> +}
>>>
>>> I don't understand this as-is. Why do all the counters have the same
>>> value?
>>
>> The only value we write to the counters is the period. This routine writes
>> a given value to a set of counters specified by the mask (not to be confused
>> with the PMU->hw_events->mask). This will help to group the writes to the counters,
>> especially since preparatory steps to write to a single counter itself is costly.
>> So, we do all the preparation only once for a batch of counters.
>>
>> The other option is to use hw_events->prev_count (which should be set before calling
>> the function) for each counter specified in the mask. I am fine with either of the
>> two.
>
> I think this would be clearer using prev_count.
>
> I guess it doesn't matter since we won't support sampling, but it would
> match the shape of other PMU drivers.

OK, will use that.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters
  2016-01-04 11:54 ` [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters Suzuki K. Poulose
@ 2016-01-11 10:50   ` Mark Rutland
  2016-01-11 10:58     ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-11 10:50 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:46AM +0000, Suzuki K. Poulose wrote:
> Adds helper routines to disable the counter controls for
> all the counters on the CCI PMU and restore it back, by
> preserving the original state in caller provided mask.
> 
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index a3938ef..2f1fcf0 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -672,6 +672,44 @@ static void pmu_set_event(struct cci_pmu *cci_pmu, int idx, unsigned long event)
>  }
>  
>  /*
> + * For all counters on the CCI-PMU, disable any 'enabled' counters,
> + * saving the changed counters in the mask, so that we can restore
> + * it later using pmu_restore_counters. The mask is private to the
> + * caller. We cannot rely on the used_mask maintained by the CCI_PMU
> + * as it only tells us if the counter is assigned to perf_event or not.
> + * The state of the perf_event cannot be locked by the PMU layer, hence
> + * we check the individual counter status (which can be locked by
> + * cci_pm->hw_events->pmu_lock).
> + *
> + * @mask should be initialised by the caller.

We should probably state "initialised to zero", or "empty".

> + */
> +static void __maybe_unused
> +pmu_save_counters(struct cci_pmu *cci_pmu, unsigned long *mask)
> +{
> +	int i;
> +
> +	for (i = 0; i < cci_pmu->num_cntrs; i++) {
> +		if (pmu_counter_is_enabled(cci_pmu, i)) {
> +			set_bit(i, mask);
> +			pmu_disable_counter(cci_pmu, i);
> +		}
> +	}
> +}
> +
> +/*
> + * Restore the status of the counters. Reversal of the pmu_disable_counters().
> + * For each counter set in the mask, enable the counter back.
> + */

Shouldn't that say pmu_save_counters?

With that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> +static void __maybe_unused
> +pmu_restore_counters(struct cci_pmu *cci_pmu, unsigned long *mask)
> +{
> +	int i;
> +
> +	for_each_set_bit(i, mask, cci_pmu->num_cntrs)
> +		pmu_enable_counter(cci_pmu, i);
> +}
> +
> +/*
>   * Returns the number of programmable counters actually implemented
>   * by the cci
>   */
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters
  2016-01-04 11:54 ` [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters Suzuki K. Poulose
@ 2016-01-11 10:54   ` Mark Rutland
  2016-01-11 12:14     ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-11 10:54 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 04, 2016 at 11:54:47AM +0000, Suzuki K. Poulose wrote:
> Add a hook for writing to CCI PMU counters. This callback
> can be used for CCI models which requires some extra work
> to program the PMU counter values. To accommodate group writes
> and single counter writes, the call back accepts a bitmask
> of the counter indices which need to be programmed with the
> given value.
> 
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index 2f1fcf0..47c9581 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -134,6 +134,7 @@ struct cci_pmu_model {
>  	struct event_range event_ranges[CCI_IF_MAX];
>  	int (*validate_hw_event)(struct cci_pmu *, unsigned long);
>  	int (*get_event_idx)(struct cci_pmu *, struct cci_pmu_hw_events *, unsigned long);
> +	void (*write_counters)(struct cci_pmu *, unsigned long *, u32 val);
>  };
>  
>  static struct cci_pmu_model cci_pmu_models[];
> @@ -846,7 +847,15 @@ static void pmu_write_counter(struct perf_event *event, u32 value)
>  		dev_err(&cci_pmu->plat_device->dev, "Invalid CCI PMU counter %d\n", idx);
>  		return;
>  	}
> -	__pmu_write_counter(cci_pmu, value, idx);
> +
> +	if (cci_pmu->model->write_counters) {
> +		unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
> +
> +		memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
> +		set_bit(idx, mask);
> +		cci_pmu->model->write_counters(cci_pmu, mask, value);
> +	} else
> +		__pmu_write_counter(cci_pmu, value, idx);
>  }

It would be much simpler to always log writes here, and only do the real
wirtes in batches when we re-enable the PMU (with appropriate
disable/enable calls in the IRQ handler).

We'd still need special hooks for CCIs which require a special dance to
program them, but all the logic to handle the writes would be in one
place.

This should work, regardless.

Thanks,
Mark.

>  
>  /* Write a value to a given set of counters */
> @@ -861,7 +870,10 @@ static void __pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u
>  static void __maybe_unused
>  pmu_write_counters(struct cci_pmu *cci_pmu, unsigned long *mask, u32 value)
>  {
> -	__pmu_write_counters(cci_pmu, mask, value);
> +	if (cci_pmu->model->write_counters)
> +		cci_pmu->model->write_counters(cci_pmu, mask, value);
> +	else
> +		__pmu_write_counters(cci_pmu, mask, value);
>  }
>  
>  static u64 pmu_event_update(struct perf_event *event)
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters
  2016-01-11 10:50   ` Mark Rutland
@ 2016-01-11 10:58     ` Suzuki K. Poulose
  0 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-11 10:58 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 11/01/16 10:50, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 11:54:46AM +0000, Suzuki K. Poulose wrote:
>> Adds helper routines to disable the counter controls for
>> all the counters on the CCI PMU and restore it back, by
>> preserving the original state in caller provided mask.
>>
>> Cc: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/bus/arm-cci.c |   38 ++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 38 insertions(+)
>>
>> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
>> index a3938ef..2f1fcf0 100644
>> --- a/drivers/bus/arm-cci.c
>> +++ b/drivers/bus/arm-cci.c
>> @@ -672,6 +672,44 @@ static void pmu_set_event(struct cci_pmu *cci_pmu, int idx, unsigned long event)
>>   }
>>
>>   /*
>> + * For all counters on the CCI-PMU, disable any 'enabled' counters,
>> + * saving the changed counters in the mask, so that we can restore
>> + * it later using pmu_restore_counters. The mask is private to the
>> + * caller. We cannot rely on the used_mask maintained by the CCI_PMU
>> + * as it only tells us if the counter is assigned to perf_event or not.
>> + * The state of the perf_event cannot be locked by the PMU layer, hence
>> + * we check the individual counter status (which can be locked by
>> + * cci_pm->hw_events->pmu_lock).
>> + *
>> + * @mask should be initialised by the caller.
>
> We should probably state "initialised to zero", or "empty".

Yep, will fix it.

>> +/*
>> + * Restore the status of the counters. Reversal of the pmu_disable_counters().
>> + * For each counter set in the mask, enable the counter back.
>> + */
>
> Shouldn't that say pmu_save_counters?

Yea, missed it in rebase.

> With that:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-11 10:46       ` Mark Rutland
@ 2016-01-11 11:08         ` Suzuki K. Poulose
  2016-01-11 11:24           ` Mark Rutland
  0 siblings, 1 reply; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-11 11:08 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 11/01/16 10:46, Mark Rutland wrote:
> On Tue, Jan 05, 2016 at 09:59:13AM +0000, Suzuki K. Poulose wrote:
>> On 04/01/16 19:24, Mark Rutland wrote:
>>> On Mon, Jan 04, 2016 at 11:54:44AM +0000, Suzuki K. Poulose wrote:
>> The pmu is not disabled while we are in overflow irq handler. Hence there may
>> not be a pmu_enable() which would set the period for the counter which
>> overflowed, if defer the write in that case. Is that assumption wrong ?
>
> As the driver stands today, yes.
>
> However, wouldn't it make more sense to disable the PMU for the overflow
> handler, such that we can reuse the batching logic?

None of the PMU drivers do that AFAIK. Hence, didn't want to change it for
CCI. We could use the batching logic, if decide to do so. I can go ahead
with that if there are no other side effects with that.

Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-11 11:08         ` Suzuki K. Poulose
@ 2016-01-11 11:24           ` Mark Rutland
  2016-01-11 18:12             ` Suzuki K. Poulose
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Rutland @ 2016-01-11 11:24 UTC (permalink / raw)
  To: Suzuki K. Poulose
  Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On Mon, Jan 11, 2016 at 11:08:27AM +0000, Suzuki K. Poulose wrote:
> On 11/01/16 10:46, Mark Rutland wrote:
> >On Tue, Jan 05, 2016 at 09:59:13AM +0000, Suzuki K. Poulose wrote:
> >>On 04/01/16 19:24, Mark Rutland wrote:
> >>>On Mon, Jan 04, 2016 at 11:54:44AM +0000, Suzuki K. Poulose wrote:
> >>The pmu is not disabled while we are in overflow irq handler. Hence there may
> >>not be a pmu_enable() which would set the period for the counter which
> >>overflowed, if defer the write in that case. Is that assumption wrong ?
> >
> >As the driver stands today, yes.
> >
> >However, wouldn't it make more sense to disable the PMU for the overflow
> >handler, such that we can reuse the batching logic?
> 
> None of the PMU drivers do that AFAIK.

I see.

The Intel PMU driver disables the PMU for the interrupt handler; see
intel_pmu_handle_irq in arch/x86/kernel/cpu/perf_event_intel.c. It looks
like that's a special-case for sampling.

I guess we may have the only case where it makes sense to batch counter
writes as opposed to batching configuration writes.

> Hence, didn't want to change it for CCI. We could use the batching
> logic, if decide to do so. I can go ahead with that if there are no
> other side effects with that.

We'll lose events regardless as our RMW sequence will race against the
counters. Batching will make that window slightly larger, but other than
that I don't see a problem.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters
  2016-01-11 10:54   ` Mark Rutland
@ 2016-01-11 12:14     ` Suzuki K. Poulose
  0 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-11 12:14 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 11/01/16 10:54, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 11:54:47AM +0000, Suzuki K. Poulose wrote:

>>   static struct cci_pmu_model cci_pmu_models[];
>> @@ -846,7 +847,15 @@ static void pmu_write_counter(struct perf_event *event, u32 value)
>>   		dev_err(&cci_pmu->plat_device->dev, "Invalid CCI PMU counter %d\n", idx);
>>   		return;
>>   	}
>> -	__pmu_write_counter(cci_pmu, value, idx);
>> +
>> +	if (cci_pmu->model->write_counters) {
>> +		unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
>> +
>> +		memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long));
>> +		set_bit(idx, mask);
>> +		cci_pmu->model->write_counters(cci_pmu, mask, value);
>> +	} else
>> +		__pmu_write_counter(cci_pmu, value, idx);
>>   }
>
> It would be much simpler to always log writes here, and only do the real
> wirtes in batches when we re-enable the PMU (with appropriate
> disable/enable calls in the IRQ handler).
>
> We'd still need special hooks for CCIs which require a special dance to
> program them, but all the logic to handle the writes would be in one
> place.

This one is only there for the writes from the irq handler. Now that
we have decided to disable pmu there, we could batch this one too.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable
  2016-01-11 11:24           ` Mark Rutland
@ 2016-01-11 18:12             ` Suzuki K. Poulose
  0 siblings, 0 replies; 29+ messages in thread
From: Suzuki K. Poulose @ 2016-01-11 18:12 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, arm, punit.agrawal, peterz

On 11/01/16 11:24, Mark Rutland wrote:
> On Mon, Jan 11, 2016 at 11:08:27AM +0000, Suzuki K. Poulose wrote:
>> On 11/01/16 10:46, Mark Rutland wrote:
>>> On Tue, Jan 05, 2016 at 09:59:13AM +0000, Suzuki K. Poulose wrote:
>>>> On 04/01/16 19:24, Mark Rutland wrote:
>> Hence, didn't want to change it for CCI. We could use the batching
>> logic, if decide to do so. I can go ahead with that if there are no
>> other side effects with that.
>
> We'll lose events regardless as our RMW sequence will race against the
> counters. Batching will make that window slightly larger, but other than
> that I don't see a problem.

OK, will do that.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-11 18:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-04 11:54 [PATCH v5 00/11] arm-cci: PMU updates Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 01/11] arm-cci: Define CCI counter period Suzuki K. Poulose
2016-01-04 18:27   ` Mark Rutland
2016-01-05  9:50     ` Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 02/11] arm-cci: Refactor pmu_write_counter Suzuki K. Poulose
2016-01-04 19:01   ` Mark Rutland
2016-01-04 11:54 ` [PATCH v5 03/11] arm-cci: Group writes to counter Suzuki K. Poulose
2016-01-04 19:03   ` Mark Rutland
2016-01-05 10:51     ` Suzuki K. Poulose
2016-01-11 10:44       ` Mark Rutland
2016-01-11 10:48         ` Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 04/11] arm-cci: Refactor CCI PMU enable/disable methods Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 05/11] arm-cci PMU: Delay counter writes to pmu_enable Suzuki K. Poulose
2016-01-04 19:24   ` Mark Rutland
2016-01-05  9:59     ` Suzuki K. Poulose
2016-01-11 10:46       ` Mark Rutland
2016-01-11 11:08         ` Suzuki K. Poulose
2016-01-11 11:24           ` Mark Rutland
2016-01-11 18:12             ` Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 06/11] arm-cci: Get the status of a counter Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 07/11] arm-cci: Add routines to save/restore all counters Suzuki K. Poulose
2016-01-11 10:50   ` Mark Rutland
2016-01-11 10:58     ` Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 08/11] arm-cci: Provide hook for writing to PMU counters Suzuki K. Poulose
2016-01-11 10:54   ` Mark Rutland
2016-01-11 12:14     ` Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 09/11] arm-cci: CCI-500: Work around PMU counter writes Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 10/11] arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU Suzuki K. Poulose
2016-01-04 11:54 ` [PATCH v5 11/11] arm-cci: CoreLink CCI-550 PMU driver Suzuki K. Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).