linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] arm: perf: add support for heterogeneous PMUs
@ 2014-11-07 16:25 Mark Rutland
  2014-11-07 16:25 ` [PATCH 01/11] of: Add empty of_get_next_parent stub Mark Rutland
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

In systems with heterogeneous CPUs (e.g. big.LITTLE) the associated PMUs
also differ in terms of the supported set of events, the precise
behaviour of each of those events, and the number of event counters.
Thus it is not possible to expose these PMUs as a single logical PMU.

Instead a logical PMU is created per CPU microarchitecture, which events
can target directly:

$ perf stat \
  -e armv7_cortex_a7/config=0x11/ \
  -e armv7_cortex_a15/config=0x11/ \
  ./test

 Performance counter stats for './test':

           7980455      armv7_cortex_a7/config=0x11/                                    [27.29%]
           9947934      armv7_cortex_a15/config=0x11/                                    [72.66%]

       0.016734833 seconds time elapsed

This series is based atop of my recent preparatory rework [1,2].

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/295820.html
[2] https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/log/?h=perf/updates

Mark Rutland (11):
  of: Add empty of_get_next_parent stub
  perf: allow for PMU-specific event filtering
  arm: perf: treat PMUs as CPU affine
  arm: perf: filter unschedulable events
  arm: perf: reject multi-pmu groups
  arm: perf: probe number of counters on affine CPUs
  arm: perf: document PMU affinity binding
  arm: perf: add functions to parse affinity from dt
  arm: perf: parse cpu affinity from dt
  arm: perf: remove singleton PMU restriction
  arm: dts: vexpress: describe all PMUs in TC2 dts

 Documentation/devicetree/bindings/arm/pmu.txt | 104 +++++++-
 arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts    |  36 ++-
 arch/arm/include/asm/pmu.h                    |  13 +
 arch/arm/kernel/perf_event.c                  |  61 ++++-
 arch/arm/kernel/perf_event_cpu.c              | 356 +++++++++++++++++++++-----
 arch/arm/kernel/perf_event_v7.c               |  41 +--
 include/linux/of.h                            |   5 +
 include/linux/perf_event.h                    |   5 +
 kernel/events/core.c                          |   8 +-
 9 files changed, 534 insertions(+), 95 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/11] of: Add empty of_get_next_parent stub
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 02/11] perf: allow for PMU-specific event filtering Mark Rutland
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, Mark Rutland, Rob Herring, Grant Likely

There's no stub version of of_get_next_parent, so use in code compiled
without !CONFIG_OF will cause the kernel build to fail.

This patch adds a stub version of of_get_next_parent as is with other
!CONFIG_OF stub functions, so such code can compile without CONFIG_OF.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <rob.herring@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
---
 include/linux/of.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/of.h b/include/linux/of.h
index 6545e7a..2d4b7e0 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -392,6 +392,11 @@ static inline struct device_node *of_get_parent(const struct device_node *node)
 	return NULL;
 }
 
+static inline struct device_node *of_get_next_parent(struct device_node *node)
+{
+	return NULL;
+}
+
 static inline struct device_node *of_get_next_child(
 	const struct device_node *node, struct device_node *prev)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/11] perf: allow for PMU-specific event filtering
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
  2014-11-07 16:25 ` [PATCH 01/11] of: Add empty of_get_next_parent stub Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 03/11] arm: perf: treat PMUs as CPU affine Mark Rutland
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, Mark Rutland, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo

In certain circumstances it may not be possible to schedule particular
events due to constraints other than a lack of hardware counters (e.g.
on big.LITTLE systems where CPUs support different events). The core
perf event code does not distinguish these cases and pessimistically
assumes that any failure to schedule an events is due to a lack of
hardware counters, ending event group scheduling early despite hardware
counters remaining available.

When such an unschedulable event exists in a ctx->flexible_groups list
it can unnecessarily prevent event groups following it in the list from
being scheduled until it is rotated to the end of the list. This can
result in events being scheduled for only a portion of the time they
would otherwise be eligible, and for short running programs unfortunate
initial list ordering can result in no events being counted.

This patch adds a new (optional) filter_match function pointer to struct
pmu which backends can use to tell the perf core whether or not it is
worth attempting to schedule an event. This plugs into the existing
event_filter_match logic, and makes it possible to avoid the scheduling
problem described above.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
---
 include/linux/perf_event.h | 5 +++++
 kernel/events/core.c       | 8 +++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 893a0d0..80c5f5f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -263,6 +263,11 @@ struct pmu {
 	 * flush branch stack on context-switches (needed in cpu-wide mode)
 	 */
 	void (*flush_branch_stack)	(void);
+
+	/*
+	 * Filter events for PMU-specific reasons.
+	 */
+	int (*filter_match)		(struct perf_event *event); /* optional */
 };
 
 /**
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2b02c9f..770b276 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1428,11 +1428,17 @@ static int __init perf_workqueue_init(void)
 
 core_initcall(perf_workqueue_init);
 
+static inline int pmu_filter_match(struct perf_event *event)
+{
+	struct pmu *pmu = event->pmu;
+	return pmu->filter_match ? pmu->filter_match(event) : 1;
+}
+
 static inline int
 event_filter_match(struct perf_event *event)
 {
 	return (event->cpu == -1 || event->cpu == smp_processor_id())
-	    && perf_cgroup_match(event);
+	    && perf_cgroup_match(event) && pmu_filter_match(event);
 }
 
 static void
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/11] arm: perf: treat PMUs as CPU affine
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
  2014-11-07 16:25 ` [PATCH 01/11] of: Add empty of_get_next_parent stub Mark Rutland
  2014-11-07 16:25 ` [PATCH 02/11] perf: allow for PMU-specific event filtering Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 04/11] arm: perf: filter unschedulable events Mark Rutland
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

In multi-cluster systems, the PMUs can be different across clusters, and
so our logical PMU may not be able to schedule events on all CPUs.

This patch adds a cpumask to encode which CPUs a PMU driver supports
controlling events for, and limits the driver to scheduling events on
those CPUs, and enabling and disabling the physical PMUs on those CPUs.
Currently the cpumask is set to match all CPUs.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/include/asm/pmu.h       |  1 +
 arch/arm/kernel/perf_event.c     | 25 +++++++++++++++++++++++++
 arch/arm/kernel/perf_event_cpu.c | 10 +++++++++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index b1596bd..b630a44 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -92,6 +92,7 @@ struct pmu_hw_events {
 struct arm_pmu {
 	struct pmu	pmu;
 	cpumask_t	active_irqs;
+	cpumask_t	supported_cpus;
 	char		*name;
 	irqreturn_t	(*handle_irq)(int irq_num, void *dev);
 	void		(*enable)(struct perf_event *event);
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index e34934f..9ad21ab 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -11,6 +11,7 @@
  */
 #define pr_fmt(fmt) "hw perfevents: " fmt
 
+#include <linux/cpumask.h>
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
@@ -223,6 +224,10 @@ armpmu_add(struct perf_event *event, int flags)
 	int idx;
 	int err = 0;
 
+	/* An event following a process won't be stopped earlier */
+	if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
+		return -ENOENT;
+
 	perf_pmu_disable(event->pmu);
 
 	/* If we don't have a space for the counter then finish early. */
@@ -439,6 +444,17 @@ static int armpmu_event_init(struct perf_event *event)
 	int err = 0;
 	atomic_t *active_events = &armpmu->active_events;
 
+	/*
+	 * Reject CPU-affine events for CPUs that are of a different class to
+	 * that which this PMU handles. Process-following events (where
+	 * event->cpu == -1) can be migrated between CPUs, and thus we have to
+	 * reject them later (in armpmu_add) if they're scheduled on a
+	 * different class of CPU.
+	 */
+	if (event->cpu != -1 &&
+		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
+		return -ENOENT;
+
 	/* does not support taken branch sampling */
 	if (has_branch_stack(event))
 		return -EOPNOTSUPP;
@@ -474,6 +490,10 @@ static void armpmu_enable(struct pmu *pmu)
 	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 	int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
 
+	/* For task-bound events we may be called on other CPUs */
+	if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
+		return;
+
 	if (enabled)
 		armpmu->start(armpmu);
 }
@@ -481,6 +501,11 @@ static void armpmu_enable(struct pmu *pmu)
 static void armpmu_disable(struct pmu *pmu)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(pmu);
+
+	/* For task-bound events we may be called on other CPUs */
+	if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
+		return;
+
 	armpmu->stop(armpmu);
 }
 
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 59c0642..ce35149 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -169,11 +169,15 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 static int cpu_pmu_notify(struct notifier_block *b, unsigned long action,
 			  void *hcpu)
 {
+	int cpu = (unsigned long)hcpu;
 	struct arm_pmu *pmu = container_of(b, struct arm_pmu, hotplug_nb);
 
 	if ((action & ~CPU_TASKS_FROZEN) != CPU_STARTING)
 		return NOTIFY_DONE;
 
+	if (!cpumask_test_cpu(cpu, &pmu->supported_cpus))
+		return NOTIFY_DONE;
+
 	if (pmu->reset)
 		pmu->reset(pmu);
 	else
@@ -209,7 +213,8 @@ static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
 
 	/* Ensure the PMU has sane values out of reset. */
 	if (cpu_pmu->reset)
-		on_each_cpu(cpu_pmu->reset, cpu_pmu, 1);
+		on_each_cpu_mask(&cpu_pmu->supported_cpus, cpu_pmu->reset,
+			 cpu_pmu, 1);
 
 	/* If no interrupts available, set the corresponding capability flag */
 	if (!platform_get_irq(cpu_pmu->plat_device, 0))
@@ -311,6 +316,9 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 	cpu_pmu = pmu;
 	cpu_pmu->plat_device = pdev;
 
+	/* Assume by default that we're on a homogeneous system */
+	cpumask_setall(&pmu->supported_cpus);
+
 	if (node && (of_id = of_match_node(cpu_pmu_of_device_ids, pdev->dev.of_node))) {
 		init_fn = of_id->data;
 		ret = init_fn(pmu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/11] arm: perf: filter unschedulable events
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (2 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 03/11] arm: perf: treat PMUs as CPU affine Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 05/11] arm: perf: reject multi-pmu groups Mark Rutland
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

Different CPU microarchitectures implement different PMU events, and
thus events which can be scheduled on one microarchitecture cannot be
scheduled on another, and vice-versa. Some archicted events behave
differently across microarchitectures, and thus cannot be meaningfully
summed. Due to this, we reject the scheduling of an event on a CPU of a
different microarchitecture to that the event targets.

When the core perf code is scheduling events and encounters an event
which cannot be scheduled, it stops attempting to schedule events. As
the perf core periodically rotates the list of events, for some
proportion of the time events which are unschedulable will block events
which are schedulable, resulting in low utilisation of the hardware
counters.

This patch implements a pmu::filter_match callback such that we can
detect and skip such events while scheduling early, before they can
block the schedulable events. This prevents the low HW counter
utilisation issue.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/kernel/perf_event.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 9ad21ab..b00f6aa 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -509,6 +509,18 @@ static void armpmu_disable(struct pmu *pmu)
 	armpmu->stop(armpmu);
 }
 
+/*
+ * In heterogeneous systems, events are specific to a particular
+ * microarchitecture, and aren't suitable for another. Thus, only match CPUs of
+ * the same microarchitecture.
+ */
+static int armpmu_filter_match(struct perf_event *event)
+{
+	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+	unsigned int cpu = smp_processor_id();
+	return cpumask_test_cpu(cpu, &armpmu->supported_cpus);
+}
+
 #ifdef CONFIG_PM_RUNTIME
 static int armpmu_runtime_resume(struct device *dev)
 {
@@ -549,6 +561,7 @@ static void armpmu_init(struct arm_pmu *armpmu)
 		.start		= armpmu_start,
 		.stop		= armpmu_stop,
 		.read		= armpmu_read,
+		.filter_match	= armpmu_filter_match,
 	};
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/11] arm: perf: reject multi-pmu groups
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (3 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 04/11] arm: perf: filter unschedulable events Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 06/11] arm: perf: probe number of counters on affine CPUs Mark Rutland
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

An event group spanning multiple CPU PMUs can never be scheduled, as at
least one event should always fail, and are therefore nonsensical.
Additionally, groups spanning multiple PMUs would require additional
validation logic throughout the driver to prevent CPU PMUs from stepping
on each others' internal state. Given that such groups are nonsensical
to begin with, the simple option is to reject such groups entirely.
Groups consisting of software events and CPU PMU events are benign so
long as the CPU PMU events only target a single CPU PMU.

This patch ensures that we reject the creation of event groups which
span multiple CPU PMUs, avoiding the issues described above. The
addition of this_pmu to the validation logic made the fake_pmu more
confusing than it already was; so this is renamed to the more accurate
hw_events. As hw_events was being modified anyway, the initialisation of
hw_events.used_mask is also simplified with the use of a designated
initializer rather than the existing memset.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/kernel/perf_event.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index b00f6aa..41dcfc0 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -258,13 +258,17 @@ out:
 }
 
 static int
-validate_event(struct pmu_hw_events *hw_events,
+validate_event(struct pmu *this_pmu,
+	       struct pmu_hw_events *hw_events,
 	       struct perf_event *event)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 
 	if (is_software_event(event))
 		return 1;
+	
+	if (event->pmu != this_pmu)
+		return 0;
 
 	if (event->state < PERF_EVENT_STATE_OFF)
 		return 1;
@@ -279,23 +283,20 @@ static int
 validate_group(struct perf_event *event)
 {
 	struct perf_event *sibling, *leader = event->group_leader;
-	struct pmu_hw_events fake_pmu;
-
-	/*
-	 * Initialise the fake PMU. We only need to populate the
-	 * used_mask for the purposes of validation.
-	 */
-	memset(&fake_pmu.used_mask, 0, sizeof(fake_pmu.used_mask));
+	struct pmu *this_pmu = event->pmu;
+	struct pmu_hw_events hw_events = {
+		.used_mask = { 0 },
+	};
 
-	if (!validate_event(&fake_pmu, leader))
+	if (!validate_event(this_pmu, &hw_events, leader))
 		return -EINVAL;
 
 	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
-		if (!validate_event(&fake_pmu, sibling))
+		if (!validate_event(this_pmu, &hw_events, sibling))
 			return -EINVAL;
 	}
 
-	if (!validate_event(&fake_pmu, event))
+	if (!validate_event(this_pmu, &hw_events, event))
 		return -EINVAL;
 
 	return 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/11] arm: perf: probe number of counters on affine CPUs
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (4 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 05/11] arm: perf: reject multi-pmu groups Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 07/11] arm: perf: document PMU affinity binding Mark Rutland
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

In heterogeneous systems, the number of counters may differ across
clusters. To find the number of counters for a cluster, we must probe
the PMU from a CPU in that cluster.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/perf_event_v7.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index fa76b25..dccd108 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -994,15 +994,22 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->max_period	= (1LLU << 32) - 1;
 };
 
-static u32 armv7_read_num_pmnc_events(void)
+static void armv7_read_num_pmnc_events(void *info)
 {
-	u32 nb_cnt;
+	int *nb_cnt = info;
 
 	/* Read the nb of CNTx counters supported from PMNC */
-	nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
+	*nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK;
 
-	/* Add the CPU cycles counter and return */
-	return nb_cnt + 1;
+	/* Add the CPU cycles counter */
+	*nb_cnt += 1;
+}
+
+static int armv7_probe_num_events(struct arm_pmu *arm_pmu)
+{
+	return smp_call_function_any(&arm_pmu->supported_cpus,
+				     armv7_read_num_pmnc_events,
+				     &arm_pmu->num_events, 1);
 }
 
 static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1010,8 +1017,7 @@ static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a8";
 	cpu_pmu->map_event	= armv7_a8_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a9_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1019,8 +1025,7 @@ static int armv7_a9_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a9";
 	cpu_pmu->map_event	= armv7_a9_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a5_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1028,8 +1033,7 @@ static int armv7_a5_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a5";
 	cpu_pmu->map_event	= armv7_a5_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a15_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1037,9 +1041,8 @@ static int armv7_a15_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a15";
 	cpu_pmu->map_event	= armv7_a15_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a7_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1047,9 +1050,8 @@ static int armv7_a7_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a7";
 	cpu_pmu->map_event	= armv7_a7_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a12_pmu_init(struct arm_pmu *cpu_pmu)
@@ -1057,16 +1059,15 @@ static int armv7_a12_pmu_init(struct arm_pmu *cpu_pmu)
 	armv7pmu_init(cpu_pmu);
 	cpu_pmu->name		= "armv7_cortex_a12";
 	cpu_pmu->map_event	= armv7_a12_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
-	return 0;
+	return armv7_probe_num_events(cpu_pmu);
 }
 
 static int armv7_a17_pmu_init(struct arm_pmu *cpu_pmu)
 {
-	armv7_a12_pmu_init(cpu_pmu);
+	int ret = armv7_a12_pmu_init(cpu_pmu);
 	cpu_pmu->name = "armv7_cortex_a17";
-	return 0;
+	return ret;
 }
 
 /*
@@ -1453,7 +1454,7 @@ static int krait_pmu_init(struct arm_pmu *cpu_pmu)
 		cpu_pmu->map_event = krait_map_event_no_branch;
 	else
 		cpu_pmu->map_event = krait_map_event;
-	cpu_pmu->num_events	= armv7_read_num_pmnc_events();
+	cpu_pmu->num_events	= armv7_probe_num_events(cpu_pmu);
 	cpu_pmu->set_event_filter = armv7pmu_set_event_filter;
 	cpu_pmu->reset		= krait_pmu_reset;
 	cpu_pmu->enable		= krait_pmu_enable_event;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/11] arm: perf: document PMU affinity binding
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (5 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 06/11] arm: perf: probe number of counters on affine CPUs Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-17 11:14   ` Will Deacon
  2014-11-17 14:32   ` Rob Herring
  2014-11-07 16:25 ` [PATCH 08/11] arm: perf: add functions to parse affinity from dt Mark Rutland
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

To describe the various ways CPU PMU interrupts might be wired up, we
can refer to the topology information in the device tree.

This patch adds a new property to the PMU binding, interrupts-affinity,
which describes the relationship between CPUs and interrupts. This
information is necessary to handle systems with heterogeneous PMU
implementations (e.g. big.LITTLE). Documentation is added describing the
use of said property.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 Documentation/devicetree/bindings/arm/pmu.txt | 104 +++++++++++++++++++++++++-
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
index 75ef91d..23a0675 100644
--- a/Documentation/devicetree/bindings/arm/pmu.txt
+++ b/Documentation/devicetree/bindings/arm/pmu.txt
@@ -24,12 +24,114 @@ Required properties:
 
 Optional properties:
 
+- interrupts-affinity : A list of phandles to topology nodes (see topology.txt) describing
+	     the set of CPUs associated with the interrupt at the same index.
 - qcom,no-pc-write : Indicates that this PMU doesn't support the 0xc and 0xd
                      events.
 
-Example:
+Example 1 (A single CPU):
 
 pmu {
         compatible = "arm,cortex-a9-pmu";
         interrupts = <100 101>;
 };
+
+Example 2 (Multiple clusters with single interrupts):
+
+cpus {
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	CPU0: cpu@0 {
+		reg = <0x0>;
+		compatible = "arm,cortex-a15-pmu";
+	};
+
+	CPU1: cpu@1 {
+		reg = <0x1>;
+		compatible = "arm,cotex-a15-pmu";
+	};
+
+	CPU100: cpu@100 {
+		reg = <0x100>;
+		compatible = "arm,cortex-a7-pmu";
+	};
+
+	cpu-map {
+		cluster0 {
+			CORE_0_0: core0 {
+				cpu = <&CPU0>;
+			};
+			CORE_0_1: core1 {
+				cpu = <&CPU1>;
+			};
+		};
+		cluster1 {
+			CORE_1_0: core0 {
+				cpu = <&CPU100>;
+			};
+		};
+	};
+};
+
+pmu_a15 {
+	compatible = "arm,cortex-a15-pmu";
+	interrupts = <100>, <101>;
+	interrupts-affinity = <&CORE0>, <&CORE1>;
+};
+
+pmu_a7 {
+	compatible = "arm,cortex-a7-pmu";
+	interrupts = <105>;
+	interrupts-affinity = <&CORE_1_0>;
+};
+
+Example 3 (Multiple clusters with per-cpu interrupts):
+
+cpus {
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	CPU0: cpu@0 {
+		reg = <0x0>;
+		compatible = "arm,cortex-a15-pmu";
+	};
+
+	CPU1: cpu@1 {
+		reg = <0x1>;
+		compatible = "arm,cotex-a15-pmu";
+	};
+
+	CPU100: cpu@100 {
+		reg = <0x100>;
+		compatible = "arm,cortex-a7-pmu";
+	};
+
+	cpu-map {
+		CLUSTER0: cluster0 {
+			core0 {
+				cpu = <&CPU0>;
+			};
+			core1 {
+				cpu = <&CPU1>;
+			};
+		};
+		CLUSTER1: cluster1 {
+			 core0 {
+				cpu = <&CPU100>;
+			};
+		};
+	};
+};
+
+pmu_a15 {
+	compatible = "arm,cortex-a15-pmu";
+	interrupts = <100>;
+	interrupts-affinity = <&CLUSTER0>;
+};
+
+pmu_a7 {
+	compatible = "arm,cortex-a7-pmu";
+	interrupts = <105>;
+	interrupts-affinity = <&CLUSTER1>;
+};
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/11] arm: perf: add functions to parse affinity from dt
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (6 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 07/11] arm: perf: document PMU affinity binding Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-17 11:16   ` Will Deacon
  2014-11-07 16:25 ` [PATCH 09/11] arm: perf: parse cpu " Mark Rutland
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, Mark Rutland, Grant Likely, Rob Herring

Depending on hardware configuration, some devices may only be accessible
from certain CPUs, may have interrupts wired up to a subset of CPUs, or
may have operations which affect subsets of CPUs. To handle these
devices it is necessary to describe this affinity information in
devicetree.

This patch adds functions to handle parsing the CPU affinity of
properties from devicetree, based on Lorenzo's topology binding,
allowing subsets of CPUs to be associated with interrupts, hardware
ports, etc. The functions can be used to build cpumasks and also to test
whether an affinity property only targets one CPU independent of the
current configuration (e.g. when the kernel supports fewer CPUs than are
physically present). This is useful for dealing with mixed SPI/PPI
devices.

A device may have an arbitrary number of affinity properties, the
meaning of which is device-specific and should be specified in a given
device's binding document.

For example, an affinity property describing interrupt routing may
consist of a phandle pointing to a subtree of the topology nodes,
indicating the set of CPUs an interrupt originates from or may be taken
on. Bindings may have restrictions on the topology nodes referenced -
for describing coherency controls an affinity property may indicate a
whole cluster (including any non-CPU logic it contains) is affected by
some configuration.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@kernel.org>
---
 arch/arm/kernel/perf_event_cpu.c | 127 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index ce35149..dfcaba5 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -22,6 +22,7 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/of.h>
+#include <linux/of_device.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
@@ -294,6 +295,132 @@ static int probe_current_pmu(struct arm_pmu *pmu)
 	return ret;
 }
 
+/*
+ * Test if the node is within the topology tree.
+ * Walk up to the root, keeping refcounts balanced.
+ */
+static bool is_topology_node(struct device_node *node)
+{
+	struct device_node *np, *cpu_map;
+	bool ret = false;
+
+	cpu_map = of_find_node_by_path("/cpus/cpu-map");
+	if (!cpu_map)
+		return false;
+
+	/*
+	 * of_get_next_parent decrements the refcount of the provided node.
+	 * Increment it first to keep things balanced.
+	 */
+	for (np = of_node_get(node); np; np = of_get_next_parent(np)) {
+		if (np != cpu_map)
+			continue;
+
+		ret = true;
+		break;
+	}
+
+	of_node_put(np);
+	of_node_put(cpu_map);
+	return ret;
+}
+
+static int cpu_node_to_id(struct device_node *node)
+{
+	int cpu;
+	for_each_possible_cpu(cpu)
+		if (of_cpu_device_node_get(cpu) == node)
+			return cpu;
+
+	return -EINVAL;
+}
+
+static int arm_dt_affine_build_mask(struct device_node *affine,
+				    cpumask_t *mask)
+{
+	struct device_node *child, *parent = NULL;
+	int ret = -EINVAL;
+
+	if (!is_topology_node(affine))
+		return -EINVAL;
+
+	child = of_node_get(affine);
+	if (!child)
+		goto out_invalid;
+
+	parent = of_get_parent(child);
+	if (!parent)
+		goto out_invalid;
+
+	if (!cpumask_empty(mask))
+		goto out_invalid;
+
+	/*
+	 * Depth-first search over the topology tree, iterating over leaf nodes
+	 * and adding all referenced CPUs to the cpumask. Almost all of the
+	 * of_* iterators are built for breadth-first search, which means we
+	 * have to do a little more work to ensure refcounts are balanced.
+	 */
+	do {
+		struct device_node *tmp, *cpu_node;
+		int cpu;
+
+		/* head down to the leaf */
+		while ((tmp = of_get_next_child(child, NULL))) {
+			of_node_put(parent);
+			parent = child;
+			child = tmp;
+		}
+
+		/*
+		 * In some cases cpu_node might be NULL, but cpu_node_to_id
+		 * will handle this (albeit slowly) and we don't need another
+		 * error path.
+		 */
+		cpu_node = of_parse_phandle(child, "cpu", 0);
+		cpu = cpu_node_to_id(cpu_node);
+
+		if (cpu < 0)
+			pr_warn("Invalid or unused node in topology description '%s', skipping\n",
+				child->full_name);
+		else
+			cpumask_set_cpu(cpu, mask);
+
+		of_node_put(cpu_node);
+
+		/*
+		 * Find the next sibling, or transitively a parent's sibling.
+		 * Don't go further up the tree than the affine node we were
+		 * handed.
+		 */
+		while (child != affine &&
+			!(child = of_get_next_child(parent, child))) {
+			child = parent;
+			parent = of_get_parent(parent);
+		}
+
+	} while (child != affine); /* all children covered. Time to stop */
+
+	ret = 0;
+
+out_invalid:
+	of_node_put(child);
+	of_node_put(parent);
+	return ret;
+}
+
+static int arm_dt_affine_get_mask(struct device_node *node, char *prop,
+				  int idx, cpumask_t *mask)
+{
+	int ret = -EINVAL;
+	struct device_node *affine = of_parse_phandle(node, prop, idx);
+
+	ret = arm_dt_affine_build_mask(affine, mask);
+
+	of_node_put(affine);
+	return ret;
+}
+
 static int cpu_pmu_device_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/11] arm: perf: parse cpu affinity from dt
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (7 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 08/11] arm: perf: add functions to parse affinity from dt Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-17 11:20   ` Will Deacon
  2014-11-07 16:25 ` [PATCH 10/11] arm: perf: remove singleton PMU restriction Mark Rutland
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

The current way we read interrupts form devicetree assumes that
interrupts are in increasing order of logical cpu id (MPIDR.Aff{2,1,0}),
and that these logical ids are in a contiguous block. This may not be
the case in general - after a kexec cpu ids may be arbitrarily assigned,
and multi-cluster systems do not have a contiguous range of cpu ids.

This patch parses cpu affinity information for interrupts from an
optional "interrupts-affinity" devicetree property described in the
devicetree binding document. Support for existing dts and board files
remains.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/include/asm/pmu.h       |  12 +++
 arch/arm/kernel/perf_event_cpu.c | 196 +++++++++++++++++++++++++++++----------
 2 files changed, 161 insertions(+), 47 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index b630a44..92fc1da 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -12,6 +12,7 @@
 #ifndef __ARM_PMU_H__
 #define __ARM_PMU_H__
 
+#include <linux/cpumask.h>
 #include <linux/interrupt.h>
 #include <linux/perf_event.h>
 
@@ -89,6 +90,15 @@ struct pmu_hw_events {
 	struct arm_pmu		*percpu_pmu;
 };
 
+/*
+ * For systems with heterogeneous PMUs, we need to know which CPUs each
+ * (possibly percpu) IRQ targets. Map between them with an array of these.
+ */
+struct cpu_irq {
+	cpumask_t cpus;
+	int irq;
+};
+
 struct arm_pmu {
 	struct pmu	pmu;
 	cpumask_t	active_irqs;
@@ -118,6 +128,8 @@ struct arm_pmu {
 	struct platform_device	*plat_device;
 	struct pmu_hw_events	__percpu *hw_events;
 	struct notifier_block	hotplug_nb;
+	int		nr_irqs;
+	struct cpu_irq *irq_map;
 };
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index dfcaba5..f09c8a0 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -85,20 +85,27 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu)
 	struct platform_device *pmu_device = cpu_pmu->plat_device;
 	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
 
-	irqs = min(pmu_device->num_resources, num_possible_cpus());
+	irqs = cpu_pmu->nr_irqs;
 
-	irq = platform_get_irq(pmu_device, 0);
-	if (irq >= 0 && irq_is_percpu(irq)) {
-		on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
-		free_percpu_irq(irq, &hw_events->percpu_pmu);
-	} else {
-		for (i = 0; i < irqs; ++i) {
-			if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
-				continue;
-			irq = platform_get_irq(pmu_device, i);
-			if (irq >= 0)
-				free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
+	for (i = 0; i < irqs; i++) {
+		struct cpu_irq *map = &cpu_pmu->irq_map[i];
+		irq = map->irq;
+
+		if (irq <= 0)
+			continue;
+
+		if (irq_is_percpu(irq)) {
+			on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
+			free_percpu_irq(irq, &hw_events->percpu_pmu);
+			return;
 		}
+
+		if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
+			continue;
+
+		irq = platform_get_irq(pmu_device, i);
+		if (irq >= 0)
+			free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
 	}
 }
 
@@ -111,51 +118,52 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 	if (!pmu_device)
 		return -ENODEV;
 
-	irqs = min(pmu_device->num_resources, num_possible_cpus());
+	irqs = cpu_pmu->nr_irqs;
 	if (irqs < 1) {
 		printk_once("perf/ARM: No irqs for PMU defined, sampling events not supported\n");
 		return 0;
 	}
 
-	irq = platform_get_irq(pmu_device, 0);
-	if (irq >= 0 && irq_is_percpu(irq)) {
-		err = request_percpu_irq(irq, handler, "arm-pmu",
-					 &hw_events->percpu_pmu);
-		if (err) {
-			pr_err("unable to request IRQ%d for ARM PMU counters\n",
-				irq);
-			return err;
-		}
-		on_each_cpu(cpu_pmu_enable_percpu_irq, &irq, 1);
-	} else {
-		for (i = 0; i < irqs; ++i) {
-			err = 0;
-			irq = platform_get_irq(pmu_device, i);
-			if (irq < 0)
-				continue;
-
-			/*
-			 * If we have a single PMU interrupt that we can't shift,
-			 * assume that we're running on a uniprocessor machine and
-			 * continue. Otherwise, continue without this interrupt.
-			 */
-			if (irq_set_affinity(irq, cpumask_of(i)) && irqs > 1) {
-				pr_warn("unable to set irq affinity (irq=%d, cpu=%u)\n",
-					irq, i);
-				continue;
-			}
+	for (i = 0; i < irqs; i++) {
+		struct cpu_irq *map = &cpu_pmu->irq_map[i];
+		irq = map->irq;
 
-			err = request_irq(irq, handler,
-					  IRQF_NOBALANCING | IRQF_NO_THREAD, "arm-pmu",
-					  per_cpu_ptr(&hw_events->percpu_pmu, i));
+		if (irq <= 0)
+			continue;
+
+		if (irq_is_percpu(map->irq)) {
+			err = request_percpu_irq(irq, handler, "arm-pmu",
+						 &hw_events->percpu_pmu);
 			if (err) {
 				pr_err("unable to request IRQ%d for ARM PMU counters\n",
 					irq);
 				return err;
 			}
+			on_each_cpu(cpu_pmu_enable_percpu_irq, &irq, 1);
+			return 0;
+		}
+
+		/*
+		 * If we have a single PMU interrupt that we can't shift,
+		 * assume that we're running on a uniprocessor machine and
+		 * continue. Otherwise, continue without this interrupt.
+		 */
+		if (irq_set_affinity(irq, &map->cpus) && irqs > 1) {
+			pr_warn("unable to set irq affinity (irq=%d, cpu=%u)\n",
+				irq, cpumask_first(&map->cpus));
+			continue;
+		}
 
-			cpumask_set_cpu(i, &cpu_pmu->active_irqs);
+		err = request_irq(irq, handler,
+				  IRQF_NOBALANCING | IRQF_NO_THREAD, "arm-pmu",
+				  per_cpu_ptr(&hw_events->percpu_pmu, i));
+		if (err) {
+			pr_err("unable to request IRQ%d for ARM PMU counters\n",
+				irq);
+			return err;
 		}
+
+		cpumask_set_cpu(i, &cpu_pmu->active_irqs);
 	}
 
 	return 0;
@@ -421,6 +429,97 @@ static int arm_dt_affine_get_mask(struct device_node *node, char *prop,
 	return ret;
 }
 
+static int cpu_pmu_parse_interrupt(struct arm_pmu *pmu, int idx)
+{
+	struct cpu_irq *map = &pmu->irq_map[idx];
+	struct platform_device *pdev = pmu->plat_device;
+	struct device_node *np = pdev->dev.of_node;
+
+	map->irq = platform_get_irq(pdev, idx);
+	if (map->irq <= 0)
+		return -ENOENT;
+
+	cpumask_clear(&map->cpus);
+
+	if (!of_property_read_bool(np, "interrupts-affinity")) {
+		/*
+		 * If we don't have any affinity information, assume a
+		 * homogeneous system. We assume that CPUs are ordered as in
+		 * the DT, even in the absence of affinity information.
+		 */
+		if (irq_is_percpu(map->irq))
+			cpumask_setall(&map->cpus);
+		else
+			cpumask_set_cpu(idx, &map->cpus);
+	} else {
+		return arm_dt_affine_get_mask(np, "interrupts-affinity", idx,
+					      &map->cpus);
+	}
+
+	return 0;
+}
+
+static int cpu_pmu_parse_interrupts(struct arm_pmu *pmu)
+{
+	struct platform_device *pdev = pmu->plat_device;
+	int ret;
+	int i, irqs;
+
+	/*
+	 * Figure out how many IRQs there are. This may be larger than NR_CPUS,
+	 * and this may be in any arbitrary order...
+	 */
+	for (irqs = 0; platform_get_irq(pdev, irqs) > 0; irqs++);
+	if (!irqs) {
+		pr_warn("Unable to find interrupts\n");
+		return -EINVAL;
+	}
+
+	pmu->nr_irqs = irqs;
+	pmu->irq_map = kmalloc_array(irqs, sizeof(*pmu->irq_map), GFP_KERNEL);
+	if (!pmu->irq_map) {
+		pr_warn("Unable to allocate irqmap data\n");
+		return -ENOMEM;
+	}
+
+	/*
+	 * Some platforms are insane enough to mux all the PMU IRQs into a
+	 * single IRQ. To enable handling of those cases, assume that if we
+	 * have a single interrupt it targets all CPUs.
+	 */
+	if (irqs == 1 && num_possible_cpus() > 1) {
+		cpumask_copy(&pmu->irq_map[0].cpus, cpu_present_mask);
+	} else {
+		for (i = 0; i < irqs; i++) {
+			ret = cpu_pmu_parse_interrupt(pmu, i);
+			if (ret)
+				goto out_free;
+		}
+	}
+
+	if (of_property_read_bool(pdev->dev.of_node, "interrupts-affinity")) {
+		/* The PMU can work on any CPU it has an interrupt. */
+		for (i = 0; i < irqs; i++) {
+			struct cpu_irq *map = &pmu->irq_map[i];
+			cpumask_or(&pmu->supported_cpus, &pmu->supported_cpus,
+				   &map->cpus);
+		}
+	} else {
+		/*
+		 * Without affintiy info, assume a homogeneous system with
+		 * potentially missing interrupts, to keep existing DTBs
+		 * working.
+		 */
+		cpumask_setall(&pmu->supported_cpus);
+	}
+
+	return 0;
+
+out_free:
+	kfree(pmu->irq_map);
+	return ret;
+}
+
 static int cpu_pmu_device_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
@@ -443,8 +542,9 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 	cpu_pmu = pmu;
 	cpu_pmu->plat_device = pdev;
 
-	/* Assume by default that we're on a homogeneous system */
-	cpumask_setall(&pmu->supported_cpus);
+	ret = cpu_pmu_parse_interrupts(pmu);
+	if (ret)
+		goto out_free_pmu;
 
 	if (node && (of_id = of_match_node(cpu_pmu_of_device_ids, pdev->dev.of_node))) {
 		init_fn = of_id->data;
@@ -471,8 +571,10 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 out_destroy:
 	cpu_pmu_destroy(cpu_pmu);
 out_free:
-	pr_info("failed to register PMU devices!\n");
+	kfree(pmu->irq_map);
+out_free_pmu:
 	kfree(pmu);
+	pr_info("failed to register PMU devices!\n");
 	return ret;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/11] arm: perf: remove singleton PMU restriction
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (8 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 09/11] arm: perf: parse cpu " Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-07 16:25 ` [PATCH 11/11] arm: dts: vexpress: describe all PMUs in TC2 dts Mark Rutland
  2014-11-17 11:24 ` [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Will Deacon
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

Now that we can describe PMUs in heterogeneous systems, the only item in
the way of perf support for big.LITTLE is the singleton cpu_pmu variable
used for OProfile compatibility.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/kernel/perf_event_cpu.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index f09c8a0..09de0e6 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -34,7 +34,7 @@
 #include <asm/pmu.h>
 
 /* Set at runtime when we know what CPU type we are. */
-static struct arm_pmu *cpu_pmu;
+static struct arm_pmu *__oprofile_cpu_pmu;
 
 /*
  * Despite the names, these two functions are CPU-specific and are used
@@ -42,10 +42,10 @@ static struct arm_pmu *cpu_pmu;
  */
 const char *perf_pmu_name(void)
 {
-	if (!cpu_pmu)
+	if (!__oprofile_cpu_pmu)
 		return NULL;
 
-	return cpu_pmu->name;
+	return __oprofile_cpu_pmu->name;
 }
 EXPORT_SYMBOL_GPL(perf_pmu_name);
 
@@ -53,8 +53,8 @@ int perf_num_counters(void)
 {
 	int max_events = 0;
 
-	if (cpu_pmu != NULL)
-		max_events = cpu_pmu->num_events;
+	if (__oprofile_cpu_pmu != NULL)
+		max_events = __oprofile_cpu_pmu->num_events;
 
 	return max_events;
 }
@@ -528,19 +528,16 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 	struct arm_pmu *pmu;
 	int ret = -ENODEV;
 
-	if (cpu_pmu) {
-		pr_info("attempt to register multiple PMU devices!\n");
-		return -ENOSPC;
-	}
-
 	pmu = kzalloc(sizeof(struct arm_pmu), GFP_KERNEL);
 	if (!pmu) {
 		pr_info("failed to allocate PMU device!\n");
 		return -ENOMEM;
 	}
 
-	cpu_pmu = pmu;
-	cpu_pmu->plat_device = pdev;
+	if (!__oprofile_cpu_pmu)
+		__oprofile_cpu_pmu = pmu;
+
+	pmu->plat_device = pdev;
 
 	ret = cpu_pmu_parse_interrupts(pmu);
 	if (ret)
@@ -558,18 +555,18 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 		goto out_free;
 	}
 
-	ret = cpu_pmu_init(cpu_pmu);
+	ret = cpu_pmu_init(pmu);
 	if (ret)
 		goto out_free;
 
-	ret = armpmu_register(cpu_pmu, -1);
+	ret = armpmu_register(pmu, -1);
 	if (ret)
 		goto out_destroy;
 
 	return 0;
 
 out_destroy:
-	cpu_pmu_destroy(cpu_pmu);
+	cpu_pmu_destroy(pmu);
 out_free:
 	kfree(pmu->irq_map);
 out_free_pmu:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/11] arm: dts: vexpress: describe all PMUs in TC2 dts
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (9 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 10/11] arm: perf: remove singleton PMU restriction Mark Rutland
@ 2014-11-07 16:25 ` Mark Rutland
  2014-11-17 11:24 ` [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Will Deacon
  11 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-07 16:25 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel, will.deacon, Mark Rutland

The dts for the CoreTile Express A15x2 A7x3 (TC2) only describes the
PMUs of the Cortex-A15 CPUs, and not the Cortex-A7 CPUs.

Now that we have a mechanism for describing disparate PMUs and their
interrupts in device tree, this patch makes use of these to describe the
PMUs for all CPUs in the system.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts | 36 +++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
index 322fd15..52416f9 100644
--- a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
+++ b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
@@ -90,6 +90,28 @@
 				min-residency-us = <2500>;
 			};
 		};
+
+		cpu-map {
+			cluster0 {
+				core_0_0: core0 {
+					cpu = <&cpu0>;
+				};
+				core_0_1: core1 {
+					cpu = <&cpu1>;
+				};
+			};
+			cluster1 {
+				core_1_0: core0 {
+					cpu = <&cpu2>;
+				};
+				core_1_1: core1 {
+					cpu = <&cpu3>;
+				};
+				core_1_2: core2 {
+					cpu = <&cpu4>;
+				};
+			};
+		};
 	};
 
 	memory@80000000 {
@@ -187,10 +209,22 @@
 			     <1 10 0xf08>;
 	};
 
-	pmu {
+	pmu_a15 {
 		compatible = "arm,cortex-a15-pmu";
 		interrupts = <0 68 4>,
 			     <0 69 4>;
+		interrupts-affinity = <&core_0_0>,
+				      <&core_0_1>;
+	};
+
+	pmu_a7 {
+		compatible = "arm,cortex-a7-pmu";
+		interrupts = <0 128 4>,
+			     <0 129 4>,
+			     <0 130 4>;
+		interrupts-affinity = <&core_1_0>,
+				      <&core_1_1>,
+				      <&core_1_2>;
 	};
 
 	oscclk6a: oscclk6a {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/11] arm: perf: document PMU affinity binding
  2014-11-07 16:25 ` [PATCH 07/11] arm: perf: document PMU affinity binding Mark Rutland
@ 2014-11-17 11:14   ` Will Deacon
  2014-11-17 14:32   ` Rob Herring
  1 sibling, 0 replies; 21+ messages in thread
From: Will Deacon @ 2014-11-17 11:14 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel

Hi Mark,

On Fri, Nov 07, 2014 at 04:25:32PM +0000, Mark Rutland wrote:
> To describe the various ways CPU PMU interrupts might be wired up, we
> can refer to the topology information in the device tree.
> 
> This patch adds a new property to the PMU binding, interrupts-affinity,
> which describes the relationship between CPUs and interrupts. This
> information is necessary to handle systems with heterogeneous PMU
> implementations (e.g. big.LITTLE). Documentation is added describing the
> use of said property.

I'm not entirely comfortable with using interrupt affinity to convey
PMU affinity. It seems perfectly plausible for somebody to play the usual
trick of ORing all the irq lines together, despite having a big/little
PMU configuration.

Can you describe such a system with this binding?

> +Example 2 (Multiple clusters with single interrupts):
> +
> +cpus {
> +	#address-cells = <1>;
> +	#size-cells = <1>;
> +
> +	CPU0: cpu@0 {
> +		reg = <0x0>;
> +		compatible = "arm,cortex-a15-pmu";
> +	};
> +
> +	CPU1: cpu@1 {
> +		reg = <0x1>;
> +		compatible = "arm,cotex-a15-pmu";

cortex

> +	};
> +
> +	CPU100: cpu@100 {
> +		reg = <0x100>;
> +		compatible = "arm,cortex-a7-pmu";
> +	};
> +
> +	cpu-map {
> +		cluster0 {
> +			CORE_0_0: core0 {
> +				cpu = <&CPU0>;
> +			};
> +			CORE_0_1: core1 {
> +				cpu = <&CPU1>;
> +			};
> +		};
> +		cluster1 {
> +			CORE_1_0: core0 {
> +				cpu = <&CPU100>;
> +			};
> +		};
> +	};
> +};
> +
> +pmu_a15 {
> +	compatible = "arm,cortex-a15-pmu";
> +	interrupts = <100>, <101>;
> +	interrupts-affinity = <&CORE0>, <&CORE1>;
> +};
> +
> +pmu_a7 {
> +	compatible = "arm,cortex-a7-pmu";
> +	interrupts = <105>;
> +	interrupts-affinity = <&CORE_1_0>;
> +};
> +
> +Example 3 (Multiple clusters with per-cpu interrupts):
> +
> +cpus {
> +	#address-cells = <1>;
> +	#size-cells = <1>;
> +
> +	CPU0: cpu@0 {
> +		reg = <0x0>;
> +		compatible = "arm,cortex-a15-pmu";
> +	};
> +
> +	CPU1: cpu@1 {
> +		reg = <0x1>;
> +		compatible = "arm,cotex-a15-pmu";

Same here.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 08/11] arm: perf: add functions to parse affinity from dt
  2014-11-07 16:25 ` [PATCH 08/11] arm: perf: add functions to parse affinity from dt Mark Rutland
@ 2014-11-17 11:16   ` Will Deacon
  2014-11-17 15:02     ` Mark Rutland
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2014-11-17 11:16 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, grant.likely, Rob Herring

On Fri, Nov 07, 2014 at 04:25:33PM +0000, Mark Rutland wrote:
> Depending on hardware configuration, some devices may only be accessible
> from certain CPUs, may have interrupts wired up to a subset of CPUs, or
> may have operations which affect subsets of CPUs. To handle these
> devices it is necessary to describe this affinity information in
> devicetree.
> 
> This patch adds functions to handle parsing the CPU affinity of
> properties from devicetree, based on Lorenzo's topology binding,
> allowing subsets of CPUs to be associated with interrupts, hardware
> ports, etc. The functions can be used to build cpumasks and also to test
> whether an affinity property only targets one CPU independent of the
> current configuration (e.g. when the kernel supports fewer CPUs than are
> physically present). This is useful for dealing with mixed SPI/PPI
> devices.
> 
> A device may have an arbitrary number of affinity properties, the
> meaning of which is device-specific and should be specified in a given
> device's binding document.
> 
> For example, an affinity property describing interrupt routing may
> consist of a phandle pointing to a subtree of the topology nodes,
> indicating the set of CPUs an interrupt originates from or may be taken
> on. Bindings may have restrictions on the topology nodes referenced -
> for describing coherency controls an affinity property may indicate a
> whole cluster (including any non-CPU logic it contains) is affected by
> some configuration.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Grant Likely <grant.likely@linaro.org>
> Cc: Rob Herring <rob.herring@kernel.org>
> ---
>  arch/arm/kernel/perf_event_cpu.c | 127 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 127 insertions(+)
> 
> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> index ce35149..dfcaba5 100644
> --- a/arch/arm/kernel/perf_event_cpu.c
> +++ b/arch/arm/kernel/perf_event_cpu.c
> @@ -22,6 +22,7 @@
>  #include <linux/export.h>
>  #include <linux/kernel.h>
>  #include <linux/of.h>
> +#include <linux/of_device.h>
>  #include <linux/platform_device.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> @@ -294,6 +295,132 @@ static int probe_current_pmu(struct arm_pmu *pmu)
>  	return ret;
>  }
>  
> +/*
> + * Test if the node is within the topology tree.
> + * Walk up to the root, keeping refcounts balanced.
> + */
> +static bool is_topology_node(struct device_node *node)
> +{
> +	struct device_node *np, *cpu_map;
> +	bool ret = false;
> +
> +	cpu_map = of_find_node_by_path("/cpus/cpu-map");
> +	if (!cpu_map)
> +		return false;
> +
> +	/*
> +	 * of_get_next_parent decrements the refcount of the provided node.
> +	 * Increment it first to keep things balanced.
> +	 */
> +	for (np = of_node_get(node); np; np = of_get_next_parent(np)) {
> +		if (np != cpu_map)
> +			continue;
> +
> +		ret = true;
> +		break;
> +	}
> +
> +	of_node_put(np);
> +	of_node_put(cpu_map);
> +	return ret;
> +}

Wouldn't this be more at home in topology.c, or somewhere where others can
make use of it?

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 09/11] arm: perf: parse cpu affinity from dt
  2014-11-07 16:25 ` [PATCH 09/11] arm: perf: parse cpu " Mark Rutland
@ 2014-11-17 11:20   ` Will Deacon
  2014-11-17 15:08     ` Mark Rutland
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2014-11-17 11:20 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel

On Fri, Nov 07, 2014 at 04:25:34PM +0000, Mark Rutland wrote:
> The current way we read interrupts form devicetree assumes that
> interrupts are in increasing order of logical cpu id (MPIDR.Aff{2,1,0}),
> and that these logical ids are in a contiguous block. This may not be
> the case in general - after a kexec cpu ids may be arbitrarily assigned,
> and multi-cluster systems do not have a contiguous range of cpu ids.
> 
> This patch parses cpu affinity information for interrupts from an
> optional "interrupts-affinity" devicetree property described in the
> devicetree binding document. Support for existing dts and board files
> remains.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> ---
>  arch/arm/include/asm/pmu.h       |  12 +++
>  arch/arm/kernel/perf_event_cpu.c | 196 +++++++++++++++++++++++++++++----------
>  2 files changed, 161 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
> index b630a44..92fc1da 100644
> --- a/arch/arm/include/asm/pmu.h
> +++ b/arch/arm/include/asm/pmu.h
> @@ -12,6 +12,7 @@
>  #ifndef __ARM_PMU_H__
>  #define __ARM_PMU_H__
>  
> +#include <linux/cpumask.h>
>  #include <linux/interrupt.h>
>  #include <linux/perf_event.h>
>  
> @@ -89,6 +90,15 @@ struct pmu_hw_events {
>  	struct arm_pmu		*percpu_pmu;
>  };
>  
> +/*
> + * For systems with heterogeneous PMUs, we need to know which CPUs each
> + * (possibly percpu) IRQ targets. Map between them with an array of these.
> + */
> +struct cpu_irq {
> +	cpumask_t cpus;
> +	int irq;
> +};
> +
>  struct arm_pmu {
>  	struct pmu	pmu;
>  	cpumask_t	active_irqs;
> @@ -118,6 +128,8 @@ struct arm_pmu {
>  	struct platform_device	*plat_device;
>  	struct pmu_hw_events	__percpu *hw_events;
>  	struct notifier_block	hotplug_nb;
> +	int		nr_irqs;
> +	struct cpu_irq *irq_map;
>  };
>  
>  #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> index dfcaba5..f09c8a0 100644
> --- a/arch/arm/kernel/perf_event_cpu.c
> +++ b/arch/arm/kernel/perf_event_cpu.c
> @@ -85,20 +85,27 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu)
>  	struct platform_device *pmu_device = cpu_pmu->plat_device;
>  	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
>  
> -	irqs = min(pmu_device->num_resources, num_possible_cpus());
> +	irqs = cpu_pmu->nr_irqs;
>  
> -	irq = platform_get_irq(pmu_device, 0);
> -	if (irq >= 0 && irq_is_percpu(irq)) {
> -		on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
> -		free_percpu_irq(irq, &hw_events->percpu_pmu);
> -	} else {
> -		for (i = 0; i < irqs; ++i) {
> -			if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
> -				continue;
> -			irq = platform_get_irq(pmu_device, i);
> -			if (irq >= 0)
> -				free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
> +	for (i = 0; i < irqs; i++) {
> +		struct cpu_irq *map = &cpu_pmu->irq_map[i];
> +		irq = map->irq;
> +
> +		if (irq <= 0)
> +			continue;
> +
> +		if (irq_is_percpu(irq)) {
> +			on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);

Hmm, ok, so we're assuming that all the PMUs will be wired with PPIs in this
case. I have a patch allowing per-cpu interrupts to be requested for a
cpumask, but I suppose that can wait until it's actually needed.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/11] arm: perf: add support for heterogeneous PMUs
  2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
                   ` (10 preceding siblings ...)
  2014-11-07 16:25 ` [PATCH 11/11] arm: dts: vexpress: describe all PMUs in TC2 dts Mark Rutland
@ 2014-11-17 11:24 ` Will Deacon
  11 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2014-11-17 11:24 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel

On Fri, Nov 07, 2014 at 04:25:25PM +0000, Mark Rutland wrote:
> In systems with heterogeneous CPUs (e.g. big.LITTLE) the associated PMUs
> also differ in terms of the supported set of events, the precise
> behaviour of each of those events, and the number of event counters.
> Thus it is not possible to expose these PMUs as a single logical PMU.
> 
> Instead a logical PMU is created per CPU microarchitecture, which events
> can target directly:
> 
> $ perf stat \
>   -e armv7_cortex_a7/config=0x11/ \
>   -e armv7_cortex_a15/config=0x11/ \
>   ./test
> 
>  Performance counter stats for './test':
> 
>            7980455      armv7_cortex_a7/config=0x11/                                    [27.29%]
>            9947934      armv7_cortex_a15/config=0x11/                                    [72.66%]
> 
>        0.016734833 seconds time elapsed
> 
> This series is based atop of my recent preparatory rework [1,2].

Modulo the patches I commented on, the ARM perf bits look fine to me. For
those:

  Acked-by: Will Deacon <will.deacon@arm.com>

However, you need to get the event_filter_match change into the core code
before I can queue anything.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/11] arm: perf: document PMU affinity binding
  2014-11-07 16:25 ` [PATCH 07/11] arm: perf: document PMU affinity binding Mark Rutland
  2014-11-17 11:14   ` Will Deacon
@ 2014-11-17 14:32   ` Rob Herring
  2014-11-17 15:01     ` Mark Rutland
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Herring @ 2014-11-17 14:32 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, Will Deacon, linux-kernel

On Fri, Nov 7, 2014 at 10:25 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> To describe the various ways CPU PMU interrupts might be wired up, we
> can refer to the topology information in the device tree.
>
> This patch adds a new property to the PMU binding, interrupts-affinity,
> which describes the relationship between CPUs and interrupts. This
> information is necessary to handle systems with heterogeneous PMU
> implementations (e.g. big.LITTLE). Documentation is added describing the
> use of said property.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/pmu.txt | 104 +++++++++++++++++++++++++-
>  1 file changed, 103 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
> index 75ef91d..23a0675 100644
> --- a/Documentation/devicetree/bindings/arm/pmu.txt
> +++ b/Documentation/devicetree/bindings/arm/pmu.txt
> @@ -24,12 +24,114 @@ Required properties:
>
>  Optional properties:
>
> +- interrupts-affinity : A list of phandles to topology nodes (see topology.txt) describing
> +            the set of CPUs associated with the interrupt at the same index.

Are there cases beyond PMUs we need to handle? I would think so, so we
should document this generically.

>  - qcom,no-pc-write : Indicates that this PMU doesn't support the 0xc and 0xd
>                       events.
>
> -Example:
> +Example 1 (A single CPU):

Isn't this a single cluster of 2 cpus?

>
>  pmu {
>          compatible = "arm,cortex-a9-pmu";
>          interrupts = <100 101>;
>  };
> +
> +Example 2 (Multiple clusters with single interrupts):

The meaning of single could be made a bit more clear especially if you
consider Will's case. But I haven't really thought of better
wording...

> +
> +cpus {
> +       #address-cells = <1>;
> +       #size-cells = <1>;
> +
> +       CPU0: cpu@0 {
> +               reg = <0x0>;
> +               compatible = "arm,cortex-a15-pmu";
> +       };
> +
> +       CPU1: cpu@1 {
> +               reg = <0x1>;
> +               compatible = "arm,cotex-a15-pmu";
> +       };
> +
> +       CPU100: cpu@100 {
> +               reg = <0x100>;
> +               compatible = "arm,cortex-a7-pmu";
> +       };
> +
> +       cpu-map {
> +               cluster0 {
> +                       CORE_0_0: core0 {
> +                               cpu = <&CPU0>;
> +                       };
> +                       CORE_0_1: core1 {
> +                               cpu = <&CPU1>;
> +                       };
> +               };
> +               cluster1 {
> +                       CORE_1_0: core0 {
> +                               cpu = <&CPU100>;
> +                       };
> +               };
> +       };
> +};
> +
> +pmu_a15 {
> +       compatible = "arm,cortex-a15-pmu";
> +       interrupts = <100>, <101>;
> +       interrupts-affinity = <&CORE0>, <&CORE1>;

The phandle names are wrong here.

> +};
> +
> +pmu_a7 {
> +       compatible = "arm,cortex-a7-pmu";
> +       interrupts = <105>;
> +       interrupts-affinity = <&CORE_1_0>;
> +};
> +
> +Example 3 (Multiple clusters with per-cpu interrupts):
> +
> +cpus {
> +       #address-cells = <1>;
> +       #size-cells = <1>;
> +
> +       CPU0: cpu@0 {
> +               reg = <0x0>;
> +               compatible = "arm,cortex-a15-pmu";
> +       };
> +
> +       CPU1: cpu@1 {
> +               reg = <0x1>;
> +               compatible = "arm,cotex-a15-pmu";
> +       };
> +
> +       CPU100: cpu@100 {
> +               reg = <0x100>;
> +               compatible = "arm,cortex-a7-pmu";
> +       };
> +
> +       cpu-map {
> +               CLUSTER0: cluster0 {
> +                       core0 {
> +                               cpu = <&CPU0>;
> +                       };
> +                       core1 {
> +                               cpu = <&CPU1>;
> +                       };
> +               };
> +               CLUSTER1: cluster1 {
> +                        core0 {
> +                               cpu = <&CPU100>;
> +                       };
> +               };
> +       };
> +};
> +
> +pmu_a15 {
> +       compatible = "arm,cortex-a15-pmu";
> +       interrupts = <100>;
> +       interrupts-affinity = <&CLUSTER0>;
> +};
> +
> +pmu_a7 {
> +       compatible = "arm,cortex-a7-pmu";
> +       interrupts = <105>;
> +       interrupts-affinity = <&CLUSTER1>;
> +};
> --
> 1.9.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/11] arm: perf: document PMU affinity binding
  2014-11-17 14:32   ` Rob Herring
@ 2014-11-17 15:01     ` Mark Rutland
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-17 15:01 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-arm-kernel, Will Deacon, linux-kernel

Hi Rob,

I appear to have typo'd your address when posting this. Sorry about
that; I'll make sure it doesn't happen again.

On Mon, Nov 17, 2014 at 02:32:57PM +0000, Rob Herring wrote:
> On Fri, Nov 7, 2014 at 10:25 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > To describe the various ways CPU PMU interrupts might be wired up, we
> > can refer to the topology information in the device tree.
> >
> > This patch adds a new property to the PMU binding, interrupts-affinity,
> > which describes the relationship between CPUs and interrupts. This
> > information is necessary to handle systems with heterogeneous PMU
> > implementations (e.g. big.LITTLE). Documentation is added describing the
> > use of said property.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  Documentation/devicetree/bindings/arm/pmu.txt | 104 +++++++++++++++++++++++++-
> >  1 file changed, 103 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
> > index 75ef91d..23a0675 100644
> > --- a/Documentation/devicetree/bindings/arm/pmu.txt
> > +++ b/Documentation/devicetree/bindings/arm/pmu.txt
> > @@ -24,12 +24,114 @@ Required properties:
> >
> >  Optional properties:
> >
> > +- interrupts-affinity : A list of phandles to topology nodes (see topology.txt) describing
> > +            the set of CPUs associated with the interrupt at the same index.
> 
> Are there cases beyond PMUs we need to handle? I would think so, so we
> should document this generically.

That was what I tried way back when I first tried to upstream all of
this, but in the mean time I've not encountered other devices which are
really CPU-affine which use SPIs and hence need a CPU<->IRQ relationship
described.

That said, I'm happy to document whatever approach for referring to a
set of CPUs that we settle on, if that seems more general than PMU IRQ
mapping.

> > -Example:
> > +Example 1 (A single CPU):
> 
> Isn't this a single cluster of 2 cpus?

Yes, it is. My bad.

> >  pmu {
> >          compatible = "arm,cortex-a9-pmu";
> >          interrupts = <100 101>;
> >  };
> > +
> > +Example 2 (Multiple clusters with single interrupts):
> 
> The meaning of single could be made a bit more clear especially if you
> consider Will's case. But I haven't really thought of better
> wording...

How about "A cluster of homogeneous CPUs"?

> > +
> > +cpus {
> > +       #address-cells = <1>;
> > +       #size-cells = <1>;
> > +
> > +       CPU0: cpu@0 {
> > +               reg = <0x0>;
> > +               compatible = "arm,cortex-a15-pmu";
> > +       };
> > +
> > +       CPU1: cpu@1 {
> > +               reg = <0x1>;
> > +               compatible = "arm,cotex-a15-pmu";
> > +       };
> > +
> > +       CPU100: cpu@100 {
> > +               reg = <0x100>;
> > +               compatible = "arm,cortex-a7-pmu";
> > +       };
> > +
> > +       cpu-map {
> > +               cluster0 {
> > +                       CORE_0_0: core0 {
> > +                               cpu = <&CPU0>;
> > +                       };
> > +                       CORE_0_1: core1 {
> > +                               cpu = <&CPU1>;
> > +                       };
> > +               };
> > +               cluster1 {
> > +                       CORE_1_0: core0 {
> > +                               cpu = <&CPU100>;
> > +                       };
> > +               };
> > +       };
> > +};
> > +
> > +pmu_a15 {
> > +       compatible = "arm,cortex-a15-pmu";
> > +       interrupts = <100>, <101>;
> > +       interrupts-affinity = <&CORE0>, <&CORE1>;
> 
> The phandle names are wrong here.

Whoops. I've fixed that up locally now.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 08/11] arm: perf: add functions to parse affinity from dt
  2014-11-17 11:16   ` Will Deacon
@ 2014-11-17 15:02     ` Mark Rutland
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2014-11-17 15:02 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, linux-kernel, grant.likely, Rob Herring

On Mon, Nov 17, 2014 at 11:16:25AM +0000, Will Deacon wrote:
> On Fri, Nov 07, 2014 at 04:25:33PM +0000, Mark Rutland wrote:
> > Depending on hardware configuration, some devices may only be accessible
> > from certain CPUs, may have interrupts wired up to a subset of CPUs, or
> > may have operations which affect subsets of CPUs. To handle these
> > devices it is necessary to describe this affinity information in
> > devicetree.
> > 
> > This patch adds functions to handle parsing the CPU affinity of
> > properties from devicetree, based on Lorenzo's topology binding,
> > allowing subsets of CPUs to be associated with interrupts, hardware
> > ports, etc. The functions can be used to build cpumasks and also to test
> > whether an affinity property only targets one CPU independent of the
> > current configuration (e.g. when the kernel supports fewer CPUs than are
> > physically present). This is useful for dealing with mixed SPI/PPI
> > devices.
> > 
> > A device may have an arbitrary number of affinity properties, the
> > meaning of which is device-specific and should be specified in a given
> > device's binding document.
> > 
> > For example, an affinity property describing interrupt routing may
> > consist of a phandle pointing to a subtree of the topology nodes,
> > indicating the set of CPUs an interrupt originates from or may be taken
> > on. Bindings may have restrictions on the topology nodes referenced -
> > for describing coherency controls an affinity property may indicate a
> > whole cluster (including any non-CPU logic it contains) is affected by
> > some configuration.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Grant Likely <grant.likely@linaro.org>
> > Cc: Rob Herring <rob.herring@kernel.org>
> > ---
> >  arch/arm/kernel/perf_event_cpu.c | 127 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 127 insertions(+)
> > 
> > diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> > index ce35149..dfcaba5 100644
> > --- a/arch/arm/kernel/perf_event_cpu.c
> > +++ b/arch/arm/kernel/perf_event_cpu.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/export.h>
> >  #include <linux/kernel.h>
> >  #include <linux/of.h>
> > +#include <linux/of_device.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/slab.h>
> >  #include <linux/spinlock.h>
> > @@ -294,6 +295,132 @@ static int probe_current_pmu(struct arm_pmu *pmu)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Test if the node is within the topology tree.
> > + * Walk up to the root, keeping refcounts balanced.
> > + */
> > +static bool is_topology_node(struct device_node *node)
> > +{
> > +	struct device_node *np, *cpu_map;
> > +	bool ret = false;
> > +
> > +	cpu_map = of_find_node_by_path("/cpus/cpu-map");
> > +	if (!cpu_map)
> > +		return false;
> > +
> > +	/*
> > +	 * of_get_next_parent decrements the refcount of the provided node.
> > +	 * Increment it first to keep things balanced.
> > +	 */
> > +	for (np = of_node_get(node); np; np = of_get_next_parent(np)) {
> > +		if (np != cpu_map)
> > +			continue;
> > +
> > +		ret = true;
> > +		break;
> > +	}
> > +
> > +	of_node_put(np);
> > +	of_node_put(cpu_map);
> > +	return ret;
> > +}
> 
> Wouldn't this be more at home in topology.c, or somewhere where others can
> make use of it?

Perhaps. I'll need this for arm64 too and I don't know where that should
live.

Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 09/11] arm: perf: parse cpu affinity from dt
  2014-11-17 11:20   ` Will Deacon
@ 2014-11-17 15:08     ` Mark Rutland
  2014-11-18 10:40       ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2014-11-17 15:08 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, linux-kernel

On Mon, Nov 17, 2014 at 11:20:35AM +0000, Will Deacon wrote:
> On Fri, Nov 07, 2014 at 04:25:34PM +0000, Mark Rutland wrote:
> > The current way we read interrupts form devicetree assumes that
> > interrupts are in increasing order of logical cpu id (MPIDR.Aff{2,1,0}),
> > and that these logical ids are in a contiguous block. This may not be
> > the case in general - after a kexec cpu ids may be arbitrarily assigned,
> > and multi-cluster systems do not have a contiguous range of cpu ids.
> > 
> > This patch parses cpu affinity information for interrupts from an
> > optional "interrupts-affinity" devicetree property described in the
> > devicetree binding document. Support for existing dts and board files
> > remains.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  arch/arm/include/asm/pmu.h       |  12 +++
> >  arch/arm/kernel/perf_event_cpu.c | 196 +++++++++++++++++++++++++++++----------
> >  2 files changed, 161 insertions(+), 47 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
> > index b630a44..92fc1da 100644
> > --- a/arch/arm/include/asm/pmu.h
> > +++ b/arch/arm/include/asm/pmu.h
> > @@ -12,6 +12,7 @@
> >  #ifndef __ARM_PMU_H__
> >  #define __ARM_PMU_H__
> >  
> > +#include <linux/cpumask.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/perf_event.h>
> >  
> > @@ -89,6 +90,15 @@ struct pmu_hw_events {
> >  	struct arm_pmu		*percpu_pmu;
> >  };
> >  
> > +/*
> > + * For systems with heterogeneous PMUs, we need to know which CPUs each
> > + * (possibly percpu) IRQ targets. Map between them with an array of these.
> > + */
> > +struct cpu_irq {
> > +	cpumask_t cpus;
> > +	int irq;
> > +};
> > +
> >  struct arm_pmu {
> >  	struct pmu	pmu;
> >  	cpumask_t	active_irqs;
> > @@ -118,6 +128,8 @@ struct arm_pmu {
> >  	struct platform_device	*plat_device;
> >  	struct pmu_hw_events	__percpu *hw_events;
> >  	struct notifier_block	hotplug_nb;
> > +	int		nr_irqs;
> > +	struct cpu_irq *irq_map;
> >  };
> >  
> >  #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
> > diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> > index dfcaba5..f09c8a0 100644
> > --- a/arch/arm/kernel/perf_event_cpu.c
> > +++ b/arch/arm/kernel/perf_event_cpu.c
> > @@ -85,20 +85,27 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu)
> >  	struct platform_device *pmu_device = cpu_pmu->plat_device;
> >  	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
> >  
> > -	irqs = min(pmu_device->num_resources, num_possible_cpus());
> > +	irqs = cpu_pmu->nr_irqs;
> >  
> > -	irq = platform_get_irq(pmu_device, 0);
> > -	if (irq >= 0 && irq_is_percpu(irq)) {
> > -		on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
> > -		free_percpu_irq(irq, &hw_events->percpu_pmu);
> > -	} else {
> > -		for (i = 0; i < irqs; ++i) {
> > -			if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
> > -				continue;
> > -			irq = platform_get_irq(pmu_device, i);
> > -			if (irq >= 0)
> > -				free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
> > +	for (i = 0; i < irqs; i++) {
> > +		struct cpu_irq *map = &cpu_pmu->irq_map[i];
> > +		irq = map->irq;
> > +
> > +		if (irq <= 0)
> > +			continue;
> > +
> > +		if (irq_is_percpu(irq)) {
> > +			on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
> 
> Hmm, ok, so we're assuming that all the PMUs will be wired with PPIs in this
> case. I have a patch allowing per-cpu interrupts to be requested for a
> cpumask, but I suppose that can wait until it's actually needed.

I wasn't too keen on assuming all CPUs, but I didn't have the facility
to request a PPI on a subset of CPUs. If you can point me at your patch,
I'd be happy to take a look.

I should have the target CPU mask decoded from whatever the binding
settles on, so at this point it's just plumbing.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 09/11] arm: perf: parse cpu affinity from dt
  2014-11-17 15:08     ` Mark Rutland
@ 2014-11-18 10:40       ` Will Deacon
  0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2014-11-18 10:40 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel

On Mon, Nov 17, 2014 at 03:08:04PM +0000, Mark Rutland wrote:
> On Mon, Nov 17, 2014 at 11:20:35AM +0000, Will Deacon wrote:
> > On Fri, Nov 07, 2014 at 04:25:34PM +0000, Mark Rutland wrote:
> > > diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> > > index dfcaba5..f09c8a0 100644
> > > --- a/arch/arm/kernel/perf_event_cpu.c
> > > +++ b/arch/arm/kernel/perf_event_cpu.c
> > > @@ -85,20 +85,27 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu)
> > >  	struct platform_device *pmu_device = cpu_pmu->plat_device;
> > >  	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
> > >  
> > > -	irqs = min(pmu_device->num_resources, num_possible_cpus());
> > > +	irqs = cpu_pmu->nr_irqs;
> > >  
> > > -	irq = platform_get_irq(pmu_device, 0);
> > > -	if (irq >= 0 && irq_is_percpu(irq)) {
> > > -		on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
> > > -		free_percpu_irq(irq, &hw_events->percpu_pmu);
> > > -	} else {
> > > -		for (i = 0; i < irqs; ++i) {
> > > -			if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
> > > -				continue;
> > > -			irq = platform_get_irq(pmu_device, i);
> > > -			if (irq >= 0)
> > > -				free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
> > > +	for (i = 0; i < irqs; i++) {
> > > +		struct cpu_irq *map = &cpu_pmu->irq_map[i];
> > > +		irq = map->irq;
> > > +
> > > +		if (irq <= 0)
> > > +			continue;
> > > +
> > > +		if (irq_is_percpu(irq)) {
> > > +			on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
> > 
> > Hmm, ok, so we're assuming that all the PMUs will be wired with PPIs in this
> > case. I have a patch allowing per-cpu interrupts to be requested for a
> > cpumask, but I suppose that can wait until it's actually needed.
> 
> I wasn't too keen on assuming all CPUs, but I didn't have the facility
> to request a PPI on a subset of CPUs. If you can point me at your patch,
> I'd be happy to take a look.

The patch is here:

https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=irq&id=774f7bc54577b6875d96e670ee34580077fc10be

But I think we can avoid it until we find a platform that needs it. I can't
see a DT/ABI issue with that, can you?

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-11-18 10:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07 16:25 [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Mark Rutland
2014-11-07 16:25 ` [PATCH 01/11] of: Add empty of_get_next_parent stub Mark Rutland
2014-11-07 16:25 ` [PATCH 02/11] perf: allow for PMU-specific event filtering Mark Rutland
2014-11-07 16:25 ` [PATCH 03/11] arm: perf: treat PMUs as CPU affine Mark Rutland
2014-11-07 16:25 ` [PATCH 04/11] arm: perf: filter unschedulable events Mark Rutland
2014-11-07 16:25 ` [PATCH 05/11] arm: perf: reject multi-pmu groups Mark Rutland
2014-11-07 16:25 ` [PATCH 06/11] arm: perf: probe number of counters on affine CPUs Mark Rutland
2014-11-07 16:25 ` [PATCH 07/11] arm: perf: document PMU affinity binding Mark Rutland
2014-11-17 11:14   ` Will Deacon
2014-11-17 14:32   ` Rob Herring
2014-11-17 15:01     ` Mark Rutland
2014-11-07 16:25 ` [PATCH 08/11] arm: perf: add functions to parse affinity from dt Mark Rutland
2014-11-17 11:16   ` Will Deacon
2014-11-17 15:02     ` Mark Rutland
2014-11-07 16:25 ` [PATCH 09/11] arm: perf: parse cpu " Mark Rutland
2014-11-17 11:20   ` Will Deacon
2014-11-17 15:08     ` Mark Rutland
2014-11-18 10:40       ` Will Deacon
2014-11-07 16:25 ` [PATCH 10/11] arm: perf: remove singleton PMU restriction Mark Rutland
2014-11-07 16:25 ` [PATCH 11/11] arm: dts: vexpress: describe all PMUs in TC2 dts Mark Rutland
2014-11-17 11:24 ` [PATCH 00/11] arm: perf: add support for heterogeneous PMUs Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).