All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/8] perf: Support for ARM DynamIQ Shared Unit
@ 2017-10-10 10:32 ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

This series adds support for the PMU in ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, using 32bit
independent counters along with a 64bit cycle counter.

The PMU can only be accessed via CPU system registers, which are common
to the cores connected to the same DSU. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

Tested on a Fast model with DSU. The driver only supports ARM64 at the
moment. It can be extended to support ARM32 by providing register
accessors like we do in arch/arm64/include/arm_dsu_pmu.h.

The firmware should setup appropriate bits in the ACTLR_EL3/EL2 to
allow EL1 access to the PMU registers.

Series applies on v4.14-rc4 and is also available at:

  git://linux-arm.org/linux-skp.git 4.14/dsu-v8

Changes since V7:
 - No changes to the Core DSU PMU code.
 - Rebased to v4.14-rc4
 - Added Tested-by from Leo
 - Convert arm64 CPU topology parsing, ARM PMU irq_affinity parsing
   to use the new helper.

Changes since V6
 - Rebased to v4.14-rc3
 - Use of_cpu_device_node_get() instead of slower of_get_cpu_node(),
   where the former uses per_cpu data when available and falls back to
   former otherwise.
 - Added Reviewed-by tags from Jonathan
Changes since V5:
 - Pickedup ack from Rob
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with DSU

Changes since V4:
 - Fix regressions introduced by v4, with the rename of generic
   helper.
 - Added reviewed-by tag from Rob

Changes since V3:
 - Rename the of generic helper to of_cpu_node_to_id(), and return
   -ENODEV upon failure than nr_cpus_id
 - Fix node name in device tree node example.
Suzuki K Poulose (8):
  perf: Export perf_event_update_userpage
  of: Add helper for mapping device node to logical CPU number
  coresight: of: Use of_cpu_node_to_id helper
  irqchip: gic-v3: Use of_cpu_node_to_id helper
  arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  arm_pmu: Use of_cpu_node_to_id helper
  dt-bindings: Document devicetree binding for ARM DSU PMU
  perf: ARM DynamIQ Shared Unit PMU support

 .../devicetree/bindings/arm/arm-dsu-pmu.txt        |  27 +
 Documentation/perf/arm_dsu_pmu.txt                 |  28 +
 arch/arm64/include/asm/arm_dsu_pmu.h               | 124 ++++
 arch/arm64/kernel/topology.c                       |  16 +-
 drivers/hwtracing/coresight/of_coresight.c         |  15 +-
 drivers/irqchip/irq-gic-v3.c                       |  29 +-
 drivers/of/base.c                                  |  26 +
 drivers/perf/Kconfig                               |   9 +
 drivers/perf/Makefile                              |   1 +
 drivers/perf/arm_dsu_pmu.c                         | 826 +++++++++++++++++++++
 drivers/perf/arm_pmu_platform.c                    |  15 +-
 include/linux/of.h                                 |   7 +
 kernel/events/core.c                               |   1 +
 13 files changed, 1063 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/8] perf: Support for ARM DynamIQ Shared Unit
@ 2017-10-10 10:32 ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, mark.rutland-5wv7dgnIgG8,
	robh-DgEjT+Ai2ygdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	sudeep.holla-5wv7dgnIgG8, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A,
	leo.yan-QSEj5FYQhm4dnm+yROfE0A, Suzuki K Poulose

This series adds support for the PMU in ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, using 32bit
independent counters along with a 64bit cycle counter.

The PMU can only be accessed via CPU system registers, which are common
to the cores connected to the same DSU. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

Tested on a Fast model with DSU. The driver only supports ARM64 at the
moment. It can be extended to support ARM32 by providing register
accessors like we do in arch/arm64/include/arm_dsu_pmu.h.

The firmware should setup appropriate bits in the ACTLR_EL3/EL2 to
allow EL1 access to the PMU registers.

Series applies on v4.14-rc4 and is also available at:

  git://linux-arm.org/linux-skp.git 4.14/dsu-v8

Changes since V7:
 - No changes to the Core DSU PMU code.
 - Rebased to v4.14-rc4
 - Added Tested-by from Leo
 - Convert arm64 CPU topology parsing, ARM PMU irq_affinity parsing
   to use the new helper.

Changes since V6
 - Rebased to v4.14-rc3
 - Use of_cpu_device_node_get() instead of slower of_get_cpu_node(),
   where the former uses per_cpu data when available and falls back to
   former otherwise.
 - Added Reviewed-by tags from Jonathan
Changes since V5:
 - Pickedup ack from Rob
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with DSU

Changes since V4:
 - Fix regressions introduced by v4, with the rename of generic
   helper.
 - Added reviewed-by tag from Rob

Changes since V3:
 - Rename the of generic helper to of_cpu_node_to_id(), and return
   -ENODEV upon failure than nr_cpus_id
 - Fix node name in device tree node example.
Suzuki K Poulose (8):
  perf: Export perf_event_update_userpage
  of: Add helper for mapping device node to logical CPU number
  coresight: of: Use of_cpu_node_to_id helper
  irqchip: gic-v3: Use of_cpu_node_to_id helper
  arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  arm_pmu: Use of_cpu_node_to_id helper
  dt-bindings: Document devicetree binding for ARM DSU PMU
  perf: ARM DynamIQ Shared Unit PMU support

 .../devicetree/bindings/arm/arm-dsu-pmu.txt        |  27 +
 Documentation/perf/arm_dsu_pmu.txt                 |  28 +
 arch/arm64/include/asm/arm_dsu_pmu.h               | 124 ++++
 arch/arm64/kernel/topology.c                       |  16 +-
 drivers/hwtracing/coresight/of_coresight.c         |  15 +-
 drivers/irqchip/irq-gic-v3.c                       |  29 +-
 drivers/of/base.c                                  |  26 +
 drivers/perf/Kconfig                               |   9 +
 drivers/perf/Makefile                              |   1 +
 drivers/perf/arm_dsu_pmu.c                         | 826 +++++++++++++++++++++
 drivers/perf/arm_pmu_platform.c                    |  15 +-
 include/linux/of.h                                 |   7 +
 kernel/events/core.c                               |   1 +
 13 files changed, 1063 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/8] perf: Support for ARM DynamIQ Shared Unit
@ 2017-10-10 10:32 ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

This series adds support for the PMU in ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, using 32bit
independent counters along with a 64bit cycle counter.

The PMU can only be accessed via CPU system registers, which are common
to the cores connected to the same DSU. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

Tested on a Fast model with DSU. The driver only supports ARM64 at the
moment. It can be extended to support ARM32 by providing register
accessors like we do in arch/arm64/include/arm_dsu_pmu.h.

The firmware should setup appropriate bits in the ACTLR_EL3/EL2 to
allow EL1 access to the PMU registers.

Series applies on v4.14-rc4 and is also available at:

  git://linux-arm.org/linux-skp.git 4.14/dsu-v8

Changes since V7:
 - No changes to the Core DSU PMU code.
 - Rebased to v4.14-rc4
 - Added Tested-by from Leo
 - Convert arm64 CPU topology parsing, ARM PMU irq_affinity parsing
   to use the new helper.

Changes since V6
 - Rebased to v4.14-rc3
 - Use of_cpu_device_node_get() instead of slower of_get_cpu_node(),
   where the former uses per_cpu data when available and falls back to
   former otherwise.
 - Added Reviewed-by tags from Jonathan
Changes since V5:
 - Pickedup ack from Rob
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with DSU

Changes since V4:
 - Fix regressions introduced by v4, with the rename of generic
   helper.
 - Added reviewed-by tag from Rob

Changes since V3:
 - Rename the of generic helper to of_cpu_node_to_id(), and return
   -ENODEV upon failure than nr_cpus_id
 - Fix node name in device tree node example.
Suzuki K Poulose (8):
  perf: Export perf_event_update_userpage
  of: Add helper for mapping device node to logical CPU number
  coresight: of: Use of_cpu_node_to_id helper
  irqchip: gic-v3: Use of_cpu_node_to_id helper
  arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  arm_pmu: Use of_cpu_node_to_id helper
  dt-bindings: Document devicetree binding for ARM DSU PMU
  perf: ARM DynamIQ Shared Unit PMU support

 .../devicetree/bindings/arm/arm-dsu-pmu.txt        |  27 +
 Documentation/perf/arm_dsu_pmu.txt                 |  28 +
 arch/arm64/include/asm/arm_dsu_pmu.h               | 124 ++++
 arch/arm64/kernel/topology.c                       |  16 +-
 drivers/hwtracing/coresight/of_coresight.c         |  15 +-
 drivers/irqchip/irq-gic-v3.c                       |  29 +-
 drivers/of/base.c                                  |  26 +
 drivers/perf/Kconfig                               |   9 +
 drivers/perf/Makefile                              |   1 +
 drivers/perf/arm_dsu_pmu.c                         | 826 +++++++++++++++++++++
 drivers/perf/arm_pmu_platform.c                    |  15 +-
 include/linux/of.h                                 |   7 +
 kernel/events/core.c                               |   1 +
 13 files changed, 1063 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 1/8] perf: Export perf_event_update_userpage
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:32   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Export perf_event_update_userpage() so that PMU driver using them,
can be built as modules.

Cc: Peter Zilstra <peterz@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6bc21e202ae4..162f5ba756a9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4982,6 +4982,7 @@ void perf_event_update_userpage(struct perf_event *event)
 unlock:
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(perf_event_update_userpage);
 
 static int perf_mmap_fault(struct vm_fault *vmf)
 {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 1/8] perf: Export perf_event_update_userpage
@ 2017-10-10 10:32   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

Export perf_event_update_userpage() so that PMU driver using them,
can be built as modules.

Cc: Peter Zilstra <peterz@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6bc21e202ae4..162f5ba756a9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4982,6 +4982,7 @@ void perf_event_update_userpage(struct perf_event *event)
 unlock:
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(perf_event_update_userpage);
 
 static int perf_mmap_fault(struct vm_fault *vmf)
 {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/8] of: Add helper for mapping device node to logical CPU number
  2017-10-10 10:32 ` Suzuki K Poulose
  (?)
@ 2017-10-10 10:32   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_cpu_device_node_get() to retrieve the device node for a given CPU
(which uses per_cpu data if available else falls back to slower
of_get_cpu_node()).

Cc: devicetree@vger.kernel.org
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V6:
 - Use faster of_cpu_device_node_get instead of of_get_cpu_node(),
   which now falls back to latter if called in early.
Changes since V3:
 - Renamed the helper to of_cpu_node_to_id(), suggested by Rob
 - Return -ENODEV on failure than nr_cpus_id
---
 drivers/of/base.c  | 26 ++++++++++++++++++++++++++
 include/linux/of.h |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 260d33c0f26c..e3f46daeee07 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -418,6 +418,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
 EXPORT_SYMBOL(of_get_cpu_node);
 
 /**
+ * of_cpu_node_to_id: Get the logical CPU number for a given device_node
+ *
+ * @cpu_node: Pointer to the device_node for CPU.
+ *
+ * Returns the logical CPU number of the given CPU device_node.
+ * Returns -ENODEV if the CPU is not found.
+ */
+int of_cpu_node_to_id(struct device_node *cpu_node)
+{
+	int cpu;
+	bool found = false;
+	struct device_node *np;
+
+	for_each_possible_cpu(cpu) {
+		np = of_cpu_device_node_get(cpu);
+		found = (cpu_node == np);
+		of_node_put(np);
+		if (found)
+			return cpu;
+	}
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(of_cpu_node_to_id);
+
+/**
  * __of_device_is_compatible() - Check if the node matches given constraints
  * @device: pointer to node
  * @compat: required compatible string, NULL or "" for any match
diff --git a/include/linux/of.h b/include/linux/of.h
index cfc34117fc92..27b54bcc420a 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -538,6 +538,8 @@ const char *of_prop_next_string(struct property *prop, const char *cur);
 
 bool of_console_check(struct device_node *dn, char *name, int index);
 
+extern int of_cpu_node_to_id(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -862,6 +864,11 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
 {
 }
 
+static inline int of_cpu_node_to_id(struct device_node *np)
+{
+	return -ENODEV;
+}
+
 #define of_match_ptr(_ptr)	NULL
 #define of_match_node(_matches, _node)	NULL
 #endif /* CONFIG_OF */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/8] of: Add helper for mapping device node to logical CPU number
@ 2017-10-10 10:32   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: mark.rutland, robh, mathieu.poirier, devicetree, marc.zyngier,
	Jonathan.Cameron, will.deacon, linux-kernel, peterz,
	sudeep.holla, frowand.list, leo.yan, Suzuki K Poulose

Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_cpu_device_node_get() to retrieve the device node for a given CPU
(which uses per_cpu data if available else falls back to slower
of_get_cpu_node()).

Cc: devicetree@vger.kernel.org
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V6:
 - Use faster of_cpu_device_node_get instead of of_get_cpu_node(),
   which now falls back to latter if called in early.
Changes since V3:
 - Renamed the helper to of_cpu_node_to_id(), suggested by Rob
 - Return -ENODEV on failure than nr_cpus_id
---
 drivers/of/base.c  | 26 ++++++++++++++++++++++++++
 include/linux/of.h |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 260d33c0f26c..e3f46daeee07 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -418,6 +418,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
 EXPORT_SYMBOL(of_get_cpu_node);
 
 /**
+ * of_cpu_node_to_id: Get the logical CPU number for a given device_node
+ *
+ * @cpu_node: Pointer to the device_node for CPU.
+ *
+ * Returns the logical CPU number of the given CPU device_node.
+ * Returns -ENODEV if the CPU is not found.
+ */
+int of_cpu_node_to_id(struct device_node *cpu_node)
+{
+	int cpu;
+	bool found = false;
+	struct device_node *np;
+
+	for_each_possible_cpu(cpu) {
+		np = of_cpu_device_node_get(cpu);
+		found = (cpu_node == np);
+		of_node_put(np);
+		if (found)
+			return cpu;
+	}
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(of_cpu_node_to_id);
+
+/**
  * __of_device_is_compatible() - Check if the node matches given constraints
  * @device: pointer to node
  * @compat: required compatible string, NULL or "" for any match
diff --git a/include/linux/of.h b/include/linux/of.h
index cfc34117fc92..27b54bcc420a 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -538,6 +538,8 @@ const char *of_prop_next_string(struct property *prop, const char *cur);
 
 bool of_console_check(struct device_node *dn, char *name, int index);
 
+extern int of_cpu_node_to_id(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -862,6 +864,11 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
 {
 }
 
+static inline int of_cpu_node_to_id(struct device_node *np)
+{
+	return -ENODEV;
+}
+
 #define of_match_ptr(_ptr)	NULL
 #define of_match_node(_matches, _node)	NULL
 #endif /* CONFIG_OF */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/8] of: Add helper for mapping device node to logical CPU number
@ 2017-10-10 10:32   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_cpu_device_node_get() to retrieve the device node for a given CPU
(which uses per_cpu data if available else falls back to slower
of_get_cpu_node()).

Cc: devicetree at vger.kernel.org
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V6:
 - Use faster of_cpu_device_node_get instead of of_get_cpu_node(),
   which now falls back to latter if called in early.
Changes since V3:
 - Renamed the helper to of_cpu_node_to_id(), suggested by Rob
 - Return -ENODEV on failure than nr_cpus_id
---
 drivers/of/base.c  | 26 ++++++++++++++++++++++++++
 include/linux/of.h |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 260d33c0f26c..e3f46daeee07 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -418,6 +418,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
 EXPORT_SYMBOL(of_get_cpu_node);
 
 /**
+ * of_cpu_node_to_id: Get the logical CPU number for a given device_node
+ *
+ * @cpu_node: Pointer to the device_node for CPU.
+ *
+ * Returns the logical CPU number of the given CPU device_node.
+ * Returns -ENODEV if the CPU is not found.
+ */
+int of_cpu_node_to_id(struct device_node *cpu_node)
+{
+	int cpu;
+	bool found = false;
+	struct device_node *np;
+
+	for_each_possible_cpu(cpu) {
+		np = of_cpu_device_node_get(cpu);
+		found = (cpu_node == np);
+		of_node_put(np);
+		if (found)
+			return cpu;
+	}
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(of_cpu_node_to_id);
+
+/**
  * __of_device_is_compatible() - Check if the node matches given constraints
  * @device: pointer to node
  * @compat: required compatible string, NULL or "" for any match
diff --git a/include/linux/of.h b/include/linux/of.h
index cfc34117fc92..27b54bcc420a 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -538,6 +538,8 @@ const char *of_prop_next_string(struct property *prop, const char *cur);
 
 bool of_console_check(struct device_node *dn, char *name, int index);
 
+extern int of_cpu_node_to_id(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -862,6 +864,11 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
 {
 }
 
+static inline int of_cpu_node_to_id(struct device_node *np)
+{
+	return -ENODEV;
+}
+
 #define of_match_ptr(_ptr)	NULL
 #define of_match_node(_matches, _node)	NULL
 #endif /* CONFIG_OF */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 3/8] coresight: of: Use of_cpu_node_to_id helper
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:32   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Reuse the new generic helper, of_cpu_node_to_id() to map a
given CPU phandle to a logical CPU number.

Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V4:
 - Fix a regression introduced in v4, reported by bugrobot
Changes since V3:
 - Reflect the renaming of the helper and return value changes
---
 drivers/hwtracing/coresight/of_coresight.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
index a18794128bf8..7c375443ede6 100644
--- a/drivers/hwtracing/coresight/of_coresight.c
+++ b/drivers/hwtracing/coresight/of_coresight.c
@@ -104,26 +104,17 @@ static int of_coresight_alloc_memory(struct device *dev,
 int of_coresight_get_cpu(const struct device_node *node)
 {
 	int cpu;
-	bool found;
-	struct device_node *dn, *np;
+	struct device_node *dn;
 
 	dn = of_parse_phandle(node, "cpu", 0);
-
 	/* Affinity defaults to CPU0 */
 	if (!dn)
 		return 0;
-
-	for_each_possible_cpu(cpu) {
-		np = of_cpu_device_node_get(cpu);
-		found = (dn == np);
-		of_node_put(np);
-		if (found)
-			break;
-	}
+	cpu = of_cpu_node_to_id(dn);
 	of_node_put(dn);
 
 	/* Affinity to CPU0 if no cpu nodes are found */
-	return found ? cpu : 0;
+	return (cpu < 0) ? 0 : cpu;
 }
 EXPORT_SYMBOL_GPL(of_coresight_get_cpu);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 3/8] coresight: of: Use of_cpu_node_to_id helper
@ 2017-10-10 10:32   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

Reuse the new generic helper, of_cpu_node_to_id() to map a
given CPU phandle to a logical CPU number.

Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V4:
 - Fix a regression introduced in v4, reported by bugrobot
Changes since V3:
 - Reflect the renaming of the helper and return value changes
---
 drivers/hwtracing/coresight/of_coresight.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
index a18794128bf8..7c375443ede6 100644
--- a/drivers/hwtracing/coresight/of_coresight.c
+++ b/drivers/hwtracing/coresight/of_coresight.c
@@ -104,26 +104,17 @@ static int of_coresight_alloc_memory(struct device *dev,
 int of_coresight_get_cpu(const struct device_node *node)
 {
 	int cpu;
-	bool found;
-	struct device_node *dn, *np;
+	struct device_node *dn;
 
 	dn = of_parse_phandle(node, "cpu", 0);
-
 	/* Affinity defaults to CPU0 */
 	if (!dn)
 		return 0;
-
-	for_each_possible_cpu(cpu) {
-		np = of_cpu_device_node_get(cpu);
-		found = (dn == np);
-		of_node_put(np);
-		if (found)
-			break;
-	}
+	cpu = of_cpu_node_to_id(dn);
 	of_node_put(dn);
 
 	/* Affinity to CPU0 if no cpu nodes are found */
-	return found ? cpu : 0;
+	return (cpu < 0) ? 0 : cpu;
 }
 EXPORT_SYMBOL_GPL(of_coresight_get_cpu);
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 4/8] irqchip: gic-v3: Use of_cpu_node_to_id helper
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:32   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Use the new generic helper of_cpu_node_to_id() instead
of using our own version to map a device node to logical CPU
number.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V3:
 - Reflect the change in the helper name and return value.
---
 drivers/irqchip/irq-gic-v3.c | 29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index b5df99c6f680..04cecab71597 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1038,31 +1038,6 @@ static int __init gic_validate_dist_version(void __iomem *dist_base)
 	return 0;
 }
 
-static int get_cpu_number(struct device_node *dn)
-{
-	const __be32 *cell;
-	u64 hwid;
-	int cpu;
-
-	cell = of_get_property(dn, "reg", NULL);
-	if (!cell)
-		return -1;
-
-	hwid = of_read_number(cell, of_n_addr_cells(dn));
-
-	/*
-	 * Non affinity bits must be set to 0 in the DT
-	 */
-	if (hwid & ~MPIDR_HWID_BITMASK)
-		return -1;
-
-	for_each_possible_cpu(cpu)
-		if (cpu_logical_map(cpu) == hwid)
-			return cpu;
-
-	return -1;
-}
-
 /* Create all possible partitions at boot time */
 static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 {
@@ -1113,8 +1088,8 @@ static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 			if (WARN_ON(!cpu_node))
 				continue;
 
-			cpu = get_cpu_number(cpu_node);
-			if (WARN_ON(cpu == -1))
+			cpu = of_cpu_node_to_id(cpu_node);
+			if (WARN_ON(cpu < 0))
 				continue;
 
 			pr_cont("%pOF[%d] ", cpu_node, cpu);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 4/8] irqchip: gic-v3: Use of_cpu_node_to_id helper
@ 2017-10-10 10:32   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

Use the new generic helper of_cpu_node_to_id() instead
of using our own version to map a device node to logical CPU
number.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V3:
 - Reflect the change in the helper name and return value.
---
 drivers/irqchip/irq-gic-v3.c | 29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index b5df99c6f680..04cecab71597 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1038,31 +1038,6 @@ static int __init gic_validate_dist_version(void __iomem *dist_base)
 	return 0;
 }
 
-static int get_cpu_number(struct device_node *dn)
-{
-	const __be32 *cell;
-	u64 hwid;
-	int cpu;
-
-	cell = of_get_property(dn, "reg", NULL);
-	if (!cell)
-		return -1;
-
-	hwid = of_read_number(cell, of_n_addr_cells(dn));
-
-	/*
-	 * Non affinity bits must be set to 0 in the DT
-	 */
-	if (hwid & ~MPIDR_HWID_BITMASK)
-		return -1;
-
-	for_each_possible_cpu(cpu)
-		if (cpu_logical_map(cpu) == hwid)
-			return cpu;
-
-	return -1;
-}
-
 /* Create all possible partitions at boot time */
 static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 {
@@ -1113,8 +1088,8 @@ static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 			if (WARN_ON(!cpu_node))
 				continue;
 
-			cpu = get_cpu_number(cpu_node);
-			if (WARN_ON(cpu == -1))
+			cpu = of_cpu_node_to_id(cpu_node);
+			if (WARN_ON(cpu < 0))
 				continue;
 
 			pr_cont("%pOF[%d] ", cpu_node, cpu);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:33   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose, Catalin Marinas

Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/topology.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..21868530018e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
 	if (!cpu_node)
 		return -1;
 
-	for_each_possible_cpu(cpu) {
-		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
-			topology_parse_cpu_capacity(cpu_node, cpu);
-			of_node_put(cpu_node);
-			return cpu;
-		}
-	}
-
-	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+	cpu = of_cpu_node_to_id(cpu_node);
+	if (cpu >= 0)
+		topology_parse_cpu_capacity(cpu_node, cpu);
+	else
+		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
 
 	of_node_put(cpu_node);
-	return -1;
+	return cpu;
 }
 
 static int __init parse_core(struct device_node *core, int cluster_id,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-10 10:33   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/topology.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..21868530018e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
 	if (!cpu_node)
 		return -1;
 
-	for_each_possible_cpu(cpu) {
-		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
-			topology_parse_cpu_capacity(cpu_node, cpu);
-			of_node_put(cpu_node);
-			return cpu;
-		}
-	}
-
-	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+	cpu = of_cpu_node_to_id(cpu_node);
+	if (cpu >= 0)
+		topology_parse_cpu_capacity(cpu_node, cpu);
+	else
+		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
 
 	of_node_put(cpu_node);
-	return -1;
+	return cpu;
 }
 
 static int __init parse_core(struct device_node *core, int cluster_id,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 6/8] arm_pmu: Use of_cpu_node_to_id helper
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:33   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Use the new generic helper, of_cpu_node_to_id(), to map a
a phandle to the logical CPU number while parsing the
PMU irq affinity.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/perf/arm_pmu_platform.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 4eafa7a42e52..a96884e37eaf 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -81,19 +81,10 @@ static int pmu_parse_irq_affinity(struct device_node *node, int i)
 		return -EINVAL;
 	}
 
-	/* Now look up the logical CPU number */
-	for_each_possible_cpu(cpu) {
-		struct device_node *cpu_dn;
-
-		cpu_dn = of_cpu_device_node_get(cpu);
-		of_node_put(cpu_dn);
-
-		if (dn == cpu_dn)
-			break;
-	}
-
-	if (cpu >= nr_cpu_ids) {
+	cpu = of_cpu_node_to_id(dn);
+	if (cpu < 0) {
 		pr_warn("failed to find logical CPU for %s\n", dn->name);
+		cpu = nr_cpu_ids;
 	}
 
 	of_node_put(dn);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 6/8] arm_pmu: Use of_cpu_node_to_id helper
@ 2017-10-10 10:33   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

Use the new generic helper, of_cpu_node_to_id(), to map a
a phandle to the logical CPU number while parsing the
PMU irq affinity.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/perf/arm_pmu_platform.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 4eafa7a42e52..a96884e37eaf 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -81,19 +81,10 @@ static int pmu_parse_irq_affinity(struct device_node *node, int i)
 		return -EINVAL;
 	}
 
-	/* Now look up the logical CPU number */
-	for_each_possible_cpu(cpu) {
-		struct device_node *cpu_dn;
-
-		cpu_dn = of_cpu_device_node_get(cpu);
-		of_node_put(cpu_dn);
-
-		if (dn == cpu_dn)
-			break;
-	}
-
-	if (cpu >= nr_cpu_ids) {
+	cpu = of_cpu_node_to_id(dn);
+	if (cpu < 0) {
 		pr_warn("failed to find logical CPU for %s\n", dn->name);
+		cpu = nr_cpu_ids;
 	}
 
 	of_node_put(dn);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:33   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

This patch documents the devicetree bindings for ARM DSU PMU.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: devicetree@vger.kernel.org
Cc: frowand.list@gmail.com
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V3:
 - Fixed node name in the example, suggested by Rob
---
 .../devicetree/bindings/arm/arm-dsu-pmu.txt        | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt

diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
new file mode 100644
index 000000000000..6efabba530f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
@@ -0,0 +1,27 @@
+* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
+
+ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
+with a shared L3 memory system, control logic and external interfaces to
+form a multicore cluster. The PMU enables to gather various statistics on
+the operations of the DSU. The PMU provides independent 32bit counters that
+can count any of the supported events, along with a 64bit cycle counter.
+The PMU is accessed via CPU system registers and has no MMIO component.
+
+** DSU PMU required properties:
+
+- compatible	: should be one of :
+
+		"arm,dsu-pmu"
+
+- interrupts	: Exactly 1 SPI must be listed.
+
+- cpus		: List of phandles for the CPUs connected to this DSU instance.
+
+
+** Example:
+
+dsu-pmu-0 {
+	compatible = "arm,dsu-pmu";
+	interrupts = <GIC_SPI 02 IRQ_TYPE_LEVEL_HIGH>;
+	cpus = <&cpu_0>, <&cpu_1>;
+};
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU
@ 2017-10-10 10:33   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

This patch documents the devicetree bindings for ARM DSU PMU.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: devicetree at vger.kernel.org
Cc: frowand.list at gmail.com
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V3:
 - Fixed node name in the example, suggested by Rob
---
 .../devicetree/bindings/arm/arm-dsu-pmu.txt        | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt

diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
new file mode 100644
index 000000000000..6efabba530f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
@@ -0,0 +1,27 @@
+* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
+
+ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
+with a shared L3 memory system, control logic and external interfaces to
+form a multicore cluster. The PMU enables to gather various statistics on
+the operations of the DSU. The PMU provides independent 32bit counters that
+can count any of the supported events, along with a 64bit cycle counter.
+The PMU is accessed via CPU system registers and has no MMIO component.
+
+** DSU PMU required properties:
+
+- compatible	: should be one of :
+
+		"arm,dsu-pmu"
+
+- interrupts	: Exactly 1 SPI must be listed.
+
+- cpus		: List of phandles for the CPUs connected to this DSU instance.
+
+
+** Example:
+
+dsu-pmu-0 {
+	compatible = "arm,dsu-pmu";
+	interrupts = <GIC_SPI 02 IRQ_TYPE_LEVEL_HIGH>;
+	cpus = <&cpu_0>, <&cpu_1>;
+};
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
  2017-10-10 10:32 ` Suzuki K Poulose
@ 2017-10-10 10:33   ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mark.rutland, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Suzuki K Poulose

Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.
The driver only supports ARM64 at the moment. It can be extended
to support ARM32 by providing register accessors like we do in
arch/arm64/include/arm_dsu_pmu.h.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V6:
 - Address comments from Jonathan
 - Add Reviewed-by tags from Jonathan
Changes since V5:
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with the DSU.
Changes since V4:
 - Reflect the changed generic helper for mapping CPU id
Changes since V2:
 - Cleanup dsu_pmu_device_probe error handling.
 - Fix event validate_group to invert the result check of validate_event
 - Return errors if we failed to parse CPUs in the DSU.
 - Add MODULE_DEVICE_TABLE entry
 - Use hlist_entry_safe for converting cpuhp_node to dsu_pmu.
---
 Documentation/perf/arm_dsu_pmu.txt   |  28 ++
 arch/arm64/include/asm/arm_dsu_pmu.h | 124 ++++++
 drivers/perf/Kconfig                 |   9 +
 drivers/perf/Makefile                |   1 +
 drivers/perf/arm_dsu_pmu.c           | 826 +++++++++++++++++++++++++++++++++++
 5 files changed, 988 insertions(+)
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/Documentation/perf/arm_dsu_pmu.txt b/Documentation/perf/arm_dsu_pmu.txt
new file mode 100644
index 000000000000..d611e15f5add
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.txt
@@ -0,0 +1,28 @@
+ARM DynamIQ Shared Unit (DSU) PMU
+==================================
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under :
+
+  /sys/bus/event_sources/devices/arm_dsu_<N>/
+
+The user should refer to the TRM of the product to figure out the supported events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
+
+
+e.g usage :
+
+	perf stat -a -e arm_dsu_0/cycles/
diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
new file mode 100644
index 000000000000..5d1b0d9ff5bb
--- /dev/null
+++ b/arch/arm64/include/asm/arm_dsu_pmu.h
@@ -0,0 +1,124 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU Low level register access routines.
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2, as published by the Free Software Foundation.
+ */
+
+#include <asm/sysreg.h>
+
+
+#define CLUSTERPMCR_EL1			sys_reg(3, 0, 15, 5, 0)
+#define CLUSTERPMCNTENSET_EL1		sys_reg(3, 0, 15, 5, 1)
+#define CLUSTERPMCNTENCLR_EL1		sys_reg(3, 0, 15, 5, 2)
+#define CLUSTERPMOVSSET_EL1		sys_reg(3, 0, 15, 5, 3)
+#define CLUSTERPMOVSCLR_EL1		sys_reg(3, 0, 15, 5, 4)
+#define CLUSTERPMSELR_EL1		sys_reg(3, 0, 15, 5, 5)
+#define CLUSTERPMINTENSET_EL1		sys_reg(3, 0, 15, 5, 6)
+#define CLUSTERPMINTENCLR_EL1		sys_reg(3, 0, 15, 5, 7)
+#define CLUSTERPMCCNTR_EL1		sys_reg(3, 0, 15, 6, 0)
+#define CLUSTERPMXEVTYPER_EL1		sys_reg(3, 0, 15, 6, 1)
+#define CLUSTERPMXEVCNTR_EL1		sys_reg(3, 0, 15, 6, 2)
+#define CLUSTERPMMDCR_EL1		sys_reg(3, 0, 15, 6, 3)
+#define CLUSTERPMCEID0_EL1		sys_reg(3, 0, 15, 6, 4)
+#define CLUSTERPMCEID1_EL1		sys_reg(3, 0, 15, 6, 5)
+
+static inline u32 __dsu_pmu_read_pmcr(void)
+{
+	return read_sysreg_s(CLUSTERPMCR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmcr(u32 val)
+{
+	write_sysreg_s(val, CLUSTERPMCR_EL1);
+	isb();
+}
+
+static inline u32 __dsu_pmu_getreset_overflow(void)
+{
+	u32 val = read_sysreg_s(CLUSTERPMOVSCLR_EL1);
+	/* Clear the bit */
+	write_sysreg_s(val, CLUSTERPMOVSCLR_EL1);
+	isb();
+	return val;
+}
+
+static inline void __dsu_pmu_select_counter(int counter)
+{
+	write_sysreg_s(counter, CLUSTERPMSELR_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_counter(int counter)
+{
+	__dsu_pmu_select_counter(counter);
+	return read_sysreg_s(CLUSTERPMXEVCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_counter(int counter, u64 val)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(val, CLUSTERPMXEVCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_set_event(int counter, u32 event)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(event, CLUSTERPMXEVTYPER_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_pmccntr(void)
+{
+	return read_sysreg_s(CLUSTERPMCCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmccntr(u64 val)
+{
+	write_sysreg_s(val, CLUSTERPMCCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_disable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENCLR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_enable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_enable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_disable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
+	isb();
+}
+
+
+static inline u32 __dsu_pmu_read_pmceid(int n)
+{
+	switch (n) {
+	case 0:
+		return read_sysreg_s(CLUSTERPMCEID0_EL1);
+	case 1:
+		return read_sysreg_s(CLUSTERPMCEID1_EL1);
+	default:
+		BUILD_BUG();
+		return 0;
+	}
+}
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index e5197ffb7422..ee3d7d13977c 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -17,6 +17,15 @@ config ARM_PMU_ACPI
 	depends on ARM_PMU && ACPI
 	def_bool y
 
+config ARM_DSU_PMU
+	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
+	depends on ARM64 && PERF_EVENTS
+	  help
+	  Provides support for performance monitor unit in ARM DynamIQ Shared
+	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory
+	  system, control logic. The PMU allows counting various events related
+	  to DSU.
+
 config QCOM_L2_PMU
 	bool "Qualcomm Technologies L2-cache PMU"
 	depends on ARCH_QCOM && ARM64 && ACPI
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 6420bd4394d5..0adb4f6926a4 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
 obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
+obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
 obj-$(CONFIG_QCOM_L2_PMU)	+= qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
new file mode 100644
index 000000000000..6352e5f3fa0a
--- /dev/null
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -0,0 +1,826 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU driver
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Based on ARM CCI-PMU, ARMv8 PMU-v3 drivers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#define PMUNAME		"arm_dsu"
+#define DRVNAME		PMUNAME "_pmu"
+#define pr_fmt(fmt)	DRVNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+
+#include <asm/arm_dsu_pmu.h>
+
+/* PMU event codes */
+#define DSU_PMU_EVT_CYCLES		0x11
+#define DSU_PMU_EVT_CHAIN		0x1e
+
+#define DSU_PMU_MAX_COMMON_EVENTS	0x40
+
+#define DSU_PMU_MAX_HW_CNTRS		32
+#define DSU_PMU_HW_COUNTER_MASK		(DSU_PMU_MAX_HW_CNTRS - 1)
+
+#define CLUSTERPMCR_E			BIT(0)
+#define CLUSTERPMCR_P			BIT(1)
+#define CLUSTERPMCR_C			BIT(2)
+#define CLUSTERPMCR_N_SHIFT		11
+#define CLUSTERPMCR_N_MASK		0x1f
+#define CLUSTERPMCR_IDCODE_SHIFT	16
+#define CLUSTERPMCR_IDCODE_MASK		0xff
+#define CLUSTERPMCR_IMP_SHIFT		24
+#define CLUSTERPMCR_IMP_MASK		0xff
+#define CLUSTERPMCR_RES_MASK		0x7e8
+#define CLUSTERPMCR_RES_VAL		0x40
+
+#define DSU_ACTIVE_CPU_MASK		0x0
+#define DSU_ASSOCIATED_CPU_MASK		0x1
+
+/*
+ * We use the index of the counters as they appear in the counter
+ * bit maps in the PMU registers (e.g CLUSTERPMSELR).
+ * i.e,
+ *	counter 0	- Bit 0
+ *	counter 1	- Bit 1
+ *	...
+ *	Cycle counter	- Bit 31
+ */
+#define DSU_PMU_IDX_CYCLE_COUNTER	31
+
+/* All event counters are 32bit, with a 64bit Cycle counter */
+#define DSU_PMU_COUNTER_WIDTH(idx)	\
+	(((idx) == DSU_PMU_IDX_CYCLE_COUNTER) ? 64 : 32)
+
+#define DSU_PMU_COUNTER_MASK(idx)	\
+	GENMASK_ULL((DSU_PMU_COUNTER_WIDTH((idx)) - 1), 0)
+
+#define DSU_EXT_ATTR(_name, _func, _config)		\
+	(&((struct dev_ext_attribute[]) {				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define DSU_EVENT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_event_show, (unsigned long)_config)
+
+#define DSU_FORMAT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_format_show, (char *)_config)
+
+#define DSU_CPUMASK_ATTR(_name, _config)	\
+	DSU_EXT_ATTR(_name, dsu_pmu_cpumask_show, (unsigned long)_config)
+
+struct dsu_hw_events {
+	DECLARE_BITMAP(used_mask, DSU_PMU_MAX_HW_CNTRS);
+	struct perf_event	*events[DSU_PMU_MAX_HW_CNTRS];
+};
+
+/*
+ * struct dsu_pmu	- DSU PMU descriptor
+ *
+ * @pmu_lock		: Protects accesses to DSU PMU register from normal vs
+ *			  interrupt handler contexts.
+ * @hw_events		: Holds the event counter state.
+ * @associated_cpus	: CPUs attached to the DSU.
+ * @active_cpu		: CPU to which the PMU is bound for accesses.
+ * @cpuhp_node		: Node for CPU hotplug notifier link.
+ * @num_counters	: Number of event counters implemented by the PMU,
+ *			  excluding the cycle counter.
+ * @irq			: Interrupt line for counter overflow.
+ * @cpmceid_bitmap	: Bitmap for the availability of architected common
+ *			  events (event_code < 0x40).
+ */
+struct dsu_pmu {
+	struct pmu			pmu;
+	struct device			*dev;
+	raw_spinlock_t			pmu_lock;
+	struct dsu_hw_events		hw_events;
+	cpumask_t			associated_cpus;
+	cpumask_t			active_cpu;
+	struct hlist_node		cpuhp_node;
+	u8				num_counters;
+	int				irq;
+	DECLARE_BITMAP(cpmceid_bitmap, DSU_PMU_MAX_COMMON_EVENTS);
+};
+
+static unsigned long dsu_pmu_cpuhp_state;
+
+static inline struct dsu_pmu *to_dsu_pmu(struct pmu *pmu)
+{
+	return container_of(pmu, struct dsu_pmu, pmu);
+}
+
+static ssize_t dsu_pmu_sysfs_event_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "event=0x%lx\n",
+					 (unsigned long)eattr->var);
+}
+
+static ssize_t dsu_pmu_sysfs_format_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "%s\n", (char *)eattr->var);
+}
+
+static ssize_t dsu_pmu_cpumask_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case DSU_ACTIVE_CPU_MASK:
+		cpumask = &dsu_pmu->active_cpu;
+		break;
+	case DSU_ASSOCIATED_CPU_MASK:
+		cpumask = &dsu_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *dsu_pmu_format_attrs[] = {
+	DSU_FORMAT_ATTR(event, "config:0-31"),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = dsu_pmu_format_attrs,
+};
+
+static struct attribute *dsu_pmu_event_attrs[] = {
+	DSU_EVENT_ATTR(cycles, 0x11),
+	DSU_EVENT_ATTR(bus_acecss, 0x19),
+	DSU_EVENT_ATTR(memory_error, 0x1a),
+	DSU_EVENT_ATTR(bus_cycles, 0x1d),
+	DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+	DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+	DSU_EVENT_ATTR(l3d_cache, 0x2b),
+	DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),
+	NULL,
+};
+
+static umode_t
+dsu_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *attr,
+				int unused)
+{
+	struct pmu *pmu = dev_get_drvdata(kobj_to_dev(kobj));
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr.attr);
+	unsigned long evt = (unsigned long)eattr->var;
+
+	return test_bit(evt, dsu_pmu->cpmceid_bitmap) ? attr->mode : 0;
+}
+
+static const struct attribute_group dsu_pmu_events_attr_group = {
+	.name = "events",
+	.attrs = dsu_pmu_event_attrs,
+	.is_visible = dsu_pmu_event_attr_is_visible,
+};
+
+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+	DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+	DSU_CPUMASK_ATTR(associated_cpus, DSU_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_cpumask_attr_group = {
+	.attrs = dsu_pmu_cpumask_attrs,
+};
+
+static const struct attribute_group *dsu_pmu_attr_groups[] = {
+	&dsu_pmu_cpumask_attr_group,
+	&dsu_pmu_events_attr_group,
+	&dsu_pmu_format_attr_group,
+	NULL,
+};
+
+static int dsu_pmu_get_online_cpu(struct dsu_pmu *dsu_pmu)
+{
+	return cpumask_first_and(&dsu_pmu->associated_cpus, cpu_online_mask);
+}
+
+static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
+{
+	struct cpumask online_supported;
+
+	cpumask_and(&online_supported,
+			 &dsu_pmu->associated_cpus, cpu_online_mask);
+	return cpumask_any_but(&online_supported, cpu);
+}
+
+static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
+{
+	return (idx < dsu_pmu->num_counters) ||
+	       (idx == DSU_PMU_IDX_CYCLE_COUNTER);
+}
+
+static inline u64 dsu_pmu_read_counter(struct perf_event *event)
+{
+	u64 val;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+				 &dsu_pmu->active_cpu)))
+		return 0;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying reading invalid counter %d\n", idx);
+		return 0;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		val = __dsu_pmu_read_pmccntr();
+	else
+		val = __dsu_pmu_read_counter(idx);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+
+	return val;
+}
+
+static void dsu_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+			 &dsu_pmu->active_cpu)))
+		return;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"writing to invalid counter %d\n", idx);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		__dsu_pmu_write_pmccntr(val);
+	else
+		__dsu_pmu_write_counter(idx, val);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static int dsu_pmu_get_event_idx(struct dsu_hw_events *hw_events,
+				 struct perf_event *event)
+{
+	int idx;
+	unsigned long evtype = event->attr.config;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	unsigned long *used_mask = hw_events->used_mask;
+
+	if (evtype == DSU_PMU_EVT_CYCLES) {
+		if (test_and_set_bit(DSU_PMU_IDX_CYCLE_COUNTER, used_mask))
+			return -EAGAIN;
+		return DSU_PMU_IDX_CYCLE_COUNTER;
+	}
+
+	idx = find_next_zero_bit(used_mask, dsu_pmu->num_counters, 0);
+	if (idx >= dsu_pmu->num_counters)
+		return -EAGAIN;
+	set_bit(idx, hw_events->used_mask);
+	return idx;
+}
+
+static void dsu_pmu_enable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	__dsu_pmu_counter_interrupt_enable(idx);
+	__dsu_pmu_enable_counter(idx);
+}
+
+static void dsu_pmu_disable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	__dsu_pmu_disable_counter(idx);
+	__dsu_pmu_counter_interrupt_disable(idx);
+}
+
+static inline void dsu_pmu_set_event(struct dsu_pmu *dsu_pmu,
+					struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	unsigned long flags;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying to set invalid counter %d\n", idx);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	__dsu_pmu_set_event(idx, event->hw.config_base);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_event_update(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev_count, new_count;
+
+	do {
+		/* We may also be called from the irq handler */
+		prev_count = local64_read(&hwc->prev_count);
+		new_count = dsu_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev_count, new_count) !=
+			prev_count);
+	delta = (new_count - prev_count) & DSU_PMU_COUNTER_MASK(hwc->idx);
+	local64_add(delta, &event->count);
+}
+
+static void dsu_pmu_read(struct perf_event *event)
+{
+	dsu_pmu_event_update(event);
+}
+
+static inline u32 dsu_pmu_getreset_overflow(void)
+{
+	return __dsu_pmu_getreset_overflow();
+}
+
+/**
+ * dsu_pmu_set_event_period: Set the period for the counter.
+ *
+ * All DSU PMU event counters, except the cycle counter are 32bit
+ * counters. To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void dsu_pmu_set_event_period(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = DSU_PMU_COUNTER_MASK(idx) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	dsu_pmu_write_counter(event, val);
+}
+
+static irqreturn_t dsu_pmu_handle_irq(int irq_num, void *dev)
+{
+	int i;
+	bool handled = false;
+	struct dsu_pmu *dsu_pmu = dev;
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	unsigned long overflow, workset;
+
+	overflow = dsu_pmu_getreset_overflow();
+	bitmap_and(&workset, &overflow, hw_events->used_mask,
+		   DSU_PMU_MAX_HW_CNTRS);
+
+	if (!workset)
+		return IRQ_NONE;
+
+	for_each_set_bit(i, &workset, DSU_PMU_MAX_HW_CNTRS) {
+		struct perf_event *event = hw_events->events[i];
+
+		if (!event)
+			continue;
+		dsu_pmu_event_update(event);
+		dsu_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+	return IRQ_RETVAL(handled);
+}
+
+static void dsu_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(event->hw.state & PERF_HES_UPTODATE));
+	dsu_pmu_set_event_period(event);
+	if (event->hw.idx != DSU_PMU_IDX_CYCLE_COUNTER)
+		dsu_pmu_set_event(dsu_pmu, event);
+	event->hw.state = 0;
+	dsu_pmu_enable_counter(dsu_pmu, event->hw.idx);
+}
+
+static void dsu_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+	dsu_pmu_disable_counter(dsu_pmu, event->hw.idx);
+	dsu_pmu_event_update(event);
+	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int dsu_pmu_add(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &dsu_pmu->active_cpu)))
+		return -ENOENT;
+
+	idx = dsu_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hwc->idx = idx;
+	hw_events->events[idx] = event;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		dsu_pmu_start(event, PERF_EF_RELOAD);
+
+	perf_event_update_userpage(event);
+	return 0;
+}
+
+static void dsu_pmu_del(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	dsu_pmu_stop(event, PERF_EF_UPDATE);
+	hw_events->events[idx] = NULL;
+	clear_bit(idx, hw_events->used_mask);
+	perf_event_update_userpage(event);
+}
+
+static void dsu_pmu_enable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+
+	/* If no counters are added, skip enabling the PMU */
+	if (bitmap_empty(dsu_pmu->hw_events.used_mask, DSU_PMU_MAX_HW_CNTRS))
+		return;
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr |= CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_disable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr &= ~CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static bool dsu_pmu_validate_event(struct pmu *pmu,
+				  struct dsu_hw_events *hw_events,
+				  struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+	return dsu_pmu_get_event_idx(hw_events, event) >= 0;
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool dsu_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct dsu_hw_events fake_hw;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
+	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
+		return false;
+	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
+			return false;
+	}
+	return dsu_pmu_validate_event(event->pmu, &fake_hw, event);
+}
+
+static int dsu_pmu_event_init(struct perf_event *event)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* We don't support sampling */
+	if (is_sampling_event(event)) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* We cannot support task bound events */
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	if (has_branch_stack(event) ||
+	    event->attr.exclude_user ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv ||
+	    event->attr.exclude_idle ||
+	    event->attr.exclude_host ||
+	    event->attr.exclude_guest) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
+		return -EINVAL;
+	}
+
+	if (!dsu_pmu_validate_group(event))
+		return -EINVAL;
+	if (!cpumask_test_cpu(event->cpu, &dsu_pmu->associated_cpus)) {
+		dev_dbg(dsu_pmu->pmu.dev,
+			 "Requested cpu is not associated with the DSU\n");
+		return -EINVAL;
+	}
+	/*
+	 * Choose the current active CPU to read the events. We don't want
+	 * to migrate the event contexts, irq handling etc to the requested
+	 * CPU. As long as the requested CPU is within the same DSU, we
+	 * are fine.
+	 */
+	event->cpu = cpumask_first(&dsu_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	event->hw.config_base = event->attr.config;
+	return 0;
+}
+
+static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu;
+
+	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
+	if (!dsu_pmu)
+		return ERR_PTR(-ENOMEM);
+
+	raw_spin_lock_init(&dsu_pmu->pmu_lock);
+	return dsu_pmu;
+}
+
+/**
+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
+ */
+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
+{
+	int i = 0, n, cpu;
+	struct device_node *cpu_node;
+
+	n = of_count_phandle_with_args(dev, "cpus", NULL);
+	if (n <= 0)
+		return -ENODEV;
+	for (; i < n; i++) {
+		cpu_node = of_parse_phandle(dev, "cpus", i);
+		if (!cpu_node)
+			break;
+		cpu = of_cpu_node_to_id(cpu_node);
+		of_node_put(cpu_node);
+		/*
+		 * We have to ignore the failures here and continue scanning
+		 * the list to handle cases where the nr_cpus could be capped
+		 * in the running kernel.
+		 */
+		if (cpu < 0)
+			continue;
+		cpumask_set_cpu(cpu, mask);
+	}
+	return 0;
+}
+
+/*
+ * dsu_pmu_probe_pmu: Probe the PMU details on a CPU in the cluster.
+ */
+static void dsu_pmu_probe_pmu(void *data)
+{
+	struct dsu_pmu *dsu_pmu = data;
+	u64 num_counters;
+	u32 cpmceid[2];
+
+	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
+						CLUSTERPMCR_N_MASK;
+	/* We can only support upto 31 independent counters */
+	if (WARN_ON(num_counters > 31))
+		num_counters = 31;
+	dsu_pmu->num_counters = num_counters;
+	if (!dsu_pmu->num_counters)
+		return;
+	cpmceid[0] = __dsu_pmu_read_pmceid(0);
+	cpmceid[1] = __dsu_pmu_read_pmceid(1);
+	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
+				DSU_PMU_MAX_COMMON_EVENTS,
+				cpmceid,
+				ARRAY_SIZE(cpmceid));
+}
+
+static int dsu_pmu_device_probe(struct platform_device *pdev)
+{
+	int irq, rc, cpu;
+	struct dsu_pmu *dsu_pmu;
+	char *name;
+	static atomic_t pmu_idx = ATOMIC_INIT(-1);
+
+	dsu_pmu = dsu_pmu_alloc(pdev);
+	if (IS_ERR(dsu_pmu))
+		return PTR_ERR(dsu_pmu);
+
+	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
+		return rc;
+	}
+
+	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
+					dsu_pmu_probe_pmu,
+					dsu_pmu, 1);
+	if (rc)
+		return rc;
+	if (!dsu_pmu->num_counters)
+		return -ENODEV;
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_warn(&pdev->dev, "Failed to find IRQ\n");
+		return -EINVAL;
+	}
+
+	name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%s_%d",
+				PMUNAME, atomic_inc_return(&pmu_idx));
+	rc = devm_request_irq(&pdev->dev, irq, dsu_pmu_handle_irq,
+			      IRQF_NOBALANCING, name, dsu_pmu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to request IRQ %d\n", irq);
+		return rc;
+	}
+
+	/*
+	 * Find one CPU in the DSU to handle the IRQs.
+	 * It is highly unlikely that we would fail
+	 * to find one, given the probing has succeeded.
+	 */
+	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+	if (cpu >= nr_cpu_ids)
+		return -ENODEV;
+	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
+	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
+					 irq);
+		return rc;
+	}
+
+	platform_set_drvdata(pdev, dsu_pmu);
+	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+						&dsu_pmu->cpuhp_node);
+	if (rc)
+		goto irq_cleanup;
+
+	dsu_pmu->irq = irq;
+	dsu_pmu->pmu = (struct pmu) {
+		.task_ctx_nr	= perf_invalid_context,
+
+		.pmu_enable	= dsu_pmu_enable,
+		.pmu_disable	= dsu_pmu_disable,
+		.event_init	= dsu_pmu_event_init,
+		.add		= dsu_pmu_add,
+		.del		= dsu_pmu_del,
+		.start		= dsu_pmu_start,
+		.stop		= dsu_pmu_stop,
+		.read		= dsu_pmu_read,
+
+		.attr_groups	= dsu_pmu_attr_groups,
+	};
+
+	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
+	if (rc)
+		goto cpuhp_cleanup;
+
+	dev_info(&pdev->dev, "Registered %s with %d event counters",
+				name, dsu_pmu->num_counters);
+	return 0;
+
+cpuhp_cleanup:
+	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
+irq_cleanup:
+	irq_set_affinity_hint(dsu_pmu->irq, NULL);
+	return rc;
+}
+
+static int dsu_pmu_device_remove(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&dsu_pmu->pmu);
+	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
+	irq_set_affinity_hint(dsu_pmu->irq, NULL);
+
+	return 0;
+}
+
+static const struct of_device_id dsu_pmu_of_match[] = {
+	{ .compatible = "arm,dsu-pmu", },
+	{},
+};
+
+static struct platform_driver dsu_pmu_driver = {
+	.driver = {
+		.name	= DRVNAME,
+		.of_match_table = of_match_ptr(dsu_pmu_of_match),
+	},
+	.probe = dsu_pmu_device_probe,
+	.remove = dsu_pmu_device_remove,
+};
+
+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
+						   cpuhp_node);
+
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+		return 0;
+
+	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
+	if (dst < nr_cpu_ids) {
+		cpumask_set_cpu(dst, &dsu_pmu->active_cpu);
+		perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+		irq_set_affinity_hint(dsu_pmu->irq, &dsu_pmu->active_cpu);
+	}
+
+	return 0;
+}
+
+static int __init dsu_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+					DRVNAME,
+					NULL,
+					dsu_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	dsu_pmu_cpuhp_state = ret;
+	return platform_driver_register(&dsu_pmu_driver);
+}
+
+static void __exit dsu_pmu_exit(void)
+{
+	platform_driver_unregister(&dsu_pmu_driver);
+	cpuhp_remove_multi_state(dsu_pmu_cpuhp_state);
+}
+
+module_init(dsu_pmu_init);
+module_exit(dsu_pmu_exit);
+
+MODULE_DEVICE_TABLE(of, dsu_pmu_of_match);
+MODULE_DESCRIPTION("Perf driver for ARM DynamIQ Shared Unit");
+MODULE_AUTHOR("Suzuki K Poulose <suzuki.poulose@arm.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
@ 2017-10-10 10:33   ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-10 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.
The driver only supports ARM64 at the moment. It can be extended
to support ARM32 by providing register accessors like we do in
arch/arm64/include/arm_dsu_pmu.h.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V6:
 - Address comments from Jonathan
 - Add Reviewed-by tags from Jonathan
Changes since V5:
 - Address comments on V5 by Mark.
 - Use IRQ_NOBALANCING for IRQ handler
 - Don't expose events which could be unimplemented.
 - Get rid of dsu_pmu_event_supported and allow raw event
   code to be used without validating whether it is supported.
 - Rename "supported_cpus" mask to "associated_cpus"
 - Add Documentation for the PMU driver
 - Don't disable IRQ for dsu_pmu_{enable/disable}_counters
 - Use consistent return codes for validate_event/group calls.
 - Check PERF_ATTACH_TASK flag in event_init.
 - Allow missing CPUs in dsu_pmu_dt_get_cpus, to handle cases
   where kernel could have capped nr_cpus.
 - Cleanup sanity checking for the CPU before accessing DSU
 - Reject events with counting CPU not associated with the DSU.
Changes since V4:
 - Reflect the changed generic helper for mapping CPU id
Changes since V2:
 - Cleanup dsu_pmu_device_probe error handling.
 - Fix event validate_group to invert the result check of validate_event
 - Return errors if we failed to parse CPUs in the DSU.
 - Add MODULE_DEVICE_TABLE entry
 - Use hlist_entry_safe for converting cpuhp_node to dsu_pmu.
---
 Documentation/perf/arm_dsu_pmu.txt   |  28 ++
 arch/arm64/include/asm/arm_dsu_pmu.h | 124 ++++++
 drivers/perf/Kconfig                 |   9 +
 drivers/perf/Makefile                |   1 +
 drivers/perf/arm_dsu_pmu.c           | 826 +++++++++++++++++++++++++++++++++++
 5 files changed, 988 insertions(+)
 create mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/Documentation/perf/arm_dsu_pmu.txt b/Documentation/perf/arm_dsu_pmu.txt
new file mode 100644
index 000000000000..d611e15f5add
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.txt
@@ -0,0 +1,28 @@
+ARM DynamIQ Shared Unit (DSU) PMU
+==================================
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under :
+
+  /sys/bus/event_sources/devices/arm_dsu_<N>/
+
+The user should refer to the TRM of the product to figure out the supported events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
+
+
+e.g usage :
+
+	perf stat -a -e arm_dsu_0/cycles/
diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
new file mode 100644
index 000000000000..5d1b0d9ff5bb
--- /dev/null
+++ b/arch/arm64/include/asm/arm_dsu_pmu.h
@@ -0,0 +1,124 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU Low level register access routines.
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2, as published by the Free Software Foundation.
+ */
+
+#include <asm/sysreg.h>
+
+
+#define CLUSTERPMCR_EL1			sys_reg(3, 0, 15, 5, 0)
+#define CLUSTERPMCNTENSET_EL1		sys_reg(3, 0, 15, 5, 1)
+#define CLUSTERPMCNTENCLR_EL1		sys_reg(3, 0, 15, 5, 2)
+#define CLUSTERPMOVSSET_EL1		sys_reg(3, 0, 15, 5, 3)
+#define CLUSTERPMOVSCLR_EL1		sys_reg(3, 0, 15, 5, 4)
+#define CLUSTERPMSELR_EL1		sys_reg(3, 0, 15, 5, 5)
+#define CLUSTERPMINTENSET_EL1		sys_reg(3, 0, 15, 5, 6)
+#define CLUSTERPMINTENCLR_EL1		sys_reg(3, 0, 15, 5, 7)
+#define CLUSTERPMCCNTR_EL1		sys_reg(3, 0, 15, 6, 0)
+#define CLUSTERPMXEVTYPER_EL1		sys_reg(3, 0, 15, 6, 1)
+#define CLUSTERPMXEVCNTR_EL1		sys_reg(3, 0, 15, 6, 2)
+#define CLUSTERPMMDCR_EL1		sys_reg(3, 0, 15, 6, 3)
+#define CLUSTERPMCEID0_EL1		sys_reg(3, 0, 15, 6, 4)
+#define CLUSTERPMCEID1_EL1		sys_reg(3, 0, 15, 6, 5)
+
+static inline u32 __dsu_pmu_read_pmcr(void)
+{
+	return read_sysreg_s(CLUSTERPMCR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmcr(u32 val)
+{
+	write_sysreg_s(val, CLUSTERPMCR_EL1);
+	isb();
+}
+
+static inline u32 __dsu_pmu_getreset_overflow(void)
+{
+	u32 val = read_sysreg_s(CLUSTERPMOVSCLR_EL1);
+	/* Clear the bit */
+	write_sysreg_s(val, CLUSTERPMOVSCLR_EL1);
+	isb();
+	return val;
+}
+
+static inline void __dsu_pmu_select_counter(int counter)
+{
+	write_sysreg_s(counter, CLUSTERPMSELR_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_counter(int counter)
+{
+	__dsu_pmu_select_counter(counter);
+	return read_sysreg_s(CLUSTERPMXEVCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_counter(int counter, u64 val)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(val, CLUSTERPMXEVCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_set_event(int counter, u32 event)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(event, CLUSTERPMXEVTYPER_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_pmccntr(void)
+{
+	return read_sysreg_s(CLUSTERPMCCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmccntr(u64 val)
+{
+	write_sysreg_s(val, CLUSTERPMCCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_disable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENCLR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_enable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_enable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_disable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
+	isb();
+}
+
+
+static inline u32 __dsu_pmu_read_pmceid(int n)
+{
+	switch (n) {
+	case 0:
+		return read_sysreg_s(CLUSTERPMCEID0_EL1);
+	case 1:
+		return read_sysreg_s(CLUSTERPMCEID1_EL1);
+	default:
+		BUILD_BUG();
+		return 0;
+	}
+}
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index e5197ffb7422..ee3d7d13977c 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -17,6 +17,15 @@ config ARM_PMU_ACPI
 	depends on ARM_PMU && ACPI
 	def_bool y
 
+config ARM_DSU_PMU
+	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
+	depends on ARM64 && PERF_EVENTS
+	  help
+	  Provides support for performance monitor unit in ARM DynamIQ Shared
+	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory
+	  system, control logic. The PMU allows counting various events related
+	  to DSU.
+
 config QCOM_L2_PMU
 	bool "Qualcomm Technologies L2-cache PMU"
 	depends on ARCH_QCOM && ARM64 && ACPI
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 6420bd4394d5..0adb4f6926a4 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
 obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
+obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
 obj-$(CONFIG_QCOM_L2_PMU)	+= qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
new file mode 100644
index 000000000000..6352e5f3fa0a
--- /dev/null
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -0,0 +1,826 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU driver
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Based on ARM CCI-PMU, ARMv8 PMU-v3 drivers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#define PMUNAME		"arm_dsu"
+#define DRVNAME		PMUNAME "_pmu"
+#define pr_fmt(fmt)	DRVNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+
+#include <asm/arm_dsu_pmu.h>
+
+/* PMU event codes */
+#define DSU_PMU_EVT_CYCLES		0x11
+#define DSU_PMU_EVT_CHAIN		0x1e
+
+#define DSU_PMU_MAX_COMMON_EVENTS	0x40
+
+#define DSU_PMU_MAX_HW_CNTRS		32
+#define DSU_PMU_HW_COUNTER_MASK		(DSU_PMU_MAX_HW_CNTRS - 1)
+
+#define CLUSTERPMCR_E			BIT(0)
+#define CLUSTERPMCR_P			BIT(1)
+#define CLUSTERPMCR_C			BIT(2)
+#define CLUSTERPMCR_N_SHIFT		11
+#define CLUSTERPMCR_N_MASK		0x1f
+#define CLUSTERPMCR_IDCODE_SHIFT	16
+#define CLUSTERPMCR_IDCODE_MASK		0xff
+#define CLUSTERPMCR_IMP_SHIFT		24
+#define CLUSTERPMCR_IMP_MASK		0xff
+#define CLUSTERPMCR_RES_MASK		0x7e8
+#define CLUSTERPMCR_RES_VAL		0x40
+
+#define DSU_ACTIVE_CPU_MASK		0x0
+#define DSU_ASSOCIATED_CPU_MASK		0x1
+
+/*
+ * We use the index of the counters as they appear in the counter
+ * bit maps in the PMU registers (e.g CLUSTERPMSELR).
+ * i.e,
+ *	counter 0	- Bit 0
+ *	counter 1	- Bit 1
+ *	...
+ *	Cycle counter	- Bit 31
+ */
+#define DSU_PMU_IDX_CYCLE_COUNTER	31
+
+/* All event counters are 32bit, with a 64bit Cycle counter */
+#define DSU_PMU_COUNTER_WIDTH(idx)	\
+	(((idx) == DSU_PMU_IDX_CYCLE_COUNTER) ? 64 : 32)
+
+#define DSU_PMU_COUNTER_MASK(idx)	\
+	GENMASK_ULL((DSU_PMU_COUNTER_WIDTH((idx)) - 1), 0)
+
+#define DSU_EXT_ATTR(_name, _func, _config)		\
+	(&((struct dev_ext_attribute[]) {				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define DSU_EVENT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_event_show, (unsigned long)_config)
+
+#define DSU_FORMAT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_format_show, (char *)_config)
+
+#define DSU_CPUMASK_ATTR(_name, _config)	\
+	DSU_EXT_ATTR(_name, dsu_pmu_cpumask_show, (unsigned long)_config)
+
+struct dsu_hw_events {
+	DECLARE_BITMAP(used_mask, DSU_PMU_MAX_HW_CNTRS);
+	struct perf_event	*events[DSU_PMU_MAX_HW_CNTRS];
+};
+
+/*
+ * struct dsu_pmu	- DSU PMU descriptor
+ *
+ * @pmu_lock		: Protects accesses to DSU PMU register from normal vs
+ *			  interrupt handler contexts.
+ * @hw_events		: Holds the event counter state.
+ * @associated_cpus	: CPUs attached to the DSU.
+ * @active_cpu		: CPU to which the PMU is bound for accesses.
+ * @cpuhp_node		: Node for CPU hotplug notifier link.
+ * @num_counters	: Number of event counters implemented by the PMU,
+ *			  excluding the cycle counter.
+ * @irq			: Interrupt line for counter overflow.
+ * @cpmceid_bitmap	: Bitmap for the availability of architected common
+ *			  events (event_code < 0x40).
+ */
+struct dsu_pmu {
+	struct pmu			pmu;
+	struct device			*dev;
+	raw_spinlock_t			pmu_lock;
+	struct dsu_hw_events		hw_events;
+	cpumask_t			associated_cpus;
+	cpumask_t			active_cpu;
+	struct hlist_node		cpuhp_node;
+	u8				num_counters;
+	int				irq;
+	DECLARE_BITMAP(cpmceid_bitmap, DSU_PMU_MAX_COMMON_EVENTS);
+};
+
+static unsigned long dsu_pmu_cpuhp_state;
+
+static inline struct dsu_pmu *to_dsu_pmu(struct pmu *pmu)
+{
+	return container_of(pmu, struct dsu_pmu, pmu);
+}
+
+static ssize_t dsu_pmu_sysfs_event_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "event=0x%lx\n",
+					 (unsigned long)eattr->var);
+}
+
+static ssize_t dsu_pmu_sysfs_format_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "%s\n", (char *)eattr->var);
+}
+
+static ssize_t dsu_pmu_cpumask_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case DSU_ACTIVE_CPU_MASK:
+		cpumask = &dsu_pmu->active_cpu;
+		break;
+	case DSU_ASSOCIATED_CPU_MASK:
+		cpumask = &dsu_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *dsu_pmu_format_attrs[] = {
+	DSU_FORMAT_ATTR(event, "config:0-31"),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = dsu_pmu_format_attrs,
+};
+
+static struct attribute *dsu_pmu_event_attrs[] = {
+	DSU_EVENT_ATTR(cycles, 0x11),
+	DSU_EVENT_ATTR(bus_acecss, 0x19),
+	DSU_EVENT_ATTR(memory_error, 0x1a),
+	DSU_EVENT_ATTR(bus_cycles, 0x1d),
+	DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+	DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+	DSU_EVENT_ATTR(l3d_cache, 0x2b),
+	DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),
+	NULL,
+};
+
+static umode_t
+dsu_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *attr,
+				int unused)
+{
+	struct pmu *pmu = dev_get_drvdata(kobj_to_dev(kobj));
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr.attr);
+	unsigned long evt = (unsigned long)eattr->var;
+
+	return test_bit(evt, dsu_pmu->cpmceid_bitmap) ? attr->mode : 0;
+}
+
+static const struct attribute_group dsu_pmu_events_attr_group = {
+	.name = "events",
+	.attrs = dsu_pmu_event_attrs,
+	.is_visible = dsu_pmu_event_attr_is_visible,
+};
+
+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+	DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+	DSU_CPUMASK_ATTR(associated_cpus, DSU_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_cpumask_attr_group = {
+	.attrs = dsu_pmu_cpumask_attrs,
+};
+
+static const struct attribute_group *dsu_pmu_attr_groups[] = {
+	&dsu_pmu_cpumask_attr_group,
+	&dsu_pmu_events_attr_group,
+	&dsu_pmu_format_attr_group,
+	NULL,
+};
+
+static int dsu_pmu_get_online_cpu(struct dsu_pmu *dsu_pmu)
+{
+	return cpumask_first_and(&dsu_pmu->associated_cpus, cpu_online_mask);
+}
+
+static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
+{
+	struct cpumask online_supported;
+
+	cpumask_and(&online_supported,
+			 &dsu_pmu->associated_cpus, cpu_online_mask);
+	return cpumask_any_but(&online_supported, cpu);
+}
+
+static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
+{
+	return (idx < dsu_pmu->num_counters) ||
+	       (idx == DSU_PMU_IDX_CYCLE_COUNTER);
+}
+
+static inline u64 dsu_pmu_read_counter(struct perf_event *event)
+{
+	u64 val;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+				 &dsu_pmu->active_cpu)))
+		return 0;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying reading invalid counter %d\n", idx);
+		return 0;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		val = __dsu_pmu_read_pmccntr();
+	else
+		val = __dsu_pmu_read_counter(idx);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+
+	return val;
+}
+
+static void dsu_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+			 &dsu_pmu->active_cpu)))
+		return;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"writing to invalid counter %d\n", idx);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		__dsu_pmu_write_pmccntr(val);
+	else
+		__dsu_pmu_write_counter(idx, val);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static int dsu_pmu_get_event_idx(struct dsu_hw_events *hw_events,
+				 struct perf_event *event)
+{
+	int idx;
+	unsigned long evtype = event->attr.config;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	unsigned long *used_mask = hw_events->used_mask;
+
+	if (evtype == DSU_PMU_EVT_CYCLES) {
+		if (test_and_set_bit(DSU_PMU_IDX_CYCLE_COUNTER, used_mask))
+			return -EAGAIN;
+		return DSU_PMU_IDX_CYCLE_COUNTER;
+	}
+
+	idx = find_next_zero_bit(used_mask, dsu_pmu->num_counters, 0);
+	if (idx >= dsu_pmu->num_counters)
+		return -EAGAIN;
+	set_bit(idx, hw_events->used_mask);
+	return idx;
+}
+
+static void dsu_pmu_enable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	__dsu_pmu_counter_interrupt_enable(idx);
+	__dsu_pmu_enable_counter(idx);
+}
+
+static void dsu_pmu_disable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	__dsu_pmu_disable_counter(idx);
+	__dsu_pmu_counter_interrupt_disable(idx);
+}
+
+static inline void dsu_pmu_set_event(struct dsu_pmu *dsu_pmu,
+					struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	unsigned long flags;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying to set invalid counter %d\n", idx);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	__dsu_pmu_set_event(idx, event->hw.config_base);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_event_update(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev_count, new_count;
+
+	do {
+		/* We may also be called from the irq handler */
+		prev_count = local64_read(&hwc->prev_count);
+		new_count = dsu_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev_count, new_count) !=
+			prev_count);
+	delta = (new_count - prev_count) & DSU_PMU_COUNTER_MASK(hwc->idx);
+	local64_add(delta, &event->count);
+}
+
+static void dsu_pmu_read(struct perf_event *event)
+{
+	dsu_pmu_event_update(event);
+}
+
+static inline u32 dsu_pmu_getreset_overflow(void)
+{
+	return __dsu_pmu_getreset_overflow();
+}
+
+/**
+ * dsu_pmu_set_event_period: Set the period for the counter.
+ *
+ * All DSU PMU event counters, except the cycle counter are 32bit
+ * counters. To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void dsu_pmu_set_event_period(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = DSU_PMU_COUNTER_MASK(idx) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	dsu_pmu_write_counter(event, val);
+}
+
+static irqreturn_t dsu_pmu_handle_irq(int irq_num, void *dev)
+{
+	int i;
+	bool handled = false;
+	struct dsu_pmu *dsu_pmu = dev;
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	unsigned long overflow, workset;
+
+	overflow = dsu_pmu_getreset_overflow();
+	bitmap_and(&workset, &overflow, hw_events->used_mask,
+		   DSU_PMU_MAX_HW_CNTRS);
+
+	if (!workset)
+		return IRQ_NONE;
+
+	for_each_set_bit(i, &workset, DSU_PMU_MAX_HW_CNTRS) {
+		struct perf_event *event = hw_events->events[i];
+
+		if (!event)
+			continue;
+		dsu_pmu_event_update(event);
+		dsu_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+	return IRQ_RETVAL(handled);
+}
+
+static void dsu_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(event->hw.state & PERF_HES_UPTODATE));
+	dsu_pmu_set_event_period(event);
+	if (event->hw.idx != DSU_PMU_IDX_CYCLE_COUNTER)
+		dsu_pmu_set_event(dsu_pmu, event);
+	event->hw.state = 0;
+	dsu_pmu_enable_counter(dsu_pmu, event->hw.idx);
+}
+
+static void dsu_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+	dsu_pmu_disable_counter(dsu_pmu, event->hw.idx);
+	dsu_pmu_event_update(event);
+	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int dsu_pmu_add(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &dsu_pmu->active_cpu)))
+		return -ENOENT;
+
+	idx = dsu_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hwc->idx = idx;
+	hw_events->events[idx] = event;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		dsu_pmu_start(event, PERF_EF_RELOAD);
+
+	perf_event_update_userpage(event);
+	return 0;
+}
+
+static void dsu_pmu_del(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	dsu_pmu_stop(event, PERF_EF_UPDATE);
+	hw_events->events[idx] = NULL;
+	clear_bit(idx, hw_events->used_mask);
+	perf_event_update_userpage(event);
+}
+
+static void dsu_pmu_enable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+
+	/* If no counters are added, skip enabling the PMU */
+	if (bitmap_empty(dsu_pmu->hw_events.used_mask, DSU_PMU_MAX_HW_CNTRS))
+		return;
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr |= CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_disable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr &= ~CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static bool dsu_pmu_validate_event(struct pmu *pmu,
+				  struct dsu_hw_events *hw_events,
+				  struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+	return dsu_pmu_get_event_idx(hw_events, event) >= 0;
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool dsu_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct dsu_hw_events fake_hw;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
+	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
+		return false;
+	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
+			return false;
+	}
+	return dsu_pmu_validate_event(event->pmu, &fake_hw, event);
+}
+
+static int dsu_pmu_event_init(struct perf_event *event)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* We don't support sampling */
+	if (is_sampling_event(event)) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* We cannot support task bound events */
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	if (has_branch_stack(event) ||
+	    event->attr.exclude_user ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv ||
+	    event->attr.exclude_idle ||
+	    event->attr.exclude_host ||
+	    event->attr.exclude_guest) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
+		return -EINVAL;
+	}
+
+	if (!dsu_pmu_validate_group(event))
+		return -EINVAL;
+	if (!cpumask_test_cpu(event->cpu, &dsu_pmu->associated_cpus)) {
+		dev_dbg(dsu_pmu->pmu.dev,
+			 "Requested cpu is not associated with the DSU\n");
+		return -EINVAL;
+	}
+	/*
+	 * Choose the current active CPU to read the events. We don't want
+	 * to migrate the event contexts, irq handling etc to the requested
+	 * CPU. As long as the requested CPU is within the same DSU, we
+	 * are fine.
+	 */
+	event->cpu = cpumask_first(&dsu_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	event->hw.config_base = event->attr.config;
+	return 0;
+}
+
+static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu;
+
+	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
+	if (!dsu_pmu)
+		return ERR_PTR(-ENOMEM);
+
+	raw_spin_lock_init(&dsu_pmu->pmu_lock);
+	return dsu_pmu;
+}
+
+/**
+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
+ */
+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
+{
+	int i = 0, n, cpu;
+	struct device_node *cpu_node;
+
+	n = of_count_phandle_with_args(dev, "cpus", NULL);
+	if (n <= 0)
+		return -ENODEV;
+	for (; i < n; i++) {
+		cpu_node = of_parse_phandle(dev, "cpus", i);
+		if (!cpu_node)
+			break;
+		cpu = of_cpu_node_to_id(cpu_node);
+		of_node_put(cpu_node);
+		/*
+		 * We have to ignore the failures here and continue scanning
+		 * the list to handle cases where the nr_cpus could be capped
+		 * in the running kernel.
+		 */
+		if (cpu < 0)
+			continue;
+		cpumask_set_cpu(cpu, mask);
+	}
+	return 0;
+}
+
+/*
+ * dsu_pmu_probe_pmu: Probe the PMU details on a CPU in the cluster.
+ */
+static void dsu_pmu_probe_pmu(void *data)
+{
+	struct dsu_pmu *dsu_pmu = data;
+	u64 num_counters;
+	u32 cpmceid[2];
+
+	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
+						CLUSTERPMCR_N_MASK;
+	/* We can only support upto 31 independent counters */
+	if (WARN_ON(num_counters > 31))
+		num_counters = 31;
+	dsu_pmu->num_counters = num_counters;
+	if (!dsu_pmu->num_counters)
+		return;
+	cpmceid[0] = __dsu_pmu_read_pmceid(0);
+	cpmceid[1] = __dsu_pmu_read_pmceid(1);
+	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
+				DSU_PMU_MAX_COMMON_EVENTS,
+				cpmceid,
+				ARRAY_SIZE(cpmceid));
+}
+
+static int dsu_pmu_device_probe(struct platform_device *pdev)
+{
+	int irq, rc, cpu;
+	struct dsu_pmu *dsu_pmu;
+	char *name;
+	static atomic_t pmu_idx = ATOMIC_INIT(-1);
+
+	dsu_pmu = dsu_pmu_alloc(pdev);
+	if (IS_ERR(dsu_pmu))
+		return PTR_ERR(dsu_pmu);
+
+	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
+		return rc;
+	}
+
+	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
+					dsu_pmu_probe_pmu,
+					dsu_pmu, 1);
+	if (rc)
+		return rc;
+	if (!dsu_pmu->num_counters)
+		return -ENODEV;
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_warn(&pdev->dev, "Failed to find IRQ\n");
+		return -EINVAL;
+	}
+
+	name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%s_%d",
+				PMUNAME, atomic_inc_return(&pmu_idx));
+	rc = devm_request_irq(&pdev->dev, irq, dsu_pmu_handle_irq,
+			      IRQF_NOBALANCING, name, dsu_pmu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to request IRQ %d\n", irq);
+		return rc;
+	}
+
+	/*
+	 * Find one CPU in the DSU to handle the IRQs.
+	 * It is highly unlikely that we would fail
+	 * to find one, given the probing has succeeded.
+	 */
+	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+	if (cpu >= nr_cpu_ids)
+		return -ENODEV;
+	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
+	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
+					 irq);
+		return rc;
+	}
+
+	platform_set_drvdata(pdev, dsu_pmu);
+	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+						&dsu_pmu->cpuhp_node);
+	if (rc)
+		goto irq_cleanup;
+
+	dsu_pmu->irq = irq;
+	dsu_pmu->pmu = (struct pmu) {
+		.task_ctx_nr	= perf_invalid_context,
+
+		.pmu_enable	= dsu_pmu_enable,
+		.pmu_disable	= dsu_pmu_disable,
+		.event_init	= dsu_pmu_event_init,
+		.add		= dsu_pmu_add,
+		.del		= dsu_pmu_del,
+		.start		= dsu_pmu_start,
+		.stop		= dsu_pmu_stop,
+		.read		= dsu_pmu_read,
+
+		.attr_groups	= dsu_pmu_attr_groups,
+	};
+
+	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
+	if (rc)
+		goto cpuhp_cleanup;
+
+	dev_info(&pdev->dev, "Registered %s with %d event counters",
+				name, dsu_pmu->num_counters);
+	return 0;
+
+cpuhp_cleanup:
+	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
+irq_cleanup:
+	irq_set_affinity_hint(dsu_pmu->irq, NULL);
+	return rc;
+}
+
+static int dsu_pmu_device_remove(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&dsu_pmu->pmu);
+	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
+	irq_set_affinity_hint(dsu_pmu->irq, NULL);
+
+	return 0;
+}
+
+static const struct of_device_id dsu_pmu_of_match[] = {
+	{ .compatible = "arm,dsu-pmu", },
+	{},
+};
+
+static struct platform_driver dsu_pmu_driver = {
+	.driver = {
+		.name	= DRVNAME,
+		.of_match_table = of_match_ptr(dsu_pmu_of_match),
+	},
+	.probe = dsu_pmu_device_probe,
+	.remove = dsu_pmu_device_remove,
+};
+
+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
+						   cpuhp_node);
+
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+		return 0;
+
+	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
+	if (dst < nr_cpu_ids) {
+		cpumask_set_cpu(dst, &dsu_pmu->active_cpu);
+		perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+		irq_set_affinity_hint(dsu_pmu->irq, &dsu_pmu->active_cpu);
+	}
+
+	return 0;
+}
+
+static int __init dsu_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+					DRVNAME,
+					NULL,
+					dsu_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	dsu_pmu_cpuhp_state = ret;
+	return platform_driver_register(&dsu_pmu_driver);
+}
+
+static void __exit dsu_pmu_exit(void)
+{
+	platform_driver_unregister(&dsu_pmu_driver);
+	cpuhp_remove_multi_state(dsu_pmu_cpuhp_state);
+}
+
+module_init(dsu_pmu_init);
+module_exit(dsu_pmu_exit);
+
+MODULE_DEVICE_TABLE(of, dsu_pmu_of_match);
+MODULE_DESCRIPTION("Perf driver for ARM DynamIQ Shared Unit");
+MODULE_AUTHOR("Suzuki K Poulose <suzuki.poulose@arm.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 15:24     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-17 15:24 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Catalin Marinas

On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> Make use of the new generic helper to convert an of_node of a CPU
> to the logical CPU id in parsing the topology.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

This looks sane to me, but it will need an ack from Will or Catalin.

FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8d48b233e6ce..21868530018e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>  	if (!cpu_node)
>  		return -1;
>  
> -	for_each_possible_cpu(cpu) {
> -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> -			topology_parse_cpu_capacity(cpu_node, cpu);
> -			of_node_put(cpu_node);
> -			return cpu;
> -		}
> -	}
> -
> -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> +	cpu = of_cpu_node_to_id(cpu_node);
> +	if (cpu >= 0)
> +		topology_parse_cpu_capacity(cpu_node, cpu);
> +	else
> +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>  
>  	of_node_put(cpu_node);
> -	return -1;
> +	return cpu;
>  }
>  
>  static int __init parse_core(struct device_node *core, int cluster_id,
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 15:24     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-17 15:24 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, robh-DgEjT+Ai2ygdnm+yROfE0A,
	will.deacon-5wv7dgnIgG8, sudeep.holla-5wv7dgnIgG8,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A,
	leo.yan-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas

On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> Make use of the new generic helper to convert an of_node of a CPU
> to the logical CPU id in parsing the topology.
> 
> Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> Cc: Leo Yan <leo.yan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> Cc: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose-5wv7dgnIgG8@public.gmane.org>

This looks sane to me, but it will need an ack from Will or Catalin.

FWIW:

Acked-by: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8d48b233e6ce..21868530018e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>  	if (!cpu_node)
>  		return -1;
>  
> -	for_each_possible_cpu(cpu) {
> -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> -			topology_parse_cpu_capacity(cpu_node, cpu);
> -			of_node_put(cpu_node);
> -			return cpu;
> -		}
> -	}
> -
> -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> +	cpu = of_cpu_node_to_id(cpu_node);
> +	if (cpu >= 0)
> +		topology_parse_cpu_capacity(cpu_node, cpu);
> +	else
> +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>  
>  	of_node_put(cpu_node);
> -	return -1;
> +	return cpu;
>  }
>  
>  static int __init parse_core(struct device_node *core, int cluster_id,
> -- 
> 2.13.6
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 15:24     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-17 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> Make use of the new generic helper to convert an of_node of a CPU
> to the logical CPU id in parsing the topology.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

This looks sane to me, but it will need an ack from Will or Catalin.

FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8d48b233e6ce..21868530018e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>  	if (!cpu_node)
>  		return -1;
>  
> -	for_each_possible_cpu(cpu) {
> -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> -			topology_parse_cpu_capacity(cpu_node, cpu);
> -			of_node_put(cpu_node);
> -			return cpu;
> -		}
> -	}
> -
> -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> +	cpu = of_cpu_node_to_id(cpu_node);
> +	if (cpu >= 0)
> +		topology_parse_cpu_capacity(cpu_node, cpu);
> +	else
> +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>  
>  	of_node_put(cpu_node);
> -	return -1;
> +	return cpu;
>  }
>  
>  static int __init parse_core(struct device_node *core, int cluster_id,
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 6/8] arm_pmu: Use of_cpu_node_to_id helper
  2017-10-10 10:33   ` Suzuki K Poulose
@ 2017-10-17 15:26     ` Mark Rutland
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-17 15:26 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan

On Tue, Oct 10, 2017 at 11:33:01AM +0100, Suzuki K Poulose wrote:
> Use the new generic helper, of_cpu_node_to_id(), to map a
> a phandle to the logical CPU number while parsing the
> PMU irq affinity.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  drivers/perf/arm_pmu_platform.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
> index 4eafa7a42e52..a96884e37eaf 100644
> --- a/drivers/perf/arm_pmu_platform.c
> +++ b/drivers/perf/arm_pmu_platform.c
> @@ -81,19 +81,10 @@ static int pmu_parse_irq_affinity(struct device_node *node, int i)
>  		return -EINVAL;
>  	}
>  
> -	/* Now look up the logical CPU number */
> -	for_each_possible_cpu(cpu) {
> -		struct device_node *cpu_dn;
> -
> -		cpu_dn = of_cpu_device_node_get(cpu);
> -		of_node_put(cpu_dn);
> -
> -		if (dn == cpu_dn)
> -			break;
> -	}
> -
> -	if (cpu >= nr_cpu_ids) {
> +	cpu = of_cpu_node_to_id(dn);
> +	if (cpu < 0) {
>  		pr_warn("failed to find logical CPU for %s\n", dn->name);
> +		cpu = nr_cpu_ids;
>  	}
>  
>  	of_node_put(dn);
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 6/8] arm_pmu: Use of_cpu_node_to_id helper
@ 2017-10-17 15:26     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-17 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 11:33:01AM +0100, Suzuki K Poulose wrote:
> Use the new generic helper, of_cpu_node_to_id(), to map a
> a phandle to the logical CPU number while parsing the
> PMU irq affinity.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  drivers/perf/arm_pmu_platform.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
> index 4eafa7a42e52..a96884e37eaf 100644
> --- a/drivers/perf/arm_pmu_platform.c
> +++ b/drivers/perf/arm_pmu_platform.c
> @@ -81,19 +81,10 @@ static int pmu_parse_irq_affinity(struct device_node *node, int i)
>  		return -EINVAL;
>  	}
>  
> -	/* Now look up the logical CPU number */
> -	for_each_possible_cpu(cpu) {
> -		struct device_node *cpu_dn;
> -
> -		cpu_dn = of_cpu_device_node_get(cpu);
> -		of_node_put(cpu_dn);
> -
> -		if (dn == cpu_dn)
> -			break;
> -	}
> -
> -	if (cpu >= nr_cpu_ids) {
> +	cpu = of_cpu_node_to_id(dn);
> +	if (cpu < 0) {
>  		pr_warn("failed to find logical CPU for %s\n", dn->name);
> +		cpu = nr_cpu_ids;
>  	}
>  
>  	of_node_put(dn);
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:11       ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2017-10-17 16:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Suzuki K Poulose, linux-arm-kernel, linux-kernel, robh,
	sudeep.holla, frowand.list, devicetree, Jonathan.Cameron,
	marc.zyngier, peterz, mathieu.poirier, leo.yan, Catalin Marinas

On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> > Make use of the new generic helper to convert an of_node of a CPU
> > to the logical CPU id in parsing the topology.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Leo Yan <leo.yan@linaro.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> This looks sane to me, but it will need an ack from Will or Catalin.
> 
> FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks,
> Mark.
> 
> > ---
> >  arch/arm64/kernel/topology.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 8d48b233e6ce..21868530018e 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
> >  	if (!cpu_node)
> >  		return -1;
> >  
> > -	for_each_possible_cpu(cpu) {
> > -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> > -			topology_parse_cpu_capacity(cpu_node, cpu);
> > -			of_node_put(cpu_node);
> > -			return cpu;
> > -		}
> > -	}
> > -
> > -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> > +	cpu = of_cpu_node_to_id(cpu_node);
> > +	if (cpu >= 0)
> > +		topology_parse_cpu_capacity(cpu_node, cpu);
> > +	else
> > +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> >  
> >  	of_node_put(cpu_node);

This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
balanced with its use of the node refcount, is this one intended to pair
with the earlier call to of_parse_phandle? If so, does that mainline is
currently broken here because it doesn't drop the refcount twice for the
matching node? Or do we need to return with that held?

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:11       ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2017-10-17 16:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Suzuki K Poulose,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, robh-DgEjT+Ai2ygdnm+yROfE0A,
	sudeep.holla-5wv7dgnIgG8, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A,
	leo.yan-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas

On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> > Make use of the new generic helper to convert an of_node of a CPU
> > to the logical CPU id in parsing the topology.
> > 
> > Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> > Cc: Leo Yan <leo.yan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> > Cc: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose-5wv7dgnIgG8@public.gmane.org>
> 
> This looks sane to me, but it will need an ack from Will or Catalin.
> 
> FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
> 
> Thanks,
> Mark.
> 
> > ---
> >  arch/arm64/kernel/topology.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 8d48b233e6ce..21868530018e 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
> >  	if (!cpu_node)
> >  		return -1;
> >  
> > -	for_each_possible_cpu(cpu) {
> > -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> > -			topology_parse_cpu_capacity(cpu_node, cpu);
> > -			of_node_put(cpu_node);
> > -			return cpu;
> > -		}
> > -	}
> > -
> > -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> > +	cpu = of_cpu_node_to_id(cpu_node);
> > +	if (cpu >= 0)
> > +		topology_parse_cpu_capacity(cpu_node, cpu);
> > +	else
> > +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> >  
> >  	of_node_put(cpu_node);

This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
balanced with its use of the node refcount, is this one intended to pair
with the earlier call to of_parse_phandle? If so, does that mainline is
currently broken here because it doesn't drop the refcount twice for the
matching node? Or do we need to return with that held?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:11       ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2017-10-17 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
> > Make use of the new generic helper to convert an of_node of a CPU
> > to the logical CPU id in parsing the topology.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Leo Yan <leo.yan@linaro.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> This looks sane to me, but it will need an ack from Will or Catalin.
> 
> FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks,
> Mark.
> 
> > ---
> >  arch/arm64/kernel/topology.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 8d48b233e6ce..21868530018e 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
> >  	if (!cpu_node)
> >  		return -1;
> >  
> > -	for_each_possible_cpu(cpu) {
> > -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
> > -			topology_parse_cpu_capacity(cpu_node, cpu);
> > -			of_node_put(cpu_node);
> > -			return cpu;
> > -		}
> > -	}
> > -
> > -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> > +	cpu = of_cpu_node_to_id(cpu_node);
> > +	if (cpu >= 0)
> > +		topology_parse_cpu_capacity(cpu_node, cpu);
> > +	else
> > +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
> >  
> >  	of_node_put(cpu_node);

This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
balanced with its use of the node refcount, is this one intended to pair
with the earlier call to of_parse_phandle? If so, does that mainline is
currently broken here because it doesn't drop the refcount twice for the
matching node? Or do we need to return with that held?

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
  2017-10-17 16:11       ` Will Deacon
@ 2017-10-17 16:20         ` Suzuki K Poulose
  -1 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-17 16:20 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Catalin Marinas

On 17/10/17 17:11, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
>> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
>>> Make use of the new generic helper to convert an of_node of a CPU
>>> to the logical CPU id in parsing the topology.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> This looks sane to me, but it will need an ack from Will or Catalin.
>>
>> FWIW:
>>
>> Acked-by: Mark Rutland <mark.rutland@arm.com>
>>
>> Thanks,
>> Mark.
>>
>>> ---
>>>   arch/arm64/kernel/topology.c | 16 ++++++----------
>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 8d48b233e6ce..21868530018e 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>>>   	if (!cpu_node)
>>>   		return -1;
>>>   
>>> -	for_each_possible_cpu(cpu) {
>>> -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
>>> -			topology_parse_cpu_capacity(cpu_node, cpu);
>>> -			of_node_put(cpu_node);
>>> -			return cpu;
>>> -		}
>>> -	}
>>> -
>>> -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>> +	cpu = of_cpu_node_to_id(cpu_node);
>>> +	if (cpu >= 0)
>>> +		topology_parse_cpu_capacity(cpu_node, cpu);
>>> +	else
>>> +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>   
>>>   	of_node_put(cpu_node);
> 
> This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
> balanced with its use of the node refcount, is this one intended to pair
> with the earlier call to of_parse_phandle?

Yes.

  If so, does that mainline is
> currently broken here because it doesn't drop the refcount twice for the
> matching node?

No. This of_node_put is for the failure case where we couldn't match a CPU.
In the success case, it is dropped just before we return the result within
the loop.

Cheers
Suzuki


  Or do we need to return with that held?
> 
> Will
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:20         ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-17 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/10/17 17:11, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
>> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
>>> Make use of the new generic helper to convert an of_node of a CPU
>>> to the logical CPU id in parsing the topology.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> This looks sane to me, but it will need an ack from Will or Catalin.
>>
>> FWIW:
>>
>> Acked-by: Mark Rutland <mark.rutland@arm.com>
>>
>> Thanks,
>> Mark.
>>
>>> ---
>>>   arch/arm64/kernel/topology.c | 16 ++++++----------
>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 8d48b233e6ce..21868530018e 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>>>   	if (!cpu_node)
>>>   		return -1;
>>>   
>>> -	for_each_possible_cpu(cpu) {
>>> -		if (of_get_cpu_node(cpu, NULL) == cpu_node) {
>>> -			topology_parse_cpu_capacity(cpu_node, cpu);
>>> -			of_node_put(cpu_node);
>>> -			return cpu;
>>> -		}
>>> -	}
>>> -
>>> -	pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>> +	cpu = of_cpu_node_to_id(cpu_node);
>>> +	if (cpu >= 0)
>>> +		topology_parse_cpu_capacity(cpu_node, cpu);
>>> +	else
>>> +		pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>   
>>>   	of_node_put(cpu_node);
> 
> This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
> balanced with its use of the node refcount, is this one intended to pair
> with the earlier call to of_parse_phandle?

Yes.

  If so, does that mainline is
> currently broken here because it doesn't drop the refcount twice for the
> matching node?

No. This of_node_put is for the failure case where we couldn't match a CPU.
In the success case, it is dropped just before we return the result within
the loop.

Cheers
Suzuki


  Or do we need to return with that held?
> 
> Will
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:42           ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-17 16:42 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan, Catalin Marinas

On 17/10/17 17:20, Suzuki K Poulose wrote:
> On 17/10/17 17:11, Will Deacon wrote:
>> On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
>>> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
>>>> Make use of the new generic helper to convert an of_node of a CPU
>>>> to the logical CPU id in parsing the topology.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Leo Yan <leo.yan@linaro.org>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>> This looks sane to me, but it will need an ack from Will or Catalin.
>>>
>>> FWIW:
>>>
>>> Acked-by: Mark Rutland <mark.rutland@arm.com>
>>>
>>> Thanks,
>>> Mark.
>>>
>>>> ---
>>>>   arch/arm64/kernel/topology.c | 16 ++++++----------
>>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 8d48b233e6ce..21868530018e 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>>>>       if (!cpu_node)
>>>>           return -1;
>>>> -    for_each_possible_cpu(cpu) {
>>>> -        if (of_get_cpu_node(cpu, NULL) == cpu_node) {
>>>> -            topology_parse_cpu_capacity(cpu_node, cpu);
>>>> -            of_node_put(cpu_node);
>>>> -            return cpu;
>>>> -        }
>>>> -    }
>>>> -
>>>> -    pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>> +    cpu = of_cpu_node_to_id(cpu_node);
>>>> +    if (cpu >= 0)
>>>> +        topology_parse_cpu_capacity(cpu_node, cpu);
>>>> +    else
>>>> +        pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>>       of_node_put(cpu_node);
>>
>> This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
>> balanced with its use of the node refcount, is this one intended to pair
>> with the earlier call to of_parse_phandle?
> 
> Yes.
> 
>   If so, does that mainline is
>> currently broken here because it doesn't drop the refcount twice for the
>> matching node?
> 
> No. This of_node_put is for the failure case where we couldn't match a CPU.
> In the success case, it is dropped just before we return the result within
> the loop.

As we discussed offline, there is indeed a missing of_node_put() for all the
nodes we loop through to match. But with this change, we have fixed that.
And I don't think it is worth sending to stable.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:42           ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-17 16:42 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, robh-DgEjT+Ai2ygdnm+yROfE0A,
	sudeep.holla-5wv7dgnIgG8, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A,
	leo.yan-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas

On 17/10/17 17:20, Suzuki K Poulose wrote:
> On 17/10/17 17:11, Will Deacon wrote:
>> On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
>>> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
>>>> Make use of the new generic helper to convert an of_node of a CPU
>>>> to the logical CPU id in parsing the topology.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
>>>> Cc: Leo Yan <leo.yan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>>> Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
>>>> Cc: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose-5wv7dgnIgG8@public.gmane.org>
>>>
>>> This looks sane to me, but it will need an ack from Will or Catalin.
>>>
>>> FWIW:
>>>
>>> Acked-by: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
>>>
>>> Thanks,
>>> Mark.
>>>
>>>> ---
>>>>   arch/arm64/kernel/topology.c | 16 ++++++----------
>>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 8d48b233e6ce..21868530018e 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>>>>       if (!cpu_node)
>>>>           return -1;
>>>> -    for_each_possible_cpu(cpu) {
>>>> -        if (of_get_cpu_node(cpu, NULL) == cpu_node) {
>>>> -            topology_parse_cpu_capacity(cpu_node, cpu);
>>>> -            of_node_put(cpu_node);
>>>> -            return cpu;
>>>> -        }
>>>> -    }
>>>> -
>>>> -    pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>> +    cpu = of_cpu_node_to_id(cpu_node);
>>>> +    if (cpu >= 0)
>>>> +        topology_parse_cpu_capacity(cpu_node, cpu);
>>>> +    else
>>>> +        pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>>       of_node_put(cpu_node);
>>
>> This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
>> balanced with its use of the node refcount, is this one intended to pair
>> with the earlier call to of_parse_phandle?
> 
> Yes.
> 
>   If so, does that mainline is
>> currently broken here because it doesn't drop the refcount twice for the
>> matching node?
> 
> No. This of_node_put is for the failure case where we couldn't match a CPU.
> In the success case, it is dropped just before we return the result within
> the loop.

As we discussed offline, there is indeed a missing of_node_put() for all the
nodes we loop through to match. But with this change, we have fixed that.
And I don't think it is worth sending to stable.

Cheers
Suzuki
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing
@ 2017-10-17 16:42           ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-17 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/10/17 17:20, Suzuki K Poulose wrote:
> On 17/10/17 17:11, Will Deacon wrote:
>> On Tue, Oct 17, 2017 at 04:24:23PM +0100, Mark Rutland wrote:
>>> On Tue, Oct 10, 2017 at 11:33:00AM +0100, Suzuki K Poulose wrote:
>>>> Make use of the new generic helper to convert an of_node of a CPU
>>>> to the logical CPU id in parsing the topology.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Leo Yan <leo.yan@linaro.org>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>> This looks sane to me, but it will need an ack from Will or Catalin.
>>>
>>> FWIW:
>>>
>>> Acked-by: Mark Rutland <mark.rutland@arm.com>
>>>
>>> Thanks,
>>> Mark.
>>>
>>>> ---
>>>> ? arch/arm64/kernel/topology.c | 16 ++++++----------
>>>> ? 1 file changed, 6 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 8d48b233e6ce..21868530018e 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -37,18 +37,14 @@ static int __init get_cpu_for_node(struct device_node *node)
>>>> ????? if (!cpu_node)
>>>> ????????? return -1;
>>>> -??? for_each_possible_cpu(cpu) {
>>>> -??????? if (of_get_cpu_node(cpu, NULL) == cpu_node) {
>>>> -??????????? topology_parse_cpu_capacity(cpu_node, cpu);
>>>> -??????????? of_node_put(cpu_node);
>>>> -??????????? return cpu;
>>>> -??????? }
>>>> -??? }
>>>> -
>>>> -??? pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>> +??? cpu = of_cpu_node_to_id(cpu_node);
>>>> +??? if (cpu >= 0)
>>>> +??????? topology_parse_cpu_capacity(cpu_node, cpu);
>>>> +??? else
>>>> +??????? pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
>>>> ????? of_node_put(cpu_node);
>>
>> This of_node_put is confusing me. Since of_cpu_node_to_id appears to be
>> balanced with its use of the node refcount, is this one intended to pair
>> with the earlier call to of_parse_phandle?
> 
> Yes.
> 
>  ?If so, does that mainline is
>> currently broken here because it doesn't drop the refcount twice for the
>> matching node?
> 
> No. This of_node_put is for the failure case where we couldn't match a CPU.
> In the success case, it is dropped just before we return the result within
> the loop.

As we discussed offline, there is indeed a missing of_node_put() for all the
nodes we loop through to match. But with this change, we have fixed that.
And I don't think it is worth sending to stable.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
  2017-10-10 10:33   ` Suzuki K Poulose
@ 2017-10-18  9:20     ` Mark Rutland
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-18  9:20 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan

Hi Suzuki,

This generally looks good. My comments below are mostly minor, modulo
the probing/hotplug bit at the end.

On Tue, Oct 10, 2017 at 11:33:03AM +0100, Suzuki K Poulose wrote:
> diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
> new file mode 100644
> index 000000000000..5d1b0d9ff5bb
> --- /dev/null
> +++ b/arch/arm64/include/asm/arm_dsu_pmu.h
> @@ -0,0 +1,124 @@
> +/*
> + * ARM DynamIQ Shared Unit (DSU) PMU Low level register access routines.
> + *
> + * Copyright (C) ARM Limited, 2017.
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2, as published by the Free Software Foundation.
> + */
> +
> +#include <asm/sysreg.h>
> +

I believe you also need the following headers:

#include <linux/bitops.h>
#include <linux/bug.h>
#include <linux/compiler.h>
#include <linux/types.h>

#include <asm/barrier.h>

[...]

> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index e5197ffb7422..ee3d7d13977c 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -17,6 +17,15 @@ config ARM_PMU_ACPI
>  	depends on ARM_PMU && ACPI
>  	def_bool y
>  
> +config ARM_DSU_PMU
> +	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> +	depends on ARM64 && PERF_EVENTS

The PERF_EVENTS dependency is handled at the top of the
drivers/perf/Kconfig file since commit:

  bddb9b68d3fb0dfb ("drivers/perf: commonise PERF_EVENTS dependency")

... so it can be dropped here.

> +	  help
> +	  Provides support for performance monitor unit in ARM DynamIQ Shared
> +	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory

Nit: double spacing in "cores  with".

[...]

> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 6420bd4394d5..0adb4f6926a4 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
>  obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> +obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o

Nit: this should go first in the list, to keep alphabetical order.

[...]

> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> new file mode 100644
> index 000000000000..6352e5f3fa0a
> --- /dev/null
> +++ b/drivers/perf/arm_dsu_pmu.c
> @@ -0,0 +1,826 @@

> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of_device.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/spinlock.h>
> +
> +#include <asm/arm_dsu_pmu.h>

I believe you also need the following headers:

#include <linux/bitmap.h>
#include <linux/bitops.h>
#include <linux/bug.h>
#include <linux/cpumask.h>
#include <linux/smp.h>
#include <linux/sysfs.h>
#include <linux/types.h>

#include <asm/local64.h>

[...]

> +static struct attribute *dsu_pmu_event_attrs[] = {
> +	DSU_EVENT_ATTR(cycles, 0x11),
> +	DSU_EVENT_ATTR(bus_acecss, 0x19),

Typo for 'bus_access'?

> +	DSU_EVENT_ATTR(memory_error, 0x1a),
> +	DSU_EVENT_ATTR(bus_cycles, 0x1d),
> +	DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
> +	DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
> +	DSU_EVENT_ATTR(l3d_cache, 0x2b),
> +	DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),
> +	NULL,
> +};

[...]

> +static int dsu_pmu_get_event_idx(struct dsu_hw_events *hw_events,
> +				 struct perf_event *event)
> +{

> +	idx = find_next_zero_bit(used_mask, dsu_pmu->num_counters, 0);

Perhaps:

	idx = find_first_zero_bit(used_mask, dsu_pmu->num_counters);

[...]

> +static irqreturn_t dsu_pmu_handle_irq(int irq_num, void *dev)
> +{
> +	int i;
> +	bool handled = false;
> +	struct dsu_pmu *dsu_pmu = dev;
> +	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
> +	unsigned long overflow, workset;
> +
> +	overflow = dsu_pmu_getreset_overflow();
> +	bitmap_and(&workset, &overflow, hw_events->used_mask,
> +		   DSU_PMU_MAX_HW_CNTRS);

Why do we need this bitmap_and()? Surely the bits for unused counters
shouldn't be set?

It would probably be better to reset those when we assocaite the first
CPU with a DSU.

[...]

> +static int dsu_pmu_event_init(struct perf_event *event)
> +{

> +	if (!dsu_pmu_validate_group(event))
> +		return -EINVAL;
> +	if (!cpumask_test_cpu(event->cpu, &dsu_pmu->associated_cpus)) {
> +		dev_dbg(dsu_pmu->pmu.dev,
> +			 "Requested cpu is not associated with the DSU\n");
> +		return -EINVAL;
> +	}

It might make sense to flip these two checks, so as to do the simpler
one first.

[...]

> +static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
> +{
> +	int i = 0, n, cpu;
> +	struct device_node *cpu_node;
> +
> +	n = of_count_phandle_with_args(dev, "cpus", NULL);
> +	if (n <= 0)
> +		return -ENODEV;
> +	for (; i < n; i++) {

Can we put the i = 0 here? That would make this easier to read.

[...]

> +static void dsu_pmu_probe_pmu(void *data)
> +{
> +	struct dsu_pmu *dsu_pmu = data;
> +	u64 num_counters;
> +	u32 cpmceid[2];
> +
> +	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
> +						CLUSTERPMCR_N_MASK;
> +	/* We can only support upto 31 independent counters */

s/upto/up to/

Does the hardware spec allow for more than this?

> +	if (WARN_ON(num_counters > 31))
> +		num_counters = 31;
> +	dsu_pmu->num_counters = num_counters;
> +	if (!dsu_pmu->num_counters)
> +		return;
> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
> +				DSU_PMU_MAX_COMMON_EVENTS,
> +				cpmceid,
> +				ARRAY_SIZE(cpmceid));
> +}

[...]

> +
> +static int dsu_pmu_device_probe(struct platform_device *pdev)
> +{
> +	int irq, rc, cpu;
> +	struct dsu_pmu *dsu_pmu;
> +	char *name;
> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
> +
> +	dsu_pmu = dsu_pmu_alloc(pdev);
> +	if (IS_ERR(dsu_pmu))
> +		return PTR_ERR(dsu_pmu);
> +
> +	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
> +	if (rc) {
> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
> +		return rc;
> +	}
> +
> +	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
> +					dsu_pmu_probe_pmu,
> +					dsu_pmu, 1);

Can we probe the relevant CPUs in the same was as the qcom l2 pmu, using
notifiers?

That way we'd work better with maxcpus=.

We have to do this cross-call in the arm_pmu driver because we need the
name at probe time, but that doesn't apply here. We should be able to
lazily initialize the set of events and number of counters.

That also keeps the IRQ affintiy management in one place.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
@ 2017-10-18  9:20     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-18  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Suzuki,

This generally looks good. My comments below are mostly minor, modulo
the probing/hotplug bit at the end.

On Tue, Oct 10, 2017 at 11:33:03AM +0100, Suzuki K Poulose wrote:
> diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
> new file mode 100644
> index 000000000000..5d1b0d9ff5bb
> --- /dev/null
> +++ b/arch/arm64/include/asm/arm_dsu_pmu.h
> @@ -0,0 +1,124 @@
> +/*
> + * ARM DynamIQ Shared Unit (DSU) PMU Low level register access routines.
> + *
> + * Copyright (C) ARM Limited, 2017.
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2, as published by the Free Software Foundation.
> + */
> +
> +#include <asm/sysreg.h>
> +

I believe you also need the following headers:

#include <linux/bitops.h>
#include <linux/bug.h>
#include <linux/compiler.h>
#include <linux/types.h>

#include <asm/barrier.h>

[...]

> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index e5197ffb7422..ee3d7d13977c 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -17,6 +17,15 @@ config ARM_PMU_ACPI
>  	depends on ARM_PMU && ACPI
>  	def_bool y
>  
> +config ARM_DSU_PMU
> +	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> +	depends on ARM64 && PERF_EVENTS

The PERF_EVENTS dependency is handled at the top of the
drivers/perf/Kconfig file since commit:

  bddb9b68d3fb0dfb ("drivers/perf: commonise PERF_EVENTS dependency")

... so it can be dropped here.

> +	  help
> +	  Provides support for performance monitor unit in ARM DynamIQ Shared
> +	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory

Nit: double spacing in "cores  with".

[...]

> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 6420bd4394d5..0adb4f6926a4 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
>  obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> +obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o

Nit: this should go first in the list, to keep alphabetical order.

[...]

> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> new file mode 100644
> index 000000000000..6352e5f3fa0a
> --- /dev/null
> +++ b/drivers/perf/arm_dsu_pmu.c
> @@ -0,0 +1,826 @@

> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of_device.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/spinlock.h>
> +
> +#include <asm/arm_dsu_pmu.h>

I believe you also need the following headers:

#include <linux/bitmap.h>
#include <linux/bitops.h>
#include <linux/bug.h>
#include <linux/cpumask.h>
#include <linux/smp.h>
#include <linux/sysfs.h>
#include <linux/types.h>

#include <asm/local64.h>

[...]

> +static struct attribute *dsu_pmu_event_attrs[] = {
> +	DSU_EVENT_ATTR(cycles, 0x11),
> +	DSU_EVENT_ATTR(bus_acecss, 0x19),

Typo for 'bus_access'?

> +	DSU_EVENT_ATTR(memory_error, 0x1a),
> +	DSU_EVENT_ATTR(bus_cycles, 0x1d),
> +	DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
> +	DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
> +	DSU_EVENT_ATTR(l3d_cache, 0x2b),
> +	DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),
> +	NULL,
> +};

[...]

> +static int dsu_pmu_get_event_idx(struct dsu_hw_events *hw_events,
> +				 struct perf_event *event)
> +{

> +	idx = find_next_zero_bit(used_mask, dsu_pmu->num_counters, 0);

Perhaps:

	idx = find_first_zero_bit(used_mask, dsu_pmu->num_counters);

[...]

> +static irqreturn_t dsu_pmu_handle_irq(int irq_num, void *dev)
> +{
> +	int i;
> +	bool handled = false;
> +	struct dsu_pmu *dsu_pmu = dev;
> +	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
> +	unsigned long overflow, workset;
> +
> +	overflow = dsu_pmu_getreset_overflow();
> +	bitmap_and(&workset, &overflow, hw_events->used_mask,
> +		   DSU_PMU_MAX_HW_CNTRS);

Why do we need this bitmap_and()? Surely the bits for unused counters
shouldn't be set?

It would probably be better to reset those when we assocaite the first
CPU with a DSU.

[...]

> +static int dsu_pmu_event_init(struct perf_event *event)
> +{

> +	if (!dsu_pmu_validate_group(event))
> +		return -EINVAL;
> +	if (!cpumask_test_cpu(event->cpu, &dsu_pmu->associated_cpus)) {
> +		dev_dbg(dsu_pmu->pmu.dev,
> +			 "Requested cpu is not associated with the DSU\n");
> +		return -EINVAL;
> +	}

It might make sense to flip these two checks, so as to do the simpler
one first.

[...]

> +static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
> +{
> +	int i = 0, n, cpu;
> +	struct device_node *cpu_node;
> +
> +	n = of_count_phandle_with_args(dev, "cpus", NULL);
> +	if (n <= 0)
> +		return -ENODEV;
> +	for (; i < n; i++) {

Can we put the i = 0 here? That would make this easier to read.

[...]

> +static void dsu_pmu_probe_pmu(void *data)
> +{
> +	struct dsu_pmu *dsu_pmu = data;
> +	u64 num_counters;
> +	u32 cpmceid[2];
> +
> +	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
> +						CLUSTERPMCR_N_MASK;
> +	/* We can only support upto 31 independent counters */

s/upto/up to/

Does the hardware spec allow for more than this?

> +	if (WARN_ON(num_counters > 31))
> +		num_counters = 31;
> +	dsu_pmu->num_counters = num_counters;
> +	if (!dsu_pmu->num_counters)
> +		return;
> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
> +				DSU_PMU_MAX_COMMON_EVENTS,
> +				cpmceid,
> +				ARRAY_SIZE(cpmceid));
> +}

[...]

> +
> +static int dsu_pmu_device_probe(struct platform_device *pdev)
> +{
> +	int irq, rc, cpu;
> +	struct dsu_pmu *dsu_pmu;
> +	char *name;
> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
> +
> +	dsu_pmu = dsu_pmu_alloc(pdev);
> +	if (IS_ERR(dsu_pmu))
> +		return PTR_ERR(dsu_pmu);
> +
> +	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
> +	if (rc) {
> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
> +		return rc;
> +	}
> +
> +	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
> +					dsu_pmu_probe_pmu,
> +					dsu_pmu, 1);

Can we probe the relevant CPUs in the same was as the qcom l2 pmu, using
notifiers?

That way we'd work better with maxcpus=.

We have to do this cross-call in the arm_pmu driver because we need the
name at probe time, but that doesn't apply here. We should be able to
lazily initialize the set of events and number of counters.

That also keeps the IRQ affintiy management in one place.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU
  2017-10-10 10:33   ` Suzuki K Poulose
@ 2017-10-18  9:20     ` Mark Rutland
  -1 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-18  9:20 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan

On Tue, Oct 10, 2017 at 11:33:02AM +0100, Suzuki K Poulose wrote:
> This patch documents the devicetree bindings for ARM DSU PMU.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: devicetree@vger.kernel.org
> Cc: frowand.list@gmail.com
> Acked-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
> Changes since V3:
>  - Fixed node name in the example, suggested by Rob
> ---
>  .../devicetree/bindings/arm/arm-dsu-pmu.txt        | 27 ++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> new file mode 100644
> index 000000000000..6efabba530f1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> @@ -0,0 +1,27 @@
> +* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
> +
> +ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
> +with a shared L3 memory system, control logic and external interfaces to
> +form a multicore cluster. The PMU enables to gather various statistics on
> +the operations of the DSU. The PMU provides independent 32bit counters that
> +can count any of the supported events, along with a 64bit cycle counter.
> +The PMU is accessed via CPU system registers and has no MMIO component.
> +
> +** DSU PMU required properties:
> +
> +- compatible	: should be one of :
> +
> +		"arm,dsu-pmu"
> +
> +- interrupts	: Exactly 1 SPI must be listed.
> +
> +- cpus		: List of phandles for the CPUs connected to this DSU instance.
> +
> +
> +** Example:
> +
> +dsu-pmu-0 {
> +	compatible = "arm,dsu-pmu";
> +	interrupts = <GIC_SPI 02 IRQ_TYPE_LEVEL_HIGH>;
> +	cpus = <&cpu_0>, <&cpu_1>;
> +};
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU
@ 2017-10-18  9:20     ` Mark Rutland
  0 siblings, 0 replies; 40+ messages in thread
From: Mark Rutland @ 2017-10-18  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 11:33:02AM +0100, Suzuki K Poulose wrote:
> This patch documents the devicetree bindings for ARM DSU PMU.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: devicetree at vger.kernel.org
> Cc: frowand.list at gmail.com
> Acked-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
> Changes since V3:
>  - Fixed node name in the example, suggested by Rob
> ---
>  .../devicetree/bindings/arm/arm-dsu-pmu.txt        | 27 ++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> new file mode 100644
> index 000000000000..6efabba530f1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
> @@ -0,0 +1,27 @@
> +* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
> +
> +ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
> +with a shared L3 memory system, control logic and external interfaces to
> +form a multicore cluster. The PMU enables to gather various statistics on
> +the operations of the DSU. The PMU provides independent 32bit counters that
> +can count any of the supported events, along with a 64bit cycle counter.
> +The PMU is accessed via CPU system registers and has no MMIO component.
> +
> +** DSU PMU required properties:
> +
> +- compatible	: should be one of :
> +
> +		"arm,dsu-pmu"
> +
> +- interrupts	: Exactly 1 SPI must be listed.
> +
> +- cpus		: List of phandles for the CPUs connected to this DSU instance.
> +
> +
> +** Example:
> +
> +dsu-pmu-0 {
> +	compatible = "arm,dsu-pmu";
> +	interrupts = <GIC_SPI 02 IRQ_TYPE_LEVEL_HIGH>;
> +	cpus = <&cpu_0>, <&cpu_1>;
> +};
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
@ 2017-10-20 10:17       ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-20 10:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, robh, will.deacon, sudeep.holla,
	frowand.list, devicetree, Jonathan.Cameron, marc.zyngier, peterz,
	mathieu.poirier, leo.yan

On 18/10/17 10:20, Mark Rutland wrote:
> Hi Suzuki,
> 
> This generally looks good. My comments below are mostly minor, modulo
> the probing/hotplug bit at the end.
> 

...

>> +static void dsu_pmu_probe_pmu(void *data)
>> +{
>> +	struct dsu_pmu *dsu_pmu = data;
>> +	u64 num_counters;
>> +	u32 cpmceid[2];
>> +
>> +	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
>> +						CLUSTERPMCR_N_MASK;
>> +	/* We can only support upto 31 independent counters */
> 
> s/upto/up to/
> 
> Does the hardware spec allow for more than this?

No, the "counter" mask registers are 32bit wide, with Bit 31 reserved for the
Cycle counter.

> 
>> +	if (WARN_ON(num_counters > 31))
>> +		num_counters = 31;
>> +	dsu_pmu->num_counters = num_counters;
>> +	if (!dsu_pmu->num_counters)
>> +		return;
>> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
>> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
>> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
>> +				DSU_PMU_MAX_COMMON_EVENTS,
>> +				cpmceid,
>> +				ARRAY_SIZE(cpmceid));
>> +}
> 
> [...]
> 
>> +
>> +static int dsu_pmu_device_probe(struct platform_device *pdev)
>> +{
>> +	int irq, rc, cpu;
>> +	struct dsu_pmu *dsu_pmu;
>> +	char *name;
>> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
>> +
>> +	dsu_pmu = dsu_pmu_alloc(pdev);
>> +	if (IS_ERR(dsu_pmu))
>> +		return PTR_ERR(dsu_pmu);
>> +
>> +	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
>> +	if (rc) {
>> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
>> +		return rc;
>> +	}
>> +
>> +	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
>> +					dsu_pmu_probe_pmu,
>> +					dsu_pmu, 1);
> 
> Can we probe the relevant CPUs in the same was as the qcom l2 pmu, using
> notifiers?
> 
> That way we'd work better with maxcpus=.
> 
> We have to do this cross-call in the arm_pmu driver because we need the
> name at probe time, but that doesn't apply here. We should be able to
> lazily initialize the set of events and number of counters.

OK, I will make the necessary changes.

Thanks a lot for the review.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
@ 2017-10-20 10:17       ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-20 10:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, robh-DgEjT+Ai2ygdnm+yROfE0A,
	will.deacon-5wv7dgnIgG8, sudeep.holla-5wv7dgnIgG8,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	Jonathan.Cameron-hv44wF8Li93QT0dZR+AlfA,
	marc.zyngier-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A,
	leo.yan-QSEj5FYQhm4dnm+yROfE0A

On 18/10/17 10:20, Mark Rutland wrote:
> Hi Suzuki,
> 
> This generally looks good. My comments below are mostly minor, modulo
> the probing/hotplug bit at the end.
> 

...

>> +static void dsu_pmu_probe_pmu(void *data)
>> +{
>> +	struct dsu_pmu *dsu_pmu = data;
>> +	u64 num_counters;
>> +	u32 cpmceid[2];
>> +
>> +	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
>> +						CLUSTERPMCR_N_MASK;
>> +	/* We can only support upto 31 independent counters */
> 
> s/upto/up to/
> 
> Does the hardware spec allow for more than this?

No, the "counter" mask registers are 32bit wide, with Bit 31 reserved for the
Cycle counter.

> 
>> +	if (WARN_ON(num_counters > 31))
>> +		num_counters = 31;
>> +	dsu_pmu->num_counters = num_counters;
>> +	if (!dsu_pmu->num_counters)
>> +		return;
>> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
>> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
>> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
>> +				DSU_PMU_MAX_COMMON_EVENTS,
>> +				cpmceid,
>> +				ARRAY_SIZE(cpmceid));
>> +}
> 
> [...]
> 
>> +
>> +static int dsu_pmu_device_probe(struct platform_device *pdev)
>> +{
>> +	int irq, rc, cpu;
>> +	struct dsu_pmu *dsu_pmu;
>> +	char *name;
>> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
>> +
>> +	dsu_pmu = dsu_pmu_alloc(pdev);
>> +	if (IS_ERR(dsu_pmu))
>> +		return PTR_ERR(dsu_pmu);
>> +
>> +	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
>> +	if (rc) {
>> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
>> +		return rc;
>> +	}
>> +
>> +	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
>> +					dsu_pmu_probe_pmu,
>> +					dsu_pmu, 1);
> 
> Can we probe the relevant CPUs in the same was as the qcom l2 pmu, using
> notifiers?
> 
> That way we'd work better with maxcpus=.
> 
> We have to do this cross-call in the arm_pmu driver because we need the
> name at probe time, but that doesn't apply here. We should be able to
> lazily initialize the set of events and number of counters.

OK, I will make the necessary changes.

Thanks a lot for the review.

Cheers
Suzuki
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support
@ 2017-10-20 10:17       ` Suzuki K Poulose
  0 siblings, 0 replies; 40+ messages in thread
From: Suzuki K Poulose @ 2017-10-20 10:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/10/17 10:20, Mark Rutland wrote:
> Hi Suzuki,
> 
> This generally looks good. My comments below are mostly minor, modulo
> the probing/hotplug bit at the end.
> 

...

>> +static void dsu_pmu_probe_pmu(void *data)
>> +{
>> +	struct dsu_pmu *dsu_pmu = data;
>> +	u64 num_counters;
>> +	u32 cpmceid[2];
>> +
>> +	num_counters = (__dsu_pmu_read_pmcr() >> CLUSTERPMCR_N_SHIFT) &
>> +						CLUSTERPMCR_N_MASK;
>> +	/* We can only support upto 31 independent counters */
> 
> s/upto/up to/
> 
> Does the hardware spec allow for more than this?

No, the "counter" mask registers are 32bit wide, with Bit 31 reserved for the
Cycle counter.

> 
>> +	if (WARN_ON(num_counters > 31))
>> +		num_counters = 31;
>> +	dsu_pmu->num_counters = num_counters;
>> +	if (!dsu_pmu->num_counters)
>> +		return;
>> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
>> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
>> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
>> +				DSU_PMU_MAX_COMMON_EVENTS,
>> +				cpmceid,
>> +				ARRAY_SIZE(cpmceid));
>> +}
> 
> [...]
> 
>> +
>> +static int dsu_pmu_device_probe(struct platform_device *pdev)
>> +{
>> +	int irq, rc, cpu;
>> +	struct dsu_pmu *dsu_pmu;
>> +	char *name;
>> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
>> +
>> +	dsu_pmu = dsu_pmu_alloc(pdev);
>> +	if (IS_ERR(dsu_pmu))
>> +		return PTR_ERR(dsu_pmu);
>> +
>> +	rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->associated_cpus);
>> +	if (rc) {
>> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
>> +		return rc;
>> +	}
>> +
>> +	rc = smp_call_function_any(&dsu_pmu->associated_cpus,
>> +					dsu_pmu_probe_pmu,
>> +					dsu_pmu, 1);
> 
> Can we probe the relevant CPUs in the same was as the qcom l2 pmu, using
> notifiers?
> 
> That way we'd work better with maxcpus=.
> 
> We have to do this cross-call in the arm_pmu driver because we need the
> name at probe time, but that doesn't apply here. We should be able to
> lazily initialize the set of events and number of counters.

OK, I will make the necessary changes.

Thanks a lot for the review.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-10-20 10:17 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 10:32 [PATCH v8 0/8] perf: Support for ARM DynamIQ Shared Unit Suzuki K Poulose
2017-10-10 10:32 ` Suzuki K Poulose
2017-10-10 10:32 ` Suzuki K Poulose
2017-10-10 10:32 ` [PATCH v8 1/8] perf: Export perf_event_update_userpage Suzuki K Poulose
2017-10-10 10:32   ` Suzuki K Poulose
2017-10-10 10:32 ` [PATCH v8 2/8] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
2017-10-10 10:32   ` Suzuki K Poulose
2017-10-10 10:32   ` Suzuki K Poulose
2017-10-10 10:32 ` [PATCH v8 3/8] coresight: of: Use of_cpu_node_to_id helper Suzuki K Poulose
2017-10-10 10:32   ` Suzuki K Poulose
2017-10-10 10:32 ` [PATCH v8 4/8] irqchip: gic-v3: " Suzuki K Poulose
2017-10-10 10:32   ` Suzuki K Poulose
2017-10-10 10:33 ` [PATCH v8 5/8] arm64: Use of_cpu_node_to_id helper for CPU topology parsing Suzuki K Poulose
2017-10-10 10:33   ` Suzuki K Poulose
2017-10-17 15:24   ` Mark Rutland
2017-10-17 15:24     ` Mark Rutland
2017-10-17 15:24     ` Mark Rutland
2017-10-17 16:11     ` Will Deacon
2017-10-17 16:11       ` Will Deacon
2017-10-17 16:11       ` Will Deacon
2017-10-17 16:20       ` Suzuki K Poulose
2017-10-17 16:20         ` Suzuki K Poulose
2017-10-17 16:42         ` Suzuki K Poulose
2017-10-17 16:42           ` Suzuki K Poulose
2017-10-17 16:42           ` Suzuki K Poulose
2017-10-10 10:33 ` [PATCH v8 6/8] arm_pmu: Use of_cpu_node_to_id helper Suzuki K Poulose
2017-10-10 10:33   ` Suzuki K Poulose
2017-10-17 15:26   ` Mark Rutland
2017-10-17 15:26     ` Mark Rutland
2017-10-10 10:33 ` [PATCH v8 7/8] dt-bindings: Document devicetree binding for ARM DSU PMU Suzuki K Poulose
2017-10-10 10:33   ` Suzuki K Poulose
2017-10-18  9:20   ` Mark Rutland
2017-10-18  9:20     ` Mark Rutland
2017-10-10 10:33 ` [PATCH v8 8/8] perf: ARM DynamIQ Shared Unit PMU support Suzuki K Poulose
2017-10-10 10:33   ` Suzuki K Poulose
2017-10-18  9:20   ` Mark Rutland
2017-10-18  9:20     ` Mark Rutland
2017-10-20 10:17     ` Suzuki K Poulose
2017-10-20 10:17       ` Suzuki K Poulose
2017-10-20 10:17       ` Suzuki K Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.