All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-10 12:44 ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

This series enables future IP trace features Embedded Trace Extension (ETE)
and Trace Buffer Extension (TRBE). This series depends on the ETM system
register instruction support series [0] and the v8.4 Self hosted tracing
support series (Jonathan Zhou) [1]. The tree is available here [2] for
quick access.

ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
extensions. ETE overlaps with the ETMv4 architecture, with additions to
support the newer architecture features and some restrictions on the
supported features w.r.t ETMv4. The ETE support is added by extending the
ETMv4 driver to recognise the ETE and handle the features as exposed by the
TRCIDRx registers. ETE only supports system instructions access from the
host CPU. The ETE could be integrated with a TRBE (see below), or with the
legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
description as the ETMs and requires a node per instance. 

Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
accessible via the system registers and can be combined with the ETE to
provide a 1x1 configuration of source & sink. TRBE is being represented
here as a CoreSight sink. Primary reason is that the ETE source could work
with other traditional CoreSight sink devices. As TRBE captures the trace
data which is produced by ETE, it cannot work alone.

TRBE representation here have some distinct deviations from a traditional
CoreSight sink device. Coresight path between ETE and TRBE are not built
during boot looking at respective DT or ACPI entries. Instead TRBE gets
checked on each available CPU, when found gets connected with respective
ETE source device on the same CPU, after altering its outward connections.
ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
coupling/decoupling method implemented here is not optimal and would be
reworked later on.

Unlike traditional sinks, TRBE can generate interrupts to signal including
many other things, buffer got filled. The interrupt is a PPI and should be
communicated from the platform. DT or ACPI entry representing TRBE should
have the PPI number for a given platform. During perf session, the TRBE IRQ
handler should capture trace for perf auxiliary buffer before restarting it
back. System registers being used here to configure ETE and TRBE could be
referred in the link below.

https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.

This adds another change where CoreSight sink device needs to be disabled
before capturing the trace data for perf in order to avoid race condition
with another simultaneous TRBE IRQ handling. This might cause problem with
traditional sink devices which can be operated in both sysfs and perf mode.
This needs to be addressed correctly. One option would be to move the
update_buffer callback into the respective sink devices. e.g, disable().

This series is primarily looking from some early feed back both on proposed
design and its implementation. It acknowledges, that it might be incomplete
and will have scopes for improvement.

Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions

[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
[1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/ 
[2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted

Anshuman Khandual (6):
  arm64: Add TRBE definitions
  coresight: sink: Add TRBE driver
  coresight: etm-perf: Truncate the perf record if handle has no space
  coresight: etm-perf: Disable the path before capturing the trace data
  coresgith: etm-perf: Connect TRBE sink with ETE source
  dts: bindings: Document device tree binding for Arm TRBE

Suzuki K Poulose (5):
  coresight: etm-perf: Allow an event to use different sinks
  coresight: Do not scan for graph if none is present
  coresight: etm4x: Add support for PE OS lock
  coresight: ete: Add support for sysreg support
  coresight: ete: Detect ETE as one of the supported ETMs

 .../devicetree/bindings/arm/coresight.txt          |   3 +
 Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
 Documentation/trace/coresight/coresight-trbe.rst   |  36 +
 arch/arm64/include/asm/sysreg.h                    |  51 ++
 drivers/hwtracing/coresight/Kconfig                |  11 +
 drivers/hwtracing/coresight/Makefile               |   1 +
 drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
 drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
 drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
 drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
 drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
 include/linux/coresight.h                          |   2 +
 14 files changed, 1680 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
 create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-10 12:44 ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

This series enables future IP trace features Embedded Trace Extension (ETE)
and Trace Buffer Extension (TRBE). This series depends on the ETM system
register instruction support series [0] and the v8.4 Self hosted tracing
support series (Jonathan Zhou) [1]. The tree is available here [2] for
quick access.

ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
extensions. ETE overlaps with the ETMv4 architecture, with additions to
support the newer architecture features and some restrictions on the
supported features w.r.t ETMv4. The ETE support is added by extending the
ETMv4 driver to recognise the ETE and handle the features as exposed by the
TRCIDRx registers. ETE only supports system instructions access from the
host CPU. The ETE could be integrated with a TRBE (see below), or with the
legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
description as the ETMs and requires a node per instance. 

Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
accessible via the system registers and can be combined with the ETE to
provide a 1x1 configuration of source & sink. TRBE is being represented
here as a CoreSight sink. Primary reason is that the ETE source could work
with other traditional CoreSight sink devices. As TRBE captures the trace
data which is produced by ETE, it cannot work alone.

TRBE representation here have some distinct deviations from a traditional
CoreSight sink device. Coresight path between ETE and TRBE are not built
during boot looking at respective DT or ACPI entries. Instead TRBE gets
checked on each available CPU, when found gets connected with respective
ETE source device on the same CPU, after altering its outward connections.
ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
coupling/decoupling method implemented here is not optimal and would be
reworked later on.

Unlike traditional sinks, TRBE can generate interrupts to signal including
many other things, buffer got filled. The interrupt is a PPI and should be
communicated from the platform. DT or ACPI entry representing TRBE should
have the PPI number for a given platform. During perf session, the TRBE IRQ
handler should capture trace for perf auxiliary buffer before restarting it
back. System registers being used here to configure ETE and TRBE could be
referred in the link below.

https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.

This adds another change where CoreSight sink device needs to be disabled
before capturing the trace data for perf in order to avoid race condition
with another simultaneous TRBE IRQ handling. This might cause problem with
traditional sink devices which can be operated in both sysfs and perf mode.
This needs to be addressed correctly. One option would be to move the
update_buffer callback into the respective sink devices. e.g, disable().

This series is primarily looking from some early feed back both on proposed
design and its implementation. It acknowledges, that it might be incomplete
and will have scopes for improvement.

Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions

[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
[1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/ 
[2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted

Anshuman Khandual (6):
  arm64: Add TRBE definitions
  coresight: sink: Add TRBE driver
  coresight: etm-perf: Truncate the perf record if handle has no space
  coresight: etm-perf: Disable the path before capturing the trace data
  coresgith: etm-perf: Connect TRBE sink with ETE source
  dts: bindings: Document device tree binding for Arm TRBE

Suzuki K Poulose (5):
  coresight: etm-perf: Allow an event to use different sinks
  coresight: Do not scan for graph if none is present
  coresight: etm4x: Add support for PE OS lock
  coresight: ete: Add support for sysreg support
  coresight: ete: Detect ETE as one of the supported ETMs

 .../devicetree/bindings/arm/coresight.txt          |   3 +
 Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
 Documentation/trace/coresight/coresight-trbe.rst   |  36 +
 arch/arm64/include/asm/sysreg.h                    |  51 ++
 drivers/hwtracing/coresight/Kconfig                |  11 +
 drivers/hwtracing/coresight/Makefile               |   1 +
 drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
 drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
 drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
 drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
 drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
 include/linux/coresight.h                          |   2 +
 14 files changed, 1680 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
 create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC 01/11] arm64: Add TRBE definitions
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:44   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

This adds TRBE related registers and corresponding feature macros.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 49 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 8bfca08..14cb156 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -330,6 +330,55 @@
 
 #define SYS_PMMIR_EL1			sys_reg(3, 0, 9, 14, 6)
 
+/*
+ * TRBE Registers
+ */
+#define SYS_TRBLIMITR_EL1		sys_reg(3, 0, 9, 11, 0)
+#define SYS_TRBPTR_EL1			sys_reg(3, 0, 9, 11, 1)
+#define SYS_TRBBASER_EL1		sys_reg(3, 0, 9, 11, 2)
+#define SYS_TRBSR_EL1			sys_reg(3, 0, 9, 11, 3)
+#define SYS_TRBMAR_EL1			sys_reg(3, 0, 9, 11, 4)
+#define SYS_TRBTRG_EL1			sys_reg(3, 0, 9, 11, 6)
+#define SYS_TRBIDR_EL1			sys_reg(3, 0, 9, 11, 7)
+
+#define TRBLIMITR_LIMIT_MASK		GENMASK(51, 0)
+#define TRBLIMITR_LIMIT_SHIFT		12
+#define TRBLIMITR_NVM			(1UL << 5)
+#define TRBLIMITR_TRIG_MODE_MASK	GENMASK(1, 0)
+#define TRBLIMITR_TRIG_MODE_SHIFT	2
+#define TRBLIMITR_FILL_MODE_MASK	GENMASK(1, 0)
+#define TRBLIMITR_FILL_MODE_SHIFT	1
+#define TRBLIMITR_ENABLE		(1UL << 0)
+#define TRBPTR_PTR_MASK			GENMASK(63, 0)
+#define TRBPTR_PTR_SHIFT		0
+#define TRBBASER_BASE_MASK		GENMASK(51, 0)
+#define TRBBASER_BASE_SHIFT		12
+#define TRBSR_EC_MASK			GENMASK(5, 0)
+#define TRBSR_EC_SHIFT			26
+#define TRBSR_IRQ			(1UL << 22)
+#define TRBSR_TRG			(1UL << 21)
+#define TRBSR_WRAP			(1UL << 20)
+#define TRBSR_ABORT			(1UL << 18)
+#define TRBSR_STOP			(1UL << 17)
+#define TRBSR_MSS_MASK			GENMASK(15, 0)
+#define TRBSR_MSS_SHIFT			0
+#define TRBSR_BSC_MASK			GENMASK(5, 0)
+#define TRBSR_BSC_SHIFT			0
+#define TRBSR_FSC_MASK			GENMASK(5, 0)
+#define TRBSR_FSC_SHIFT			0
+#define TRBMAR_SHARE_MASK		GENMASK(1, 0)
+#define TRBMAR_SHARE_SHIFT		8
+#define TRBMAR_OUTER_MASK		GENMASK(3, 0)
+#define TRBMAR_OUTER_SHIFT		4
+#define TRBMAR_INNER_MASK		GENMASK(3, 0)
+#define TRBMAR_INNER_SHIFT		0
+#define TRBTRG_TRG_MASK			GENMASK(31, 0)
+#define TRBTRG_TRG_SHIFT		0
+#define TRBIDR_FLAG			(1UL << 5)
+#define TRBIDR_PROG			(1UL << 4)
+#define TRBIDR_ALIGN_MASK		GENMASK(3, 0)
+#define TRBIDR_ALIGN_SHIFT		0
+
 #define SYS_MAIR_EL1			sys_reg(3, 0, 10, 2, 0)
 #define SYS_AMAIR_EL1			sys_reg(3, 0, 10, 3, 0)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 01/11] arm64: Add TRBE definitions
@ 2020-11-10 12:44   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

This adds TRBE related registers and corresponding feature macros.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 49 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 8bfca08..14cb156 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -330,6 +330,55 @@
 
 #define SYS_PMMIR_EL1			sys_reg(3, 0, 9, 14, 6)
 
+/*
+ * TRBE Registers
+ */
+#define SYS_TRBLIMITR_EL1		sys_reg(3, 0, 9, 11, 0)
+#define SYS_TRBPTR_EL1			sys_reg(3, 0, 9, 11, 1)
+#define SYS_TRBBASER_EL1		sys_reg(3, 0, 9, 11, 2)
+#define SYS_TRBSR_EL1			sys_reg(3, 0, 9, 11, 3)
+#define SYS_TRBMAR_EL1			sys_reg(3, 0, 9, 11, 4)
+#define SYS_TRBTRG_EL1			sys_reg(3, 0, 9, 11, 6)
+#define SYS_TRBIDR_EL1			sys_reg(3, 0, 9, 11, 7)
+
+#define TRBLIMITR_LIMIT_MASK		GENMASK(51, 0)
+#define TRBLIMITR_LIMIT_SHIFT		12
+#define TRBLIMITR_NVM			(1UL << 5)
+#define TRBLIMITR_TRIG_MODE_MASK	GENMASK(1, 0)
+#define TRBLIMITR_TRIG_MODE_SHIFT	2
+#define TRBLIMITR_FILL_MODE_MASK	GENMASK(1, 0)
+#define TRBLIMITR_FILL_MODE_SHIFT	1
+#define TRBLIMITR_ENABLE		(1UL << 0)
+#define TRBPTR_PTR_MASK			GENMASK(63, 0)
+#define TRBPTR_PTR_SHIFT		0
+#define TRBBASER_BASE_MASK		GENMASK(51, 0)
+#define TRBBASER_BASE_SHIFT		12
+#define TRBSR_EC_MASK			GENMASK(5, 0)
+#define TRBSR_EC_SHIFT			26
+#define TRBSR_IRQ			(1UL << 22)
+#define TRBSR_TRG			(1UL << 21)
+#define TRBSR_WRAP			(1UL << 20)
+#define TRBSR_ABORT			(1UL << 18)
+#define TRBSR_STOP			(1UL << 17)
+#define TRBSR_MSS_MASK			GENMASK(15, 0)
+#define TRBSR_MSS_SHIFT			0
+#define TRBSR_BSC_MASK			GENMASK(5, 0)
+#define TRBSR_BSC_SHIFT			0
+#define TRBSR_FSC_MASK			GENMASK(5, 0)
+#define TRBSR_FSC_SHIFT			0
+#define TRBMAR_SHARE_MASK		GENMASK(1, 0)
+#define TRBMAR_SHARE_SHIFT		8
+#define TRBMAR_OUTER_MASK		GENMASK(3, 0)
+#define TRBMAR_OUTER_SHIFT		4
+#define TRBMAR_INNER_MASK		GENMASK(3, 0)
+#define TRBMAR_INNER_SHIFT		0
+#define TRBTRG_TRG_MASK			GENMASK(31, 0)
+#define TRBTRG_TRG_SHIFT		0
+#define TRBIDR_FLAG			(1UL << 5)
+#define TRBIDR_PROG			(1UL << 4)
+#define TRBIDR_ALIGN_MASK		GENMASK(3, 0)
+#define TRBIDR_ALIGN_SHIFT		0
+
 #define SYS_MAIR_EL1			sys_reg(3, 0, 10, 2, 0)
 #define SYS_AMAIR_EL1			sys_reg(3, 0, 10, 3, 0)
 
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

From: Suzuki K Poulose <suzuki.poulose@arm.com>

When there are multiple sinks on the system, in the absence
of a specified sink, it is quite possible that a default sink
for an ETM could be different from that of another ETM. However
we do not support having multiple sinks for an event yet. This
patch allows the event to use the default sinks on the ETMs
where they are scheduled as long as the sinks are of the same
type.

e.g, if we have 1x1 topology with per-CPU ETRs, the event can
use the per-CPU ETR for the session. However, if the sinks
are of different type, e.g TMC-ETR on one and a custom sink
on another, the event will only trace on the first detected
sink.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index c2c9b12..ea73cfa 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
 	schedule_work(&event_data->work);
 }
 
+static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
+{
+	if (!a || !b)
+		return false;
+	return (sink_ops(a) == sink_ops(b));
+}
+
 static void *etm_setup_aux(struct perf_event *event, void **pages,
 			   int nr_pages, bool overwrite)
 {
 	u32 id;
 	int cpu = event->cpu;
 	cpumask_t *mask;
-	struct coresight_device *sink;
+	struct coresight_device *sink = NULL;
 	struct etm_event_data *event_data = NULL;
+	bool sink_forced = false;
 
 	event_data = alloc_event_data(cpu);
 	if (!event_data)
@@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (event->attr.config2) {
 		id = (u32)event->attr.config2;
 		sink = coresight_get_sink_by_id(id);
+		sink_forced = true;
 	}
 
 	mask = &event_data->mask;
@@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	 */
 	for_each_cpu(cpu, mask) {
 		struct list_head *path;
-		struct coresight_device *csdev;
+		struct coresight_device *csdev, *new_sink;
 
 		csdev = per_cpu(csdev_src, cpu);
 		/*
@@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 		}
 
 		/*
-		 * No sink provided - look for a default sink for one of the
-		 * devices. At present we only support topology where all CPUs
-		 * use the same sink [N:1], so only need to find one sink. The
-		 * coresight_build_path later will remove any CPU that does not
-		 * attach to the sink, or if we have not found a sink.
+		 * No sink provided - look for a default sink for all the devices.
+		 * We only support multiple sinks, only if all the default sinks
+		 * are of the same type, so that the sink buffer can be shared
+		 * as the event moves around. We don't trace on a CPU if it can't
+		 *
 		 */
-		if (!sink)
-			sink = coresight_find_default_sink(csdev);
+		if (!sink_forced) {
+			new_sink = coresight_find_default_sink(csdev);
+			if (!new_sink) {
+				cpumask_clear_cpu(cpu, mask);
+				continue;
+			}
+			/* Skip checks for the first sink */
+			if (!sink) {
+				sink = new_sink;
+			} else if (!sinks_match(new_sink, sink)) {
+				cpumask_clear_cpu(cpu, mask);
+				continue;
+			}
+		} else {
+			new_sink = sink;
+		}
 
 		/*
 		 * Building a path doesn't enable it, it simply builds a
 		 * list of devices from source to sink that can be
 		 * referenced later when the path is actually needed.
 		 */
-		path = coresight_build_path(csdev, sink);
+		path = coresight_build_path(csdev, new_sink);
 		if (IS_ERR(path)) {
 			cpumask_clear_cpu(cpu, mask);
 			continue;
@@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
 		goto err;
 
-	/* Allocate the sink buffer for this session */
+	/*
+	 * Allocate the sink buffer for this session. All the sinks
+	 * where this event can be scheduled are ensured to be of the
+	 * same type. Thus the same sink configuration is used by the
+	 * sinks.
+	 */
 	event_data->snk_config =
 			sink_ops(sink)->alloc_buffer(sink, event, pages,
 						     nr_pages, overwrite);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

From: Suzuki K Poulose <suzuki.poulose@arm.com>

When there are multiple sinks on the system, in the absence
of a specified sink, it is quite possible that a default sink
for an ETM could be different from that of another ETM. However
we do not support having multiple sinks for an event yet. This
patch allows the event to use the default sinks on the ETMs
where they are scheduled as long as the sinks are of the same
type.

e.g, if we have 1x1 topology with per-CPU ETRs, the event can
use the per-CPU ETR for the session. However, if the sinks
are of different type, e.g TMC-ETR on one and a custom sink
on another, the event will only trace on the first detected
sink.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index c2c9b12..ea73cfa 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
 	schedule_work(&event_data->work);
 }
 
+static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
+{
+	if (!a || !b)
+		return false;
+	return (sink_ops(a) == sink_ops(b));
+}
+
 static void *etm_setup_aux(struct perf_event *event, void **pages,
 			   int nr_pages, bool overwrite)
 {
 	u32 id;
 	int cpu = event->cpu;
 	cpumask_t *mask;
-	struct coresight_device *sink;
+	struct coresight_device *sink = NULL;
 	struct etm_event_data *event_data = NULL;
+	bool sink_forced = false;
 
 	event_data = alloc_event_data(cpu);
 	if (!event_data)
@@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (event->attr.config2) {
 		id = (u32)event->attr.config2;
 		sink = coresight_get_sink_by_id(id);
+		sink_forced = true;
 	}
 
 	mask = &event_data->mask;
@@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	 */
 	for_each_cpu(cpu, mask) {
 		struct list_head *path;
-		struct coresight_device *csdev;
+		struct coresight_device *csdev, *new_sink;
 
 		csdev = per_cpu(csdev_src, cpu);
 		/*
@@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 		}
 
 		/*
-		 * No sink provided - look for a default sink for one of the
-		 * devices. At present we only support topology where all CPUs
-		 * use the same sink [N:1], so only need to find one sink. The
-		 * coresight_build_path later will remove any CPU that does not
-		 * attach to the sink, or if we have not found a sink.
+		 * No sink provided - look for a default sink for all the devices.
+		 * We only support multiple sinks, only if all the default sinks
+		 * are of the same type, so that the sink buffer can be shared
+		 * as the event moves around. We don't trace on a CPU if it can't
+		 *
 		 */
-		if (!sink)
-			sink = coresight_find_default_sink(csdev);
+		if (!sink_forced) {
+			new_sink = coresight_find_default_sink(csdev);
+			if (!new_sink) {
+				cpumask_clear_cpu(cpu, mask);
+				continue;
+			}
+			/* Skip checks for the first sink */
+			if (!sink) {
+				sink = new_sink;
+			} else if (!sinks_match(new_sink, sink)) {
+				cpumask_clear_cpu(cpu, mask);
+				continue;
+			}
+		} else {
+			new_sink = sink;
+		}
 
 		/*
 		 * Building a path doesn't enable it, it simply builds a
 		 * list of devices from source to sink that can be
 		 * referenced later when the path is actually needed.
 		 */
-		path = coresight_build_path(csdev, sink);
+		path = coresight_build_path(csdev, new_sink);
 		if (IS_ERR(path)) {
 			cpumask_clear_cpu(cpu, mask);
 			continue;
@@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
 		goto err;
 
-	/* Allocate the sink buffer for this session */
+	/*
+	 * Allocate the sink buffer for this session. All the sinks
+	 * where this event can be scheduled are ensured to be of the
+	 * same type. Thus the same sink configuration is used by the
+	 * sinks.
+	 */
 	event_data->snk_config =
 			sink_ops(sink)->alloc_buffer(sink, event, pages,
 						     nr_pages, overwrite);
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 03/11] coresight: Do not scan for graph if none is present
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

From: Suzuki K Poulose <suzuki.poulose@arm.com>

If a graph node is not found for a given node, of_get_next_endpoint()
will emit the following error message :

 OF: graph: no port node found in /<node_name>

If the given component doesn't have any explicit connections (e.g,
ETE) we could simply ignore the graph parsing.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-platform.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
index 3629b78..c594f45 100644
--- a/drivers/hwtracing/coresight/coresight-platform.c
+++ b/drivers/hwtracing/coresight/coresight-platform.c
@@ -90,6 +90,12 @@ static void of_coresight_get_ports_legacy(const struct device_node *node,
 	struct of_endpoint endpoint;
 	int in = 0, out = 0;
 
+	/*
+	 * Avoid warnings in of_graph_get_next_endpoint()
+	 * if the device doesn't have any graph connections
+	 */
+	if (!of_graph_is_present(node))
+		return;
 	do {
 		ep = of_graph_get_next_endpoint(node, ep);
 		if (!ep)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 03/11] coresight: Do not scan for graph if none is present
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

From: Suzuki K Poulose <suzuki.poulose@arm.com>

If a graph node is not found for a given node, of_get_next_endpoint()
will emit the following error message :

 OF: graph: no port node found in /<node_name>

If the given component doesn't have any explicit connections (e.g,
ETE) we could simply ignore the graph parsing.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-platform.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
index 3629b78..c594f45 100644
--- a/drivers/hwtracing/coresight/coresight-platform.c
+++ b/drivers/hwtracing/coresight/coresight-platform.c
@@ -90,6 +90,12 @@ static void of_coresight_get_ports_legacy(const struct device_node *node,
 	struct of_endpoint endpoint;
 	int in = 0, out = 0;
 
+	/*
+	 * Avoid warnings in of_graph_get_next_endpoint()
+	 * if the device doesn't have any graph connections
+	 */
+	if (!of_graph_is_present(node))
+		return;
 	do {
 		ep = of_graph_get_next_endpoint(node, ep);
 		if (!ep)
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 04/11] coresight: etm4x: Add support for PE OS lock
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

From: Suzuki K Poulose <suzuki.poulose@arm.com>

ETE may not implement the OS lock and instead could rely on
the PE OS Lock for the trace unit access. This is indicated
by the TRCOLSR.OSM == 0b100. Add support for handling the
PE OS lock

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 50 ++++++++++++++++++----
 drivers/hwtracing/coresight/coresight-etm4x.h      | 15 +++++++
 2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index fd945c1..0269b4c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -101,30 +101,59 @@ void etm4x_sysreg_write(struct csdev_access *csa,
 	}
 }
 
-static void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata, struct csdev_access *csa)
+static void etm_detect_os_lock(struct etmv4_drvdata *drvdata,
+			       struct csdev_access *csa)
 {
-	/* Writing 0 to TRCOSLAR unlocks the trace registers */
-	etm4x_relaxed_write32(csa, 0x0, TRCOSLAR);
-	drvdata->os_unlock = true;
+	u32 oslsr = etm4x_relaxed_read32(csa, TRCOSLSR);
+
+	drvdata->os_lock_model = ETM_OSLSR_OSLM(oslsr);
+}
+
+static void etm_write_os_lock(struct etmv4_drvdata *drvdata,
+			      struct csdev_access *csa, u32 val)
+{
+	val = !!val;
+
+	switch (drvdata->os_lock_model) {
+	case ETM_OSLOCK_PRESENT:
+		etm4x_relaxed_write32(csa, val, TRCOSLAR);
+		break;
+	case ETM_OSLOCK_PE:
+		write_sysreg_s(val, SYS_OSLAR_EL1);
+		break;
+	default:
+		pr_warn_once("CPU%d: Unsupported Trace OSLock model: %x\n",
+			     smp_processor_id(), drvdata->os_lock_model);
+		fallthrough;
+	case ETM_OSLOCK_NI:
+		return;
+	}
 	isb();
 }
 
+static inline void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata,
+				      struct csdev_access *csa)
+{
+	WARN_ON(drvdata->cpu != smp_processor_id());
+
+	/* Writing 0 to OS Lock unlocks the trace unit registers */
+	etm_write_os_lock(drvdata, csa, 0x0);
+	drvdata->os_unlock = true;
+}
+
 static void etm4_os_unlock(struct etmv4_drvdata *drvdata)
 {
 	if (!WARN_ON(!drvdata->csdev))
 		etm4_os_unlock_csa(drvdata, &drvdata->csdev->access);
-
 }
 
 static void etm4_os_lock(struct etmv4_drvdata *drvdata)
 {
 	if (WARN_ON(!drvdata->csdev))
 		return;
-
-	/* Writing 0x1 to TRCOSLAR locks the trace registers */
-	etm4x_relaxed_write32(&drvdata->csdev->access, 0x1, TRCOSLAR);
+	/* Writing 0x1 to OS Lock locks the trace registers */
+	etm_write_os_lock(drvdata, &drvdata->csdev->access, 0x1);
 	drvdata->os_unlock = false;
-	isb();
 }
 
 static void etm4_cs_lock(struct etmv4_drvdata *drvdata,
@@ -794,6 +823,9 @@ static void etm4_init_arch_data(void *info)
 	if (!etm_init_csdev_access(drvdata, csa))
 		return;
 
+	/* Detect the support for OS Lock before we actuall use it */
+	etm_detect_os_lock(drvdata, csa);
+
 	/* Make sure all registers are accessible */
 	etm4_os_unlock_csa(drvdata, csa);
 	etm4_cs_unlock(drvdata, csa);
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index fe71072..4b1bfc2 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -497,6 +497,20 @@
 					 ETM_MODE_EXCL_USER)
 
 /*
+ * TRCOSLSR.OSLM advertises the OS Lock model.
+ * OSLM[2:0] = TRCOSLSR[4:3,0]
+ *
+ *	0b000 - Trace OS Lock is not implemented.
+ *	0b010 - Trace OS Lock is implemented.
+ *	0b100 - Trace OS Lock is not implemented, unit is controlled by PE OS Lock.
+ */
+#define ETM_OSLOCK_NI		0b000
+#define ETM_OSLOCK_PRESENT	0b010
+#define ETM_OSLOCK_PE		0b100
+
+#define ETM_OSLSR_OSLM(oslsr)	((((oslsr) & GENMASK(4, 3)) >> 2) | (oslsr & 0x1))
+
+/*
  * TRCDEVARCH Bit field definitions
  * Bits[31:21]	- ARCHITECT = Always Arm Ltd.
  *                * Bits[31:28] = 0x4
@@ -879,6 +893,7 @@ struct etmv4_drvdata {
 	u8				s_ex_level;
 	u8				ns_ex_level;
 	u8				q_support;
+	u8				os_lock_model;
 	bool				sticky_enable;
 	bool				boot_enable;
 	bool				os_unlock;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 04/11] coresight: etm4x: Add support for PE OS lock
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

From: Suzuki K Poulose <suzuki.poulose@arm.com>

ETE may not implement the OS lock and instead could rely on
the PE OS Lock for the trace unit access. This is indicated
by the TRCOLSR.OSM == 0b100. Add support for handling the
PE OS lock

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 50 ++++++++++++++++++----
 drivers/hwtracing/coresight/coresight-etm4x.h      | 15 +++++++
 2 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index fd945c1..0269b4c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -101,30 +101,59 @@ void etm4x_sysreg_write(struct csdev_access *csa,
 	}
 }
 
-static void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata, struct csdev_access *csa)
+static void etm_detect_os_lock(struct etmv4_drvdata *drvdata,
+			       struct csdev_access *csa)
 {
-	/* Writing 0 to TRCOSLAR unlocks the trace registers */
-	etm4x_relaxed_write32(csa, 0x0, TRCOSLAR);
-	drvdata->os_unlock = true;
+	u32 oslsr = etm4x_relaxed_read32(csa, TRCOSLSR);
+
+	drvdata->os_lock_model = ETM_OSLSR_OSLM(oslsr);
+}
+
+static void etm_write_os_lock(struct etmv4_drvdata *drvdata,
+			      struct csdev_access *csa, u32 val)
+{
+	val = !!val;
+
+	switch (drvdata->os_lock_model) {
+	case ETM_OSLOCK_PRESENT:
+		etm4x_relaxed_write32(csa, val, TRCOSLAR);
+		break;
+	case ETM_OSLOCK_PE:
+		write_sysreg_s(val, SYS_OSLAR_EL1);
+		break;
+	default:
+		pr_warn_once("CPU%d: Unsupported Trace OSLock model: %x\n",
+			     smp_processor_id(), drvdata->os_lock_model);
+		fallthrough;
+	case ETM_OSLOCK_NI:
+		return;
+	}
 	isb();
 }
 
+static inline void etm4_os_unlock_csa(struct etmv4_drvdata *drvdata,
+				      struct csdev_access *csa)
+{
+	WARN_ON(drvdata->cpu != smp_processor_id());
+
+	/* Writing 0 to OS Lock unlocks the trace unit registers */
+	etm_write_os_lock(drvdata, csa, 0x0);
+	drvdata->os_unlock = true;
+}
+
 static void etm4_os_unlock(struct etmv4_drvdata *drvdata)
 {
 	if (!WARN_ON(!drvdata->csdev))
 		etm4_os_unlock_csa(drvdata, &drvdata->csdev->access);
-
 }
 
 static void etm4_os_lock(struct etmv4_drvdata *drvdata)
 {
 	if (WARN_ON(!drvdata->csdev))
 		return;
-
-	/* Writing 0x1 to TRCOSLAR locks the trace registers */
-	etm4x_relaxed_write32(&drvdata->csdev->access, 0x1, TRCOSLAR);
+	/* Writing 0x1 to OS Lock locks the trace registers */
+	etm_write_os_lock(drvdata, &drvdata->csdev->access, 0x1);
 	drvdata->os_unlock = false;
-	isb();
 }
 
 static void etm4_cs_lock(struct etmv4_drvdata *drvdata,
@@ -794,6 +823,9 @@ static void etm4_init_arch_data(void *info)
 	if (!etm_init_csdev_access(drvdata, csa))
 		return;
 
+	/* Detect the support for OS Lock before we actuall use it */
+	etm_detect_os_lock(drvdata, csa);
+
 	/* Make sure all registers are accessible */
 	etm4_os_unlock_csa(drvdata, csa);
 	etm4_cs_unlock(drvdata, csa);
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index fe71072..4b1bfc2 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -497,6 +497,20 @@
 					 ETM_MODE_EXCL_USER)
 
 /*
+ * TRCOSLSR.OSLM advertises the OS Lock model.
+ * OSLM[2:0] = TRCOSLSR[4:3,0]
+ *
+ *	0b000 - Trace OS Lock is not implemented.
+ *	0b010 - Trace OS Lock is implemented.
+ *	0b100 - Trace OS Lock is not implemented, unit is controlled by PE OS Lock.
+ */
+#define ETM_OSLOCK_NI		0b000
+#define ETM_OSLOCK_PRESENT	0b010
+#define ETM_OSLOCK_PE		0b100
+
+#define ETM_OSLSR_OSLM(oslsr)	((((oslsr) & GENMASK(4, 3)) >> 2) | (oslsr & 0x1))
+
+/*
  * TRCDEVARCH Bit field definitions
  * Bits[31:21]	- ARCHITECT = Always Arm Ltd.
  *                * Bits[31:28] = 0x4
@@ -879,6 +893,7 @@ struct etmv4_drvdata {
 	u8				s_ex_level;
 	u8				ns_ex_level;
 	u8				q_support;
+	u8				os_lock_model;
 	bool				sticky_enable;
 	bool				boot_enable;
 	bool				os_unlock;
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 05/11] coresight: ete: Add support for sysreg support
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

From: Suzuki K Poulose <suzuki.poulose@arm.com>

This adds sysreg support for ETE.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 39 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-etm4x.h      | 42 +++++++++++++++++-----
 2 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 0269b4c..15b6e94 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -101,6 +101,45 @@ void etm4x_sysreg_write(struct csdev_access *csa,
 	}
 }
 
+u64 ete_sysreg_read(struct csdev_access *csa,
+		      u32 offset,
+		      bool _relaxed,
+		      bool _64bit)
+{
+	u64 res = 0;
+
+	switch (offset) {
+	ETE_READ_CASES(res)
+	default :
+		WARN_ONCE(1, "ete: trying to read unsupported register @%x\n",
+			 offset);
+	}
+
+	if (!_relaxed)
+		__iormb(res);	/* Imitate the !relaxed I/O helpers */
+
+	return res;
+}
+
+void ete_sysreg_write(struct csdev_access *csa,
+			u64 val,
+			u32 offset,
+			bool _relaxed,
+			bool _64bit)
+{
+	if (!_relaxed)
+		__iowmb();	/* Imitate the !relaxed I/O helpers */
+	if (!_64bit)
+		val &= GENMASK(31, 0);
+
+	switch (offset) {
+	ETE_WRITE_CASES(val)
+	default :
+		WARN_ONCE(1, "ete: trying to write to unsupported register @%x\n",
+			offset);
+	}
+}
+
 static void etm_detect_os_lock(struct etmv4_drvdata *drvdata,
 			       struct csdev_access *csa)
 {
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 4b1bfc2..00c0367 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -28,6 +28,7 @@
 #define TRCAUXCTLR			0x018
 #define TRCEVENTCTL0R			0x020
 #define TRCEVENTCTL1R			0x024
+#define TRCRSR				0x028
 #define TRCSTALLCTLR			0x02C
 #define TRCTSCTLR			0x030
 #define TRCSYNCPR			0x034
@@ -48,6 +49,7 @@
 #define TRCSEQRSTEVR			0x118
 #define TRCSEQSTR			0x11C
 #define TRCEXTINSELR			0x120
+#define TRCEXTINSELRn(n)		(0x120 + (n * 4)) /* n = 0-3 */
 #define TRCCNTRLDVRn(n)			(0x140 + (n * 4)) /* n = 0-3 */
 #define TRCCNTCTLRn(n)			(0x150 + (n * 4)) /* n = 0-3 */
 #define TRCCNTVRn(n)			(0x160 + (n * 4)) /* n = 0-3 */
@@ -156,9 +158,22 @@
 #define CASE_WRITE(val, x)					\
 	case (x): { write_etm4x_sysreg_const_offset((val), (x)); break; }
 
-#define CASE_LIST(op, val)			\
-	CASE_##op((val), TRCPRGCTLR)		\
+#define ETE_ONLY_LIST(op, val)			\
+	CASE_##op((val), TRCRSR)		\
+	CASE_##op((val), TRCEXTINSELRn(1))	\
+	CASE_##op((val), TRCEXTINSELRn(2))	\
+	CASE_##op((val), TRCEXTINSELRn(3))
+
+#define ETM_ONLY_LIST(op, val)			\
 	CASE_##op((val), TRCPROCSELR)		\
+	CASE_##op((val), TRCVDCTLR)		\
+	CASE_##op((val), TRCVDSACCTLR)		\
+	CASE_##op((val), TRCVDARCCTLR)		\
+	CASE_##op((val), TRCITCTRL)		\
+	CASE_##op((val), TRCOSLAR)
+
+#define COMMON_LIST(op, val)		\
+	CASE_##op((val), TRCPRGCTLR)		\
 	CASE_##op((val), TRCSTATR)		\
 	CASE_##op((val), TRCCONFIGR)		\
 	CASE_##op((val), TRCAUXCTLR)		\
@@ -175,9 +190,6 @@
 	CASE_##op((val), TRCVIIECTLR)		\
 	CASE_##op((val), TRCVISSCTLR)		\
 	CASE_##op((val), TRCVIPCSSCTLR)		\
-	CASE_##op((val), TRCVDCTLR)		\
-	CASE_##op((val), TRCVDSACCTLR)		\
-	CASE_##op((val), TRCVDARCCTLR)		\
 	CASE_##op((val), TRCSEQEVRn(0))		\
 	CASE_##op((val), TRCSEQEVRn(1))		\
 	CASE_##op((val), TRCSEQEVRn(2))		\
@@ -272,7 +284,6 @@
 	CASE_##op((val), TRCSSPCICRn(5))	\
 	CASE_##op((val), TRCSSPCICRn(6))	\
 	CASE_##op((val), TRCSSPCICRn(7))	\
-	CASE_##op((val), TRCOSLAR)		\
 	CASE_##op((val), TRCOSLSR)		\
 	CASE_##op((val), TRCPDCR)		\
 	CASE_##op((val), TRCPDSR)		\
@@ -344,7 +355,6 @@
 	CASE_##op((val), TRCCIDCCTLR1)		\
 	CASE_##op((val), TRCVMIDCCTLR0)		\
 	CASE_##op((val), TRCVMIDCCTLR1)		\
-	CASE_##op((val), TRCITCTRL)		\
 	CASE_##op((val), TRCCLAIMSET)		\
 	CASE_##op((val), TRCCLAIMCLR)		\
 	CASE_##op((val), TRCDEVAFF0)		\
@@ -364,8 +374,22 @@
 	CASE_##op((val), TRCPIDR2)		\
 	CASE_##op((val), TRCPIDR3)
 
-#define ETM4x_READ_CASES(res)	CASE_LIST(READ, (res))
-#define ETM4x_WRITE_CASES(val)	CASE_LIST(WRITE, (val))
+#define ETM4x_READ_CASES(res)			\
+	COMMON_LIST(READ, (res))		\
+	ETM_ONLY_LIST(READ, (res))
+
+#define ETM4x_WRITE_CASES(res)			\
+	COMMON_LIST(WRITE, (res))		\
+	ETM_ONLY_LIST(WRITE, (res))
+
+#define ETE_READ_CASES(res)			\
+	COMMON_LIST(READ, (res))		\
+	ETE_ONLY_LIST(READ, (res))
+
+#define ETE_WRITE_CASES(res)			\
+	COMMON_LIST(WRITE, (res))		\
+	ETE_ONLY_LIST(WRITE, (res))
+
 
 #define read_etm4x_sysreg_offset(csa, offset, _64bit)				\
 	({									\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 05/11] coresight: ete: Add support for sysreg support
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

From: Suzuki K Poulose <suzuki.poulose@arm.com>

This adds sysreg support for ETE.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 39 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-etm4x.h      | 42 +++++++++++++++++-----
 2 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 0269b4c..15b6e94 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -101,6 +101,45 @@ void etm4x_sysreg_write(struct csdev_access *csa,
 	}
 }
 
+u64 ete_sysreg_read(struct csdev_access *csa,
+		      u32 offset,
+		      bool _relaxed,
+		      bool _64bit)
+{
+	u64 res = 0;
+
+	switch (offset) {
+	ETE_READ_CASES(res)
+	default :
+		WARN_ONCE(1, "ete: trying to read unsupported register @%x\n",
+			 offset);
+	}
+
+	if (!_relaxed)
+		__iormb(res);	/* Imitate the !relaxed I/O helpers */
+
+	return res;
+}
+
+void ete_sysreg_write(struct csdev_access *csa,
+			u64 val,
+			u32 offset,
+			bool _relaxed,
+			bool _64bit)
+{
+	if (!_relaxed)
+		__iowmb();	/* Imitate the !relaxed I/O helpers */
+	if (!_64bit)
+		val &= GENMASK(31, 0);
+
+	switch (offset) {
+	ETE_WRITE_CASES(val)
+	default :
+		WARN_ONCE(1, "ete: trying to write to unsupported register @%x\n",
+			offset);
+	}
+}
+
 static void etm_detect_os_lock(struct etmv4_drvdata *drvdata,
 			       struct csdev_access *csa)
 {
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 4b1bfc2..00c0367 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -28,6 +28,7 @@
 #define TRCAUXCTLR			0x018
 #define TRCEVENTCTL0R			0x020
 #define TRCEVENTCTL1R			0x024
+#define TRCRSR				0x028
 #define TRCSTALLCTLR			0x02C
 #define TRCTSCTLR			0x030
 #define TRCSYNCPR			0x034
@@ -48,6 +49,7 @@
 #define TRCSEQRSTEVR			0x118
 #define TRCSEQSTR			0x11C
 #define TRCEXTINSELR			0x120
+#define TRCEXTINSELRn(n)		(0x120 + (n * 4)) /* n = 0-3 */
 #define TRCCNTRLDVRn(n)			(0x140 + (n * 4)) /* n = 0-3 */
 #define TRCCNTCTLRn(n)			(0x150 + (n * 4)) /* n = 0-3 */
 #define TRCCNTVRn(n)			(0x160 + (n * 4)) /* n = 0-3 */
@@ -156,9 +158,22 @@
 #define CASE_WRITE(val, x)					\
 	case (x): { write_etm4x_sysreg_const_offset((val), (x)); break; }
 
-#define CASE_LIST(op, val)			\
-	CASE_##op((val), TRCPRGCTLR)		\
+#define ETE_ONLY_LIST(op, val)			\
+	CASE_##op((val), TRCRSR)		\
+	CASE_##op((val), TRCEXTINSELRn(1))	\
+	CASE_##op((val), TRCEXTINSELRn(2))	\
+	CASE_##op((val), TRCEXTINSELRn(3))
+
+#define ETM_ONLY_LIST(op, val)			\
 	CASE_##op((val), TRCPROCSELR)		\
+	CASE_##op((val), TRCVDCTLR)		\
+	CASE_##op((val), TRCVDSACCTLR)		\
+	CASE_##op((val), TRCVDARCCTLR)		\
+	CASE_##op((val), TRCITCTRL)		\
+	CASE_##op((val), TRCOSLAR)
+
+#define COMMON_LIST(op, val)		\
+	CASE_##op((val), TRCPRGCTLR)		\
 	CASE_##op((val), TRCSTATR)		\
 	CASE_##op((val), TRCCONFIGR)		\
 	CASE_##op((val), TRCAUXCTLR)		\
@@ -175,9 +190,6 @@
 	CASE_##op((val), TRCVIIECTLR)		\
 	CASE_##op((val), TRCVISSCTLR)		\
 	CASE_##op((val), TRCVIPCSSCTLR)		\
-	CASE_##op((val), TRCVDCTLR)		\
-	CASE_##op((val), TRCVDSACCTLR)		\
-	CASE_##op((val), TRCVDARCCTLR)		\
 	CASE_##op((val), TRCSEQEVRn(0))		\
 	CASE_##op((val), TRCSEQEVRn(1))		\
 	CASE_##op((val), TRCSEQEVRn(2))		\
@@ -272,7 +284,6 @@
 	CASE_##op((val), TRCSSPCICRn(5))	\
 	CASE_##op((val), TRCSSPCICRn(6))	\
 	CASE_##op((val), TRCSSPCICRn(7))	\
-	CASE_##op((val), TRCOSLAR)		\
 	CASE_##op((val), TRCOSLSR)		\
 	CASE_##op((val), TRCPDCR)		\
 	CASE_##op((val), TRCPDSR)		\
@@ -344,7 +355,6 @@
 	CASE_##op((val), TRCCIDCCTLR1)		\
 	CASE_##op((val), TRCVMIDCCTLR0)		\
 	CASE_##op((val), TRCVMIDCCTLR1)		\
-	CASE_##op((val), TRCITCTRL)		\
 	CASE_##op((val), TRCCLAIMSET)		\
 	CASE_##op((val), TRCCLAIMCLR)		\
 	CASE_##op((val), TRCDEVAFF0)		\
@@ -364,8 +374,22 @@
 	CASE_##op((val), TRCPIDR2)		\
 	CASE_##op((val), TRCPIDR3)
 
-#define ETM4x_READ_CASES(res)	CASE_LIST(READ, (res))
-#define ETM4x_WRITE_CASES(val)	CASE_LIST(WRITE, (val))
+#define ETM4x_READ_CASES(res)			\
+	COMMON_LIST(READ, (res))		\
+	ETM_ONLY_LIST(READ, (res))
+
+#define ETM4x_WRITE_CASES(res)			\
+	COMMON_LIST(WRITE, (res))		\
+	ETM_ONLY_LIST(WRITE, (res))
+
+#define ETE_READ_CASES(res)			\
+	COMMON_LIST(READ, (res))		\
+	ETE_ONLY_LIST(READ, (res))
+
+#define ETE_WRITE_CASES(res)			\
+	COMMON_LIST(WRITE, (res))		\
+	ETE_ONLY_LIST(WRITE, (res))
+
 
 #define read_etm4x_sysreg_offset(csa, offset, _64bit)				\
 	({									\
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Add ETE as one of the supported device types we support
with ETM4x driver. The devices are named following the
existing convention as ete<N>.

ETE mandates that the trace resource status register is programmed
before the tracing is turned on. For the moment simply write to
it indicating TraceActive.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 .../devicetree/bindings/arm/coresight.txt          |  3 ++
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 +++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-etm4x.h      |  7 +++
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index bff96a5..784cc1b 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -40,6 +40,9 @@ its hardware characteristcs.
 		- Embedded Trace Macrocell with system register access only.
 			"arm,coresight-etm-sysreg";
 
+		- Embedded Trace Extensions.
+			"arm,ete"
+
 		- Coresight programmable Replicator :
 			"arm,coresight-dynamic-replicator", "arm,primecell";
 
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 15b6e94..0fea349 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
 		etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR);
 	}
 
+	/*
+	 * ETE mandates that the TRCRSR is written to before
+	 * enabling it.
+	 */
+	if (drvdata->arch >= ETM_ARCH_ETE)
+		etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
+
 	/* Enable the trace unit */
 	etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
 
@@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct etmv4_drvdata *drvdata,
 	 * ETMs implementing sysreg access must implement TRCDEVARCH.
 	 */
 	devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH);
-	if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH)
+	switch (devarch & ETM_DEVARCH_ID_MASK) {
+	case ETM_DEVARCH_ETMv4x_ARCH:
+		*csa = (struct csdev_access) {
+			.io_mem	= false,
+			.read	= etm4x_sysreg_read,
+			.write	= etm4x_sysreg_write,
+		};
+		break;
+	case ETM_DEVARCH_ETE_ARCH:
+		*csa = (struct csdev_access) {
+			.io_mem	= false,
+			.read	= ete_sysreg_read,
+			.write	= ete_sysreg_write,
+		};
+		break;
+	default:
 		return false;
-	*csa = (struct csdev_access) {
-		.io_mem	= false,
-		.read	= etm4x_sysreg_read,
-		.write	= etm4x_sysreg_write,
-	};
+	}
 
 	drvdata->arch = etm_devarch_to_arch(devarch);
 	return true;
@@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	struct etmv4_drvdata *drvdata;
 	struct coresight_desc desc = { 0 };
 	struct etm_init_arg init_arg = { 0 };
+	u8 major, minor;
+	char *type_name;
 
 	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
 	if (!drvdata)
@@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	if (drvdata->cpu < 0)
 		return drvdata->cpu;
 
-	desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu);
-	if (!desc.name)
-		return -ENOMEM;
-
 	init_arg.drvdata = drvdata;
 	init_arg.csa = &desc.access;
 
@@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	if (!desc.access.io_mem ||
 	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
 		drvdata->skip_power_up = true;
+	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
+	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
+	if (drvdata->arch >= ETM_ARCH_ETE) {
+		type_name = "ete";
+		major -= 4;
+	} else {
+		type_name = "etm";
+	}
+
+	desc.name = devm_kasprintf(dev, GFP_KERNEL,
+				   "%s%d", type_name, drvdata->cpu);
+	if (!desc.name)
+		return -ENOMEM;
 
 	etm4_init_trace_id(drvdata);
 	etm4_set_default(&drvdata->config);
@@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 
 	etmdrvdata[drvdata->cpu] = drvdata;
 
-	dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n",
-		 drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch),
-		 ETM_ARCH_MINOR_VERSION(drvdata->arch));
+	dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n",
+		 drvdata->cpu, type_name, major, minor);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
 
 static const struct of_device_id etm_sysreg_match[] = {
 	{ .compatible	= "arm,coresight-etm-sysreg" },
+	{ .compatible	= "arm,ete" },
 	{}
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 00c0367..05fd0e5 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -127,6 +127,8 @@
 #define TRCCIDR2			0xFF8
 #define TRCCIDR3			0xFFC
 
+#define TRCRSR_TA			BIT(12)
+
 /*
  * System instructions to access ETM registers.
  * See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions
@@ -570,11 +572,14 @@
 	((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
 
 #define ETM_DEVARCH_ARCHID_ETMv4x		ETM_DEVARCH_MAKE_ARCHID(0x4)
+#define ETM_DEVARCH_ARCHID_ETE			ETM_DEVARCH_MAKE_ARCHID(0x5)
 
 #define ETM_DEVARCH_ID_MASK						\
 	(ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | ETM_DEVARCH_PRESENT)
 #define ETM_DEVARCH_ETMv4x_ARCH						\
 	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | ETM_DEVARCH_PRESENT)
+#define ETM_DEVARCH_ETE_ARCH						\
+	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
 
 #define TRCSTATR_IDLE_BIT		0
 #define TRCSTATR_PMSTABLE_BIT		1
@@ -661,6 +666,8 @@
 #define ETM_ARCH_MINOR_VERSION(arch)	((arch) & 0xfU)
 
 #define ETM_ARCH_V4	ETM_ARCH_VERSION(4, 0)
+#define ETM_ARCH_ETE	ETM_ARCH_VERSION(5, 0)
+
 /* Interpretation of resource numbers change at ETM v4.3 architecture */
 #define ETM_ARCH_V4_3	ETM_ARCH_VERSION(4, 3)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Add ETE as one of the supported device types we support
with ETM4x driver. The devices are named following the
existing convention as ete<N>.

ETE mandates that the trace resource status register is programmed
before the tracing is turned on. For the moment simply write to
it indicating TraceActive.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 .../devicetree/bindings/arm/coresight.txt          |  3 ++
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 +++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-etm4x.h      |  7 +++
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index bff96a5..784cc1b 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -40,6 +40,9 @@ its hardware characteristcs.
 		- Embedded Trace Macrocell with system register access only.
 			"arm,coresight-etm-sysreg";
 
+		- Embedded Trace Extensions.
+			"arm,ete"
+
 		- Coresight programmable Replicator :
 			"arm,coresight-dynamic-replicator", "arm,primecell";
 
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 15b6e94..0fea349 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
 		etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR);
 	}
 
+	/*
+	 * ETE mandates that the TRCRSR is written to before
+	 * enabling it.
+	 */
+	if (drvdata->arch >= ETM_ARCH_ETE)
+		etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
+
 	/* Enable the trace unit */
 	etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
 
@@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct etmv4_drvdata *drvdata,
 	 * ETMs implementing sysreg access must implement TRCDEVARCH.
 	 */
 	devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH);
-	if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH)
+	switch (devarch & ETM_DEVARCH_ID_MASK) {
+	case ETM_DEVARCH_ETMv4x_ARCH:
+		*csa = (struct csdev_access) {
+			.io_mem	= false,
+			.read	= etm4x_sysreg_read,
+			.write	= etm4x_sysreg_write,
+		};
+		break;
+	case ETM_DEVARCH_ETE_ARCH:
+		*csa = (struct csdev_access) {
+			.io_mem	= false,
+			.read	= ete_sysreg_read,
+			.write	= ete_sysreg_write,
+		};
+		break;
+	default:
 		return false;
-	*csa = (struct csdev_access) {
-		.io_mem	= false,
-		.read	= etm4x_sysreg_read,
-		.write	= etm4x_sysreg_write,
-	};
+	}
 
 	drvdata->arch = etm_devarch_to_arch(devarch);
 	return true;
@@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	struct etmv4_drvdata *drvdata;
 	struct coresight_desc desc = { 0 };
 	struct etm_init_arg init_arg = { 0 };
+	u8 major, minor;
+	char *type_name;
 
 	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
 	if (!drvdata)
@@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	if (drvdata->cpu < 0)
 		return drvdata->cpu;
 
-	desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu);
-	if (!desc.name)
-		return -ENOMEM;
-
 	init_arg.drvdata = drvdata;
 	init_arg.csa = &desc.access;
 
@@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 	if (!desc.access.io_mem ||
 	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
 		drvdata->skip_power_up = true;
+	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
+	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
+	if (drvdata->arch >= ETM_ARCH_ETE) {
+		type_name = "ete";
+		major -= 4;
+	} else {
+		type_name = "etm";
+	}
+
+	desc.name = devm_kasprintf(dev, GFP_KERNEL,
+				   "%s%d", type_name, drvdata->cpu);
+	if (!desc.name)
+		return -ENOMEM;
 
 	etm4_init_trace_id(drvdata);
 	etm4_set_default(&drvdata->config);
@@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem *base)
 
 	etmdrvdata[drvdata->cpu] = drvdata;
 
-	dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n",
-		 drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch),
-		 ETM_ARCH_MINOR_VERSION(drvdata->arch));
+	dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n",
+		 drvdata->cpu, type_name, major, minor);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
 
 static const struct of_device_id etm_sysreg_match[] = {
 	{ .compatible	= "arm,coresight-etm-sysreg" },
+	{ .compatible	= "arm,ete" },
 	{}
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 00c0367..05fd0e5 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -127,6 +127,8 @@
 #define TRCCIDR2			0xFF8
 #define TRCCIDR3			0xFFC
 
+#define TRCRSR_TA			BIT(12)
+
 /*
  * System instructions to access ETM registers.
  * See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions
@@ -570,11 +572,14 @@
 	((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
 
 #define ETM_DEVARCH_ARCHID_ETMv4x		ETM_DEVARCH_MAKE_ARCHID(0x4)
+#define ETM_DEVARCH_ARCHID_ETE			ETM_DEVARCH_MAKE_ARCHID(0x5)
 
 #define ETM_DEVARCH_ID_MASK						\
 	(ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | ETM_DEVARCH_PRESENT)
 #define ETM_DEVARCH_ETMv4x_ARCH						\
 	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | ETM_DEVARCH_PRESENT)
+#define ETM_DEVARCH_ETE_ARCH						\
+	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
 
 #define TRCSTATR_IDLE_BIT		0
 #define TRCSTATR_PMSTABLE_BIT		1
@@ -661,6 +666,8 @@
 #define ETM_ARCH_MINOR_VERSION(arch)	((arch) & 0xfU)
 
 #define ETM_ARCH_V4	ETM_ARCH_VERSION(4, 0)
+#define ETM_ARCH_ETE	ETM_ARCH_VERSION(5, 0)
+
 /* Interpretation of resource numbers change at ETM v4.3 architecture */
 #define ETM_ARCH_V4_3	ETM_ARCH_VERSION(4, 3)
 
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 07/11] coresight: sink: Add TRBE driver
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
accessible via the system registers. The TRBE supports different addressing
modes including CPU virtual address and buffer modes including the circular
buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
access to the trace buffer could be prohibited by a higher exception level
(EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
private interrupt (PPI) on address translation errors and when the buffer
is full. Overall implementation here is inspired from the Arm SPE driver.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 Documentation/trace/coresight/coresight-trbe.rst |  36 ++
 arch/arm64/include/asm/sysreg.h                  |   2 +
 drivers/hwtracing/coresight/Kconfig              |  11 +
 drivers/hwtracing/coresight/Makefile             |   1 +
 drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
 6 files changed, 1341 insertions(+)
 create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h

diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
new file mode 100644
index 0000000..4320a8b
--- /dev/null
+++ b/Documentation/trace/coresight/coresight-trbe.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Trace Buffer Extension (TRBE).
+==============================
+
+    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
+    :Date:     November 2020
+
+Hardware Description
+--------------------
+
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
+memory, CPU traces generated from a corresponding percpu tracing unit. This
+gets plugged in as a coresight sink device because the corresponding trace
+genarators (ETE), are plugged in as source device.
+
+Sysfs files and directories
+---------------------------
+
+The TRBE devices appear on the existing coresight bus alongside the other
+coresight devices::
+
+	>$ ls /sys/bus/coresight/devices
+	trbe0  trbe1  trbe2 trbe3
+
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
+
+	>$ ls /sys/bus/coresight/devices/trbe0/
+	irq align dbm
+
+*Key file items are:-*
+   * ``irq``: TRBE maintenance interrupt number
+   * ``align``: TRBE write pointer alignment
+   * ``dbm``: TRBE updates memory with access and dirty flags
+
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 14cb156..61136f6 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -97,6 +97,7 @@
 #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
+#define TSB_CSYNC			__emit_inst(0xd503225f)
 
 #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
 	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
@@ -865,6 +866,7 @@
 #define ID_AA64MMFR2_CNP_SHIFT		0
 
 /* id_aa64dfr0 */
+#define ID_AA64DFR0_TRBE_SHIFT		44
 #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
 #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
 #define ID_AA64DFR0_PMSVER_SHIFT	32
diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index c119824..0f5e101 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -156,6 +156,17 @@ config CORESIGHT_CTI
 	  To compile this driver as a module, choose M here: the
 	  module will be called coresight-cti.
 
+config CORESIGHT_TRBE
+	bool "Trace Buffer Extension (TRBE) driver"
+	depends on ARM64
+	help
+	  This driver provides support for percpu Trace Buffer Extension (TRBE).
+	  TRBE always needs to be used along with it's corresponding percpu ETE
+	  component. ETE generates trace data which is then captured with TRBE.
+	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
+	  system registers. But it's explicit dependency with trace unit (ETE)
+	  requires it to be plugged in as a coresight sink device.
+
 config CORESIGHT_CTI_INTEGRATION_REGS
 	bool "Access CTI CoreSight Integration Registers"
 	depends on CORESIGHT_CTI
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index f20e357..d608165 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
 obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
 obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
 obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
+obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
 coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
 		   coresight-cti-sysfs.o
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
new file mode 100644
index 0000000..48a8ec3
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -0,0 +1,766 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
+ * sink device could then pair with an appropriate per-cpu coresight source
+ * device (ETE) thus generating required trace data. Trace can be enabled
+ * via the perf framework.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define DRVNAME "arm_trbe"
+
+#define pr_fmt(fmt) DRVNAME ": " fmt
+
+#include "coresight-trbe.h"
+
+#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+
+#define ETE_IGNORE_PACKET 0x70
+
+static const char trbe_name[] = "trbe";
+
+enum trbe_fault_action {
+	TRBE_FAULT_ACT_WRAP,
+	TRBE_FAULT_ACT_SPURIOUS,
+	TRBE_FAULT_ACT_FATAL,
+};
+
+struct trbe_perf {
+	unsigned long trbe_base;
+	unsigned long trbe_limit;
+	unsigned long trbe_write;
+	pid_t pid;
+	int nr_pages;
+	void **pages;
+	bool snapshot;
+	struct trbe_cpudata *cpudata;
+};
+
+struct trbe_cpudata {
+	struct coresight_device	*csdev;
+	bool trbe_dbm;
+	u64 trbe_align;
+	int cpu;
+	enum cs_mode mode;
+	struct trbe_perf *perf;
+	struct trbe_drvdata *drvdata;
+};
+
+struct trbe_drvdata {
+	struct trbe_cpudata __percpu *cpudata;
+	struct perf_output_handle __percpu *handle;
+	struct hlist_node hotplug_node;
+	int irq;
+	cpumask_t supported_cpus;
+	enum cpuhp_state trbe_online;
+	struct platform_device *pdev;
+	struct clk *atclk;
+};
+
+static int trbe_alloc_node(struct perf_event *event)
+{
+	if (event->cpu == -1)
+		return NUMA_NO_NODE;
+	return cpu_to_node(event->cpu);
+}
+
+static void trbe_disable_and_drain_local(void)
+{
+	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
+	isb();
+	dsb(nsh);
+	asm(TSB_CSYNC);
+}
+
+static void trbe_reset_local(void)
+{
+	trbe_disable_and_drain_local();
+	write_sysreg_s(0, SYS_TRBPTR_EL1);
+	isb();
+
+	write_sysreg_s(0, SYS_TRBBASER_EL1);
+	isb();
+
+	write_sysreg_s(0, SYS_TRBSR_EL1);
+	isb();
+}
+
+static void trbe_pad_buf(struct perf_output_handle *handle, int len)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	u64 head = PERF_IDX2OFF(handle->head, perf);
+
+	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
+	if (!perf->snapshot)
+		perf_aux_output_skip(handle, len);
+}
+
+static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	u64 head = PERF_IDX2OFF(handle->head, perf);
+	u64 limit = perf->nr_pages * PAGE_SIZE;
+
+	if (head < limit >> 1)
+		limit >>= 1;
+
+	return limit;
+}
+
+static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	struct trbe_cpudata *cpudata = perf->cpudata;
+	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
+	u64 limit = bufsize;
+	u64 head, tail, wakeup;
+
+	head = PERF_IDX2OFF(handle->head, perf);
+	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
+		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
+
+		delta = min(delta, handle->size);
+		trbe_pad_buf(handle, delta);
+		head = PERF_IDX2OFF(handle->head, perf);
+	}
+
+	if (!handle->size) {
+		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+		return 0;
+	}
+
+	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
+	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
+
+	if (head < tail)
+		limit = round_down(tail, PAGE_SIZE);
+
+	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
+		limit = min(limit, round_up(wakeup, PAGE_SIZE));
+
+	if (limit > head)
+		return limit;
+
+	trbe_pad_buf(handle, handle->size);
+	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	return 0;
+}
+
+static unsigned long get_trbe_limit(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	unsigned long offset;
+
+	if (perf->snapshot)
+		offset = trbe_snapshot_offset(handle);
+	else
+		offset = trbe_normal_offset(handle);
+	return perf->trbe_base + offset;
+}
+
+static void trbe_enable_hw(struct trbe_perf *perf)
+{
+	WARN_ON(perf->trbe_write < perf->trbe_base);
+	WARN_ON(perf->trbe_write >= perf->trbe_limit);
+	set_trbe_disabled();
+	clr_trbe_irq();
+	clr_trbe_wrap();
+	clr_trbe_abort();
+	clr_trbe_ec();
+	clr_trbe_bsc();
+	clr_trbe_fsc();
+	set_trbe_virtual_mode();
+	set_trbe_fill_mode(TRBE_FILL_STOP);
+	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
+	isb();
+	set_trbe_base_pointer(perf->trbe_base);
+	set_trbe_limit_pointer(perf->trbe_limit);
+	set_trbe_write_pointer(perf->trbe_write);
+	isb();
+	dsb(ishst);
+	flush_tlb_all();
+	set_trbe_running();
+	set_trbe_enabled();
+	asm(TSB_CSYNC);
+}
+
+static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
+				   struct perf_event *event, void **pages,
+				   int nr_pages, bool snapshot)
+{
+	struct trbe_perf *perf;
+	struct page **pglist;
+	int i;
+
+	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
+		return NULL;
+
+	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
+	if (IS_ERR(perf))
+		return ERR_PTR(-ENOMEM);
+
+	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
+	if (IS_ERR(pglist)) {
+		kfree(perf);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	for (i = 0; i < nr_pages; i++)
+		pglist[i] = virt_to_page(pages[i]);
+
+	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+	if (IS_ERR((void *) perf->trbe_base)) {
+		kfree(pglist);
+		kfree(perf);
+		return ERR_PTR(perf->trbe_base);
+	}
+	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
+	perf->trbe_write = perf->trbe_base;
+	perf->pid = task_pid_nr(event->owner);
+	perf->snapshot = snapshot;
+	perf->nr_pages = nr_pages;
+	perf->pages = pages;
+	kfree(pglist);
+	return perf;
+}
+
+void arm_trbe_free_buffer(void *config)
+{
+	struct trbe_perf *perf = config;
+
+	vunmap((void *) perf->trbe_base);
+	kfree(perf);
+}
+
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
+					    struct perf_output_handle *handle,
+					    void *config)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct trbe_perf *perf = config;
+	unsigned long size, offset;
+
+	WARN_ON(perf->cpudata != cpudata);
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(cpudata->mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
+	size = offset - PERF_IDX2OFF(handle->head, perf);
+	if (perf->snapshot)
+		handle->head += size;
+	return size;
+}
+
+static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct perf_output_handle *handle = data;
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	*this_cpu_ptr(drvdata->handle) = *handle;
+	cpudata->perf = perf;
+	cpudata->mode = mode;
+	perf->cpudata = cpudata;
+	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return 0;
+	}
+	trbe_enable_hw(perf);
+	return 0;
+}
+
+static int arm_trbe_disable(struct coresight_device *csdev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct trbe_perf *perf = cpudata->perf;
+
+	WARN_ON(perf->cpudata != cpudata);
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(cpudata->mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	trbe_disable_and_drain_local();
+	perf->cpudata = NULL;
+	cpudata->perf = NULL;
+	cpudata->mode = CS_MODE_DISABLED;
+	return 0;
+}
+
+static void trbe_handle_fatal(struct perf_output_handle *handle)
+{
+	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	perf_aux_output_end(handle, 0);
+	trbe_disable_and_drain_local();
+}
+
+static void trbe_handle_spurious(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+
+	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return;
+	}
+	trbe_enable_hw(perf);
+}
+
+static void trbe_handle_overflow(struct perf_output_handle *handle)
+{
+	struct perf_event *event = handle->event;
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	unsigned long offset, size;
+	struct etm_event_data *event_data;
+
+	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
+	size = offset - PERF_IDX2OFF(handle->head, perf);
+	if (perf->snapshot)
+		handle->head = offset;
+	perf_aux_output_end(handle, size);
+
+	event_data = perf_aux_output_begin(handle, event);
+	if (!event_data) {
+		event->hw.state |= PERF_HES_STOPPED;
+		trbe_disable_and_drain_local();
+		return;
+	}
+	perf->trbe_write = perf->trbe_base;
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return;
+	}
+	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
+	trbe_enable_hw(perf);
+}
+
+static bool is_perf_trbe(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	struct trbe_cpudata *cpudata = perf->cpudata;
+	struct trbe_drvdata *drvdata = cpudata->drvdata;
+	int cpu = smp_processor_id();
+
+	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
+	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
+
+	if (cpudata->mode != CS_MODE_PERF)
+		return false;
+
+	if (cpudata->cpu != cpu)
+		return false;
+
+	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
+		return false;
+
+	return true;
+}
+
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
+{
+	enum trbe_ec ec = get_trbe_ec();
+	enum trbe_bsc bsc = get_trbe_bsc();
+
+	WARN_ON(is_trbe_running());
+	asm(TSB_CSYNC);
+	dsb(nsh);
+	isb();
+
+	if (is_trbe_trg() || is_trbe_abort())
+		return TRBE_FAULT_ACT_FATAL;
+
+	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
+		return TRBE_FAULT_ACT_FATAL;
+
+	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
+		if (get_trbe_write_pointer() == get_trbe_base_pointer())
+			return TRBE_FAULT_ACT_WRAP;
+	}
+	return TRBE_FAULT_ACT_SPURIOUS;
+}
+
+static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
+{
+	struct perf_output_handle *handle = dev;
+	enum trbe_fault_action act;
+
+	WARN_ON(!is_trbe_irq());
+	clr_trbe_irq();
+
+	if (!perf_get_aux(handle))
+		return IRQ_NONE;
+
+	if (!is_perf_trbe(handle))
+		return IRQ_NONE;
+
+	irq_work_run();
+
+	act = trbe_get_fault_act(handle);
+	switch (act) {
+	case TRBE_FAULT_ACT_WRAP:
+		trbe_handle_overflow(handle);
+		break;
+	case TRBE_FAULT_ACT_SPURIOUS:
+		trbe_handle_spurious(handle);
+		break;
+	case TRBE_FAULT_ACT_FATAL:
+		trbe_handle_fatal(handle);
+		break;
+	}
+	return IRQ_HANDLED;
+}
+
+static const struct coresight_ops_sink arm_trbe_sink_ops = {
+	.enable		= arm_trbe_enable,
+	.disable	= arm_trbe_disable,
+	.alloc_buffer	= arm_trbe_alloc_buffer,
+	.free_buffer	= arm_trbe_free_buffer,
+	.update_buffer	= arm_trbe_update_buffer,
+};
+
+static const struct coresight_ops arm_trbe_cs_ops = {
+	.sink_ops	= &arm_trbe_sink_ops,
+};
+
+static ssize_t irq_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%d\n", drvdata->irq);
+}
+static DEVICE_ATTR_RO(irq);
+
+static ssize_t align_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%s\n", trbe_buffer_align_str[ilog2(cpudata->trbe_align)]);
+}
+static DEVICE_ATTR_RO(align);
+
+static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%d\n", cpudata->trbe_dbm);
+}
+static DEVICE_ATTR_RO(dbm);
+
+static struct attribute *arm_trbe_attrs[] = {
+	&dev_attr_align.attr,
+	&dev_attr_irq.attr,
+	&dev_attr_dbm.attr,
+	NULL,
+};
+
+static const struct attribute_group arm_trbe_group = {
+	.attrs = arm_trbe_attrs,
+};
+
+static const struct attribute_group *arm_trbe_groups[] = {
+	&arm_trbe_group,
+	NULL,
+};
+
+static void arm_trbe_probe_coresight_cpu(void *info)
+{
+	struct trbe_cpudata *cpudata = info;
+	struct device *dev = &cpudata->drvdata->pdev->dev;
+	struct coresight_desc desc = { 0 };
+
+	if (WARN_ON(!cpudata))
+		goto cpu_clear;
+
+	if (!is_trbe_available()) {
+		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+
+	if (!is_trbe_programmable()) {
+		pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
+	if (IS_ERR(desc.name))
+		goto cpu_clear;
+
+	desc.type = CORESIGHT_DEV_TYPE_SINK;
+	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
+	desc.ops = &arm_trbe_cs_ops;
+	desc.pdata = dev_get_platdata(dev);
+	desc.groups = arm_trbe_groups;
+	desc.dev = dev;
+	cpudata->csdev = coresight_register(&desc);
+	if (IS_ERR(cpudata->csdev))
+		goto cpu_clear;
+
+	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
+	cpudata->trbe_dbm = get_trbe_flag_update();
+	cpudata->trbe_align = 1ULL << get_trbe_address_align();
+	if (cpudata->trbe_align > SZ_2K) {
+		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+	return;
+cpu_clear:
+	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
+}
+
+static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
+{
+	struct trbe_cpudata *cpudata;
+	int cpu;
+
+	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
+	if (IS_ERR(drvdata->cpudata))
+		return PTR_ERR(drvdata->cpudata);
+
+	for_each_cpu(cpu, &drvdata->supported_cpus) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		cpudata->cpu = cpu;
+		cpudata->drvdata = drvdata;
+		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
+	}
+	return 0;
+}
+
+static void arm_trbe_remove_coresight_cpu(void *info)
+{
+	struct trbe_drvdata *drvdata = info;
+
+	disable_percpu_irq(drvdata->irq);
+}
+
+static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
+{
+	struct trbe_cpudata *cpudata;
+	int cpu;
+
+	for_each_cpu(cpu, &drvdata->supported_cpus) {
+		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (cpudata->csdev) {
+			coresight_unregister(cpudata->csdev);
+			cpudata->drvdata = NULL;
+			cpudata->csdev = NULL;
+		}
+	}
+	free_percpu(drvdata->cpudata);
+	return 0;
+}
+
+static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
+{
+	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
+	struct trbe_cpudata *cpudata;
+
+	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (!cpudata->csdev) {
+			cpudata->drvdata = drvdata;
+			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
+		}
+		trbe_reset_local();
+		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
+	}
+	return 0;
+}
+
+static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
+	struct trbe_cpudata *cpudata;
+
+	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (cpudata->csdev) {
+			coresight_unregister(cpudata->csdev);
+			cpudata->drvdata = NULL;
+			cpudata->csdev = NULL;
+		}
+		disable_percpu_irq(drvdata->irq);
+		trbe_reset_local();
+	}
+	return 0;
+}
+
+static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
+{
+	enum cpuhp_state trbe_online;
+
+	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
+					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
+	if (trbe_online < 0)
+		return -EINVAL;
+
+	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
+		return -EINVAL;
+
+	drvdata->trbe_online = trbe_online;
+	return 0;
+}
+
+static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
+{
+	cpuhp_remove_multi_state(drvdata->trbe_online);
+}
+
+static int arm_trbe_probe_irq(struct platform_device *pdev,
+			      struct trbe_drvdata *drvdata)
+{
+	drvdata->irq = platform_get_irq(pdev, 0);
+	if (!drvdata->irq) {
+		pr_err("IRQ not found for the platform device\n");
+		return -ENXIO;
+	}
+
+	if (!irq_is_percpu(drvdata->irq)) {
+		pr_err("IRQ is not a PPI\n");
+		return -EINVAL;
+	}
+
+	if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
+		return -EINVAL;
+
+	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
+	if (!drvdata->handle)
+		return -ENOMEM;
+
+	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
+		free_percpu(drvdata->handle);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
+{
+	free_percpu_irq(drvdata->irq, drvdata->handle);
+	free_percpu(drvdata->handle);
+}
+
+static int arm_trbe_device_probe(struct platform_device *pdev)
+{
+	struct coresight_platform_data *pdata;
+	struct trbe_drvdata *drvdata;
+	struct device *dev = &pdev->dev;
+	int ret;
+
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (IS_ERR(drvdata))
+		return -ENOMEM;
+
+	pdata = coresight_get_platform_data(dev);
+	if (IS_ERR(pdata)) {
+		kfree(drvdata);
+		return -ENOMEM;
+	}
+
+	drvdata->atclk = devm_clk_get(dev, "atclk");
+	if (!IS_ERR(drvdata->atclk)) {
+		ret = clk_prepare_enable(drvdata->atclk);
+		if (ret)
+			return ret;
+	}
+	dev_set_drvdata(dev, drvdata);
+	dev->platform_data = pdata;
+	drvdata->pdev = pdev;
+	ret = arm_trbe_probe_irq(pdev, drvdata);
+	if (ret)
+		goto irq_failed;
+
+	ret = arm_trbe_probe_coresight(drvdata);
+	if (ret)
+		goto probe_failed;
+
+	ret = arm_trbe_probe_cpuhp(drvdata);
+	if (ret)
+		goto cpuhp_failed;
+
+	return 0;
+cpuhp_failed:
+	arm_trbe_remove_coresight(drvdata);
+probe_failed:
+	arm_trbe_remove_irq(drvdata);
+irq_failed:
+	kfree(pdata);
+	kfree(drvdata);
+	return ret;
+}
+
+static int arm_trbe_device_remove(struct platform_device *pdev)
+{
+	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
+	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
+
+	arm_trbe_remove_coresight(drvdata);
+	arm_trbe_remove_cpuhp(drvdata);
+	arm_trbe_remove_irq(drvdata);
+	kfree(pdata);
+	kfree(drvdata);
+	return 0;
+}
+
+#ifdef CONFIG_PM
+static int arm_trbe_runtime_suspend(struct device *dev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+
+	if (drvdata && !IS_ERR(drvdata->atclk))
+		clk_disable_unprepare(drvdata->atclk);
+
+	return 0;
+}
+
+static int arm_trbe_runtime_resume(struct device *dev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+
+	if (drvdata && !IS_ERR(drvdata->atclk))
+		clk_prepare_enable(drvdata->atclk);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
+	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
+};
+
+static const struct of_device_id arm_trbe_of_match[] = {
+	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
+	{},
+};
+MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
+
+static const struct platform_device_id arm_trbe_match[] = {
+	{ "arm,trbe", 0},
+	{ }
+};
+MODULE_DEVICE_TABLE(platform, arm_trbe_match);
+
+static struct platform_driver arm_trbe_driver = {
+	.id_table = arm_trbe_match,
+	.driver	= {
+		.name = DRVNAME,
+		.of_match_table = of_match_ptr(arm_trbe_of_match),
+		.pm = &arm_trbe_dev_pm_ops,
+		.suppress_bind_attrs = true,
+	},
+	.probe	= arm_trbe_device_probe,
+	.remove	= arm_trbe_device_remove,
+};
+builtin_platform_driver(arm_trbe_driver)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
new file mode 100644
index 0000000..82ffbfc
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trbe.h
@@ -0,0 +1,525 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This contains all required hardware related helper functions for
+ * Trace Buffer Extension (TRBE) driver in the coresight framework.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#include <linux/coresight.h>
+#include <linux/device.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/smp.h>
+
+#include "coresight-etm-perf.h"
+
+static inline bool is_trbe_available(void)
+{
+	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
+
+	return trbe >= 0b0001;
+}
+
+static inline bool is_ete_available(void)
+{
+	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
+
+	return (tracever != 0b0000);
+}
+
+static inline bool is_trbe_enabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return trblimitr & TRBLIMITR_ENABLE;
+}
+
+enum trbe_ec {
+	TRBE_EC_OTHERS		= 0,
+	TRBE_EC_STAGE1_ABORT	= 36,
+	TRBE_EC_STAGE2_ABORT	= 37,
+};
+
+static const char *const trbe_ec_str[] = {
+	[TRBE_EC_OTHERS]	= "Maintenance exception",
+	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
+	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
+};
+
+static inline enum trbe_ec get_trbe_ec(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
+}
+
+static inline void clr_trbe_ec(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_bsc {
+	TRBE_BSC_NOT_STOPPED	= 0,
+	TRBE_BSC_FILLED		= 1,
+	TRBE_BSC_TRIGGERED	= 2,
+};
+
+static const char *const trbe_bsc_str[] = {
+	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
+	[TRBE_BSC_FILLED]	= "TRBE filled",
+	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
+};
+
+static inline enum trbe_bsc get_trbe_bsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
+}
+
+static inline void clr_trbe_bsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_fsc {
+	TRBE_FSC_ASF_LEVEL0	= 0,
+	TRBE_FSC_ASF_LEVEL1	= 1,
+	TRBE_FSC_ASF_LEVEL2	= 2,
+	TRBE_FSC_ASF_LEVEL3	= 3,
+	TRBE_FSC_TF_LEVEL0	= 4,
+	TRBE_FSC_TF_LEVEL1	= 5,
+	TRBE_FSC_TF_LEVEL2	= 6,
+	TRBE_FSC_TF_LEVEL3	= 7,
+	TRBE_FSC_AFF_LEVEL0	= 8,
+	TRBE_FSC_AFF_LEVEL1	= 9,
+	TRBE_FSC_AFF_LEVEL2	= 10,
+	TRBE_FSC_AFF_LEVEL3	= 11,
+	TRBE_FSC_PF_LEVEL0	= 12,
+	TRBE_FSC_PF_LEVEL1	= 13,
+	TRBE_FSC_PF_LEVEL2	= 14,
+	TRBE_FSC_PF_LEVEL3	= 15,
+	TRBE_FSC_SEA_WRITE	= 16,
+	TRBE_FSC_ASEA_WRITE	= 17,
+	TRBE_FSC_SEA_LEVEL0	= 20,
+	TRBE_FSC_SEA_LEVEL1	= 21,
+	TRBE_FSC_SEA_LEVEL2	= 22,
+	TRBE_FSC_SEA_LEVEL3	= 23,
+	TRBE_FSC_ALIGN_FAULT	= 33,
+	TRBE_FSC_TLB_FAULT	= 48,
+	TRBE_FSC_ATOMIC_FAULT	= 49,
+};
+
+static const char *const trbe_fsc_str[] = {
+	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
+	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
+	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
+	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
+	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
+	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
+	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
+	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
+	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
+	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
+	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
+	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
+	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
+	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
+	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
+	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
+	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
+	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
+	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 0",
+	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 1",
+	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 2",
+	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 3",
+	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
+	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
+	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
+};
+
+static inline enum trbe_fsc get_trbe_fsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK;
+}
+
+static inline void clr_trbe_fsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_IRQ;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~TRBSR_IRQ;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_TRG;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_TRG;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_WRAP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_WRAP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_ABORT;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_ABORT;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline bool is_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_IRQ;
+}
+
+static inline bool is_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_TRG;
+}
+
+static inline bool is_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_WRAP;
+}
+
+static inline bool is_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_ABORT;
+}
+
+static inline bool is_trbe_running(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return !(trbsr & TRBSR_STOP);
+}
+
+static inline void set_trbe_running(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~TRBSR_STOP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_address_mode {
+	TRBE_ADDRESS_VIRTUAL,
+	TRBE_ADDRESS_PHYSICAL,
+};
+
+static const char *const trbe_address_mode_str[] = {
+	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
+	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
+};
+
+static inline bool is_trbe_virtual_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return !(trblimitr & TRBLIMITR_NVM);
+}
+
+static inline bool is_trbe_physical_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return trblimitr & TRBLIMITR_NVM;
+}
+
+static inline void set_trbe_virtual_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~TRBLIMITR_NVM;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_physical_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr |= TRBLIMITR_NVM;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+enum trbe_trig_mode {
+	TRBE_TRIGGER_STOP	= 0,
+	TRBE_TRIGGER_IRQ	= 1,
+	TRBE_TRIGGER_IGNORE	= 3,
+};
+
+static const char *const trbe_trig_mode_str[] = {
+	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
+	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
+	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
+};
+
+static inline enum trbe_trig_mode get_trbe_trig_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
+}
+
+static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
+	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+enum trbe_fill_mode {
+	TRBE_FILL_STOP		= 0,
+	TRBE_FILL_WRAP		= 1,
+	TRBE_FILL_CIRCULAR	= 3,
+};
+
+static const char *const trbe_fill_mode_str[] = {
+	[TRBE_FILL_STOP]	= "Buffer mode - stop",
+	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
+	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
+};
+
+static inline enum trbe_fill_mode get_trbe_fill_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
+}
+
+static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
+	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_disabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~TRBLIMITR_ENABLE;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_enabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr |= TRBLIMITR_ENABLE;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline bool get_trbe_flag_update(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return trbidr & TRBIDR_FLAG;
+}
+
+static inline bool is_trbe_programmable(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return !(trbidr & TRBIDR_PROG);
+}
+
+enum trbe_buffer_align {
+	TRBE_BUFFER_BYTE,
+	TRBE_BUFFER_HALF_WORD,
+	TRBE_BUFFER_WORD,
+	TRBE_BUFFER_DOUBLE_WORD,
+	TRBE_BUFFER_16_BYTES,
+	TRBE_BUFFER_32_BYTES,
+	TRBE_BUFFER_64_BYTES,
+	TRBE_BUFFER_128_BYTES,
+	TRBE_BUFFER_256_BYTES,
+	TRBE_BUFFER_512_BYTES,
+	TRBE_BUFFER_1K_BYTES,
+	TRBE_BUFFER_2K_BYTES,
+};
+
+static const char *const trbe_buffer_align_str[] = {
+	[TRBE_BUFFER_BYTE]		= "Byte",
+	[TRBE_BUFFER_HALF_WORD]		= "Half word",
+	[TRBE_BUFFER_WORD]		= "Word",
+	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
+	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
+	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
+	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
+	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
+	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
+	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
+	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
+	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
+};
+
+static inline enum trbe_buffer_align get_trbe_address_align(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
+}
+
+static inline void assert_trbe_address_mode(unsigned long addr)
+{
+	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
+	bool virt_mode = is_trbe_virtual_mode();
+
+	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
+}
+
+static inline void assert_trbe_address_align(unsigned long addr)
+{
+	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
+
+	WARN_ON(addr & (nr_bytes - 1));
+}
+
+static inline unsigned long get_trbe_write_pointer(void)
+{
+	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
+	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
+
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_write_pointer(unsigned long addr)
+{
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
+	write_sysreg_s(addr, SYS_TRBPTR_EL1);
+}
+
+static inline unsigned long get_trbe_limit_pointer(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK;
+	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
+
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_limit_pointer(unsigned long addr)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
+	trblimitr |= (addr & PAGE_MASK);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline unsigned long get_trbe_base_pointer(void)
+{
+	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
+	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK;
+
+	addr = addr << TRBBASER_BASE_SHIFT;
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_base_pointer(unsigned long addr)
+{
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	write_sysreg_s(addr, SYS_TRBBASER_EL1);
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 07/11] coresight: sink: Add TRBE driver
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
accessible via the system registers. The TRBE supports different addressing
modes including CPU virtual address and buffer modes including the circular
buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
access to the trace buffer could be prohibited by a higher exception level
(EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
private interrupt (PPI) on address translation errors and when the buffer
is full. Overall implementation here is inspired from the Arm SPE driver.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 Documentation/trace/coresight/coresight-trbe.rst |  36 ++
 arch/arm64/include/asm/sysreg.h                  |   2 +
 drivers/hwtracing/coresight/Kconfig              |  11 +
 drivers/hwtracing/coresight/Makefile             |   1 +
 drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
 6 files changed, 1341 insertions(+)
 create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h

diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
new file mode 100644
index 0000000..4320a8b
--- /dev/null
+++ b/Documentation/trace/coresight/coresight-trbe.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Trace Buffer Extension (TRBE).
+==============================
+
+    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
+    :Date:     November 2020
+
+Hardware Description
+--------------------
+
+Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
+memory, CPU traces generated from a corresponding percpu tracing unit. This
+gets plugged in as a coresight sink device because the corresponding trace
+genarators (ETE), are plugged in as source device.
+
+Sysfs files and directories
+---------------------------
+
+The TRBE devices appear on the existing coresight bus alongside the other
+coresight devices::
+
+	>$ ls /sys/bus/coresight/devices
+	trbe0  trbe1  trbe2 trbe3
+
+The ``trbe<N>`` named TRBEs are associated with a CPU.::
+
+	>$ ls /sys/bus/coresight/devices/trbe0/
+	irq align dbm
+
+*Key file items are:-*
+   * ``irq``: TRBE maintenance interrupt number
+   * ``align``: TRBE write pointer alignment
+   * ``dbm``: TRBE updates memory with access and dirty flags
+
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 14cb156..61136f6 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -97,6 +97,7 @@
 #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
+#define TSB_CSYNC			__emit_inst(0xd503225f)
 
 #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
 	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
@@ -865,6 +866,7 @@
 #define ID_AA64MMFR2_CNP_SHIFT		0
 
 /* id_aa64dfr0 */
+#define ID_AA64DFR0_TRBE_SHIFT		44
 #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
 #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
 #define ID_AA64DFR0_PMSVER_SHIFT	32
diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index c119824..0f5e101 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -156,6 +156,17 @@ config CORESIGHT_CTI
 	  To compile this driver as a module, choose M here: the
 	  module will be called coresight-cti.
 
+config CORESIGHT_TRBE
+	bool "Trace Buffer Extension (TRBE) driver"
+	depends on ARM64
+	help
+	  This driver provides support for percpu Trace Buffer Extension (TRBE).
+	  TRBE always needs to be used along with it's corresponding percpu ETE
+	  component. ETE generates trace data which is then captured with TRBE.
+	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
+	  system registers. But it's explicit dependency with trace unit (ETE)
+	  requires it to be plugged in as a coresight sink device.
+
 config CORESIGHT_CTI_INTEGRATION_REGS
 	bool "Access CTI CoreSight Integration Registers"
 	depends on CORESIGHT_CTI
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index f20e357..d608165 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
 obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
 obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
 obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
+obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
 coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
 		   coresight-cti-sysfs.o
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
new file mode 100644
index 0000000..48a8ec3
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -0,0 +1,766 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
+ * sink device could then pair with an appropriate per-cpu coresight source
+ * device (ETE) thus generating required trace data. Trace can be enabled
+ * via the perf framework.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define DRVNAME "arm_trbe"
+
+#define pr_fmt(fmt) DRVNAME ": " fmt
+
+#include "coresight-trbe.h"
+
+#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+
+#define ETE_IGNORE_PACKET 0x70
+
+static const char trbe_name[] = "trbe";
+
+enum trbe_fault_action {
+	TRBE_FAULT_ACT_WRAP,
+	TRBE_FAULT_ACT_SPURIOUS,
+	TRBE_FAULT_ACT_FATAL,
+};
+
+struct trbe_perf {
+	unsigned long trbe_base;
+	unsigned long trbe_limit;
+	unsigned long trbe_write;
+	pid_t pid;
+	int nr_pages;
+	void **pages;
+	bool snapshot;
+	struct trbe_cpudata *cpudata;
+};
+
+struct trbe_cpudata {
+	struct coresight_device	*csdev;
+	bool trbe_dbm;
+	u64 trbe_align;
+	int cpu;
+	enum cs_mode mode;
+	struct trbe_perf *perf;
+	struct trbe_drvdata *drvdata;
+};
+
+struct trbe_drvdata {
+	struct trbe_cpudata __percpu *cpudata;
+	struct perf_output_handle __percpu *handle;
+	struct hlist_node hotplug_node;
+	int irq;
+	cpumask_t supported_cpus;
+	enum cpuhp_state trbe_online;
+	struct platform_device *pdev;
+	struct clk *atclk;
+};
+
+static int trbe_alloc_node(struct perf_event *event)
+{
+	if (event->cpu == -1)
+		return NUMA_NO_NODE;
+	return cpu_to_node(event->cpu);
+}
+
+static void trbe_disable_and_drain_local(void)
+{
+	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
+	isb();
+	dsb(nsh);
+	asm(TSB_CSYNC);
+}
+
+static void trbe_reset_local(void)
+{
+	trbe_disable_and_drain_local();
+	write_sysreg_s(0, SYS_TRBPTR_EL1);
+	isb();
+
+	write_sysreg_s(0, SYS_TRBBASER_EL1);
+	isb();
+
+	write_sysreg_s(0, SYS_TRBSR_EL1);
+	isb();
+}
+
+static void trbe_pad_buf(struct perf_output_handle *handle, int len)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	u64 head = PERF_IDX2OFF(handle->head, perf);
+
+	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
+	if (!perf->snapshot)
+		perf_aux_output_skip(handle, len);
+}
+
+static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	u64 head = PERF_IDX2OFF(handle->head, perf);
+	u64 limit = perf->nr_pages * PAGE_SIZE;
+
+	if (head < limit >> 1)
+		limit >>= 1;
+
+	return limit;
+}
+
+static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	struct trbe_cpudata *cpudata = perf->cpudata;
+	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
+	u64 limit = bufsize;
+	u64 head, tail, wakeup;
+
+	head = PERF_IDX2OFF(handle->head, perf);
+	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
+		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
+
+		delta = min(delta, handle->size);
+		trbe_pad_buf(handle, delta);
+		head = PERF_IDX2OFF(handle->head, perf);
+	}
+
+	if (!handle->size) {
+		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+		return 0;
+	}
+
+	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
+	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
+
+	if (head < tail)
+		limit = round_down(tail, PAGE_SIZE);
+
+	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
+		limit = min(limit, round_up(wakeup, PAGE_SIZE));
+
+	if (limit > head)
+		return limit;
+
+	trbe_pad_buf(handle, handle->size);
+	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	return 0;
+}
+
+static unsigned long get_trbe_limit(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	unsigned long offset;
+
+	if (perf->snapshot)
+		offset = trbe_snapshot_offset(handle);
+	else
+		offset = trbe_normal_offset(handle);
+	return perf->trbe_base + offset;
+}
+
+static void trbe_enable_hw(struct trbe_perf *perf)
+{
+	WARN_ON(perf->trbe_write < perf->trbe_base);
+	WARN_ON(perf->trbe_write >= perf->trbe_limit);
+	set_trbe_disabled();
+	clr_trbe_irq();
+	clr_trbe_wrap();
+	clr_trbe_abort();
+	clr_trbe_ec();
+	clr_trbe_bsc();
+	clr_trbe_fsc();
+	set_trbe_virtual_mode();
+	set_trbe_fill_mode(TRBE_FILL_STOP);
+	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
+	isb();
+	set_trbe_base_pointer(perf->trbe_base);
+	set_trbe_limit_pointer(perf->trbe_limit);
+	set_trbe_write_pointer(perf->trbe_write);
+	isb();
+	dsb(ishst);
+	flush_tlb_all();
+	set_trbe_running();
+	set_trbe_enabled();
+	asm(TSB_CSYNC);
+}
+
+static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
+				   struct perf_event *event, void **pages,
+				   int nr_pages, bool snapshot)
+{
+	struct trbe_perf *perf;
+	struct page **pglist;
+	int i;
+
+	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
+		return NULL;
+
+	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
+	if (IS_ERR(perf))
+		return ERR_PTR(-ENOMEM);
+
+	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
+	if (IS_ERR(pglist)) {
+		kfree(perf);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	for (i = 0; i < nr_pages; i++)
+		pglist[i] = virt_to_page(pages[i]);
+
+	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+	if (IS_ERR((void *) perf->trbe_base)) {
+		kfree(pglist);
+		kfree(perf);
+		return ERR_PTR(perf->trbe_base);
+	}
+	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
+	perf->trbe_write = perf->trbe_base;
+	perf->pid = task_pid_nr(event->owner);
+	perf->snapshot = snapshot;
+	perf->nr_pages = nr_pages;
+	perf->pages = pages;
+	kfree(pglist);
+	return perf;
+}
+
+void arm_trbe_free_buffer(void *config)
+{
+	struct trbe_perf *perf = config;
+
+	vunmap((void *) perf->trbe_base);
+	kfree(perf);
+}
+
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
+					    struct perf_output_handle *handle,
+					    void *config)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct trbe_perf *perf = config;
+	unsigned long size, offset;
+
+	WARN_ON(perf->cpudata != cpudata);
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(cpudata->mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
+	size = offset - PERF_IDX2OFF(handle->head, perf);
+	if (perf->snapshot)
+		handle->head += size;
+	return size;
+}
+
+static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct perf_output_handle *handle = data;
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	*this_cpu_ptr(drvdata->handle) = *handle;
+	cpudata->perf = perf;
+	cpudata->mode = mode;
+	perf->cpudata = cpudata;
+	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return 0;
+	}
+	trbe_enable_hw(perf);
+	return 0;
+}
+
+static int arm_trbe_disable(struct coresight_device *csdev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+	struct trbe_perf *perf = cpudata->perf;
+
+	WARN_ON(perf->cpudata != cpudata);
+	WARN_ON(cpudata->cpu != smp_processor_id());
+	WARN_ON(cpudata->mode != CS_MODE_PERF);
+	WARN_ON(cpudata->drvdata != drvdata);
+
+	trbe_disable_and_drain_local();
+	perf->cpudata = NULL;
+	cpudata->perf = NULL;
+	cpudata->mode = CS_MODE_DISABLED;
+	return 0;
+}
+
+static void trbe_handle_fatal(struct perf_output_handle *handle)
+{
+	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	perf_aux_output_end(handle, 0);
+	trbe_disable_and_drain_local();
+}
+
+static void trbe_handle_spurious(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+
+	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return;
+	}
+	trbe_enable_hw(perf);
+}
+
+static void trbe_handle_overflow(struct perf_output_handle *handle)
+{
+	struct perf_event *event = handle->event;
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	unsigned long offset, size;
+	struct etm_event_data *event_data;
+
+	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
+	size = offset - PERF_IDX2OFF(handle->head, perf);
+	if (perf->snapshot)
+		handle->head = offset;
+	perf_aux_output_end(handle, size);
+
+	event_data = perf_aux_output_begin(handle, event);
+	if (!event_data) {
+		event->hw.state |= PERF_HES_STOPPED;
+		trbe_disable_and_drain_local();
+		return;
+	}
+	perf->trbe_write = perf->trbe_base;
+	perf->trbe_limit = get_trbe_limit(handle);
+	if (perf->trbe_limit == perf->trbe_base) {
+		trbe_disable_and_drain_local();
+		return;
+	}
+	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
+	trbe_enable_hw(perf);
+}
+
+static bool is_perf_trbe(struct perf_output_handle *handle)
+{
+	struct trbe_perf *perf = etm_perf_sink_config(handle);
+	struct trbe_cpudata *cpudata = perf->cpudata;
+	struct trbe_drvdata *drvdata = cpudata->drvdata;
+	int cpu = smp_processor_id();
+
+	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
+	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
+
+	if (cpudata->mode != CS_MODE_PERF)
+		return false;
+
+	if (cpudata->cpu != cpu)
+		return false;
+
+	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
+		return false;
+
+	return true;
+}
+
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
+{
+	enum trbe_ec ec = get_trbe_ec();
+	enum trbe_bsc bsc = get_trbe_bsc();
+
+	WARN_ON(is_trbe_running());
+	asm(TSB_CSYNC);
+	dsb(nsh);
+	isb();
+
+	if (is_trbe_trg() || is_trbe_abort())
+		return TRBE_FAULT_ACT_FATAL;
+
+	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
+		return TRBE_FAULT_ACT_FATAL;
+
+	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
+		if (get_trbe_write_pointer() == get_trbe_base_pointer())
+			return TRBE_FAULT_ACT_WRAP;
+	}
+	return TRBE_FAULT_ACT_SPURIOUS;
+}
+
+static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
+{
+	struct perf_output_handle *handle = dev;
+	enum trbe_fault_action act;
+
+	WARN_ON(!is_trbe_irq());
+	clr_trbe_irq();
+
+	if (!perf_get_aux(handle))
+		return IRQ_NONE;
+
+	if (!is_perf_trbe(handle))
+		return IRQ_NONE;
+
+	irq_work_run();
+
+	act = trbe_get_fault_act(handle);
+	switch (act) {
+	case TRBE_FAULT_ACT_WRAP:
+		trbe_handle_overflow(handle);
+		break;
+	case TRBE_FAULT_ACT_SPURIOUS:
+		trbe_handle_spurious(handle);
+		break;
+	case TRBE_FAULT_ACT_FATAL:
+		trbe_handle_fatal(handle);
+		break;
+	}
+	return IRQ_HANDLED;
+}
+
+static const struct coresight_ops_sink arm_trbe_sink_ops = {
+	.enable		= arm_trbe_enable,
+	.disable	= arm_trbe_disable,
+	.alloc_buffer	= arm_trbe_alloc_buffer,
+	.free_buffer	= arm_trbe_free_buffer,
+	.update_buffer	= arm_trbe_update_buffer,
+};
+
+static const struct coresight_ops arm_trbe_cs_ops = {
+	.sink_ops	= &arm_trbe_sink_ops,
+};
+
+static ssize_t irq_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%d\n", drvdata->irq);
+}
+static DEVICE_ATTR_RO(irq);
+
+static ssize_t align_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%s\n", trbe_buffer_align_str[ilog2(cpudata->trbe_align)]);
+}
+static DEVICE_ATTR_RO(align);
+
+static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%d\n", cpudata->trbe_dbm);
+}
+static DEVICE_ATTR_RO(dbm);
+
+static struct attribute *arm_trbe_attrs[] = {
+	&dev_attr_align.attr,
+	&dev_attr_irq.attr,
+	&dev_attr_dbm.attr,
+	NULL,
+};
+
+static const struct attribute_group arm_trbe_group = {
+	.attrs = arm_trbe_attrs,
+};
+
+static const struct attribute_group *arm_trbe_groups[] = {
+	&arm_trbe_group,
+	NULL,
+};
+
+static void arm_trbe_probe_coresight_cpu(void *info)
+{
+	struct trbe_cpudata *cpudata = info;
+	struct device *dev = &cpudata->drvdata->pdev->dev;
+	struct coresight_desc desc = { 0 };
+
+	if (WARN_ON(!cpudata))
+		goto cpu_clear;
+
+	if (!is_trbe_available()) {
+		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+
+	if (!is_trbe_programmable()) {
+		pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
+	if (IS_ERR(desc.name))
+		goto cpu_clear;
+
+	desc.type = CORESIGHT_DEV_TYPE_SINK;
+	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
+	desc.ops = &arm_trbe_cs_ops;
+	desc.pdata = dev_get_platdata(dev);
+	desc.groups = arm_trbe_groups;
+	desc.dev = dev;
+	cpudata->csdev = coresight_register(&desc);
+	if (IS_ERR(cpudata->csdev))
+		goto cpu_clear;
+
+	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
+	cpudata->trbe_dbm = get_trbe_flag_update();
+	cpudata->trbe_align = 1ULL << get_trbe_address_align();
+	if (cpudata->trbe_align > SZ_2K) {
+		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
+		goto cpu_clear;
+	}
+	return;
+cpu_clear:
+	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
+}
+
+static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
+{
+	struct trbe_cpudata *cpudata;
+	int cpu;
+
+	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
+	if (IS_ERR(drvdata->cpudata))
+		return PTR_ERR(drvdata->cpudata);
+
+	for_each_cpu(cpu, &drvdata->supported_cpus) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		cpudata->cpu = cpu;
+		cpudata->drvdata = drvdata;
+		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
+	}
+	return 0;
+}
+
+static void arm_trbe_remove_coresight_cpu(void *info)
+{
+	struct trbe_drvdata *drvdata = info;
+
+	disable_percpu_irq(drvdata->irq);
+}
+
+static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
+{
+	struct trbe_cpudata *cpudata;
+	int cpu;
+
+	for_each_cpu(cpu, &drvdata->supported_cpus) {
+		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (cpudata->csdev) {
+			coresight_unregister(cpudata->csdev);
+			cpudata->drvdata = NULL;
+			cpudata->csdev = NULL;
+		}
+	}
+	free_percpu(drvdata->cpudata);
+	return 0;
+}
+
+static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
+{
+	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
+	struct trbe_cpudata *cpudata;
+
+	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (!cpudata->csdev) {
+			cpudata->drvdata = drvdata;
+			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
+		}
+		trbe_reset_local();
+		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
+	}
+	return 0;
+}
+
+static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
+	struct trbe_cpudata *cpudata;
+
+	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
+		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		if (cpudata->csdev) {
+			coresight_unregister(cpudata->csdev);
+			cpudata->drvdata = NULL;
+			cpudata->csdev = NULL;
+		}
+		disable_percpu_irq(drvdata->irq);
+		trbe_reset_local();
+	}
+	return 0;
+}
+
+static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
+{
+	enum cpuhp_state trbe_online;
+
+	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
+					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
+	if (trbe_online < 0)
+		return -EINVAL;
+
+	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
+		return -EINVAL;
+
+	drvdata->trbe_online = trbe_online;
+	return 0;
+}
+
+static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
+{
+	cpuhp_remove_multi_state(drvdata->trbe_online);
+}
+
+static int arm_trbe_probe_irq(struct platform_device *pdev,
+			      struct trbe_drvdata *drvdata)
+{
+	drvdata->irq = platform_get_irq(pdev, 0);
+	if (!drvdata->irq) {
+		pr_err("IRQ not found for the platform device\n");
+		return -ENXIO;
+	}
+
+	if (!irq_is_percpu(drvdata->irq)) {
+		pr_err("IRQ is not a PPI\n");
+		return -EINVAL;
+	}
+
+	if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
+		return -EINVAL;
+
+	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
+	if (!drvdata->handle)
+		return -ENOMEM;
+
+	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
+		free_percpu(drvdata->handle);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
+{
+	free_percpu_irq(drvdata->irq, drvdata->handle);
+	free_percpu(drvdata->handle);
+}
+
+static int arm_trbe_device_probe(struct platform_device *pdev)
+{
+	struct coresight_platform_data *pdata;
+	struct trbe_drvdata *drvdata;
+	struct device *dev = &pdev->dev;
+	int ret;
+
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (IS_ERR(drvdata))
+		return -ENOMEM;
+
+	pdata = coresight_get_platform_data(dev);
+	if (IS_ERR(pdata)) {
+		kfree(drvdata);
+		return -ENOMEM;
+	}
+
+	drvdata->atclk = devm_clk_get(dev, "atclk");
+	if (!IS_ERR(drvdata->atclk)) {
+		ret = clk_prepare_enable(drvdata->atclk);
+		if (ret)
+			return ret;
+	}
+	dev_set_drvdata(dev, drvdata);
+	dev->platform_data = pdata;
+	drvdata->pdev = pdev;
+	ret = arm_trbe_probe_irq(pdev, drvdata);
+	if (ret)
+		goto irq_failed;
+
+	ret = arm_trbe_probe_coresight(drvdata);
+	if (ret)
+		goto probe_failed;
+
+	ret = arm_trbe_probe_cpuhp(drvdata);
+	if (ret)
+		goto cpuhp_failed;
+
+	return 0;
+cpuhp_failed:
+	arm_trbe_remove_coresight(drvdata);
+probe_failed:
+	arm_trbe_remove_irq(drvdata);
+irq_failed:
+	kfree(pdata);
+	kfree(drvdata);
+	return ret;
+}
+
+static int arm_trbe_device_remove(struct platform_device *pdev)
+{
+	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
+	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
+
+	arm_trbe_remove_coresight(drvdata);
+	arm_trbe_remove_cpuhp(drvdata);
+	arm_trbe_remove_irq(drvdata);
+	kfree(pdata);
+	kfree(drvdata);
+	return 0;
+}
+
+#ifdef CONFIG_PM
+static int arm_trbe_runtime_suspend(struct device *dev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+
+	if (drvdata && !IS_ERR(drvdata->atclk))
+		clk_disable_unprepare(drvdata->atclk);
+
+	return 0;
+}
+
+static int arm_trbe_runtime_resume(struct device *dev)
+{
+	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
+
+	if (drvdata && !IS_ERR(drvdata->atclk))
+		clk_prepare_enable(drvdata->atclk);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
+	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
+};
+
+static const struct of_device_id arm_trbe_of_match[] = {
+	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
+	{},
+};
+MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
+
+static const struct platform_device_id arm_trbe_match[] = {
+	{ "arm,trbe", 0},
+	{ }
+};
+MODULE_DEVICE_TABLE(platform, arm_trbe_match);
+
+static struct platform_driver arm_trbe_driver = {
+	.id_table = arm_trbe_match,
+	.driver	= {
+		.name = DRVNAME,
+		.of_match_table = of_match_ptr(arm_trbe_of_match),
+		.pm = &arm_trbe_dev_pm_ops,
+		.suppress_bind_attrs = true,
+	},
+	.probe	= arm_trbe_device_probe,
+	.remove	= arm_trbe_device_remove,
+};
+builtin_platform_driver(arm_trbe_driver)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
new file mode 100644
index 0000000..82ffbfc
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trbe.h
@@ -0,0 +1,525 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This contains all required hardware related helper functions for
+ * Trace Buffer Extension (TRBE) driver in the coresight framework.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#include <linux/coresight.h>
+#include <linux/device.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/smp.h>
+
+#include "coresight-etm-perf.h"
+
+static inline bool is_trbe_available(void)
+{
+	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
+
+	return trbe >= 0b0001;
+}
+
+static inline bool is_ete_available(void)
+{
+	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
+
+	return (tracever != 0b0000);
+}
+
+static inline bool is_trbe_enabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return trblimitr & TRBLIMITR_ENABLE;
+}
+
+enum trbe_ec {
+	TRBE_EC_OTHERS		= 0,
+	TRBE_EC_STAGE1_ABORT	= 36,
+	TRBE_EC_STAGE2_ABORT	= 37,
+};
+
+static const char *const trbe_ec_str[] = {
+	[TRBE_EC_OTHERS]	= "Maintenance exception",
+	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
+	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
+};
+
+static inline enum trbe_ec get_trbe_ec(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
+}
+
+static inline void clr_trbe_ec(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_bsc {
+	TRBE_BSC_NOT_STOPPED	= 0,
+	TRBE_BSC_FILLED		= 1,
+	TRBE_BSC_TRIGGERED	= 2,
+};
+
+static const char *const trbe_bsc_str[] = {
+	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
+	[TRBE_BSC_FILLED]	= "TRBE filled",
+	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
+};
+
+static inline enum trbe_bsc get_trbe_bsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
+}
+
+static inline void clr_trbe_bsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_fsc {
+	TRBE_FSC_ASF_LEVEL0	= 0,
+	TRBE_FSC_ASF_LEVEL1	= 1,
+	TRBE_FSC_ASF_LEVEL2	= 2,
+	TRBE_FSC_ASF_LEVEL3	= 3,
+	TRBE_FSC_TF_LEVEL0	= 4,
+	TRBE_FSC_TF_LEVEL1	= 5,
+	TRBE_FSC_TF_LEVEL2	= 6,
+	TRBE_FSC_TF_LEVEL3	= 7,
+	TRBE_FSC_AFF_LEVEL0	= 8,
+	TRBE_FSC_AFF_LEVEL1	= 9,
+	TRBE_FSC_AFF_LEVEL2	= 10,
+	TRBE_FSC_AFF_LEVEL3	= 11,
+	TRBE_FSC_PF_LEVEL0	= 12,
+	TRBE_FSC_PF_LEVEL1	= 13,
+	TRBE_FSC_PF_LEVEL2	= 14,
+	TRBE_FSC_PF_LEVEL3	= 15,
+	TRBE_FSC_SEA_WRITE	= 16,
+	TRBE_FSC_ASEA_WRITE	= 17,
+	TRBE_FSC_SEA_LEVEL0	= 20,
+	TRBE_FSC_SEA_LEVEL1	= 21,
+	TRBE_FSC_SEA_LEVEL2	= 22,
+	TRBE_FSC_SEA_LEVEL3	= 23,
+	TRBE_FSC_ALIGN_FAULT	= 33,
+	TRBE_FSC_TLB_FAULT	= 48,
+	TRBE_FSC_ATOMIC_FAULT	= 49,
+};
+
+static const char *const trbe_fsc_str[] = {
+	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
+	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
+	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
+	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
+	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
+	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
+	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
+	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
+	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
+	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
+	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
+	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
+	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
+	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
+	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
+	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
+	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
+	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
+	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 0",
+	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 1",
+	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 2",
+	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 3",
+	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
+	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
+	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
+};
+
+static inline enum trbe_fsc get_trbe_fsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK;
+}
+
+static inline void clr_trbe_fsc(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT);
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_IRQ;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~TRBSR_IRQ;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_TRG;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_TRG;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_WRAP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_WRAP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void set_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr |= TRBSR_ABORT;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline void clr_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	trbsr &= ~TRBSR_ABORT;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+static inline bool is_trbe_irq(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_IRQ;
+}
+
+static inline bool is_trbe_trg(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_TRG;
+}
+
+static inline bool is_trbe_wrap(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_WRAP;
+}
+
+static inline bool is_trbe_abort(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return trbsr & TRBSR_ABORT;
+}
+
+static inline bool is_trbe_running(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	return !(trbsr & TRBSR_STOP);
+}
+
+static inline void set_trbe_running(void)
+{
+	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
+
+	trbsr &= ~TRBSR_STOP;
+	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
+}
+
+enum trbe_address_mode {
+	TRBE_ADDRESS_VIRTUAL,
+	TRBE_ADDRESS_PHYSICAL,
+};
+
+static const char *const trbe_address_mode_str[] = {
+	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
+	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
+};
+
+static inline bool is_trbe_virtual_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return !(trblimitr & TRBLIMITR_NVM);
+}
+
+static inline bool is_trbe_physical_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return trblimitr & TRBLIMITR_NVM;
+}
+
+static inline void set_trbe_virtual_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~TRBLIMITR_NVM;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_physical_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr |= TRBLIMITR_NVM;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+enum trbe_trig_mode {
+	TRBE_TRIGGER_STOP	= 0,
+	TRBE_TRIGGER_IRQ	= 1,
+	TRBE_TRIGGER_IGNORE	= 3,
+};
+
+static const char *const trbe_trig_mode_str[] = {
+	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
+	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
+	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
+};
+
+static inline enum trbe_trig_mode get_trbe_trig_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
+}
+
+static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
+	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+enum trbe_fill_mode {
+	TRBE_FILL_STOP		= 0,
+	TRBE_FILL_WRAP		= 1,
+	TRBE_FILL_CIRCULAR	= 3,
+};
+
+static const char *const trbe_fill_mode_str[] = {
+	[TRBE_FILL_STOP]	= "Buffer mode - stop",
+	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
+	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
+};
+
+static inline enum trbe_fill_mode get_trbe_fill_mode(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
+}
+
+static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
+	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_disabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr &= ~TRBLIMITR_ENABLE;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline void set_trbe_enabled(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	trblimitr |= TRBLIMITR_ENABLE;
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline bool get_trbe_flag_update(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return trbidr & TRBIDR_FLAG;
+}
+
+static inline bool is_trbe_programmable(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return !(trbidr & TRBIDR_PROG);
+}
+
+enum trbe_buffer_align {
+	TRBE_BUFFER_BYTE,
+	TRBE_BUFFER_HALF_WORD,
+	TRBE_BUFFER_WORD,
+	TRBE_BUFFER_DOUBLE_WORD,
+	TRBE_BUFFER_16_BYTES,
+	TRBE_BUFFER_32_BYTES,
+	TRBE_BUFFER_64_BYTES,
+	TRBE_BUFFER_128_BYTES,
+	TRBE_BUFFER_256_BYTES,
+	TRBE_BUFFER_512_BYTES,
+	TRBE_BUFFER_1K_BYTES,
+	TRBE_BUFFER_2K_BYTES,
+};
+
+static const char *const trbe_buffer_align_str[] = {
+	[TRBE_BUFFER_BYTE]		= "Byte",
+	[TRBE_BUFFER_HALF_WORD]		= "Half word",
+	[TRBE_BUFFER_WORD]		= "Word",
+	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
+	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
+	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
+	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
+	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
+	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
+	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
+	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
+	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
+};
+
+static inline enum trbe_buffer_align get_trbe_address_align(void)
+{
+	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
+
+	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
+}
+
+static inline void assert_trbe_address_mode(unsigned long addr)
+{
+	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
+	bool virt_mode = is_trbe_virtual_mode();
+
+	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
+}
+
+static inline void assert_trbe_address_align(unsigned long addr)
+{
+	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
+
+	WARN_ON(addr & (nr_bytes - 1));
+}
+
+static inline unsigned long get_trbe_write_pointer(void)
+{
+	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
+	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
+
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_write_pointer(unsigned long addr)
+{
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
+	write_sysreg_s(addr, SYS_TRBPTR_EL1);
+}
+
+static inline unsigned long get_trbe_limit_pointer(void)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK;
+	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
+
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_limit_pointer(unsigned long addr)
+{
+	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
+
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
+	trblimitr |= (addr & PAGE_MASK);
+	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
+}
+
+static inline unsigned long get_trbe_base_pointer(void)
+{
+	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
+	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK;
+
+	addr = addr << TRBBASER_BASE_SHIFT;
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	return addr;
+}
+
+static inline void set_trbe_base_pointer(unsigned long addr)
+{
+	WARN_ON(is_trbe_enabled());
+	assert_trbe_address_mode(addr);
+	assert_trbe_address_align(addr);
+	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
+	WARN_ON(addr & (PAGE_SIZE - 1));
+	write_sysreg_s(addr, SYS_TRBBASER_EL1);
+}
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 08/11] coresight: etm-perf: Truncate the perf record if handle has no space
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

While starting off the etm event, just abort and truncate the perf record
if the perf handle as no space left. This avoids configuring both source
and sink devices in case the data cannot be consumed in perf.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index ea73cfa..534e205 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -347,6 +347,9 @@ static void etm_event_start(struct perf_event *event, int flags)
 	if (!event_data)
 		goto fail;
 
+	if (!handle->size)
+		goto fail_end_stop;
+
 	/*
 	 * Check if this ETM is allowed to trace, as decided
 	 * at etm_setup_aux(). This could be due to an unreachable
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 08/11] coresight: etm-perf: Truncate the perf record if handle has no space
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

While starting off the etm event, just abort and truncate the perf record
if the perf handle as no space left. This avoids configuring both source
and sink devices in case the data cannot be consumed in perf.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index ea73cfa..534e205 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -347,6 +347,9 @@ static void etm_event_start(struct perf_event *event, int flags)
 	if (!event_data)
 		goto fail;
 
+	if (!handle->size)
+		goto fail_end_stop;
+
 	/*
 	 * Check if this ETM is allowed to trace, as decided
 	 * at etm_setup_aux(). This could be due to an unreachable
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

perf handle structure needs to be shared with the TRBE IRQ handler for
capturing trace data and restarting the handle. There is a probability
of an undefined reference based crash when etm event is being stopped
while a TRBE IRQ also getting processed. This happens due the release
of perf handle via perf_aux_output_end(). This stops the sinks via the
link before releasing the handle, which will ensure that a simultaneous
TRBE IRQ could not happen.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This might cause problem with traditional sink devices which can be
operated in both sysfs and perf mode. This needs to be addressed
correctly. One option would be to move the update_buffer callback
into the respective sink devices. e.g, disable().

 drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 534e205..1a37991 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
 
 		size = sink_ops(sink)->update_buffer(sink, handle,
 					      event_data->snk_config);
+		coresight_disable_path(path);
 		perf_aux_output_end(handle, size);
+		return;
 	}
 
 	/* Disabling the path make its elements available to other sessions */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

perf handle structure needs to be shared with the TRBE IRQ handler for
capturing trace data and restarting the handle. There is a probability
of an undefined reference based crash when etm event is being stopped
while a TRBE IRQ also getting processed. This happens due the release
of perf handle via perf_aux_output_end(). This stops the sinks via the
link before releasing the handle, which will ensure that a simultaneous
TRBE IRQ could not happen.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This might cause problem with traditional sink devices which can be
operated in both sysfs and perf mode. This needs to be addressed
correctly. One option would be to move the update_buffer callback
into the respective sink devices. e.g, disable().

 drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 534e205..1a37991 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
 
 		size = sink_ops(sink)->update_buffer(sink, handle,
 					      event_data->snk_config);
+		coresight_disable_path(path);
 		perf_aux_output_end(handle, size);
+		return;
 	}
 
 	/* Disabling the path make its elements available to other sessions */
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

Unlike traditional sink devices, individual TRBE instances are not detected
via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
process. Hence a path connecting ETE and TRBE on a given CPU would not have
been established until then. This adds two coresight helpers that will help
modify outward connections from a source device to establish and terminate
path to a given sink device. But this method might not be optimal and would
be reworked later.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-etm-perf.h |  4 ++++
 drivers/hwtracing/coresight/coresight-platform.c |  3 ++-
 drivers/hwtracing/coresight/coresight-trbe.c     |  2 ++
 include/linux/coresight.h                        |  2 ++
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1a37991..b4ab1d4 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -664,3 +664,33 @@ void __exit etm_perf_exit(void)
 {
 	perf_pmu_unregister(&etm_pmu);
 }
+
+#ifdef CONFIG_CORESIGHT_TRBE
+void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu)
+{
+	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
+
+	if (!csdev_ete) {
+		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
+		return;
+	}
+	csdev_ete->def_sink = csdev_trbe;
+	csdev_ete->pdata->nr_outport++;
+	if (!csdev_ete->pdata->conns)
+		coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata);
+	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe;
+}
+
+void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu)
+{
+	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
+
+	if (!csdev_ete) {
+		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
+		return;
+	}
+	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL;
+	csdev_ete->def_sink = NULL;
+	csdev_ete->pdata->nr_outport--;
+}
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
index 3e4f2ad..20386cf 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.h
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
@@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle)
 int __init etm_perf_init(void);
 void __exit etm_perf_exit(void);
 
+#ifdef CONFIG_CORESIGHT_TRBE
+void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu);
+void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu);
+#endif
 #endif
diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
index c594f45..8fa7406 100644
--- a/drivers/hwtracing/coresight/coresight-platform.c
+++ b/drivers/hwtracing/coresight/coresight-platform.c
@@ -23,7 +23,7 @@
  * coresight_alloc_conns: Allocate connections record for each output
  * port from the device.
  */
-static int coresight_alloc_conns(struct device *dev,
+int coresight_alloc_conns(struct device *dev,
 				 struct coresight_platform_data *pdata)
 {
 	if (pdata->nr_outport) {
@@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(coresight_alloc_conns);
 
 static struct device *
 coresight_find_device_by_fwnode(struct fwnode_handle *fwnode)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 48a8ec3..afd1a1c 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info)
 	if (IS_ERR(cpudata->csdev))
 		goto cpu_clear;
 
+	coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu);
 	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
 	cpudata->trbe_dbm = get_trbe_flag_update();
 	cpudata->trbe_align = 1ULL << get_trbe_address_align();
@@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 
 	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
 		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		coresight_trbe_remove_ete(cpudata->csdev, cpu);
 		if (cpudata->csdev) {
 			coresight_unregister(cpudata->csdev);
 			cpudata->drvdata = NULL;
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index c2d0a2a..c657813 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev,
 			       u64 val, u32 offset);
 void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset);
 
+int coresight_alloc_conns(struct device *dev,
+			  struct coresight_platform_data *pdata);
 
 #else
 static inline struct coresight_device *
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

Unlike traditional sink devices, individual TRBE instances are not detected
via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
process. Hence a path connecting ETE and TRBE on a given CPU would not have
been established until then. This adds two coresight helpers that will help
modify outward connections from a source device to establish and terminate
path to a given sink device. But this method might not be optimal and would
be reworked later.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-etm-perf.h |  4 ++++
 drivers/hwtracing/coresight/coresight-platform.c |  3 ++-
 drivers/hwtracing/coresight/coresight-trbe.c     |  2 ++
 include/linux/coresight.h                        |  2 ++
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1a37991..b4ab1d4 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -664,3 +664,33 @@ void __exit etm_perf_exit(void)
 {
 	perf_pmu_unregister(&etm_pmu);
 }
+
+#ifdef CONFIG_CORESIGHT_TRBE
+void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu)
+{
+	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
+
+	if (!csdev_ete) {
+		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
+		return;
+	}
+	csdev_ete->def_sink = csdev_trbe;
+	csdev_ete->pdata->nr_outport++;
+	if (!csdev_ete->pdata->conns)
+		coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata);
+	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe;
+}
+
+void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu)
+{
+	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
+
+	if (!csdev_ete) {
+		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
+		return;
+	}
+	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL;
+	csdev_ete->def_sink = NULL;
+	csdev_ete->pdata->nr_outport--;
+}
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
index 3e4f2ad..20386cf 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.h
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
@@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle)
 int __init etm_perf_init(void);
 void __exit etm_perf_exit(void);
 
+#ifdef CONFIG_CORESIGHT_TRBE
+void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu);
+void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu);
+#endif
 #endif
diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
index c594f45..8fa7406 100644
--- a/drivers/hwtracing/coresight/coresight-platform.c
+++ b/drivers/hwtracing/coresight/coresight-platform.c
@@ -23,7 +23,7 @@
  * coresight_alloc_conns: Allocate connections record for each output
  * port from the device.
  */
-static int coresight_alloc_conns(struct device *dev,
+int coresight_alloc_conns(struct device *dev,
 				 struct coresight_platform_data *pdata)
 {
 	if (pdata->nr_outport) {
@@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(coresight_alloc_conns);
 
 static struct device *
 coresight_find_device_by_fwnode(struct fwnode_handle *fwnode)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 48a8ec3..afd1a1c 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info)
 	if (IS_ERR(cpudata->csdev))
 		goto cpu_clear;
 
+	coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu);
 	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
 	cpudata->trbe_dbm = get_trbe_flag_update();
 	cpudata->trbe_align = 1ULL << get_trbe_address_align();
@@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 
 	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
 		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
+		coresight_trbe_remove_ete(cpudata->csdev, cpu);
 		if (cpudata->csdev) {
 			coresight_unregister(cpudata->csdev);
 			cpudata->drvdata = NULL;
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index c2d0a2a..c657813 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev,
 			       u64 val, u32 offset);
 void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset);
 
+int coresight_alloc_conns(struct device *dev,
+			  struct coresight_platform_data *pdata);
 
 #else
 static inline struct coresight_device *
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 11/11] dts: bindings: Document device tree binding for Arm TRBE
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 12:45   ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: linux-kernel, suzuki.poulose, mathieu.poirier, mike.leach,
	Anshuman Khandual

This patch documents the device tree binding in use for Arm TRBE.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 Documentation/devicetree/bindings/arm/trbe.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt

diff --git a/Documentation/devicetree/bindings/arm/trbe.txt b/Documentation/devicetree/bindings/arm/trbe.txt
new file mode 100644
index 0000000..4bb5b09
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/trbe.txt
@@ -0,0 +1,20 @@
+* Trace Buffer Extension (TRBE)
+
+Trace Buffer Extension (TRBE) is used for collecting trace data generated
+from a corresponding trace unit (ETE) using an in memory trace buffer.
+
+** TRBE Required properties:
+
+- compatible : should be one of:
+	       "arm,arm-trbe"
+
+- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
+	       TRBE is only supported on a subset of the CPUs, please consult
+	       the arm,gic-v3 binding for details on describing a PPI partition.
+
+** Example:
+
+trbe {
+	compatible = "arm,arm-trbe";
+	interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>;
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC 11/11] dts: bindings: Document device tree binding for Arm TRBE
@ 2020-11-10 12:45   ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-10 12:45 UTC (permalink / raw)
  To: linux-arm-kernel, coresight
  Cc: mike.leach, Anshuman Khandual, linux-kernel, mathieu.poirier,
	suzuki.poulose

This patch documents the device tree binding in use for Arm TRBE.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 Documentation/devicetree/bindings/arm/trbe.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt

diff --git a/Documentation/devicetree/bindings/arm/trbe.txt b/Documentation/devicetree/bindings/arm/trbe.txt
new file mode 100644
index 0000000..4bb5b09
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/trbe.txt
@@ -0,0 +1,20 @@
+* Trace Buffer Extension (TRBE)
+
+Trace Buffer Extension (TRBE) is used for collecting trace data generated
+from a corresponding trace unit (ETE) using an in memory trace buffer.
+
+** TRBE Required properties:
+
+- compatible : should be one of:
+	       "arm,arm-trbe"
+
+- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
+	       TRBE is only supported on a subset of the CPUs, please consult
+	       the arm,gic-v3 binding for details on describing a PPI partition.
+
+** Example:
+
+trbe {
+	compatible = "arm,arm-trbe";
+	interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>;
+};
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-10 18:25   ` Mathieu Poirier
  -1 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-11-10 18:25 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Coresight ML, Linux Kernel Mailing List,
	Suzuki K. Poulose, Mike Leach

Hi Anshuman,

On Tue, 10 Nov 2020 at 05:45, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
>
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
>
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
>
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.
>
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
>
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
>
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
>
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
>
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
>
> [0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
>
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
>
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
>
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)

This is to confirm that I have received your work and it is now on my
list of patchset to review.  However doing so likely won't happen
before a couple of weeks because of patchsets already in the queue.  I
will touch base with you again if there are further delays.

Thanks,
Mathieu

>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-10 18:25   ` Mathieu Poirier
  0 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-11-10 18:25 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Coresight ML, Mike Leach, Linux Kernel Mailing List,
	linux-arm-kernel, Suzuki K. Poulose

Hi Anshuman,

On Tue, 10 Nov 2020 at 05:45, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
>
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
>
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
>
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.
>
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
>
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
>
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
>
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
>
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
>
> [0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
>
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
>
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
>
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)

This is to confirm that I have received your work and it is now on my
list of patchset to review.  However doing so likely won't happen
before a couple of weeks because of patchsets already in the queue.  I
will touch base with you again if there are further delays.

Thanks,
Mathieu

>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>
> --
> 2.7.4
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-12  9:21     ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:21 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach, Linu Cherian, Linu Cherian

Hi Linu,

Please could you test this slightly modified version and give us
a Tested-by tag if you are happy with the results ?

Suzuki


On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> When there are multiple sinks on the system, in the absence
> of a specified sink, it is quite possible that a default sink
> for an ETM could be different from that of another ETM. However
> we do not support having multiple sinks for an event yet. This
> patch allows the event to use the default sinks on the ETMs
> where they are scheduled as long as the sinks are of the same
> type.
> 
> e.g, if we have 1x1 topology with per-CPU ETRs, the event can
> use the per-CPU ETR for the session. However, if the sinks
> are of different type, e.g TMC-ETR on one and a custom sink
> on another, the event will only trace on the first detected
> sink.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
>   1 file changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index c2c9b12..ea73cfa 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
>   	schedule_work(&event_data->work);
>   }
>   
> +static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
> +{
> +	if (!a || !b)
> +		return false;
> +	return (sink_ops(a) == sink_ops(b));
> +}
> +
>   static void *etm_setup_aux(struct perf_event *event, void **pages,
>   			   int nr_pages, bool overwrite)
>   {
>   	u32 id;
>   	int cpu = event->cpu;
>   	cpumask_t *mask;
> -	struct coresight_device *sink;
> +	struct coresight_device *sink = NULL;
>   	struct etm_event_data *event_data = NULL;
> +	bool sink_forced = false;
>   
>   	event_data = alloc_event_data(cpu);
>   	if (!event_data)
> @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	if (event->attr.config2) {
>   		id = (u32)event->attr.config2;
>   		sink = coresight_get_sink_by_id(id);
> +		sink_forced = true;
>   	}
>   
>   	mask = &event_data->mask;
> @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	 */
>   	for_each_cpu(cpu, mask) {
>   		struct list_head *path;
> -		struct coresight_device *csdev;
> +		struct coresight_device *csdev, *new_sink;
>   
>   		csdev = per_cpu(csdev_src, cpu);
>   		/*
> @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   		}
>   
>   		/*
> -		 * No sink provided - look for a default sink for one of the
> -		 * devices. At present we only support topology where all CPUs
> -		 * use the same sink [N:1], so only need to find one sink. The
> -		 * coresight_build_path later will remove any CPU that does not
> -		 * attach to the sink, or if we have not found a sink.
> +		 * No sink provided - look for a default sink for all the devices.
> +		 * We only support multiple sinks, only if all the default sinks
> +		 * are of the same type, so that the sink buffer can be shared
> +		 * as the event moves around. We don't trace on a CPU if it can't
> +		 *
>   		 */
> -		if (!sink)
> -			sink = coresight_find_default_sink(csdev);
> +		if (!sink_forced) {
> +			new_sink = coresight_find_default_sink(csdev);
> +			if (!new_sink) {
> +				cpumask_clear_cpu(cpu, mask);
> +				continue;
> +			}
> +			/* Skip checks for the first sink */
> +			if (!sink) {
> +				sink = new_sink;
> +			} else if (!sinks_match(new_sink, sink)) {
> +				cpumask_clear_cpu(cpu, mask);
> +				continue;
> +			}
> +		} else {
> +			new_sink = sink;
> +		}
>   
>   		/*
>   		 * Building a path doesn't enable it, it simply builds a
>   		 * list of devices from source to sink that can be
>   		 * referenced later when the path is actually needed.
>   		 */
> -		path = coresight_build_path(csdev, sink);
> +		path = coresight_build_path(csdev, new_sink);
>   		if (IS_ERR(path)) {
>   			cpumask_clear_cpu(cpu, mask);
>   			continue;
> @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
>   		goto err;
>   
> -	/* Allocate the sink buffer for this session */
> +	/*
> +	 * Allocate the sink buffer for this session. All the sinks
> +	 * where this event can be scheduled are ensured to be of the
> +	 * same type. Thus the same sink configuration is used by the
> +	 * sinks.
> +	 */
>   	event_data->snk_config =
>   			sink_ops(sink)->alloc_buffer(sink, event, pages,
>   						     nr_pages, overwrite);
> 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
@ 2020-11-12  9:21     ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:21 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: Linu Cherian, Linu Cherian, linux-kernel, mathieu.poirier, mike.leach

Hi Linu,

Please could you test this slightly modified version and give us
a Tested-by tag if you are happy with the results ?

Suzuki


On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> When there are multiple sinks on the system, in the absence
> of a specified sink, it is quite possible that a default sink
> for an ETM could be different from that of another ETM. However
> we do not support having multiple sinks for an event yet. This
> patch allows the event to use the default sinks on the ETMs
> where they are scheduled as long as the sinks are of the same
> type.
> 
> e.g, if we have 1x1 topology with per-CPU ETRs, the event can
> use the per-CPU ETR for the session. However, if the sinks
> are of different type, e.g TMC-ETR on one and a custom sink
> on another, the event will only trace on the first detected
> sink.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
>   1 file changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index c2c9b12..ea73cfa 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
>   	schedule_work(&event_data->work);
>   }
>   
> +static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
> +{
> +	if (!a || !b)
> +		return false;
> +	return (sink_ops(a) == sink_ops(b));
> +}
> +
>   static void *etm_setup_aux(struct perf_event *event, void **pages,
>   			   int nr_pages, bool overwrite)
>   {
>   	u32 id;
>   	int cpu = event->cpu;
>   	cpumask_t *mask;
> -	struct coresight_device *sink;
> +	struct coresight_device *sink = NULL;
>   	struct etm_event_data *event_data = NULL;
> +	bool sink_forced = false;
>   
>   	event_data = alloc_event_data(cpu);
>   	if (!event_data)
> @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	if (event->attr.config2) {
>   		id = (u32)event->attr.config2;
>   		sink = coresight_get_sink_by_id(id);
> +		sink_forced = true;
>   	}
>   
>   	mask = &event_data->mask;
> @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	 */
>   	for_each_cpu(cpu, mask) {
>   		struct list_head *path;
> -		struct coresight_device *csdev;
> +		struct coresight_device *csdev, *new_sink;
>   
>   		csdev = per_cpu(csdev_src, cpu);
>   		/*
> @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   		}
>   
>   		/*
> -		 * No sink provided - look for a default sink for one of the
> -		 * devices. At present we only support topology where all CPUs
> -		 * use the same sink [N:1], so only need to find one sink. The
> -		 * coresight_build_path later will remove any CPU that does not
> -		 * attach to the sink, or if we have not found a sink.
> +		 * No sink provided - look for a default sink for all the devices.
> +		 * We only support multiple sinks, only if all the default sinks
> +		 * are of the same type, so that the sink buffer can be shared
> +		 * as the event moves around. We don't trace on a CPU if it can't
> +		 *
>   		 */
> -		if (!sink)
> -			sink = coresight_find_default_sink(csdev);
> +		if (!sink_forced) {
> +			new_sink = coresight_find_default_sink(csdev);
> +			if (!new_sink) {
> +				cpumask_clear_cpu(cpu, mask);
> +				continue;
> +			}
> +			/* Skip checks for the first sink */
> +			if (!sink) {
> +				sink = new_sink;
> +			} else if (!sinks_match(new_sink, sink)) {
> +				cpumask_clear_cpu(cpu, mask);
> +				continue;
> +			}
> +		} else {
> +			new_sink = sink;
> +		}
>   
>   		/*
>   		 * Building a path doesn't enable it, it simply builds a
>   		 * list of devices from source to sink that can be
>   		 * referenced later when the path is actually needed.
>   		 */
> -		path = coresight_build_path(csdev, sink);
> +		path = coresight_build_path(csdev, new_sink);
>   		if (IS_ERR(path)) {
>   			cpumask_clear_cpu(cpu, mask);
>   			continue;
> @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>   	if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
>   		goto err;
>   
> -	/* Allocate the sink buffer for this session */
> +	/*
> +	 * Allocate the sink buffer for this session. All the sinks
> +	 * where this event can be scheduled are ensured to be of the
> +	 * same type. Thus the same sink configuration is used by the
> +	 * sinks.
> +	 */
>   	event_data->snk_config =
>   			sink_ops(sink)->alloc_buffer(sink, event, pages,
>   						     nr_pages, overwrite);
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-12  9:27     ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:27 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> perf handle structure needs to be shared with the TRBE IRQ handler for
> capturing trace data and restarting the handle. There is a probability
> of an undefined reference based crash when etm event is being stopped
> while a TRBE IRQ also getting processed. This happens due the release
> of perf handle via perf_aux_output_end(). This stops the sinks via the
> link before releasing the handle, which will ensure that a simultaneous
> TRBE IRQ could not happen.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This might cause problem with traditional sink devices which can be
> operated in both sysfs and perf mode. This needs to be addressed
> correctly. One option would be to move the update_buffer callback
> into the respective sink devices. e.g, disable().
> 
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 534e205..1a37991 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>   
>   		size = sink_ops(sink)->update_buffer(sink, handle,
>   					      event_data->snk_config);
> +		coresight_disable_path(path);
>   		perf_aux_output_end(handle, size);
> +		return;
>   	}

As you mentioned, this is not ideal where another session could be triggered on
the sink from a different ETM (not for per-CPU sink) in a different mode before
you collect the buffer. I believe the best option is to leave the
update_buffer() to disable_hw. This would need to pass on the "handle" to the
disable_path.

That way the races can be handled inside the sinks. Also, this aligns the
perf mode of the sinks with that of the sysfs mode.

Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-11-12  9:27     ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:27 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> perf handle structure needs to be shared with the TRBE IRQ handler for
> capturing trace data and restarting the handle. There is a probability
> of an undefined reference based crash when etm event is being stopped
> while a TRBE IRQ also getting processed. This happens due the release
> of perf handle via perf_aux_output_end(). This stops the sinks via the
> link before releasing the handle, which will ensure that a simultaneous
> TRBE IRQ could not happen.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This might cause problem with traditional sink devices which can be
> operated in both sysfs and perf mode. This needs to be addressed
> correctly. One option would be to move the update_buffer callback
> into the respective sink devices. e.g, disable().
> 
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 534e205..1a37991 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>   
>   		size = sink_ops(sink)->update_buffer(sink, handle,
>   					      event_data->snk_config);
> +		coresight_disable_path(path);
>   		perf_aux_output_end(handle, size);
> +		return;
>   	}

As you mentioned, this is not ideal where another session could be triggered on
the sink from a different ETM (not for per-CPU sink) in a different mode before
you collect the buffer. I believe the best option is to leave the
update_buffer() to disable_hw. This would need to pass on the "handle" to the
disable_path.

That way the races can be handled inside the sinks. Also, this aligns the
perf mode of the sinks with that of the sysfs mode.

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-12  9:31     ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:31 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

Hi Anshuman,
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> Unlike traditional sink devices, individual TRBE instances are not detected
> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
> process. Hence a path connecting ETE and TRBE on a given CPU would not have
> been established until then. This adds two coresight helpers that will help
> modify outward connections from a source device to establish and terminate
> path to a given sink device. But this method might not be optimal and would
> be reworked later.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Instead of this, could we come up something like a percpu_sink concept ? That
way, the TRBE driver could register the percpu_sink for the corresponding CPU
and we don't have to worry about the order in which the ETE will be probed
on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following
approach would fail to register the sink).

And the default sink can be initialized when the ETE instance first starts
looking for it.

Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
@ 2020-11-12  9:31     ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12  9:31 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

Hi Anshuman,
On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> Unlike traditional sink devices, individual TRBE instances are not detected
> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
> process. Hence a path connecting ETE and TRBE on a given CPU would not have
> been established until then. This adds two coresight helpers that will help
> modify outward connections from a source device to establish and terminate
> path to a given sink device. But this method might not be optimal and would
> be reworked later.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Instead of this, could we come up something like a percpu_sink concept ? That
way, the TRBE driver could register the percpu_sink for the corresponding CPU
and we don't have to worry about the order in which the ETE will be probed
on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following
approach would fail to register the sink).

And the default sink can be initialized when the ETE instance first starts
looking for it.

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-12 10:13     ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12 10:13 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
> accessible via the system registers. The TRBE supports different addressing
> modes including CPU virtual address and buffer modes including the circular
> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
> access to the trace buffer could be prohibited by a higher exception level
> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
> private interrupt (PPI) on address translation errors and when the buffer
> is full. Overall implementation here is inspired from the Arm SPE driver.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>   arch/arm64/include/asm/sysreg.h                  |   2 +
>   drivers/hwtracing/coresight/Kconfig              |  11 +
>   drivers/hwtracing/coresight/Makefile             |   1 +
>   drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
>   drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>   6 files changed, 1341 insertions(+)
>   create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
> new file mode 100644
> index 0000000..4320a8b
> --- /dev/null
> +++ b/Documentation/trace/coresight/coresight-trbe.rst
> @@ -0,0 +1,36 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============================
> +Trace Buffer Extension (TRBE).
> +==============================
> +
> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
> +    :Date:     November 2020
> +
> +Hardware Description
> +--------------------
> +
> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
> +memory, CPU traces generated from a corresponding percpu tracing unit. This
> +gets plugged in as a coresight sink device because the corresponding trace
> +genarators (ETE), are plugged in as source device.
> +
> +Sysfs files and directories
> +---------------------------
> +
> +The TRBE devices appear on the existing coresight bus alongside the other
> +coresight devices::
> +
> +	>$ ls /sys/bus/coresight/devices
> +	trbe0  trbe1  trbe2 trbe3
> +
> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
> +
> +	>$ ls /sys/bus/coresight/devices/trbe0/
> +	irq align dbm
> +
> +*Key file items are:-*
> +   * ``irq``: TRBE maintenance interrupt number
> +   * ``align``: TRBE write pointer alignment
> +   * ``dbm``: TRBE updates memory with access and dirty flags
> +
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 14cb156..61136f6 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -97,6 +97,7 @@
>   #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
>   #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
>   #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> +#define TSB_CSYNC			__emit_inst(0xd503225f)
>   
>   #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>   	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
> @@ -865,6 +866,7 @@
>   #define ID_AA64MMFR2_CNP_SHIFT		0
>   
>   /* id_aa64dfr0 */
> +#define ID_AA64DFR0_TRBE_SHIFT		44
>   #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>   #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>   #define ID_AA64DFR0_PMSVER_SHIFT	32
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index c119824..0f5e101 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>   	  To compile this driver as a module, choose M here: the
>   	  module will be called coresight-cti.
>   
> +config CORESIGHT_TRBE
> +	bool "Trace Buffer Extension (TRBE) driver"
> +	depends on ARM64
> +	help
> +	  This driver provides support for percpu Trace Buffer Extension (TRBE).
> +	  TRBE always needs to be used along with it's corresponding percpu ETE
> +	  component. ETE generates trace data which is then captured with TRBE.
> +	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
> +	  system registers. But it's explicit dependency with trace unit (ETE)
> +	  requires it to be plugged in as a coresight sink device.
> +
>   config CORESIGHT_CTI_INTEGRATION_REGS
>   	bool "Access CTI CoreSight Integration Registers"
>   	depends on CORESIGHT_CTI
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index f20e357..d608165 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>   obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>   obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>   obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>   coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
>   		   coresight-cti-sysfs.o
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> new file mode 100644
> index 0000000..48a8ec3
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -0,0 +1,766 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
> + * sink device could then pair with an appropriate per-cpu coresight source
> + * device (ETE) thus generating required trace data. Trace can be enabled
> + * via the perf framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define DRVNAME "arm_trbe"
> +
> +#define pr_fmt(fmt) DRVNAME ": " fmt
> +
> +#include "coresight-trbe.h"
> +
> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +#define ETE_IGNORE_PACKET 0x70

Add a comment here, on what this means to the decoder.

> +
> +static const char trbe_name[] = "trbe";

Why not

#define DEVNAME	"trbe"


> +
> +enum trbe_fault_action {
> +	TRBE_FAULT_ACT_WRAP,
> +	TRBE_FAULT_ACT_SPURIOUS,
> +	TRBE_FAULT_ACT_FATAL,
> +};
> +
> +struct trbe_perf {

Please rename this to trbe_buf. This will be used for sysfs mode as well.

> +	unsigned long trbe_base;
> +	unsigned long trbe_limit;
> +	unsigned long trbe_write;
> +	pid_t pid;

Why do we need this ? This seems unused and moreover, there cannot
be multiple tracers into TRBE. So, we don't need to share the sink
unlike the traditional ones.

> +	int nr_pages;
> +	void **pages;
> +	bool snapshot;
> +	struct trbe_cpudata *cpudata;
> +};
> +
> +struct trbe_cpudata {
> +	struct coresight_device	*csdev;
> +	bool trbe_dbm;

Why do we need this ?

> +	u64 trbe_align;
> +	int cpu;
> +	enum cs_mode mode;
> +	struct trbe_perf *perf;
> +	struct trbe_drvdata *drvdata;
> +};
> +
> +struct trbe_drvdata {
> +	struct trbe_cpudata __percpu *cpudata;
> +	struct perf_output_handle __percpu *handle;

Shouldn't this be :

	struct perf_output_handle __percpu **handle ?

as we get a handle from the etm-perf and is not controlled by
the TRBE ?

> +	struct hlist_node hotplug_node;
> +	int irq;
> +	cpumask_t supported_cpus;
> +	enum cpuhp_state trbe_online;
> +	struct platform_device *pdev;
> +	struct clk *atclk;

We don't have any clocks for the TRBE instance. Please remove.

> +};
> +
> +static int trbe_alloc_node(struct perf_event *event)
> +{
> +	if (event->cpu == -1)
> +		return NUMA_NO_NODE;
> +	return cpu_to_node(event->cpu);
> +}
> +
> +static void trbe_disable_and_drain_local(void)
> +{
> +	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +	isb();
> +	dsb(nsh);
> +	asm(TSB_CSYNC);
> +}
> +
> +static void trbe_reset_local(void)
> +{
> +	trbe_disable_and_drain_local();
> +	write_sysreg_s(0, SYS_TRBPTR_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBBASER_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBSR_EL1);
> +	isb();
> +}
> +
> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +
> +	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
> +	if (!perf->snapshot)
> +		perf_aux_output_skip(handle, len);
> +}
> +
> +static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +	u64 limit = perf->nr_pages * PAGE_SIZE;
> +

So we are using half of the buffer for snapshot mode to avoid a case where the
analyzer is unable to decode the trace in case of an overflow.

> +	if (head < limit >> 1)
> +		limit >>= 1;

Also this needs to be thought out. We may not need this restriction. The trace decoder
will be able to walk forward and then find a synchronization packet and then continue
the tracing from there. So, we could use the entire buffer for TRBE.


> +
> +	return limit;
> +}
> +
> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
> +	u64 limit = bufsize;
> +	u64 head, tail, wakeup;
> +

Commentary please.

> +	head = PERF_IDX2OFF(handle->head, perf);
> +	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
> +		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
> +
> +		delta = min(delta, handle->size);
> +		trbe_pad_buf(handle, delta);
> +		head = PERF_IDX2OFF(handle->head, perf);
> +	}
> +
> +	if (!handle->size) {
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +		return 0;
> +	}
> +
> +	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
> +	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
> +

> +	if (head < tail)

  comment

> +		limit = round_down(tail, PAGE_SIZE);
> +
> +	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
> +		limit = min(limit, round_up(wakeup, PAGE_SIZE));

comment. Also do we need an alignement to PAGE_SIZE ?

> +
> +	if (limit > head)
> +		return limit;
> +
> +	trbe_pad_buf(handle, handle->size);
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return 0;
> +}
> +
> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset;
> +
> +	if (perf->snapshot)
> +		offset = trbe_snapshot_offset(handle);
> +	else
> +		offset = trbe_normal_offset(handle);
> +	return perf->trbe_base + offset;
> +}
> +
> +static void trbe_enable_hw(struct trbe_perf *perf)
> +{
> +	WARN_ON(perf->trbe_write < perf->trbe_base);
> +	WARN_ON(perf->trbe_write >= perf->trbe_limit);
> +	set_trbe_disabled();
> +	clr_trbe_irq();
> +	clr_trbe_wrap();
> +	clr_trbe_abort();
> +	clr_trbe_ec();
> +	clr_trbe_bsc();
> +	clr_trbe_fsc();

Please merge all of these field updates to single register update
unless mandated by the architecture.

> +	set_trbe_virtual_mode();
> +	set_trbe_fill_mode(TRBE_FILL_STOP);
> +	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);

Same here ^^

> +	isb();
> +	set_trbe_base_pointer(perf->trbe_base);
> +	set_trbe_limit_pointer(perf->trbe_limit);
> +	set_trbe_write_pointer(perf->trbe_write);
> +	isb();
> +	dsb(ishst);
> +	flush_tlb_all();

Why is this needed ?

> +	set_trbe_running();
> +	set_trbe_enabled();
> +	asm(TSB_CSYNC);
> +}
> +
> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> +				   struct perf_event *event, void **pages,
> +				   int nr_pages, bool snapshot)
> +{
> +	struct trbe_perf *perf;
> +	struct page **pglist;
> +	int i;
> +
> +	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))

We may be able to remove the restriction on snapshot mode, see my comment
above.

> +		return NULL;
> +
> +	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
> +	if (IS_ERR(perf))
> +		return ERR_PTR(-ENOMEM);
> +
> +	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
> +	if (IS_ERR(pglist)) {
> +		kfree(perf);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	for (i = 0; i < nr_pages; i++)
> +		pglist[i] = virt_to_page(pages[i]);
> +
> +	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> +	if (IS_ERR((void *) perf->trbe_base)) {
> +		kfree(pglist);
> +		kfree(perf);
> +		return ERR_PTR(perf->trbe_base);
> +	}
> +	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
> +	perf->trbe_write = perf->trbe_base;
> +	perf->pid = task_pid_nr(event->owner);
> +	perf->snapshot = snapshot;
> +	perf->nr_pages = nr_pages;
> +	perf->pages = pages;
> +	kfree(pglist);
> +	return perf;
> +}
> +
> +void arm_trbe_free_buffer(void *config)
> +{
> +	struct trbe_perf *perf = config;
> +
> +	vunmap((void *) perf->trbe_base);
> +	kfree(perf);
> +}
> +
> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> +					    struct perf_output_handle *handle,
> +					    void *config)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = config;
> +	unsigned long size, offset;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head += size;
> +	return size;
> +}
> +
> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct perf_output_handle *handle = data;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(mode != CS_MODE_PERF);

Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure
the mode is DISABLED (when you get to sysfs mode).

> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	*this_cpu_ptr(drvdata->handle) = *handle;

That is wrong. Storing a local copy of a global perf generic structure
is calling for trouble, assuming that the global structure doesn't change
beneath us. Please store handle ptr.

> +	cpudata->perf = perf;
> +	cpudata->mode = mode;
> +	perf->cpudata = cpudata;
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return 0;
> +	}
> +	trbe_enable_hw(perf);
> +	return 0;
> +}
> +
> +static int arm_trbe_disable(struct coresight_device *csdev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = cpudata->perf;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	trbe_disable_and_drain_local();
> +	perf->cpudata = NULL;
> +	cpudata->perf = NULL;
> +	cpudata->mode = CS_MODE_DISABLED;
> +	return 0;
> +}
> +
> +static void trbe_handle_fatal(struct perf_output_handle *handle)
> +{
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	perf_aux_output_end(handle, 0);
> +	trbe_disable_and_drain_local();
> +}
> +
> +static void trbe_handle_spurious(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	trbe_enable_hw(perf);
> +}
> +
> +static void trbe_handle_overflow(struct perf_output_handle *handle)
> +{
> +	struct perf_event *event = handle->event;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset, size;
> +	struct etm_event_data *event_data;
> +
> +	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head = offset;

Is this correct ? Or was this supposed to mean :
		handle->head += offset;


> +	perf_aux_output_end(handle, size);
> +
> +	event_data = perf_aux_output_begin(handle, event);
> +	if (!event_data) {
> +		event->hw.state |= PERF_HES_STOPPED;
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	perf->trbe_write = perf->trbe_base;
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
> +	trbe_enable_hw(perf);
> +}
> +
> +static bool is_perf_trbe(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	struct trbe_drvdata *drvdata = cpudata->drvdata;

Can you trust the cpudata ptr here as we are still verifying
if this was legitimate ?

> +	int cpu = smp_processor_id();
> +
> +	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
> +	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
> +
> +	if (cpudata->mode != CS_MODE_PERF)
> +		return false;
> +
> +	if (cpudata->cpu != cpu)
> +		return false;
> +
> +	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
> +		return false;
> +
> +	return true;
> +}
> +
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
> +{
> +	enum trbe_ec ec = get_trbe_ec();
> +	enum trbe_bsc bsc = get_trbe_bsc();
> +
> +	WARN_ON(is_trbe_running());
> +	asm(TSB_CSYNC);
> +	dsb(nsh);
> +	isb();
> +
> +	if (is_trbe_trg() || is_trbe_abort())
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
> +		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> +			return TRBE_FAULT_ACT_WRAP;
> +	}
> +	return TRBE_FAULT_ACT_SPURIOUS;
> +}
> +
> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
> +{
> +	struct perf_output_handle *handle = dev;
> +	enum trbe_fault_action act;
> +
> +	WARN_ON(!is_trbe_irq());
> +	clr_trbe_irq();
> +
> +	if (!perf_get_aux(handle))
> +		return IRQ_NONE;
> +
> +	if (!is_perf_trbe(handle))
> +		return IRQ_NONE;
> +
> +	irq_work_run();
> +
> +	act = trbe_get_fault_act(handle);
> +	switch (act) {
> +	case TRBE_FAULT_ACT_WRAP:
> +		trbe_handle_overflow(handle);
> +		break;
> +	case TRBE_FAULT_ACT_SPURIOUS:
> +		trbe_handle_spurious(handle);
> +		break;
> +	case TRBE_FAULT_ACT_FATAL:
> +		trbe_handle_fatal(handle);
> +		break;
> +	}
> +	return IRQ_HANDLED;
> +}
> +


> +static void arm_trbe_probe_coresight_cpu(void *info)
> +{
> +	struct trbe_cpudata *cpudata = info;
> +	struct device *dev = &cpudata->drvdata->pdev->dev;
> +	struct coresight_desc desc = { 0 };
> +
> +	if (WARN_ON(!cpudata))
> +		goto cpu_clear;
> +
> +	if (!is_trbe_available()) {
> +		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +
> +	if (!is_trbe_programmable()) {
> +		pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
> +	if (IS_ERR(desc.name))
> +		goto cpu_clear;
> +
> +	desc.type = CORESIGHT_DEV_TYPE_SINK;
> +	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;

May be should add a new subtype to make this higher priority than the normal ETR.
Something like :

	CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM

> +	desc.ops = &arm_trbe_cs_ops;
> +	desc.pdata = dev_get_platdata(dev);
> +	desc.groups = arm_trbe_groups;
> +	desc.dev = dev;
> +	cpudata->csdev = coresight_register(&desc);
> +	if (IS_ERR(cpudata->csdev))
> +		goto cpu_clear;
> +
> +	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
> +	cpudata->trbe_dbm = get_trbe_flag_update();
> +	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> +	if (cpudata->trbe_align > SZ_2K) {
> +		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	return;
> +cpu_clear:
> +	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
> +}
> +
> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
> +	if (IS_ERR(drvdata->cpudata))
> +		return PTR_ERR(drvdata->cpudata);
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		cpudata->cpu = cpu;
> +		cpudata->drvdata = drvdata;
> +		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);

We could batch it and run it on all CPUs at the same time ? Also it would be better to
leave the per_cpu area filled by the CPU itself, to avoid racing.


> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_coresight_cpu(void *info)
> +{
> +	struct trbe_drvdata *drvdata = info;
> +
> +	disable_percpu_irq(drvdata->irq);
> +}
> +
> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}

Please leave this to the CPU to do the part.

> +	}
> +	free_percpu(drvdata->cpudata);
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (!cpudata->csdev) {
> +			cpudata->drvdata = drvdata;
> +			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);

Why do we need smp_call here ? We are already on the CPU.

> +		}
> +		trbe_reset_local();
> +		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +		disable_percpu_irq(drvdata->irq);
> +		trbe_reset_local();
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	enum cpuhp_state trbe_online;
> +
> +	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
> +					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
> +	if (trbe_online < 0)
> +		return -EINVAL;
> +
> +	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
> +		return -EINVAL;
> +
> +	drvdata->trbe_online = trbe_online;
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	cpuhp_remove_multi_state(drvdata->trbe_online);
> +}
> +
> +static int arm_trbe_probe_irq(struct platform_device *pdev,
> +			      struct trbe_drvdata *drvdata)
> +{
> +	drvdata->irq = platform_get_irq(pdev, 0);
> +	if (!drvdata->irq) {
> +		pr_err("IRQ not found for the platform device\n");
> +		return -ENXIO;
> +	}
> +
> +	if (!irq_is_percpu(drvdata->irq)) {
> +		pr_err("IRQ is not a PPI\n");
> +		return -EINVAL;
> +	}
> +
> +	if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
> +		return -EINVAL;
> +
> +	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
> +	if (!drvdata->handle)
> +		return -ENOMEM;
> +
> +	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
> +		free_percpu(drvdata->handle);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
> +{
> +	free_percpu_irq(drvdata->irq, drvdata->handle);
> +	free_percpu(drvdata->handle);
> +}
> +
> +static int arm_trbe_device_probe(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata;
> +	struct trbe_drvdata *drvdata;
> +	struct device *dev = &pdev->dev;
> +	int ret;
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (IS_ERR(drvdata))
> +		return -ENOMEM;
> +
> +	pdata = coresight_get_platform_data(dev);
> +	if (IS_ERR(pdata)) {
> +		kfree(drvdata);
> +		return -ENOMEM;
> +	}


> +
> +	drvdata->atclk = devm_clk_get(dev, "atclk");
> +	if (!IS_ERR(drvdata->atclk)) {
> +		ret = clk_prepare_enable(drvdata->atclk);
> +		if (ret)
> +			return ret;
> +	}

Please drop the clocks, we don't have any

> +	dev_set_drvdata(dev, drvdata);
> +	dev->platform_data = pdata;
> +	drvdata->pdev = pdev;
> +	ret = arm_trbe_probe_irq(pdev, drvdata);
> +	if (ret)
> +		goto irq_failed;
> +
> +	ret = arm_trbe_probe_coresight(drvdata);
> +	if (ret)
> +		goto probe_failed;
> +
> +	ret = arm_trbe_probe_cpuhp(drvdata);
> +	if (ret)
> +		goto cpuhp_failed;
> +
> +	return 0;
> +cpuhp_failed:
> +	arm_trbe_remove_coresight(drvdata);
> +probe_failed:
> +	arm_trbe_remove_irq(drvdata);
> +irq_failed:
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return ret;
> +}
> +
> +static int arm_trbe_device_remove(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
> +	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
> +
> +	arm_trbe_remove_coresight(drvdata);
> +	arm_trbe_remove_cpuhp(drvdata);
> +	arm_trbe_remove_irq(drvdata);
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM
> +static int arm_trbe_runtime_suspend(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_disable_unprepare(drvdata->atclk);
> +

Remove. We may need to save/restore the TRBE ptrs, depending on the
TRBE.

> +	return 0;
> +}
> +
> +static int arm_trbe_runtime_resume(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_prepare_enable(drvdata->atclk);

Remove. See above.

> +
> +	return 0;
> +}
> +#endif
> +
> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
> +};
> +
> +static const struct of_device_id arm_trbe_of_match[] = {
> +	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
> +	{},
> +};

I think it is better to call this, we have too many acronyms ;-)

	"arm,trace-buffer-extension"

> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);

> +
> +static const struct platform_device_id arm_trbe_match[] = {
> +	{ "arm,trbe", 0},
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);

Please remove. The ACPI part can be added when we get to it.

> +
> +static struct platform_driver arm_trbe_driver = {
> +	.id_table = arm_trbe_match,
> +	.driver	= {
> +		.name = DRVNAME,
> +		.of_match_table = of_match_ptr(arm_trbe_of_match),
> +		.pm = &arm_trbe_dev_pm_ops,
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe	= arm_trbe_device_probe,
> +	.remove	= arm_trbe_device_remove,
> +};
> +builtin_platform_driver(arm_trbe_driver)

Please make this modular.


> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
> new file mode 100644
> index 0000000..82ffbfc
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
> @@ -0,0 +1,525 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This contains all required hardware related helper functions for
> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#include <linux/coresight.h>
> +#include <linux/device.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/smp.h>
> +
> +#include "coresight-etm-perf.h"
> +
> +static inline bool is_trbe_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
> +
> +	return trbe >= 0b0001;
> +}
> +
> +static inline bool is_ete_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
> +
> +	return (tracever != 0b0000);

Why is this needed ?

> +}
> +
> +static inline bool is_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_ENABLE;
> +}
> +
> +enum trbe_ec {
> +	TRBE_EC_OTHERS		= 0,
> +	TRBE_EC_STAGE1_ABORT	= 36,
> +	TRBE_EC_STAGE2_ABORT	= 37,
> +};
> +
> +static const char *const trbe_ec_str[] = {
> +	[TRBE_EC_OTHERS]	= "Maintenance exception",
> +	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
> +	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
> +};
> +

Please remove the defintions that are not used by the driver.

> +static inline enum trbe_ec get_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
> +}
> +
> +static inline void clr_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_bsc {
> +	TRBE_BSC_NOT_STOPPED	= 0,
> +	TRBE_BSC_FILLED		= 1,
> +	TRBE_BSC_TRIGGERED	= 2,
> +};
> +
> +static const char *const trbe_bsc_str[] = {
> +	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
> +	[TRBE_BSC_FILLED]	= "TRBE filled",
> +	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
> +};
> +
> +static inline enum trbe_bsc get_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
> +}
> +
> +static inline void clr_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_fsc {
> +	TRBE_FSC_ASF_LEVEL0	= 0,
> +	TRBE_FSC_ASF_LEVEL1	= 1,
> +	TRBE_FSC_ASF_LEVEL2	= 2,
> +	TRBE_FSC_ASF_LEVEL3	= 3,
> +	TRBE_FSC_TF_LEVEL0	= 4,
> +	TRBE_FSC_TF_LEVEL1	= 5,
> +	TRBE_FSC_TF_LEVEL2	= 6,
> +	TRBE_FSC_TF_LEVEL3	= 7,
> +	TRBE_FSC_AFF_LEVEL0	= 8,
> +	TRBE_FSC_AFF_LEVEL1	= 9,
> +	TRBE_FSC_AFF_LEVEL2	= 10,
> +	TRBE_FSC_AFF_LEVEL3	= 11,
> +	TRBE_FSC_PF_LEVEL0	= 12,
> +	TRBE_FSC_PF_LEVEL1	= 13,
> +	TRBE_FSC_PF_LEVEL2	= 14,
> +	TRBE_FSC_PF_LEVEL3	= 15,
> +	TRBE_FSC_SEA_WRITE	= 16,
> +	TRBE_FSC_ASEA_WRITE	= 17,
> +	TRBE_FSC_SEA_LEVEL0	= 20,
> +	TRBE_FSC_SEA_LEVEL1	= 21,
> +	TRBE_FSC_SEA_LEVEL2	= 22,
> +	TRBE_FSC_SEA_LEVEL3	= 23,
> +	TRBE_FSC_ALIGN_FAULT	= 33,
> +	TRBE_FSC_TLB_FAULT	= 48,
> +	TRBE_FSC_ATOMIC_FAULT	= 49,
> +};

Please remove ^^^

> +
> +static const char *const trbe_fsc_str[] = {
> +	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
> +	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
> +	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
> +	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
> +	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
> +	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
> +	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
> +	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
> +	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
> +	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
> +	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
> +	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
> +	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
> +	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
> +	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
> +	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
> +	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
> +	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
> +	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 0",
> +	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 1",
> +	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 2",
> +	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 3",
> +	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
> +	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
> +	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
> +};
> 

Please remove ^^^

>

> +enum trbe_address_mode {
> +	TRBE_ADDRESS_VIRTUAL,
> +	TRBE_ADDRESS_PHYSICAL,
> +};

#define please.

> +
> +static const char *const trbe_address_mode_str[] = {
> +	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
> +	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
> +};

Do we need this ? We always use virtual.

> +
> +static inline bool is_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return !(trblimitr & TRBLIMITR_NVM);
> +}
> +

Remove

> +static inline bool is_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_NVM;
> +}

Remove

> +
> +static inline void set_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +

> +static inline void set_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}

Remove

> +
> +enum trbe_trig_mode {
> +	TRBE_TRIGGER_STOP	= 0,
> +	TRBE_TRIGGER_IRQ	= 1,
> +	TRBE_TRIGGER_IGNORE	= 3,
> +};
> +
> +static const char *const trbe_trig_mode_str[] = {
> +	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
> +	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
> +	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
> +};
> +
> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
> +}
> +
> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_fill_mode {
> +	TRBE_FILL_STOP		= 0,
> +	TRBE_FILL_WRAP		= 1,
> +	TRBE_FILL_CIRCULAR	= 3,
> +};
> +

Please use #define

> +static const char *const trbe_fill_mode_str[] = {
> +	[TRBE_FILL_STOP]	= "Buffer mode - stop",
> +	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
> +	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
> +};
> +
> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
> +}
> +
> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_disabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline bool get_trbe_flag_update(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return trbidr & TRBIDR_FLAG;
> +}
> +
> +static inline bool is_trbe_programmable(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return !(trbidr & TRBIDR_PROG);
> +}
> +#
> +enum trbe_buffer_align {
> +	TRBE_BUFFER_BYTE,
> +	TRBE_BUFFER_HALF_WORD,
> +	TRBE_BUFFER_WORD,
> +	TRBE_BUFFER_DOUBLE_WORD,
> +	TRBE_BUFFER_16_BYTES,
> +	TRBE_BUFFER_32_BYTES,
> +	TRBE_BUFFER_64_BYTES,
> +	TRBE_BUFFER_128_BYTES,
> +	TRBE_BUFFER_256_BYTES,
> +	TRBE_BUFFER_512_BYTES,
> +	TRBE_BUFFER_1K_BYTES,
> +	TRBE_BUFFER_2K_BYTES,
> +};
> +

Remove ^^

> +static const char *const trbe_buffer_align_str[] = {
> +	[TRBE_BUFFER_BYTE]		= "Byte",
> +	[TRBE_BUFFER_HALF_WORD]		= "Half word",
> +	[TRBE_BUFFER_WORD]		= "Word",
> +	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
> +	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
> +	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
> +	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
> +	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
> +	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
> +	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
> +	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
> +	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
> +};

We don't need any of this. We could simply "<<" and get the
size.


> +
> +static inline enum trbe_buffer_align get_trbe_address_align(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
> +}
> +
> +static inline void assert_trbe_address_mode(unsigned long addr)
> +{
> +	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
> +	bool virt_mode = is_trbe_virtual_mode();
> +
> +	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
> +}

I am not sure if this is really helpful. You have to trust the kernel vmalloc().

> +
> +static inline void assert_trbe_address_align(unsigned long addr)
> +{
> +	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
> +
> +	WARN_ON(addr & (nr_bytes - 1));
> +}
> +
> +static inline unsigned long get_trbe_write_pointer(void)
> +{
> +	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
> +	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_write_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +	write_sysreg_s(addr, SYS_TRBPTR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_limit_pointer(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK;
> +	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
> +
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_limit_pointer(unsigned long addr)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
> +	trblimitr |= (addr & PAGE_MASK);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_base_pointer(void)
> +{
> +	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
> +	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK;
> +
> +	addr = addr << TRBBASER_BASE_SHIFT;
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_base_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	write_sysreg_s(addr, SYS_TRBBASER_EL1);
> +}
> 

Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
@ 2020-11-12 10:13     ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12 10:13 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
> accessible via the system registers. The TRBE supports different addressing
> modes including CPU virtual address and buffer modes including the circular
> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
> access to the trace buffer could be prohibited by a higher exception level
> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
> private interrupt (PPI) on address translation errors and when the buffer
> is full. Overall implementation here is inspired from the Arm SPE driver.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>   arch/arm64/include/asm/sysreg.h                  |   2 +
>   drivers/hwtracing/coresight/Kconfig              |  11 +
>   drivers/hwtracing/coresight/Makefile             |   1 +
>   drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
>   drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>   6 files changed, 1341 insertions(+)
>   create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
> new file mode 100644
> index 0000000..4320a8b
> --- /dev/null
> +++ b/Documentation/trace/coresight/coresight-trbe.rst
> @@ -0,0 +1,36 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============================
> +Trace Buffer Extension (TRBE).
> +==============================
> +
> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
> +    :Date:     November 2020
> +
> +Hardware Description
> +--------------------
> +
> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
> +memory, CPU traces generated from a corresponding percpu tracing unit. This
> +gets plugged in as a coresight sink device because the corresponding trace
> +genarators (ETE), are plugged in as source device.
> +
> +Sysfs files and directories
> +---------------------------
> +
> +The TRBE devices appear on the existing coresight bus alongside the other
> +coresight devices::
> +
> +	>$ ls /sys/bus/coresight/devices
> +	trbe0  trbe1  trbe2 trbe3
> +
> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
> +
> +	>$ ls /sys/bus/coresight/devices/trbe0/
> +	irq align dbm
> +
> +*Key file items are:-*
> +   * ``irq``: TRBE maintenance interrupt number
> +   * ``align``: TRBE write pointer alignment
> +   * ``dbm``: TRBE updates memory with access and dirty flags
> +
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 14cb156..61136f6 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -97,6 +97,7 @@
>   #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
>   #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
>   #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> +#define TSB_CSYNC			__emit_inst(0xd503225f)
>   
>   #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>   	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
> @@ -865,6 +866,7 @@
>   #define ID_AA64MMFR2_CNP_SHIFT		0
>   
>   /* id_aa64dfr0 */
> +#define ID_AA64DFR0_TRBE_SHIFT		44
>   #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>   #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>   #define ID_AA64DFR0_PMSVER_SHIFT	32
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index c119824..0f5e101 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>   	  To compile this driver as a module, choose M here: the
>   	  module will be called coresight-cti.
>   
> +config CORESIGHT_TRBE
> +	bool "Trace Buffer Extension (TRBE) driver"
> +	depends on ARM64
> +	help
> +	  This driver provides support for percpu Trace Buffer Extension (TRBE).
> +	  TRBE always needs to be used along with it's corresponding percpu ETE
> +	  component. ETE generates trace data which is then captured with TRBE.
> +	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
> +	  system registers. But it's explicit dependency with trace unit (ETE)
> +	  requires it to be plugged in as a coresight sink device.
> +
>   config CORESIGHT_CTI_INTEGRATION_REGS
>   	bool "Access CTI CoreSight Integration Registers"
>   	depends on CORESIGHT_CTI
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index f20e357..d608165 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>   obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>   obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>   obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>   coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
>   		   coresight-cti-sysfs.o
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> new file mode 100644
> index 0000000..48a8ec3
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -0,0 +1,766 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
> + * sink device could then pair with an appropriate per-cpu coresight source
> + * device (ETE) thus generating required trace data. Trace can be enabled
> + * via the perf framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define DRVNAME "arm_trbe"
> +
> +#define pr_fmt(fmt) DRVNAME ": " fmt
> +
> +#include "coresight-trbe.h"
> +
> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +#define ETE_IGNORE_PACKET 0x70

Add a comment here, on what this means to the decoder.

> +
> +static const char trbe_name[] = "trbe";

Why not

#define DEVNAME	"trbe"


> +
> +enum trbe_fault_action {
> +	TRBE_FAULT_ACT_WRAP,
> +	TRBE_FAULT_ACT_SPURIOUS,
> +	TRBE_FAULT_ACT_FATAL,
> +};
> +
> +struct trbe_perf {

Please rename this to trbe_buf. This will be used for sysfs mode as well.

> +	unsigned long trbe_base;
> +	unsigned long trbe_limit;
> +	unsigned long trbe_write;
> +	pid_t pid;

Why do we need this ? This seems unused and moreover, there cannot
be multiple tracers into TRBE. So, we don't need to share the sink
unlike the traditional ones.

> +	int nr_pages;
> +	void **pages;
> +	bool snapshot;
> +	struct trbe_cpudata *cpudata;
> +};
> +
> +struct trbe_cpudata {
> +	struct coresight_device	*csdev;
> +	bool trbe_dbm;

Why do we need this ?

> +	u64 trbe_align;
> +	int cpu;
> +	enum cs_mode mode;
> +	struct trbe_perf *perf;
> +	struct trbe_drvdata *drvdata;
> +};
> +
> +struct trbe_drvdata {
> +	struct trbe_cpudata __percpu *cpudata;
> +	struct perf_output_handle __percpu *handle;

Shouldn't this be :

	struct perf_output_handle __percpu **handle ?

as we get a handle from the etm-perf and is not controlled by
the TRBE ?

> +	struct hlist_node hotplug_node;
> +	int irq;
> +	cpumask_t supported_cpus;
> +	enum cpuhp_state trbe_online;
> +	struct platform_device *pdev;
> +	struct clk *atclk;

We don't have any clocks for the TRBE instance. Please remove.

> +};
> +
> +static int trbe_alloc_node(struct perf_event *event)
> +{
> +	if (event->cpu == -1)
> +		return NUMA_NO_NODE;
> +	return cpu_to_node(event->cpu);
> +}
> +
> +static void trbe_disable_and_drain_local(void)
> +{
> +	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +	isb();
> +	dsb(nsh);
> +	asm(TSB_CSYNC);
> +}
> +
> +static void trbe_reset_local(void)
> +{
> +	trbe_disable_and_drain_local();
> +	write_sysreg_s(0, SYS_TRBPTR_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBBASER_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBSR_EL1);
> +	isb();
> +}
> +
> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +
> +	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
> +	if (!perf->snapshot)
> +		perf_aux_output_skip(handle, len);
> +}
> +
> +static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +	u64 limit = perf->nr_pages * PAGE_SIZE;
> +

So we are using half of the buffer for snapshot mode to avoid a case where the
analyzer is unable to decode the trace in case of an overflow.

> +	if (head < limit >> 1)
> +		limit >>= 1;

Also this needs to be thought out. We may not need this restriction. The trace decoder
will be able to walk forward and then find a synchronization packet and then continue
the tracing from there. So, we could use the entire buffer for TRBE.


> +
> +	return limit;
> +}
> +
> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
> +	u64 limit = bufsize;
> +	u64 head, tail, wakeup;
> +

Commentary please.

> +	head = PERF_IDX2OFF(handle->head, perf);
> +	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
> +		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
> +
> +		delta = min(delta, handle->size);
> +		trbe_pad_buf(handle, delta);
> +		head = PERF_IDX2OFF(handle->head, perf);
> +	}
> +
> +	if (!handle->size) {
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +		return 0;
> +	}
> +
> +	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
> +	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
> +

> +	if (head < tail)

  comment

> +		limit = round_down(tail, PAGE_SIZE);
> +
> +	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
> +		limit = min(limit, round_up(wakeup, PAGE_SIZE));

comment. Also do we need an alignement to PAGE_SIZE ?

> +
> +	if (limit > head)
> +		return limit;
> +
> +	trbe_pad_buf(handle, handle->size);
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return 0;
> +}
> +
> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset;
> +
> +	if (perf->snapshot)
> +		offset = trbe_snapshot_offset(handle);
> +	else
> +		offset = trbe_normal_offset(handle);
> +	return perf->trbe_base + offset;
> +}
> +
> +static void trbe_enable_hw(struct trbe_perf *perf)
> +{
> +	WARN_ON(perf->trbe_write < perf->trbe_base);
> +	WARN_ON(perf->trbe_write >= perf->trbe_limit);
> +	set_trbe_disabled();
> +	clr_trbe_irq();
> +	clr_trbe_wrap();
> +	clr_trbe_abort();
> +	clr_trbe_ec();
> +	clr_trbe_bsc();
> +	clr_trbe_fsc();

Please merge all of these field updates to single register update
unless mandated by the architecture.

> +	set_trbe_virtual_mode();
> +	set_trbe_fill_mode(TRBE_FILL_STOP);
> +	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);

Same here ^^

> +	isb();
> +	set_trbe_base_pointer(perf->trbe_base);
> +	set_trbe_limit_pointer(perf->trbe_limit);
> +	set_trbe_write_pointer(perf->trbe_write);
> +	isb();
> +	dsb(ishst);
> +	flush_tlb_all();

Why is this needed ?

> +	set_trbe_running();
> +	set_trbe_enabled();
> +	asm(TSB_CSYNC);
> +}
> +
> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> +				   struct perf_event *event, void **pages,
> +				   int nr_pages, bool snapshot)
> +{
> +	struct trbe_perf *perf;
> +	struct page **pglist;
> +	int i;
> +
> +	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))

We may be able to remove the restriction on snapshot mode, see my comment
above.

> +		return NULL;
> +
> +	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
> +	if (IS_ERR(perf))
> +		return ERR_PTR(-ENOMEM);
> +
> +	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
> +	if (IS_ERR(pglist)) {
> +		kfree(perf);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	for (i = 0; i < nr_pages; i++)
> +		pglist[i] = virt_to_page(pages[i]);
> +
> +	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> +	if (IS_ERR((void *) perf->trbe_base)) {
> +		kfree(pglist);
> +		kfree(perf);
> +		return ERR_PTR(perf->trbe_base);
> +	}
> +	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
> +	perf->trbe_write = perf->trbe_base;
> +	perf->pid = task_pid_nr(event->owner);
> +	perf->snapshot = snapshot;
> +	perf->nr_pages = nr_pages;
> +	perf->pages = pages;
> +	kfree(pglist);
> +	return perf;
> +}
> +
> +void arm_trbe_free_buffer(void *config)
> +{
> +	struct trbe_perf *perf = config;
> +
> +	vunmap((void *) perf->trbe_base);
> +	kfree(perf);
> +}
> +
> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> +					    struct perf_output_handle *handle,
> +					    void *config)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = config;
> +	unsigned long size, offset;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head += size;
> +	return size;
> +}
> +
> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct perf_output_handle *handle = data;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(mode != CS_MODE_PERF);

Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure
the mode is DISABLED (when you get to sysfs mode).

> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	*this_cpu_ptr(drvdata->handle) = *handle;

That is wrong. Storing a local copy of a global perf generic structure
is calling for trouble, assuming that the global structure doesn't change
beneath us. Please store handle ptr.

> +	cpudata->perf = perf;
> +	cpudata->mode = mode;
> +	perf->cpudata = cpudata;
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return 0;
> +	}
> +	trbe_enable_hw(perf);
> +	return 0;
> +}
> +
> +static int arm_trbe_disable(struct coresight_device *csdev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = cpudata->perf;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	trbe_disable_and_drain_local();
> +	perf->cpudata = NULL;
> +	cpudata->perf = NULL;
> +	cpudata->mode = CS_MODE_DISABLED;
> +	return 0;
> +}
> +
> +static void trbe_handle_fatal(struct perf_output_handle *handle)
> +{
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	perf_aux_output_end(handle, 0);
> +	trbe_disable_and_drain_local();
> +}
> +
> +static void trbe_handle_spurious(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	trbe_enable_hw(perf);
> +}
> +
> +static void trbe_handle_overflow(struct perf_output_handle *handle)
> +{
> +	struct perf_event *event = handle->event;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset, size;
> +	struct etm_event_data *event_data;
> +
> +	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head = offset;

Is this correct ? Or was this supposed to mean :
		handle->head += offset;


> +	perf_aux_output_end(handle, size);
> +
> +	event_data = perf_aux_output_begin(handle, event);
> +	if (!event_data) {
> +		event->hw.state |= PERF_HES_STOPPED;
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	perf->trbe_write = perf->trbe_base;
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
> +	trbe_enable_hw(perf);
> +}
> +
> +static bool is_perf_trbe(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	struct trbe_drvdata *drvdata = cpudata->drvdata;

Can you trust the cpudata ptr here as we are still verifying
if this was legitimate ?

> +	int cpu = smp_processor_id();
> +
> +	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
> +	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
> +
> +	if (cpudata->mode != CS_MODE_PERF)
> +		return false;
> +
> +	if (cpudata->cpu != cpu)
> +		return false;
> +
> +	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
> +		return false;
> +
> +	return true;
> +}
> +
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
> +{
> +	enum trbe_ec ec = get_trbe_ec();
> +	enum trbe_bsc bsc = get_trbe_bsc();
> +
> +	WARN_ON(is_trbe_running());
> +	asm(TSB_CSYNC);
> +	dsb(nsh);
> +	isb();
> +
> +	if (is_trbe_trg() || is_trbe_abort())
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
> +		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> +			return TRBE_FAULT_ACT_WRAP;
> +	}
> +	return TRBE_FAULT_ACT_SPURIOUS;
> +}
> +
> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
> +{
> +	struct perf_output_handle *handle = dev;
> +	enum trbe_fault_action act;
> +
> +	WARN_ON(!is_trbe_irq());
> +	clr_trbe_irq();
> +
> +	if (!perf_get_aux(handle))
> +		return IRQ_NONE;
> +
> +	if (!is_perf_trbe(handle))
> +		return IRQ_NONE;
> +
> +	irq_work_run();
> +
> +	act = trbe_get_fault_act(handle);
> +	switch (act) {
> +	case TRBE_FAULT_ACT_WRAP:
> +		trbe_handle_overflow(handle);
> +		break;
> +	case TRBE_FAULT_ACT_SPURIOUS:
> +		trbe_handle_spurious(handle);
> +		break;
> +	case TRBE_FAULT_ACT_FATAL:
> +		trbe_handle_fatal(handle);
> +		break;
> +	}
> +	return IRQ_HANDLED;
> +}
> +


> +static void arm_trbe_probe_coresight_cpu(void *info)
> +{
> +	struct trbe_cpudata *cpudata = info;
> +	struct device *dev = &cpudata->drvdata->pdev->dev;
> +	struct coresight_desc desc = { 0 };
> +
> +	if (WARN_ON(!cpudata))
> +		goto cpu_clear;
> +
> +	if (!is_trbe_available()) {
> +		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +
> +	if (!is_trbe_programmable()) {
> +		pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
> +	if (IS_ERR(desc.name))
> +		goto cpu_clear;
> +
> +	desc.type = CORESIGHT_DEV_TYPE_SINK;
> +	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;

May be should add a new subtype to make this higher priority than the normal ETR.
Something like :

	CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM

> +	desc.ops = &arm_trbe_cs_ops;
> +	desc.pdata = dev_get_platdata(dev);
> +	desc.groups = arm_trbe_groups;
> +	desc.dev = dev;
> +	cpudata->csdev = coresight_register(&desc);
> +	if (IS_ERR(cpudata->csdev))
> +		goto cpu_clear;
> +
> +	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
> +	cpudata->trbe_dbm = get_trbe_flag_update();
> +	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> +	if (cpudata->trbe_align > SZ_2K) {
> +		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	return;
> +cpu_clear:
> +	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
> +}
> +
> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
> +	if (IS_ERR(drvdata->cpudata))
> +		return PTR_ERR(drvdata->cpudata);
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		cpudata->cpu = cpu;
> +		cpudata->drvdata = drvdata;
> +		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);

We could batch it and run it on all CPUs at the same time ? Also it would be better to
leave the per_cpu area filled by the CPU itself, to avoid racing.


> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_coresight_cpu(void *info)
> +{
> +	struct trbe_drvdata *drvdata = info;
> +
> +	disable_percpu_irq(drvdata->irq);
> +}
> +
> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}

Please leave this to the CPU to do the part.

> +	}
> +	free_percpu(drvdata->cpudata);
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (!cpudata->csdev) {
> +			cpudata->drvdata = drvdata;
> +			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);

Why do we need smp_call here ? We are already on the CPU.

> +		}
> +		trbe_reset_local();
> +		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +		disable_percpu_irq(drvdata->irq);
> +		trbe_reset_local();
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	enum cpuhp_state trbe_online;
> +
> +	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
> +					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
> +	if (trbe_online < 0)
> +		return -EINVAL;
> +
> +	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
> +		return -EINVAL;
> +
> +	drvdata->trbe_online = trbe_online;
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	cpuhp_remove_multi_state(drvdata->trbe_online);
> +}
> +
> +static int arm_trbe_probe_irq(struct platform_device *pdev,
> +			      struct trbe_drvdata *drvdata)
> +{
> +	drvdata->irq = platform_get_irq(pdev, 0);
> +	if (!drvdata->irq) {
> +		pr_err("IRQ not found for the platform device\n");
> +		return -ENXIO;
> +	}
> +
> +	if (!irq_is_percpu(drvdata->irq)) {
> +		pr_err("IRQ is not a PPI\n");
> +		return -EINVAL;
> +	}
> +
> +	if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
> +		return -EINVAL;
> +
> +	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
> +	if (!drvdata->handle)
> +		return -ENOMEM;
> +
> +	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
> +		free_percpu(drvdata->handle);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
> +{
> +	free_percpu_irq(drvdata->irq, drvdata->handle);
> +	free_percpu(drvdata->handle);
> +}
> +
> +static int arm_trbe_device_probe(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata;
> +	struct trbe_drvdata *drvdata;
> +	struct device *dev = &pdev->dev;
> +	int ret;
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (IS_ERR(drvdata))
> +		return -ENOMEM;
> +
> +	pdata = coresight_get_platform_data(dev);
> +	if (IS_ERR(pdata)) {
> +		kfree(drvdata);
> +		return -ENOMEM;
> +	}


> +
> +	drvdata->atclk = devm_clk_get(dev, "atclk");
> +	if (!IS_ERR(drvdata->atclk)) {
> +		ret = clk_prepare_enable(drvdata->atclk);
> +		if (ret)
> +			return ret;
> +	}

Please drop the clocks, we don't have any

> +	dev_set_drvdata(dev, drvdata);
> +	dev->platform_data = pdata;
> +	drvdata->pdev = pdev;
> +	ret = arm_trbe_probe_irq(pdev, drvdata);
> +	if (ret)
> +		goto irq_failed;
> +
> +	ret = arm_trbe_probe_coresight(drvdata);
> +	if (ret)
> +		goto probe_failed;
> +
> +	ret = arm_trbe_probe_cpuhp(drvdata);
> +	if (ret)
> +		goto cpuhp_failed;
> +
> +	return 0;
> +cpuhp_failed:
> +	arm_trbe_remove_coresight(drvdata);
> +probe_failed:
> +	arm_trbe_remove_irq(drvdata);
> +irq_failed:
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return ret;
> +}
> +
> +static int arm_trbe_device_remove(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
> +	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
> +
> +	arm_trbe_remove_coresight(drvdata);
> +	arm_trbe_remove_cpuhp(drvdata);
> +	arm_trbe_remove_irq(drvdata);
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM
> +static int arm_trbe_runtime_suspend(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_disable_unprepare(drvdata->atclk);
> +

Remove. We may need to save/restore the TRBE ptrs, depending on the
TRBE.

> +	return 0;
> +}
> +
> +static int arm_trbe_runtime_resume(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_prepare_enable(drvdata->atclk);

Remove. See above.

> +
> +	return 0;
> +}
> +#endif
> +
> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
> +};
> +
> +static const struct of_device_id arm_trbe_of_match[] = {
> +	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
> +	{},
> +};

I think it is better to call this, we have too many acronyms ;-)

	"arm,trace-buffer-extension"

> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);

> +
> +static const struct platform_device_id arm_trbe_match[] = {
> +	{ "arm,trbe", 0},
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);

Please remove. The ACPI part can be added when we get to it.

> +
> +static struct platform_driver arm_trbe_driver = {
> +	.id_table = arm_trbe_match,
> +	.driver	= {
> +		.name = DRVNAME,
> +		.of_match_table = of_match_ptr(arm_trbe_of_match),
> +		.pm = &arm_trbe_dev_pm_ops,
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe	= arm_trbe_device_probe,
> +	.remove	= arm_trbe_device_remove,
> +};
> +builtin_platform_driver(arm_trbe_driver)

Please make this modular.


> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
> new file mode 100644
> index 0000000..82ffbfc
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
> @@ -0,0 +1,525 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This contains all required hardware related helper functions for
> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#include <linux/coresight.h>
> +#include <linux/device.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/smp.h>
> +
> +#include "coresight-etm-perf.h"
> +
> +static inline bool is_trbe_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
> +
> +	return trbe >= 0b0001;
> +}
> +
> +static inline bool is_ete_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
> +
> +	return (tracever != 0b0000);

Why is this needed ?

> +}
> +
> +static inline bool is_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_ENABLE;
> +}
> +
> +enum trbe_ec {
> +	TRBE_EC_OTHERS		= 0,
> +	TRBE_EC_STAGE1_ABORT	= 36,
> +	TRBE_EC_STAGE2_ABORT	= 37,
> +};
> +
> +static const char *const trbe_ec_str[] = {
> +	[TRBE_EC_OTHERS]	= "Maintenance exception",
> +	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
> +	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
> +};
> +

Please remove the defintions that are not used by the driver.

> +static inline enum trbe_ec get_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
> +}
> +
> +static inline void clr_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_bsc {
> +	TRBE_BSC_NOT_STOPPED	= 0,
> +	TRBE_BSC_FILLED		= 1,
> +	TRBE_BSC_TRIGGERED	= 2,
> +};
> +
> +static const char *const trbe_bsc_str[] = {
> +	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
> +	[TRBE_BSC_FILLED]	= "TRBE filled",
> +	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
> +};
> +
> +static inline enum trbe_bsc get_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
> +}
> +
> +static inline void clr_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_fsc {
> +	TRBE_FSC_ASF_LEVEL0	= 0,
> +	TRBE_FSC_ASF_LEVEL1	= 1,
> +	TRBE_FSC_ASF_LEVEL2	= 2,
> +	TRBE_FSC_ASF_LEVEL3	= 3,
> +	TRBE_FSC_TF_LEVEL0	= 4,
> +	TRBE_FSC_TF_LEVEL1	= 5,
> +	TRBE_FSC_TF_LEVEL2	= 6,
> +	TRBE_FSC_TF_LEVEL3	= 7,
> +	TRBE_FSC_AFF_LEVEL0	= 8,
> +	TRBE_FSC_AFF_LEVEL1	= 9,
> +	TRBE_FSC_AFF_LEVEL2	= 10,
> +	TRBE_FSC_AFF_LEVEL3	= 11,
> +	TRBE_FSC_PF_LEVEL0	= 12,
> +	TRBE_FSC_PF_LEVEL1	= 13,
> +	TRBE_FSC_PF_LEVEL2	= 14,
> +	TRBE_FSC_PF_LEVEL3	= 15,
> +	TRBE_FSC_SEA_WRITE	= 16,
> +	TRBE_FSC_ASEA_WRITE	= 17,
> +	TRBE_FSC_SEA_LEVEL0	= 20,
> +	TRBE_FSC_SEA_LEVEL1	= 21,
> +	TRBE_FSC_SEA_LEVEL2	= 22,
> +	TRBE_FSC_SEA_LEVEL3	= 23,
> +	TRBE_FSC_ALIGN_FAULT	= 33,
> +	TRBE_FSC_TLB_FAULT	= 48,
> +	TRBE_FSC_ATOMIC_FAULT	= 49,
> +};

Please remove ^^^

> +
> +static const char *const trbe_fsc_str[] = {
> +	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
> +	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
> +	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
> +	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
> +	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
> +	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
> +	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
> +	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
> +	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
> +	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
> +	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
> +	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
> +	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
> +	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
> +	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
> +	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
> +	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
> +	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
> +	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 0",
> +	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 1",
> +	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 2",
> +	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 3",
> +	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
> +	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
> +	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
> +};
> 

Please remove ^^^

>

> +enum trbe_address_mode {
> +	TRBE_ADDRESS_VIRTUAL,
> +	TRBE_ADDRESS_PHYSICAL,
> +};

#define please.

> +
> +static const char *const trbe_address_mode_str[] = {
> +	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
> +	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
> +};

Do we need this ? We always use virtual.

> +
> +static inline bool is_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return !(trblimitr & TRBLIMITR_NVM);
> +}
> +

Remove

> +static inline bool is_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_NVM;
> +}

Remove

> +
> +static inline void set_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +

> +static inline void set_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}

Remove

> +
> +enum trbe_trig_mode {
> +	TRBE_TRIGGER_STOP	= 0,
> +	TRBE_TRIGGER_IRQ	= 1,
> +	TRBE_TRIGGER_IGNORE	= 3,
> +};
> +
> +static const char *const trbe_trig_mode_str[] = {
> +	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
> +	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
> +	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
> +};
> +
> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
> +}
> +
> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_fill_mode {
> +	TRBE_FILL_STOP		= 0,
> +	TRBE_FILL_WRAP		= 1,
> +	TRBE_FILL_CIRCULAR	= 3,
> +};
> +

Please use #define

> +static const char *const trbe_fill_mode_str[] = {
> +	[TRBE_FILL_STOP]	= "Buffer mode - stop",
> +	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
> +	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
> +};
> +
> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
> +}
> +
> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_disabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline bool get_trbe_flag_update(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return trbidr & TRBIDR_FLAG;
> +}
> +
> +static inline bool is_trbe_programmable(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return !(trbidr & TRBIDR_PROG);
> +}
> +#
> +enum trbe_buffer_align {
> +	TRBE_BUFFER_BYTE,
> +	TRBE_BUFFER_HALF_WORD,
> +	TRBE_BUFFER_WORD,
> +	TRBE_BUFFER_DOUBLE_WORD,
> +	TRBE_BUFFER_16_BYTES,
> +	TRBE_BUFFER_32_BYTES,
> +	TRBE_BUFFER_64_BYTES,
> +	TRBE_BUFFER_128_BYTES,
> +	TRBE_BUFFER_256_BYTES,
> +	TRBE_BUFFER_512_BYTES,
> +	TRBE_BUFFER_1K_BYTES,
> +	TRBE_BUFFER_2K_BYTES,
> +};
> +

Remove ^^

> +static const char *const trbe_buffer_align_str[] = {
> +	[TRBE_BUFFER_BYTE]		= "Byte",
> +	[TRBE_BUFFER_HALF_WORD]		= "Half word",
> +	[TRBE_BUFFER_WORD]		= "Word",
> +	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
> +	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
> +	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
> +	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
> +	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
> +	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
> +	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
> +	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
> +	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
> +};

We don't need any of this. We could simply "<<" and get the
size.


> +
> +static inline enum trbe_buffer_align get_trbe_address_align(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
> +}
> +
> +static inline void assert_trbe_address_mode(unsigned long addr)
> +{
> +	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
> +	bool virt_mode = is_trbe_virtual_mode();
> +
> +	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
> +}

I am not sure if this is really helpful. You have to trust the kernel vmalloc().

> +
> +static inline void assert_trbe_address_align(unsigned long addr)
> +{
> +	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
> +
> +	WARN_ON(addr & (nr_bytes - 1));
> +}
> +
> +static inline unsigned long get_trbe_write_pointer(void)
> +{
> +	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
> +	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_write_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +	write_sysreg_s(addr, SYS_TRBPTR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_limit_pointer(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & TRBLIMITR_LIMIT_MASK;
> +	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
> +
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_limit_pointer(unsigned long addr)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
> +	trblimitr |= (addr & PAGE_MASK);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_base_pointer(void)
> +{
> +	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
> +	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & TRBBASER_BASE_MASK;
> +
> +	addr = addr << TRBBASER_BASE_SHIFT;
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_base_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	write_sysreg_s(addr, SYS_TRBBASER_EL1);
> +}
> 

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
  2020-11-12  9:21     ` Suzuki K Poulose
@ 2020-11-12 10:37       ` Linu Cherian
  -1 siblings, 0 replies; 72+ messages in thread
From: Linu Cherian @ 2020-11-12 10:37 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Anshuman Khandual, linux-arm-kernel, Coresight ML, linux-kernel,
	Mathieu Poirier, Mike Leach, Linu Cherian

Hi Suzuki,

On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> Hi Linu,
>
> Please could you test this slightly modified version and give us
> a Tested-by tag if you are happy with the results ?
>
> Suzuki
>
>
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> >
> > When there are multiple sinks on the system, in the absence
> > of a specified sink, it is quite possible that a default sink
> > for an ETM could be different from that of another ETM. However
> > we do not support having multiple sinks for an event yet. This
> > patch allows the event to use the default sinks on the ETMs
> > where they are scheduled as long as the sinks are of the same
> > type.
> >
> > e.g, if we have 1x1 topology with per-CPU ETRs, the event can
> > use the per-CPU ETR for the session. However, if the sinks
> > are of different type, e.g TMC-ETR on one and a custom sink
> > on another, the event will only trace on the first detected
> > sink.
> >
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> >   drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
> >   1 file changed, 39 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index c2c9b12..ea73cfa 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
> >       schedule_work(&event_data->work);
> >   }
> >
> > +static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
> > +{
> > +     if (!a || !b)
> > +             return false;
> > +     return (sink_ops(a) == sink_ops(b));
> > +}
> > +
> >   static void *etm_setup_aux(struct perf_event *event, void **pages,
> >                          int nr_pages, bool overwrite)
> >   {
> >       u32 id;
> >       int cpu = event->cpu;
> >       cpumask_t *mask;
> > -     struct coresight_device *sink;
> > +     struct coresight_device *sink = NULL;
> >       struct etm_event_data *event_data = NULL;
> > +     bool sink_forced = false;
> >
> >       event_data = alloc_event_data(cpu);
> >       if (!event_data)
> > @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >       if (event->attr.config2) {
> >               id = (u32)event->attr.config2;
> >               sink = coresight_get_sink_by_id(id);
> > +             sink_forced = true;
> >       }
> >
> >       mask = &event_data->mask;
> > @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >        */
> >       for_each_cpu(cpu, mask) {
> >               struct list_head *path;
> > -             struct coresight_device *csdev;
> > +             struct coresight_device *csdev, *new_sink;
> >
> >               csdev = per_cpu(csdev_src, cpu);
> >               /*
> > @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >               }
> >
> >               /*
> > -              * No sink provided - look for a default sink for one of the
> > -              * devices. At present we only support topology where all CPUs
> > -              * use the same sink [N:1], so only need to find one sink. The
> > -              * coresight_build_path later will remove any CPU that does not
> > -              * attach to the sink, or if we have not found a sink.
> > +              * No sink provided - look for a default sink for all the devices.
> > +              * We only support multiple sinks, only if all the default sinks
> > +              * are of the same type, so that the sink buffer can be shared
> > +              * as the event moves around. We don't trace on a CPU if it can't
> > +              *
> >                */
> > -             if (!sink)
> > -                     sink = coresight_find_default_sink(csdev);
> > +             if (!sink_forced) {
> > +                     new_sink = coresight_find_default_sink(csdev);
> > +                     if (!new_sink) {
> > +                             cpumask_clear_cpu(cpu, mask);
> > +                             continue;
> > +                     }
> > +                     /* Skip checks for the first sink */
> > +                     if (!sink) {
> > +                             sink = new_sink;
> > +                     } else if (!sinks_match(new_sink, sink)) {
> > +                             cpumask_clear_cpu(cpu, mask);
> > +                             continue;
> > +                     }
> > +             } else {
> > +                     new_sink = sink;
> > +             }
> >
> >               /*
> >                * Building a path doesn't enable it, it simply builds a
> >                * list of devices from source to sink that can be
> >                * referenced later when the path is actually needed.
> >                */
> > -             path = coresight_build_path(csdev, sink);
> > +             path = coresight_build_path(csdev, new_sink);
> >               if (IS_ERR(path)) {
> >                       cpumask_clear_cpu(cpu, mask);
> >                       continue;
> > @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >       if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
> >               goto err;
> >
> > -     /* Allocate the sink buffer for this session */
> > +     /*
> > +      * Allocate the sink buffer for this session. All the sinks
> > +      * where this event can be scheduled are ensured to be of the
> > +      * same type. Thus the same sink configuration is used by the
> > +      * sinks.
> > +      */
> >       event_data->snk_config =
> >                       sink_ops(sink)->alloc_buffer(sink, event, pages,
> >                                                    nr_pages, overwrite);
> >
>

Perf record and report worked fine with this as well, with formatting
related opencsd hacks.

Tested-by : Linu Cherian <lcherian@marvell.com>

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
@ 2020-11-12 10:37       ` Linu Cherian
  0 siblings, 0 replies; 72+ messages in thread
From: Linu Cherian @ 2020-11-12 10:37 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Mathieu Poirier, Anshuman Khandual, Coresight ML, linux-kernel,
	Linu Cherian, linux-arm-kernel, Mike Leach

Hi Suzuki,

On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> Hi Linu,
>
> Please could you test this slightly modified version and give us
> a Tested-by tag if you are happy with the results ?
>
> Suzuki
>
>
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> >
> > When there are multiple sinks on the system, in the absence
> > of a specified sink, it is quite possible that a default sink
> > for an ETM could be different from that of another ETM. However
> > we do not support having multiple sinks for an event yet. This
> > patch allows the event to use the default sinks on the ETMs
> > where they are scheduled as long as the sinks are of the same
> > type.
> >
> > e.g, if we have 1x1 topology with per-CPU ETRs, the event can
> > use the per-CPU ETR for the session. However, if the sinks
> > are of different type, e.g TMC-ETR on one and a custom sink
> > on another, the event will only trace on the first detected
> > sink.
> >
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> >   drivers/hwtracing/coresight/coresight-etm-perf.c | 50 ++++++++++++++++++------
> >   1 file changed, 39 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index c2c9b12..ea73cfa 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -204,14 +204,22 @@ static void etm_free_aux(void *data)
> >       schedule_work(&event_data->work);
> >   }
> >
> > +static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
> > +{
> > +     if (!a || !b)
> > +             return false;
> > +     return (sink_ops(a) == sink_ops(b));
> > +}
> > +
> >   static void *etm_setup_aux(struct perf_event *event, void **pages,
> >                          int nr_pages, bool overwrite)
> >   {
> >       u32 id;
> >       int cpu = event->cpu;
> >       cpumask_t *mask;
> > -     struct coresight_device *sink;
> > +     struct coresight_device *sink = NULL;
> >       struct etm_event_data *event_data = NULL;
> > +     bool sink_forced = false;
> >
> >       event_data = alloc_event_data(cpu);
> >       if (!event_data)
> > @@ -222,6 +230,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >       if (event->attr.config2) {
> >               id = (u32)event->attr.config2;
> >               sink = coresight_get_sink_by_id(id);
> > +             sink_forced = true;
> >       }
> >
> >       mask = &event_data->mask;
> > @@ -235,7 +244,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >        */
> >       for_each_cpu(cpu, mask) {
> >               struct list_head *path;
> > -             struct coresight_device *csdev;
> > +             struct coresight_device *csdev, *new_sink;
> >
> >               csdev = per_cpu(csdev_src, cpu);
> >               /*
> > @@ -249,21 +258,35 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >               }
> >
> >               /*
> > -              * No sink provided - look for a default sink for one of the
> > -              * devices. At present we only support topology where all CPUs
> > -              * use the same sink [N:1], so only need to find one sink. The
> > -              * coresight_build_path later will remove any CPU that does not
> > -              * attach to the sink, or if we have not found a sink.
> > +              * No sink provided - look for a default sink for all the devices.
> > +              * We only support multiple sinks, only if all the default sinks
> > +              * are of the same type, so that the sink buffer can be shared
> > +              * as the event moves around. We don't trace on a CPU if it can't
> > +              *
> >                */
> > -             if (!sink)
> > -                     sink = coresight_find_default_sink(csdev);
> > +             if (!sink_forced) {
> > +                     new_sink = coresight_find_default_sink(csdev);
> > +                     if (!new_sink) {
> > +                             cpumask_clear_cpu(cpu, mask);
> > +                             continue;
> > +                     }
> > +                     /* Skip checks for the first sink */
> > +                     if (!sink) {
> > +                             sink = new_sink;
> > +                     } else if (!sinks_match(new_sink, sink)) {
> > +                             cpumask_clear_cpu(cpu, mask);
> > +                             continue;
> > +                     }
> > +             } else {
> > +                     new_sink = sink;
> > +             }
> >
> >               /*
> >                * Building a path doesn't enable it, it simply builds a
> >                * list of devices from source to sink that can be
> >                * referenced later when the path is actually needed.
> >                */
> > -             path = coresight_build_path(csdev, sink);
> > +             path = coresight_build_path(csdev, new_sink);
> >               if (IS_ERR(path)) {
> >                       cpumask_clear_cpu(cpu, mask);
> >                       continue;
> > @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >       if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
> >               goto err;
> >
> > -     /* Allocate the sink buffer for this session */
> > +     /*
> > +      * Allocate the sink buffer for this session. All the sinks
> > +      * where this event can be scheduled are ensured to be of the
> > +      * same type. Thus the same sink configuration is used by the
> > +      * sinks.
> > +      */
> >       event_data->snk_config =
> >                       sink_ops(sink)->alloc_buffer(sink, event, pages,
> >                                                    nr_pages, overwrite);
> >
>

Perf record and report worked fine with this as well, with formatting
related opencsd hacks.

Tested-by : Linu Cherian <lcherian@marvell.com>

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
  2020-11-12 10:37       ` Linu Cherian
@ 2020-11-12 11:09         ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12 11:09 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Anshuman Khandual, linux-arm-kernel, Coresight ML, linux-kernel,
	Mathieu Poirier, Mike Leach, Linu Cherian

On 11/12/20 10:37 AM, Linu Cherian wrote:
> Hi Suzuki,
> 
> On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>> Hi Linu,
>>
>> Please could you test this slightly modified version and give us
>> a Tested-by tag if you are happy with the results ?
>>
>> Suzuki
>>
>>
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>> When there are multiple sinks on the system, in the absence
>>> of a specified sink, it is quite possible that a default sink
>>> for an ETM could be different from that of another ETM. However
>>> we do not support having multiple sinks for an event yet. This
>>> patch allows the event to use the default sinks on the ETMs
>>> where they are scheduled as long as the sinks are of the same
>>> type.
>>>
>>> e.g, if we have 1x1 topology with per-CPU ETRs, the event can
>>> use the per-CPU ETR for the session. However, if the sinks
>>> are of different type, e.g TMC-ETR on one and a custom sink
>>> on another, the event will only trace on the first detected
>>> sink.
>>>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---


>>> @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>>>        if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
>>>                goto err;
>>>
>>> -     /* Allocate the sink buffer for this session */
>>> +     /*
>>> +      * Allocate the sink buffer for this session. All the sinks
>>> +      * where this event can be scheduled are ensured to be of the
>>> +      * same type. Thus the same sink configuration is used by the
>>> +      * sinks.
>>> +      */
>>>        event_data->snk_config =
>>>                        sink_ops(sink)->alloc_buffer(sink, event, pages,
>>>                                                     nr_pages, overwrite);
>>>
>>
> 
> Perf record and report worked fine with this as well, with formatting
> related opencsd hacks.
> 
> Tested-by : Linu Cherian <lcherian@marvell.com>

Thanks Linu, much appreciated.

Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks
@ 2020-11-12 11:09         ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-12 11:09 UTC (permalink / raw)
  To: Linu Cherian
  Cc: Mathieu Poirier, Anshuman Khandual, Coresight ML, linux-kernel,
	Linu Cherian, linux-arm-kernel, Mike Leach

On 11/12/20 10:37 AM, Linu Cherian wrote:
> Hi Suzuki,
> 
> On Thu, Nov 12, 2020 at 2:51 PM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>> Hi Linu,
>>
>> Please could you test this slightly modified version and give us
>> a Tested-by tag if you are happy with the results ?
>>
>> Suzuki
>>
>>
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>> When there are multiple sinks on the system, in the absence
>>> of a specified sink, it is quite possible that a default sink
>>> for an ETM could be different from that of another ETM. However
>>> we do not support having multiple sinks for an event yet. This
>>> patch allows the event to use the default sinks on the ETMs
>>> where they are scheduled as long as the sinks are of the same
>>> type.
>>>
>>> e.g, if we have 1x1 topology with per-CPU ETRs, the event can
>>> use the per-CPU ETR for the session. However, if the sinks
>>> are of different type, e.g TMC-ETR on one and a custom sink
>>> on another, the event will only trace on the first detected
>>> sink.
>>>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---


>>> @@ -284,7 +307,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>>>        if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
>>>                goto err;
>>>
>>> -     /* Allocate the sink buffer for this session */
>>> +     /*
>>> +      * Allocate the sink buffer for this session. All the sinks
>>> +      * where this event can be scheduled are ensured to be of the
>>> +      * same type. Thus the same sink configuration is used by the
>>> +      * sinks.
>>> +      */
>>>        event_data->snk_config =
>>>                        sink_ops(sink)->alloc_buffer(sink, event, pages,
>>>                                                     nr_pages, overwrite);
>>>
>>
> 
> Perf record and report worked fine with this as well, with formatting
> related opencsd hacks.
> 
> Tested-by : Linu Cherian <lcherian@marvell.com>

Thanks Linu, much appreciated.

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-10 12:44 ` Anshuman Khandual
@ 2020-11-14  5:17   ` Tingwei Zhang
  -1 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:17 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-arm-kernel, coresight, mike.leach, linux-kernel

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
> 
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
> 
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
> 
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.

Only perf mode is supported in TRBE in current path. Will you consider
support sysfs mode as well in following patch sets?

Thanks,
Tingwei

> 
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
> 
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> 
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
> 
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
> 
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
> 
> [0] 
> https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] 
> https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] 
> https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> 
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
> 
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
> 
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 
> +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-14  5:17   ` Tingwei Zhang
  0 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:17 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
> 
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
> 
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
> 
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.

Only perf mode is supported in TRBE in current path. Will you consider
support sysfs mode as well in following patch sets?

Thanks,
Tingwei

> 
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
> 
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> 
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
> 
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
> 
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
> 
> [0] 
> https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] 
> https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] 
> https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> 
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
> 
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
> 
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 
> +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-14  5:36     ` Tingwei Zhang
  -1 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:36 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-arm-kernel, coresight, mike.leach, linux-kernel

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Add ETE as one of the supported device types we support
> with ETM4x driver. The devices are named following the
> existing convention as ete<N>.
> 
> ETE mandates that the trace resource status register is programmed
> before the tracing is turned on. For the moment simply write to
> it indicating TraceActive.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  .../devicetree/bindings/arm/coresight.txt          |  3 ++
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 
> +++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  7 +++
>  3 files changed, 52 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt 
> b/Documentation/devicetree/bindings/arm/coresight.txt
> index bff96a5..784cc1b 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -40,6 +40,9 @@ its hardware characteristcs.
>  		- Embedded Trace Macrocell with system register access only.
>  			"arm,coresight-etm-sysreg";
> 
> +		- Embedded Trace Extensions.
> +			"arm,ete"
> +
>  		- Coresight programmable Replicator :
>  			"arm,coresight-dynamic-replicator", "arm,primecell";
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c 
> b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 15b6e94..0fea349 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata 
> *drvdata)
>  		etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR);
>  	}
> 
> +	/*
> +	 * ETE mandates that the TRCRSR is written to before
> +	 * enabling it.
> +	 */
> +	if (drvdata->arch >= ETM_ARCH_ETE)
> +		etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
> +
>  	/* Enable the trace unit */
>  	etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
> 
> @@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct 
> etmv4_drvdata *drvdata,
>  	 * ETMs implementing sysreg access must implement TRCDEVARCH.
>  	 */
>  	devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH);
> -	if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH)
> +	switch (devarch & ETM_DEVARCH_ID_MASK) {
> +	case ETM_DEVARCH_ETMv4x_ARCH:
> +		*csa = (struct csdev_access) {
> +			.io_mem	= false,
> +			.read	= etm4x_sysreg_read,
> +			.write	= etm4x_sysreg_write,
> +		};
> +		break;
> +	case ETM_DEVARCH_ETE_ARCH:
> +		*csa = (struct csdev_access) {
> +			.io_mem	= false,
> +			.read	= ete_sysreg_read,
> +			.write	= ete_sysreg_write,
> +		};
> +		break;
> +	default:
>  		return false;
> -	*csa = (struct csdev_access) {
> -		.io_mem	= false,
> -		.read	= etm4x_sysreg_read,
> -		.write	= etm4x_sysreg_write,
> -	};
> +	}
> 
>  	drvdata->arch = etm_devarch_to_arch(devarch);
>  	return true;
> @@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem 
> *base)
>  	struct etmv4_drvdata *drvdata;
>  	struct coresight_desc desc = { 0 };
>  	struct etm_init_arg init_arg = { 0 };
> +	u8 major, minor;
> +	char *type_name;
> 
>  	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>  	if (!drvdata)
> @@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void 
> __iomem *base)
>  	if (drvdata->cpu < 0)
>  		return drvdata->cpu;
> 
> -	desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu);
> -	if (!desc.name)
> -		return -ENOMEM;
> -
>  	init_arg.drvdata = drvdata;
>  	init_arg.csa = &desc.access;
> 
> @@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void 
> __iomem *base)
>  	if (!desc.access.io_mem ||
>  	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
>  		drvdata->skip_power_up = true;
> +	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
> +	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
> +	if (drvdata->arch >= ETM_ARCH_ETE) {
> +		type_name = "ete";
> +		major -= 4;
> +	} else {
> +		type_name = "etm";
> +	}
> +
When trace unit supports ETE, could it be still compatible with ETMv4.4?
Can use selectively use it as ETM instead of ETE?

Thanks,
Tingwei

> +	desc.name = devm_kasprintf(dev, GFP_KERNEL,
> +				   "%s%d", type_name, drvdata->cpu);
> +	if (!desc.name)
> +		return -ENOMEM;
> 
>  	etm4_init_trace_id(drvdata);
>  	etm4_set_default(&drvdata->config);
> @@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem 
> *base)
> 
>  	etmdrvdata[drvdata->cpu] = drvdata;
> 
> -	dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n",
> -		 drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch),
> -		 ETM_ARCH_MINOR_VERSION(drvdata->arch));
> +	dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n",
> +		 drvdata->cpu, type_name, major, minor);
> 
>  	if (boot_enable) {
>  		coresight_enable(drvdata->csdev);
> @@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
> 
>  static const struct of_device_id etm_sysreg_match[] = {
>  	{ .compatible	= "arm,coresight-etm-sysreg" },
> +	{ .compatible	= "arm,ete" },
>  	{}
>  };
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h 
> b/drivers/hwtracing/coresight/coresight-etm4x.h
> index 00c0367..05fd0e5 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> @@ -127,6 +127,8 @@
>  #define TRCCIDR2			0xFF8
>  #define TRCCIDR3			0xFFC
> 
> +#define TRCRSR_TA			BIT(12)
> +
>  /*
>   * System instructions to access ETM registers.
>   * See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions
> @@ -570,11 +572,14 @@
>  	((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | 
> ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
> 
>  #define ETM_DEVARCH_ARCHID_ETMv4x		ETM_DEVARCH_MAKE_ARCHID(0x4)
> +#define ETM_DEVARCH_ARCHID_ETE			ETM_DEVARCH_MAKE_ARCHID(0x5)
> 
>  #define ETM_DEVARCH_ID_MASK						\
>  	(ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | 
> ETM_DEVARCH_PRESENT)
>  #define ETM_DEVARCH_ETMv4x_ARCH						\
>  	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | 
> ETM_DEVARCH_PRESENT)
> +#define ETM_DEVARCH_ETE_ARCH						\
> +	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
> 
>  #define TRCSTATR_IDLE_BIT		0
>  #define TRCSTATR_PMSTABLE_BIT		1
> @@ -661,6 +666,8 @@
>  #define ETM_ARCH_MINOR_VERSION(arch)	((arch) & 0xfU)
> 
>  #define ETM_ARCH_V4	ETM_ARCH_VERSION(4, 0)
> +#define ETM_ARCH_ETE	ETM_ARCH_VERSION(5, 0)
> +
>  /* Interpretation of resource numbers change at ETM v4.3 architecture */
>  #define ETM_ARCH_V4_3	ETM_ARCH_VERSION(4, 3)
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
@ 2020-11-14  5:36     ` Tingwei Zhang
  0 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:36 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Add ETE as one of the supported device types we support
> with ETM4x driver. The devices are named following the
> existing convention as ete<N>.
> 
> ETE mandates that the trace resource status register is programmed
> before the tracing is turned on. For the moment simply write to
> it indicating TraceActive.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  .../devicetree/bindings/arm/coresight.txt          |  3 ++
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 55 
> +++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  7 +++
>  3 files changed, 52 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt 
> b/Documentation/devicetree/bindings/arm/coresight.txt
> index bff96a5..784cc1b 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -40,6 +40,9 @@ its hardware characteristcs.
>  		- Embedded Trace Macrocell with system register access only.
>  			"arm,coresight-etm-sysreg";
> 
> +		- Embedded Trace Extensions.
> +			"arm,ete"
> +
>  		- Coresight programmable Replicator :
>  			"arm,coresight-dynamic-replicator", "arm,primecell";
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c 
> b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 15b6e94..0fea349 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -331,6 +331,13 @@ static int etm4_enable_hw(struct etmv4_drvdata 
> *drvdata)
>  		etm4x_relaxed_write32(csa, trcpdcr | TRCPDCR_PU, TRCPDCR);
>  	}
> 
> +	/*
> +	 * ETE mandates that the TRCRSR is written to before
> +	 * enabling it.
> +	 */
> +	if (drvdata->arch >= ETM_ARCH_ETE)
> +		etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
> +
>  	/* Enable the trace unit */
>  	etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
> 
> @@ -763,13 +770,24 @@ static bool etm_init_sysreg_access(struct 
> etmv4_drvdata *drvdata,
>  	 * ETMs implementing sysreg access must implement TRCDEVARCH.
>  	 */
>  	devarch = read_etm4x_sysreg_const_offset(TRCDEVARCH);
> -	if ((devarch & ETM_DEVARCH_ID_MASK) != ETM_DEVARCH_ETMv4x_ARCH)
> +	switch (devarch & ETM_DEVARCH_ID_MASK) {
> +	case ETM_DEVARCH_ETMv4x_ARCH:
> +		*csa = (struct csdev_access) {
> +			.io_mem	= false,
> +			.read	= etm4x_sysreg_read,
> +			.write	= etm4x_sysreg_write,
> +		};
> +		break;
> +	case ETM_DEVARCH_ETE_ARCH:
> +		*csa = (struct csdev_access) {
> +			.io_mem	= false,
> +			.read	= ete_sysreg_read,
> +			.write	= ete_sysreg_write,
> +		};
> +		break;
> +	default:
>  		return false;
> -	*csa = (struct csdev_access) {
> -		.io_mem	= false,
> -		.read	= etm4x_sysreg_read,
> -		.write	= etm4x_sysreg_write,
> -	};
> +	}
> 
>  	drvdata->arch = etm_devarch_to_arch(devarch);
>  	return true;
> @@ -1698,6 +1716,8 @@ static int etm4_probe(struct device *dev, void __iomem 
> *base)
>  	struct etmv4_drvdata *drvdata;
>  	struct coresight_desc desc = { 0 };
>  	struct etm_init_arg init_arg = { 0 };
> +	u8 major, minor;
> +	char *type_name;
> 
>  	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>  	if (!drvdata)
> @@ -1724,10 +1744,6 @@ static int etm4_probe(struct device *dev, void 
> __iomem *base)
>  	if (drvdata->cpu < 0)
>  		return drvdata->cpu;
> 
> -	desc.name = devm_kasprintf(dev, GFP_KERNEL, "etm%d", drvdata->cpu);
> -	if (!desc.name)
> -		return -ENOMEM;
> -
>  	init_arg.drvdata = drvdata;
>  	init_arg.csa = &desc.access;
> 
> @@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void 
> __iomem *base)
>  	if (!desc.access.io_mem ||
>  	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
>  		drvdata->skip_power_up = true;
> +	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
> +	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
> +	if (drvdata->arch >= ETM_ARCH_ETE) {
> +		type_name = "ete";
> +		major -= 4;
> +	} else {
> +		type_name = "etm";
> +	}
> +
When trace unit supports ETE, could it be still compatible with ETMv4.4?
Can use selectively use it as ETM instead of ETE?

Thanks,
Tingwei

> +	desc.name = devm_kasprintf(dev, GFP_KERNEL,
> +				   "%s%d", type_name, drvdata->cpu);
> +	if (!desc.name)
> +		return -ENOMEM;
> 
>  	etm4_init_trace_id(drvdata);
>  	etm4_set_default(&drvdata->config);
> @@ -1770,9 +1799,8 @@ static int etm4_probe(struct device *dev, void __iomem 
> *base)
> 
>  	etmdrvdata[drvdata->cpu] = drvdata;
> 
> -	dev_info(&drvdata->csdev->dev, "CPU%d: ETM v%d.%d initialized\n",
> -		 drvdata->cpu, ETM_ARCH_MAJOR_VERSION(drvdata->arch),
> -		 ETM_ARCH_MINOR_VERSION(drvdata->arch));
> +	dev_info(&drvdata->csdev->dev, "CPU%d: %s v%d.%d initialized\n",
> +		 drvdata->cpu, type_name, major, minor);
> 
>  	if (boot_enable) {
>  		coresight_enable(drvdata->csdev);
> @@ -1892,6 +1920,7 @@ static struct amba_driver etm4x_amba_driver = {
> 
>  static const struct of_device_id etm_sysreg_match[] = {
>  	{ .compatible	= "arm,coresight-etm-sysreg" },
> +	{ .compatible	= "arm,ete" },
>  	{}
>  };
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h 
> b/drivers/hwtracing/coresight/coresight-etm4x.h
> index 00c0367..05fd0e5 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> @@ -127,6 +127,8 @@
>  #define TRCCIDR2			0xFF8
>  #define TRCCIDR3			0xFFC
> 
> +#define TRCRSR_TA			BIT(12)
> +
>  /*
>   * System instructions to access ETM registers.
>   * See ETMv4.4 spec ARM IHI0064F section 4.3.6 System instructions
> @@ -570,11 +572,14 @@
>  	((ETM_DEVARCH_MAKE_ARCHID_ARCH_VER(major)) | 
> ETM_DEVARCH_ARCHID_ARCH_PART(0xA13))
> 
>  #define ETM_DEVARCH_ARCHID_ETMv4x		ETM_DEVARCH_MAKE_ARCHID(0x4)
> +#define ETM_DEVARCH_ARCHID_ETE			ETM_DEVARCH_MAKE_ARCHID(0x5)
> 
>  #define ETM_DEVARCH_ID_MASK						\
>  	(ETM_DEVARCH_ARCHITECT_MASK | ETM_DEVARCH_ARCHID_MASK | 
> ETM_DEVARCH_PRESENT)
>  #define ETM_DEVARCH_ETMv4x_ARCH						\
>  	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETMv4x | 
> ETM_DEVARCH_PRESENT)
> +#define ETM_DEVARCH_ETE_ARCH						\
> +	(ETM_DEVARCH_ARCHITECT_ARM | ETM_DEVARCH_ARCHID_ETE | ETM_DEVARCH_PRESENT)
> 
>  #define TRCSTATR_IDLE_BIT		0
>  #define TRCSTATR_PMSTABLE_BIT		1
> @@ -661,6 +666,8 @@
>  #define ETM_ARCH_MINOR_VERSION(arch)	((arch) & 0xfU)
> 
>  #define ETM_ARCH_V4	ETM_ARCH_VERSION(4, 0)
> +#define ETM_ARCH_ETE	ETM_ARCH_VERSION(5, 0)
> +
>  /* Interpretation of resource numbers change at ETM v4.3 architecture */
>  #define ETM_ARCH_V4_3	ETM_ARCH_VERSION(4, 3)
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-14  5:38     ` Tingwei Zhang
  -1 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:38 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-arm-kernel, coresight, mike.leach, linux-kernel

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
> accessible via the system registers. The TRBE supports different addressing
> modes including CPU virtual address and buffer modes including the circular
> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
> access to the trace buffer could be prohibited by a higher exception level
> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
> private interrupt (PPI) on address translation errors and when the buffer
> is full. Overall implementation here is inspired from the Arm SPE driver.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>  arch/arm64/include/asm/sysreg.h                  |   2 +
>  drivers/hwtracing/coresight/Kconfig              |  11 +
>  drivers/hwtracing/coresight/Makefile             |   1 +
>  drivers/hwtracing/coresight/coresight-trbe.c     | 766 
> +++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>  6 files changed, 1341 insertions(+)
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
> b/Documentation/trace/coresight/coresight-trbe.rst
> new file mode 100644
> index 0000000..4320a8b
> --- /dev/null
> +++ b/Documentation/trace/coresight/coresight-trbe.rst
> @@ -0,0 +1,36 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============================
> +Trace Buffer Extension (TRBE).
> +==============================
> +
> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
> +    :Date:     November 2020
> +
> +Hardware Description
> +--------------------
> +
> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
> +memory, CPU traces generated from a corresponding percpu tracing unit. This
> +gets plugged in as a coresight sink device because the corresponding trace
> +genarators (ETE), are plugged in as source device.
> +
> +Sysfs files and directories
> +---------------------------
> +
> +The TRBE devices appear on the existing coresight bus alongside the other
> +coresight devices::
> +
> +	>$ ls /sys/bus/coresight/devices
> +	trbe0  trbe1  trbe2 trbe3
> +
> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
> +
> +	>$ ls /sys/bus/coresight/devices/trbe0/
> +	irq align dbm
> +
> +*Key file items are:-*
> +   * ``irq``: TRBE maintenance interrupt number
> +   * ``align``: TRBE write pointer alignment
> +   * ``dbm``: TRBE updates memory with access and dirty flags
> +
> diff --git a/arch/arm64/include/asm/sysreg.h 
> b/arch/arm64/include/asm/sysreg.h
> index 14cb156..61136f6 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -97,6 +97,7 @@
>  #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << 
> PSTATE_Imm_shift))
>  #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) 
> << PSTATE_Imm_shift))
>  #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << 
> PSTATE_Imm_shift))
> +#define TSB_CSYNC			__emit_inst(0xd503225f)
> 
>  #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>  	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
> @@ -865,6 +866,7 @@
>  #define ID_AA64MMFR2_CNP_SHIFT		0
> 
>  /* id_aa64dfr0 */
> +#define ID_AA64DFR0_TRBE_SHIFT		44
>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>  #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>  #define ID_AA64DFR0_PMSVER_SHIFT	32
> diff --git a/drivers/hwtracing/coresight/Kconfig 
> b/drivers/hwtracing/coresight/Kconfig
> index c119824..0f5e101 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called coresight-cti.
> 
> +config CORESIGHT_TRBE
> +	bool "Trace Buffer Extension (TRBE) driver"

Can you consider to support TRBE as loadable module since all coresight
drivers support loadable module now.

Thanks
Tingwei

> +	depends on ARM64
> +	help
> +	  This driver provides support for percpu Trace Buffer Extension (TRBE).
> +	  TRBE always needs to be used along with it's corresponding percpu ETE
> +	  component. ETE generates trace data which is then captured with TRBE.
> +	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
> +	  system registers. But it's explicit dependency with trace unit (ETE)
> +	  requires it to be plugged in as a coresight sink device.
> +
>  config CORESIGHT_CTI_INTEGRATION_REGS
>  	bool "Access CTI CoreSight Integration Registers"
>  	depends on CORESIGHT_CTI
> diff --git a/drivers/hwtracing/coresight/Makefile 
> b/drivers/hwtracing/coresight/Makefile
> index f20e357..d608165 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>  obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>  obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>  coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
>  		   coresight-cti-sysfs.o
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c 
> b/drivers/hwtracing/coresight/coresight-trbe.c
> new file mode 100644
> index 0000000..48a8ec3
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -0,0 +1,766 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
> + * sink device could then pair with an appropriate per-cpu coresight source
> + * device (ETE) thus generating required trace data. Trace can be enabled
> + * via the perf framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define DRVNAME "arm_trbe"
> +
> +#define pr_fmt(fmt) DRVNAME ": " fmt
> +
> +#include "coresight-trbe.h"
> +
> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +#define ETE_IGNORE_PACKET 0x70
> +
> +static const char trbe_name[] = "trbe";
> +
> +enum trbe_fault_action {
> +	TRBE_FAULT_ACT_WRAP,
> +	TRBE_FAULT_ACT_SPURIOUS,
> +	TRBE_FAULT_ACT_FATAL,
> +};
> +
> +struct trbe_perf {
> +	unsigned long trbe_base;
> +	unsigned long trbe_limit;
> +	unsigned long trbe_write;
> +	pid_t pid;
> +	int nr_pages;
> +	void **pages;
> +	bool snapshot;
> +	struct trbe_cpudata *cpudata;
> +};
> +
> +struct trbe_cpudata {
> +	struct coresight_device	*csdev;
> +	bool trbe_dbm;
> +	u64 trbe_align;
> +	int cpu;
> +	enum cs_mode mode;
> +	struct trbe_perf *perf;
> +	struct trbe_drvdata *drvdata;
> +};
> +
> +struct trbe_drvdata {
> +	struct trbe_cpudata __percpu *cpudata;
> +	struct perf_output_handle __percpu *handle;
> +	struct hlist_node hotplug_node;
> +	int irq;
> +	cpumask_t supported_cpus;
> +	enum cpuhp_state trbe_online;
> +	struct platform_device *pdev;
> +	struct clk *atclk;
> +};
> +
> +static int trbe_alloc_node(struct perf_event *event)
> +{
> +	if (event->cpu == -1)
> +		return NUMA_NO_NODE;
> +	return cpu_to_node(event->cpu);
> +}
> +
> +static void trbe_disable_and_drain_local(void)
> +{
> +	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +	isb();
> +	dsb(nsh);
> +	asm(TSB_CSYNC);
> +}
> +
> +static void trbe_reset_local(void)
> +{
> +	trbe_disable_and_drain_local();
> +	write_sysreg_s(0, SYS_TRBPTR_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBBASER_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBSR_EL1);
> +	isb();
> +}
> +
> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +
> +	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
> +	if (!perf->snapshot)
> +		perf_aux_output_skip(handle, len);
> +}
> +
> +static unsigned long trbe_snapshot_offset(struct perf_output_handle 
> *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +	u64 limit = perf->nr_pages * PAGE_SIZE;
> +
> +	if (head < limit >> 1)
> +		limit >>= 1;
> +
> +	return limit;
> +}
> +
> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
> +	u64 limit = bufsize;
> +	u64 head, tail, wakeup;
> +
> +	head = PERF_IDX2OFF(handle->head, perf);
> +	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
> +		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
> +
> +		delta = min(delta, handle->size);
> +		trbe_pad_buf(handle, delta);
> +		head = PERF_IDX2OFF(handle->head, perf);
> +	}
> +
> +	if (!handle->size) {
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +		return 0;
> +	}
> +
> +	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
> +	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
> +
> +	if (head < tail)
> +		limit = round_down(tail, PAGE_SIZE);
> +
> +	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
> +		limit = min(limit, round_up(wakeup, PAGE_SIZE));
> +
> +	if (limit > head)
> +		return limit;
> +
> +	trbe_pad_buf(handle, handle->size);
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return 0;
> +}
> +
> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset;
> +
> +	if (perf->snapshot)
> +		offset = trbe_snapshot_offset(handle);
> +	else
> +		offset = trbe_normal_offset(handle);
> +	return perf->trbe_base + offset;
> +}
> +
> +static void trbe_enable_hw(struct trbe_perf *perf)
> +{
> +	WARN_ON(perf->trbe_write < perf->trbe_base);
> +	WARN_ON(perf->trbe_write >= perf->trbe_limit);
> +	set_trbe_disabled();
> +	clr_trbe_irq();
> +	clr_trbe_wrap();
> +	clr_trbe_abort();
> +	clr_trbe_ec();
> +	clr_trbe_bsc();
> +	clr_trbe_fsc();
> +	set_trbe_virtual_mode();
> +	set_trbe_fill_mode(TRBE_FILL_STOP);
> +	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
> +	isb();
> +	set_trbe_base_pointer(perf->trbe_base);
> +	set_trbe_limit_pointer(perf->trbe_limit);
> +	set_trbe_write_pointer(perf->trbe_write);
> +	isb();
> +	dsb(ishst);
> +	flush_tlb_all();
> +	set_trbe_running();
> +	set_trbe_enabled();
> +	asm(TSB_CSYNC);
> +}
> +
> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> +				   struct perf_event *event, void **pages,
> +				   int nr_pages, bool snapshot)
> +{
> +	struct trbe_perf *perf;
> +	struct page **pglist;
> +	int i;
> +
> +	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
> +		return NULL;
> +
> +	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
> +	if (IS_ERR(perf))
> +		return ERR_PTR(-ENOMEM);
> +
> +	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
> +	if (IS_ERR(pglist)) {
> +		kfree(perf);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	for (i = 0; i < nr_pages; i++)
> +		pglist[i] = virt_to_page(pages[i]);
> +
> +	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, 
> PAGE_KERNEL);
> +	if (IS_ERR((void *) perf->trbe_base)) {
> +		kfree(pglist);
> +		kfree(perf);
> +		return ERR_PTR(perf->trbe_base);
> +	}
> +	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
> +	perf->trbe_write = perf->trbe_base;
> +	perf->pid = task_pid_nr(event->owner);
> +	perf->snapshot = snapshot;
> +	perf->nr_pages = nr_pages;
> +	perf->pages = pages;
> +	kfree(pglist);
> +	return perf;
> +}
> +
> +void arm_trbe_free_buffer(void *config)
> +{
> +	struct trbe_perf *perf = config;
> +
> +	vunmap((void *) perf->trbe_base);
> +	kfree(perf);
> +}
> +
> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> +					    struct perf_output_handle *handle,
> +					    void *config)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = config;
> +	unsigned long size, offset;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head += size;
> +	return size;
> +}
> +
> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void 
> *data)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct perf_output_handle *handle = data;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	*this_cpu_ptr(drvdata->handle) = *handle;
> +	cpudata->perf = perf;
> +	cpudata->mode = mode;
> +	perf->cpudata = cpudata;
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return 0;
> +	}
> +	trbe_enable_hw(perf);
> +	return 0;
> +}
> +
> +static int arm_trbe_disable(struct coresight_device *csdev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = cpudata->perf;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	trbe_disable_and_drain_local();
> +	perf->cpudata = NULL;
> +	cpudata->perf = NULL;
> +	cpudata->mode = CS_MODE_DISABLED;
> +	return 0;
> +}
> +
> +static void trbe_handle_fatal(struct perf_output_handle *handle)
> +{
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	perf_aux_output_end(handle, 0);
> +	trbe_disable_and_drain_local();
> +}
> +
> +static void trbe_handle_spurious(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	trbe_enable_hw(perf);
> +}
> +
> +static void trbe_handle_overflow(struct perf_output_handle *handle)
> +{
> +	struct perf_event *event = handle->event;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset, size;
> +	struct etm_event_data *event_data;
> +
> +	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head = offset;
> +	perf_aux_output_end(handle, size);
> +
> +	event_data = perf_aux_output_begin(handle, event);
> +	if (!event_data) {
> +		event->hw.state |= PERF_HES_STOPPED;
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	perf->trbe_write = perf->trbe_base;
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
> +	trbe_enable_hw(perf);
> +}
> +
> +static bool is_perf_trbe(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	struct trbe_drvdata *drvdata = cpudata->drvdata;
> +	int cpu = smp_processor_id();
> +
> +	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
> +	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
> +
> +	if (cpudata->mode != CS_MODE_PERF)
> +		return false;
> +
> +	if (cpudata->cpu != cpu)
> +		return false;
> +
> +	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
> +		return false;
> +
> +	return true;
> +}
> +
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle 
> *handle)
> +{
> +	enum trbe_ec ec = get_trbe_ec();
> +	enum trbe_bsc bsc = get_trbe_bsc();
> +
> +	WARN_ON(is_trbe_running());
> +	asm(TSB_CSYNC);
> +	dsb(nsh);
> +	isb();
> +
> +	if (is_trbe_trg() || is_trbe_abort())
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) 
> {
> +		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> +			return TRBE_FAULT_ACT_WRAP;
> +	}
> +	return TRBE_FAULT_ACT_SPURIOUS;
> +}
> +
> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
> +{
> +	struct perf_output_handle *handle = dev;
> +	enum trbe_fault_action act;
> +
> +	WARN_ON(!is_trbe_irq());
> +	clr_trbe_irq();
> +
> +	if (!perf_get_aux(handle))
> +		return IRQ_NONE;
> +
> +	if (!is_perf_trbe(handle))
> +		return IRQ_NONE;
> +
> +	irq_work_run();
> +
> +	act = trbe_get_fault_act(handle);
> +	switch (act) {
> +	case TRBE_FAULT_ACT_WRAP:
> +		trbe_handle_overflow(handle);
> +		break;
> +	case TRBE_FAULT_ACT_SPURIOUS:
> +		trbe_handle_spurious(handle);
> +		break;
> +	case TRBE_FAULT_ACT_FATAL:
> +		trbe_handle_fatal(handle);
> +		break;
> +	}
> +	return IRQ_HANDLED;
> +}
> +
> +static const struct coresight_ops_sink arm_trbe_sink_ops = {
> +	.enable		= arm_trbe_enable,
> +	.disable	= arm_trbe_disable,
> +	.alloc_buffer	= arm_trbe_alloc_buffer,
> +	.free_buffer	= arm_trbe_free_buffer,
> +	.update_buffer	= arm_trbe_update_buffer,
> +};
> +
> +static const struct coresight_ops arm_trbe_cs_ops = {
> +	.sink_ops	= &arm_trbe_sink_ops,
> +};
> +
> +static ssize_t irq_show(struct device *dev, struct device_attribute *attr, 
> char *buf)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	return sprintf(buf, "%d\n", drvdata->irq);
> +}
> +static DEVICE_ATTR_RO(irq);
> +
> +static ssize_t align_show(struct device *dev, struct device_attribute 
> *attr, char *buf)
> +{
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%s\n", 
> trbe_buffer_align_str[ilog2(cpudata->trbe_align)]);
> +}
> +static DEVICE_ATTR_RO(align);
> +
> +static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, 
> char *buf)
> +{
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%d\n", cpudata->trbe_dbm);
> +}
> +static DEVICE_ATTR_RO(dbm);
> +
> +static struct attribute *arm_trbe_attrs[] = {
> +	&dev_attr_align.attr,
> +	&dev_attr_irq.attr,
> +	&dev_attr_dbm.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group arm_trbe_group = {
> +	.attrs = arm_trbe_attrs,
> +};
> +
> +static const struct attribute_group *arm_trbe_groups[] = {
> +	&arm_trbe_group,
> +	NULL,
> +};
> +
> +static void arm_trbe_probe_coresight_cpu(void *info)
> +{
> +	struct trbe_cpudata *cpudata = info;
> +	struct device *dev = &cpudata->drvdata->pdev->dev;
> +	struct coresight_desc desc = { 0 };
> +
> +	if (WARN_ON(!cpudata))
> +		goto cpu_clear;
> +
> +	if (!is_trbe_available()) {
> +		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +
> +	if (!is_trbe_programmable()) {
> +		pr_err("TRBE is owned in higher exception level on cpu %d\n", 
> cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, 
> smp_processor_id());
> +	if (IS_ERR(desc.name))
> +		goto cpu_clear;
> +
> +	desc.type = CORESIGHT_DEV_TYPE_SINK;
> +	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
> +	desc.ops = &arm_trbe_cs_ops;
> +	desc.pdata = dev_get_platdata(dev);
> +	desc.groups = arm_trbe_groups;
> +	desc.dev = dev;
> +	cpudata->csdev = coresight_register(&desc);
> +	if (IS_ERR(cpudata->csdev))
> +		goto cpu_clear;
> +
> +	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
> +	cpudata->trbe_dbm = get_trbe_flag_update();
> +	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> +	if (cpudata->trbe_align > SZ_2K) {
> +		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	return;
> +cpu_clear:
> +	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
> +}
> +
> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
> +	if (IS_ERR(drvdata->cpudata))
> +		return PTR_ERR(drvdata->cpudata);
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		cpudata->cpu = cpu;
> +		cpudata->drvdata = drvdata;
> +		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_coresight_cpu(void *info)
> +{
> +	struct trbe_drvdata *drvdata = info;
> +
> +	disable_percpu_irq(drvdata->irq);
> +}
> +
> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +	}
> +	free_percpu(drvdata->cpudata);
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, 
> hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (!cpudata->csdev) {
> +			cpudata->drvdata = drvdata;
> +			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> +		}
> +		trbe_reset_local();
> +		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, 
> hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +		disable_percpu_irq(drvdata->irq);
> +		trbe_reset_local();
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	enum cpuhp_state trbe_online;
> +
> +	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
> +					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
> +	if (trbe_online < 0)
> +		return -EINVAL;
> +
> +	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
> +		return -EINVAL;
> +
> +	drvdata->trbe_online = trbe_online;
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	cpuhp_remove_multi_state(drvdata->trbe_online);
> +}
> +
> +static int arm_trbe_probe_irq(struct platform_device *pdev,
> +			      struct trbe_drvdata *drvdata)
> +{
> +	drvdata->irq = platform_get_irq(pdev, 0);
> +	if (!drvdata->irq) {
> +		pr_err("IRQ not found for the platform device\n");
> +		return -ENXIO;
> +	}
> +
> +	if (!irq_is_percpu(drvdata->irq)) {
> +		pr_err("IRQ is not a PPI\n");
> +		return -EINVAL;
> +	}
> +
> +	if (irq_get_percpu_devid_partition(drvdata->irq, 
> &drvdata->supported_cpus))
> +		return -EINVAL;
> +
> +	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
> +	if (!drvdata->handle)
> +		return -ENOMEM;
> +
> +	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, 
> drvdata->handle)) {
> +		free_percpu(drvdata->handle);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
> +{
> +	free_percpu_irq(drvdata->irq, drvdata->handle);
> +	free_percpu(drvdata->handle);
> +}
> +
> +static int arm_trbe_device_probe(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata;
> +	struct trbe_drvdata *drvdata;
> +	struct device *dev = &pdev->dev;
> +	int ret;
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (IS_ERR(drvdata))
> +		return -ENOMEM;
> +
> +	pdata = coresight_get_platform_data(dev);
> +	if (IS_ERR(pdata)) {
> +		kfree(drvdata);
> +		return -ENOMEM;
> +	}
> +
> +	drvdata->atclk = devm_clk_get(dev, "atclk");
> +	if (!IS_ERR(drvdata->atclk)) {
> +		ret = clk_prepare_enable(drvdata->atclk);
> +		if (ret)
> +			return ret;
> +	}
> +	dev_set_drvdata(dev, drvdata);
> +	dev->platform_data = pdata;
> +	drvdata->pdev = pdev;
> +	ret = arm_trbe_probe_irq(pdev, drvdata);
> +	if (ret)
> +		goto irq_failed;
> +
> +	ret = arm_trbe_probe_coresight(drvdata);
> +	if (ret)
> +		goto probe_failed;
> +
> +	ret = arm_trbe_probe_cpuhp(drvdata);
> +	if (ret)
> +		goto cpuhp_failed;
> +
> +	return 0;
> +cpuhp_failed:
> +	arm_trbe_remove_coresight(drvdata);
> +probe_failed:
> +	arm_trbe_remove_irq(drvdata);
> +irq_failed:
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return ret;
> +}
> +
> +static int arm_trbe_device_remove(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
> +	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
> +
> +	arm_trbe_remove_coresight(drvdata);
> +	arm_trbe_remove_cpuhp(drvdata);
> +	arm_trbe_remove_irq(drvdata);
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM
> +static int arm_trbe_runtime_suspend(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_disable_unprepare(drvdata->atclk);
> +
> +	return 0;
> +}
> +
> +static int arm_trbe_runtime_resume(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_prepare_enable(drvdata->atclk);
> +
> +	return 0;
> +}
> +#endif
> +
> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, 
> NULL)
> +};
> +
> +static const struct of_device_id arm_trbe_of_match[] = {
> +	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
> +
> +static const struct platform_device_id arm_trbe_match[] = {
> +	{ "arm,trbe", 0},
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
> +
> +static struct platform_driver arm_trbe_driver = {
> +	.id_table = arm_trbe_match,
> +	.driver	= {
> +		.name = DRVNAME,
> +		.of_match_table = of_match_ptr(arm_trbe_of_match),
> +		.pm = &arm_trbe_dev_pm_ops,
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe	= arm_trbe_device_probe,
> +	.remove	= arm_trbe_device_remove,
> +};
> +builtin_platform_driver(arm_trbe_driver)
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h 
> b/drivers/hwtracing/coresight/coresight-trbe.h
> new file mode 100644
> index 0000000..82ffbfc
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
> @@ -0,0 +1,525 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This contains all required hardware related helper functions for
> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#include <linux/coresight.h>
> +#include <linux/device.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/smp.h>
> +
> +#include "coresight-etm-perf.h"
> +
> +static inline bool is_trbe_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, 
> ID_AA64DFR0_TRBE_SHIFT);
> +
> +	return trbe >= 0b0001;
> +}
> +
> +static inline bool is_ete_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, 
> ID_AA64DFR0_TRACEVER_SHIFT);
> +
> +	return (tracever != 0b0000);
> +}
> +
> +static inline bool is_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_ENABLE;
> +}
> +
> +enum trbe_ec {
> +	TRBE_EC_OTHERS		= 0,
> +	TRBE_EC_STAGE1_ABORT	= 36,
> +	TRBE_EC_STAGE2_ABORT	= 37,
> +};
> +
> +static const char *const trbe_ec_str[] = {
> +	[TRBE_EC_OTHERS]	= "Maintenance exception",
> +	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
> +	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
> +};
> +
> +static inline enum trbe_ec get_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
> +}
> +
> +static inline void clr_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_bsc {
> +	TRBE_BSC_NOT_STOPPED	= 0,
> +	TRBE_BSC_FILLED		= 1,
> +	TRBE_BSC_TRIGGERED	= 2,
> +};
> +
> +static const char *const trbe_bsc_str[] = {
> +	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
> +	[TRBE_BSC_FILLED]	= "TRBE filled",
> +	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
> +};
> +
> +static inline enum trbe_bsc get_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
> +}
> +
> +static inline void clr_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_fsc {
> +	TRBE_FSC_ASF_LEVEL0	= 0,
> +	TRBE_FSC_ASF_LEVEL1	= 1,
> +	TRBE_FSC_ASF_LEVEL2	= 2,
> +	TRBE_FSC_ASF_LEVEL3	= 3,
> +	TRBE_FSC_TF_LEVEL0	= 4,
> +	TRBE_FSC_TF_LEVEL1	= 5,
> +	TRBE_FSC_TF_LEVEL2	= 6,
> +	TRBE_FSC_TF_LEVEL3	= 7,
> +	TRBE_FSC_AFF_LEVEL0	= 8,
> +	TRBE_FSC_AFF_LEVEL1	= 9,
> +	TRBE_FSC_AFF_LEVEL2	= 10,
> +	TRBE_FSC_AFF_LEVEL3	= 11,
> +	TRBE_FSC_PF_LEVEL0	= 12,
> +	TRBE_FSC_PF_LEVEL1	= 13,
> +	TRBE_FSC_PF_LEVEL2	= 14,
> +	TRBE_FSC_PF_LEVEL3	= 15,
> +	TRBE_FSC_SEA_WRITE	= 16,
> +	TRBE_FSC_ASEA_WRITE	= 17,
> +	TRBE_FSC_SEA_LEVEL0	= 20,
> +	TRBE_FSC_SEA_LEVEL1	= 21,
> +	TRBE_FSC_SEA_LEVEL2	= 22,
> +	TRBE_FSC_SEA_LEVEL3	= 23,
> +	TRBE_FSC_ALIGN_FAULT	= 33,
> +	TRBE_FSC_TLB_FAULT	= 48,
> +	TRBE_FSC_ATOMIC_FAULT	= 49,
> +};
> +
> +static const char *const trbe_fsc_str[] = {
> +	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
> +	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
> +	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
> +	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
> +	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
> +	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
> +	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
> +	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
> +	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
> +	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
> +	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
> +	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
> +	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
> +	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
> +	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
> +	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
> +	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
> +	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
> +	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 
> 0",
> +	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 
> 1",
> +	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 
> 2",
> +	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 
> 3",
> +	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
> +	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
> +	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
> +};
> +
> +static inline enum trbe_fsc get_trbe_fsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK;
> +}
> +
> +static inline void clr_trbe_fsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_IRQ;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~TRBSR_IRQ;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_TRG;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_TRG;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_WRAP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_WRAP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_ABORT;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_ABORT;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline bool is_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_IRQ;
> +}
> +
> +static inline bool is_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_TRG;
> +}
> +
> +static inline bool is_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_WRAP;
> +}
> +
> +static inline bool is_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_ABORT;
> +}
> +
> +static inline bool is_trbe_running(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return !(trbsr & TRBSR_STOP);
> +}
> +
> +static inline void set_trbe_running(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~TRBSR_STOP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_address_mode {
> +	TRBE_ADDRESS_VIRTUAL,
> +	TRBE_ADDRESS_PHYSICAL,
> +};
> +
> +static const char *const trbe_address_mode_str[] = {
> +	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
> +	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
> +};
> +
> +static inline bool is_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return !(trblimitr & TRBLIMITR_NVM);
> +}
> +
> +static inline bool is_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_NVM;
> +}
> +
> +static inline void set_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_trig_mode {
> +	TRBE_TRIGGER_STOP	= 0,
> +	TRBE_TRIGGER_IRQ	= 1,
> +	TRBE_TRIGGER_IGNORE	= 3,
> +};
> +
> +static const char *const trbe_trig_mode_str[] = {
> +	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
> +	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
> +	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
> +};
> +
> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & 
> TRBLIMITR_TRIG_MODE_MASK;
> +}
> +
> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << 
> TRBLIMITR_TRIG_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_fill_mode {
> +	TRBE_FILL_STOP		= 0,
> +	TRBE_FILL_WRAP		= 1,
> +	TRBE_FILL_CIRCULAR	= 3,
> +};
> +
> +static const char *const trbe_fill_mode_str[] = {
> +	[TRBE_FILL_STOP]	= "Buffer mode - stop",
> +	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
> +	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
> +};
> +
> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & 
> TRBLIMITR_FILL_MODE_MASK;
> +}
> +
> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << 
> TRBLIMITR_FILL_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_disabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline bool get_trbe_flag_update(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return trbidr & TRBIDR_FLAG;
> +}
> +
> +static inline bool is_trbe_programmable(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return !(trbidr & TRBIDR_PROG);
> +}
> +
> +enum trbe_buffer_align {
> +	TRBE_BUFFER_BYTE,
> +	TRBE_BUFFER_HALF_WORD,
> +	TRBE_BUFFER_WORD,
> +	TRBE_BUFFER_DOUBLE_WORD,
> +	TRBE_BUFFER_16_BYTES,
> +	TRBE_BUFFER_32_BYTES,
> +	TRBE_BUFFER_64_BYTES,
> +	TRBE_BUFFER_128_BYTES,
> +	TRBE_BUFFER_256_BYTES,
> +	TRBE_BUFFER_512_BYTES,
> +	TRBE_BUFFER_1K_BYTES,
> +	TRBE_BUFFER_2K_BYTES,
> +};
> +
> +static const char *const trbe_buffer_align_str[] = {
> +	[TRBE_BUFFER_BYTE]		= "Byte",
> +	[TRBE_BUFFER_HALF_WORD]		= "Half word",
> +	[TRBE_BUFFER_WORD]		= "Word",
> +	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
> +	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
> +	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
> +	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
> +	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
> +	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
> +	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
> +	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
> +	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
> +};
> +
> +static inline enum trbe_buffer_align get_trbe_address_align(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
> +}
> +
> +static inline void assert_trbe_address_mode(unsigned long addr)
> +{
> +	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
> +	bool virt_mode = is_trbe_virtual_mode();
> +
> +	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
> +}
> +
> +static inline void assert_trbe_address_align(unsigned long addr)
> +{
> +	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
> +
> +	WARN_ON(addr & (nr_bytes - 1));
> +}
> +
> +static inline unsigned long get_trbe_write_pointer(void)
> +{
> +	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
> +	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_write_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +	write_sysreg_s(addr, SYS_TRBPTR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_limit_pointer(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & 
> TRBLIMITR_LIMIT_MASK;
> +	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
> +
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_limit_pointer(unsigned long addr)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
> +	trblimitr |= (addr & PAGE_MASK);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_base_pointer(void)
> +{
> +	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
> +	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & 
> TRBBASER_BASE_MASK;
> +
> +	addr = addr << TRBBASER_BASE_SHIFT;
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_base_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	write_sysreg_s(addr, SYS_TRBBASER_EL1);
> +}
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
@ 2020-11-14  5:38     ` Tingwei Zhang
  0 siblings, 0 replies; 72+ messages in thread
From: Tingwei Zhang @ 2020-11-14  5:38 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hi Anshuman,

On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
> accessible via the system registers. The TRBE supports different addressing
> modes including CPU virtual address and buffer modes including the circular
> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
> access to the trace buffer could be prohibited by a higher exception level
> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
> private interrupt (PPI) on address translation errors and when the buffer
> is full. Overall implementation here is inspired from the Arm SPE driver.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>  arch/arm64/include/asm/sysreg.h                  |   2 +
>  drivers/hwtracing/coresight/Kconfig              |  11 +
>  drivers/hwtracing/coresight/Makefile             |   1 +
>  drivers/hwtracing/coresight/coresight-trbe.c     | 766 
> +++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>  6 files changed, 1341 insertions(+)
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
> b/Documentation/trace/coresight/coresight-trbe.rst
> new file mode 100644
> index 0000000..4320a8b
> --- /dev/null
> +++ b/Documentation/trace/coresight/coresight-trbe.rst
> @@ -0,0 +1,36 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============================
> +Trace Buffer Extension (TRBE).
> +==============================
> +
> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
> +    :Date:     November 2020
> +
> +Hardware Description
> +--------------------
> +
> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
> +memory, CPU traces generated from a corresponding percpu tracing unit. This
> +gets plugged in as a coresight sink device because the corresponding trace
> +genarators (ETE), are plugged in as source device.
> +
> +Sysfs files and directories
> +---------------------------
> +
> +The TRBE devices appear on the existing coresight bus alongside the other
> +coresight devices::
> +
> +	>$ ls /sys/bus/coresight/devices
> +	trbe0  trbe1  trbe2 trbe3
> +
> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
> +
> +	>$ ls /sys/bus/coresight/devices/trbe0/
> +	irq align dbm
> +
> +*Key file items are:-*
> +   * ``irq``: TRBE maintenance interrupt number
> +   * ``align``: TRBE write pointer alignment
> +   * ``dbm``: TRBE updates memory with access and dirty flags
> +
> diff --git a/arch/arm64/include/asm/sysreg.h 
> b/arch/arm64/include/asm/sysreg.h
> index 14cb156..61136f6 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -97,6 +97,7 @@
>  #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << 
> PSTATE_Imm_shift))
>  #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) 
> << PSTATE_Imm_shift))
>  #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << 
> PSTATE_Imm_shift))
> +#define TSB_CSYNC			__emit_inst(0xd503225f)
> 
>  #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>  	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
> @@ -865,6 +866,7 @@
>  #define ID_AA64MMFR2_CNP_SHIFT		0
> 
>  /* id_aa64dfr0 */
> +#define ID_AA64DFR0_TRBE_SHIFT		44
>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>  #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>  #define ID_AA64DFR0_PMSVER_SHIFT	32
> diff --git a/drivers/hwtracing/coresight/Kconfig 
> b/drivers/hwtracing/coresight/Kconfig
> index c119824..0f5e101 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called coresight-cti.
> 
> +config CORESIGHT_TRBE
> +	bool "Trace Buffer Extension (TRBE) driver"

Can you consider to support TRBE as loadable module since all coresight
drivers support loadable module now.

Thanks
Tingwei

> +	depends on ARM64
> +	help
> +	  This driver provides support for percpu Trace Buffer Extension (TRBE).
> +	  TRBE always needs to be used along with it's corresponding percpu ETE
> +	  component. ETE generates trace data which is then captured with TRBE.
> +	  Unlike traditional sink devices, TRBE is a CPU feature accessible via
> +	  system registers. But it's explicit dependency with trace unit (ETE)
> +	  requires it to be plugged in as a coresight sink device.
> +
>  config CORESIGHT_CTI_INTEGRATION_REGS
>  	bool "Access CTI CoreSight Integration Registers"
>  	depends on CORESIGHT_CTI
> diff --git a/drivers/hwtracing/coresight/Makefile 
> b/drivers/hwtracing/coresight/Makefile
> index f20e357..d608165 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>  obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>  obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>  coresight-cti-y := coresight-cti-core.o	coresight-cti-platform.o \
>  		   coresight-cti-sysfs.o
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c 
> b/drivers/hwtracing/coresight/coresight-trbe.c
> new file mode 100644
> index 0000000..48a8ec3
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -0,0 +1,766 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
> + * sink device could then pair with an appropriate per-cpu coresight source
> + * device (ETE) thus generating required trace data. Trace can be enabled
> + * via the perf framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#define DRVNAME "arm_trbe"
> +
> +#define pr_fmt(fmt) DRVNAME ": " fmt
> +
> +#include "coresight-trbe.h"
> +
> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +#define ETE_IGNORE_PACKET 0x70
> +
> +static const char trbe_name[] = "trbe";
> +
> +enum trbe_fault_action {
> +	TRBE_FAULT_ACT_WRAP,
> +	TRBE_FAULT_ACT_SPURIOUS,
> +	TRBE_FAULT_ACT_FATAL,
> +};
> +
> +struct trbe_perf {
> +	unsigned long trbe_base;
> +	unsigned long trbe_limit;
> +	unsigned long trbe_write;
> +	pid_t pid;
> +	int nr_pages;
> +	void **pages;
> +	bool snapshot;
> +	struct trbe_cpudata *cpudata;
> +};
> +
> +struct trbe_cpudata {
> +	struct coresight_device	*csdev;
> +	bool trbe_dbm;
> +	u64 trbe_align;
> +	int cpu;
> +	enum cs_mode mode;
> +	struct trbe_perf *perf;
> +	struct trbe_drvdata *drvdata;
> +};
> +
> +struct trbe_drvdata {
> +	struct trbe_cpudata __percpu *cpudata;
> +	struct perf_output_handle __percpu *handle;
> +	struct hlist_node hotplug_node;
> +	int irq;
> +	cpumask_t supported_cpus;
> +	enum cpuhp_state trbe_online;
> +	struct platform_device *pdev;
> +	struct clk *atclk;
> +};
> +
> +static int trbe_alloc_node(struct perf_event *event)
> +{
> +	if (event->cpu == -1)
> +		return NUMA_NO_NODE;
> +	return cpu_to_node(event->cpu);
> +}
> +
> +static void trbe_disable_and_drain_local(void)
> +{
> +	write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +	isb();
> +	dsb(nsh);
> +	asm(TSB_CSYNC);
> +}
> +
> +static void trbe_reset_local(void)
> +{
> +	trbe_disable_and_drain_local();
> +	write_sysreg_s(0, SYS_TRBPTR_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBBASER_EL1);
> +	isb();
> +
> +	write_sysreg_s(0, SYS_TRBSR_EL1);
> +	isb();
> +}
> +
> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +
> +	memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
> +	if (!perf->snapshot)
> +		perf_aux_output_skip(handle, len);
> +}
> +
> +static unsigned long trbe_snapshot_offset(struct perf_output_handle 
> *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	u64 head = PERF_IDX2OFF(handle->head, perf);
> +	u64 limit = perf->nr_pages * PAGE_SIZE;
> +
> +	if (head < limit >> 1)
> +		limit >>= 1;
> +
> +	return limit;
> +}
> +
> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	const u64 bufsize = perf->nr_pages * PAGE_SIZE;
> +	u64 limit = bufsize;
> +	u64 head, tail, wakeup;
> +
> +	head = PERF_IDX2OFF(handle->head, perf);
> +	if (!IS_ALIGNED(head, cpudata->trbe_align)) {
> +		unsigned long delta = roundup(head, cpudata->trbe_align) - head;
> +
> +		delta = min(delta, handle->size);
> +		trbe_pad_buf(handle, delta);
> +		head = PERF_IDX2OFF(handle->head, perf);
> +	}
> +
> +	if (!handle->size) {
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +		return 0;
> +	}
> +
> +	tail = PERF_IDX2OFF(handle->head + handle->size, perf);
> +	wakeup = PERF_IDX2OFF(handle->wakeup, perf);
> +
> +	if (head < tail)
> +		limit = round_down(tail, PAGE_SIZE);
> +
> +	if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
> +		limit = min(limit, round_up(wakeup, PAGE_SIZE));
> +
> +	if (limit > head)
> +		return limit;
> +
> +	trbe_pad_buf(handle, handle->size);
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return 0;
> +}
> +
> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset;
> +
> +	if (perf->snapshot)
> +		offset = trbe_snapshot_offset(handle);
> +	else
> +		offset = trbe_normal_offset(handle);
> +	return perf->trbe_base + offset;
> +}
> +
> +static void trbe_enable_hw(struct trbe_perf *perf)
> +{
> +	WARN_ON(perf->trbe_write < perf->trbe_base);
> +	WARN_ON(perf->trbe_write >= perf->trbe_limit);
> +	set_trbe_disabled();
> +	clr_trbe_irq();
> +	clr_trbe_wrap();
> +	clr_trbe_abort();
> +	clr_trbe_ec();
> +	clr_trbe_bsc();
> +	clr_trbe_fsc();
> +	set_trbe_virtual_mode();
> +	set_trbe_fill_mode(TRBE_FILL_STOP);
> +	set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
> +	isb();
> +	set_trbe_base_pointer(perf->trbe_base);
> +	set_trbe_limit_pointer(perf->trbe_limit);
> +	set_trbe_write_pointer(perf->trbe_write);
> +	isb();
> +	dsb(ishst);
> +	flush_tlb_all();
> +	set_trbe_running();
> +	set_trbe_enabled();
> +	asm(TSB_CSYNC);
> +}
> +
> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> +				   struct perf_event *event, void **pages,
> +				   int nr_pages, bool snapshot)
> +{
> +	struct trbe_perf *perf;
> +	struct page **pglist;
> +	int i;
> +
> +	if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
> +		return NULL;
> +
> +	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
> +	if (IS_ERR(perf))
> +		return ERR_PTR(-ENOMEM);
> +
> +	pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
> +	if (IS_ERR(pglist)) {
> +		kfree(perf);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	for (i = 0; i < nr_pages; i++)
> +		pglist[i] = virt_to_page(pages[i]);
> +
> +	perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, 
> PAGE_KERNEL);
> +	if (IS_ERR((void *) perf->trbe_base)) {
> +		kfree(pglist);
> +		kfree(perf);
> +		return ERR_PTR(perf->trbe_base);
> +	}
> +	perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
> +	perf->trbe_write = perf->trbe_base;
> +	perf->pid = task_pid_nr(event->owner);
> +	perf->snapshot = snapshot;
> +	perf->nr_pages = nr_pages;
> +	perf->pages = pages;
> +	kfree(pglist);
> +	return perf;
> +}
> +
> +void arm_trbe_free_buffer(void *config)
> +{
> +	struct trbe_perf *perf = config;
> +
> +	vunmap((void *) perf->trbe_base);
> +	kfree(perf);
> +}
> +
> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> +					    struct perf_output_handle *handle,
> +					    void *config)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = config;
> +	unsigned long size, offset;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	offset = get_trbe_write_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head += size;
> +	return size;
> +}
> +
> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void 
> *data)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct perf_output_handle *handle = data;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	*this_cpu_ptr(drvdata->handle) = *handle;
> +	cpudata->perf = perf;
> +	cpudata->mode = mode;
> +	perf->cpudata = cpudata;
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return 0;
> +	}
> +	trbe_enable_hw(perf);
> +	return 0;
> +}
> +
> +static int arm_trbe_disable(struct coresight_device *csdev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> +	struct trbe_perf *perf = cpudata->perf;
> +
> +	WARN_ON(perf->cpudata != cpudata);
> +	WARN_ON(cpudata->cpu != smp_processor_id());
> +	WARN_ON(cpudata->mode != CS_MODE_PERF);
> +	WARN_ON(cpudata->drvdata != drvdata);
> +
> +	trbe_disable_and_drain_local();
> +	perf->cpudata = NULL;
> +	cpudata->perf = NULL;
> +	cpudata->mode = CS_MODE_DISABLED;
> +	return 0;
> +}
> +
> +static void trbe_handle_fatal(struct perf_output_handle *handle)
> +{
> +	perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	perf_aux_output_end(handle, 0);
> +	trbe_disable_and_drain_local();
> +}
> +
> +static void trbe_handle_spurious(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +
> +	perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	trbe_enable_hw(perf);
> +}
> +
> +static void trbe_handle_overflow(struct perf_output_handle *handle)
> +{
> +	struct perf_event *event = handle->event;
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	unsigned long offset, size;
> +	struct etm_event_data *event_data;
> +
> +	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> +	size = offset - PERF_IDX2OFF(handle->head, perf);
> +	if (perf->snapshot)
> +		handle->head = offset;
> +	perf_aux_output_end(handle, size);
> +
> +	event_data = perf_aux_output_begin(handle, event);
> +	if (!event_data) {
> +		event->hw.state |= PERF_HES_STOPPED;
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	perf->trbe_write = perf->trbe_base;
> +	perf->trbe_limit = get_trbe_limit(handle);
> +	if (perf->trbe_limit == perf->trbe_base) {
> +		trbe_disable_and_drain_local();
> +		return;
> +	}
> +	*this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
> +	trbe_enable_hw(perf);
> +}
> +
> +static bool is_perf_trbe(struct perf_output_handle *handle)
> +{
> +	struct trbe_perf *perf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = perf->cpudata;
> +	struct trbe_drvdata *drvdata = cpudata->drvdata;
> +	int cpu = smp_processor_id();
> +
> +	WARN_ON(perf->trbe_base != get_trbe_base_pointer());
> +	WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
> +
> +	if (cpudata->mode != CS_MODE_PERF)
> +		return false;
> +
> +	if (cpudata->cpu != cpu)
> +		return false;
> +
> +	if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
> +		return false;
> +
> +	return true;
> +}
> +
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle 
> *handle)
> +{
> +	enum trbe_ec ec = get_trbe_ec();
> +	enum trbe_bsc bsc = get_trbe_bsc();
> +
> +	WARN_ON(is_trbe_running());
> +	asm(TSB_CSYNC);
> +	dsb(nsh);
> +	isb();
> +
> +	if (is_trbe_trg() || is_trbe_abort())
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> +		return TRBE_FAULT_ACT_FATAL;
> +
> +	if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) 
> {
> +		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> +			return TRBE_FAULT_ACT_WRAP;
> +	}
> +	return TRBE_FAULT_ACT_SPURIOUS;
> +}
> +
> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
> +{
> +	struct perf_output_handle *handle = dev;
> +	enum trbe_fault_action act;
> +
> +	WARN_ON(!is_trbe_irq());
> +	clr_trbe_irq();
> +
> +	if (!perf_get_aux(handle))
> +		return IRQ_NONE;
> +
> +	if (!is_perf_trbe(handle))
> +		return IRQ_NONE;
> +
> +	irq_work_run();
> +
> +	act = trbe_get_fault_act(handle);
> +	switch (act) {
> +	case TRBE_FAULT_ACT_WRAP:
> +		trbe_handle_overflow(handle);
> +		break;
> +	case TRBE_FAULT_ACT_SPURIOUS:
> +		trbe_handle_spurious(handle);
> +		break;
> +	case TRBE_FAULT_ACT_FATAL:
> +		trbe_handle_fatal(handle);
> +		break;
> +	}
> +	return IRQ_HANDLED;
> +}
> +
> +static const struct coresight_ops_sink arm_trbe_sink_ops = {
> +	.enable		= arm_trbe_enable,
> +	.disable	= arm_trbe_disable,
> +	.alloc_buffer	= arm_trbe_alloc_buffer,
> +	.free_buffer	= arm_trbe_free_buffer,
> +	.update_buffer	= arm_trbe_update_buffer,
> +};
> +
> +static const struct coresight_ops arm_trbe_cs_ops = {
> +	.sink_ops	= &arm_trbe_sink_ops,
> +};
> +
> +static ssize_t irq_show(struct device *dev, struct device_attribute *attr, 
> char *buf)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	return sprintf(buf, "%d\n", drvdata->irq);
> +}
> +static DEVICE_ATTR_RO(irq);
> +
> +static ssize_t align_show(struct device *dev, struct device_attribute 
> *attr, char *buf)
> +{
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%s\n", 
> trbe_buffer_align_str[ilog2(cpudata->trbe_align)]);
> +}
> +static DEVICE_ATTR_RO(align);
> +
> +static ssize_t dbm_show(struct device *dev, struct device_attribute *attr, 
> char *buf)
> +{
> +	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
> +
> +	return sprintf(buf, "%d\n", cpudata->trbe_dbm);
> +}
> +static DEVICE_ATTR_RO(dbm);
> +
> +static struct attribute *arm_trbe_attrs[] = {
> +	&dev_attr_align.attr,
> +	&dev_attr_irq.attr,
> +	&dev_attr_dbm.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group arm_trbe_group = {
> +	.attrs = arm_trbe_attrs,
> +};
> +
> +static const struct attribute_group *arm_trbe_groups[] = {
> +	&arm_trbe_group,
> +	NULL,
> +};
> +
> +static void arm_trbe_probe_coresight_cpu(void *info)
> +{
> +	struct trbe_cpudata *cpudata = info;
> +	struct device *dev = &cpudata->drvdata->pdev->dev;
> +	struct coresight_desc desc = { 0 };
> +
> +	if (WARN_ON(!cpudata))
> +		goto cpu_clear;
> +
> +	if (!is_trbe_available()) {
> +		pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +
> +	if (!is_trbe_programmable()) {
> +		pr_err("TRBE is owned in higher exception level on cpu %d\n", 
> cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, 
> smp_processor_id());
> +	if (IS_ERR(desc.name))
> +		goto cpu_clear;
> +
> +	desc.type = CORESIGHT_DEV_TYPE_SINK;
> +	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
> +	desc.ops = &arm_trbe_cs_ops;
> +	desc.pdata = dev_get_platdata(dev);
> +	desc.groups = arm_trbe_groups;
> +	desc.dev = dev;
> +	cpudata->csdev = coresight_register(&desc);
> +	if (IS_ERR(cpudata->csdev))
> +		goto cpu_clear;
> +
> +	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
> +	cpudata->trbe_dbm = get_trbe_flag_update();
> +	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> +	if (cpudata->trbe_align > SZ_2K) {
> +		pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
> +		goto cpu_clear;
> +	}
> +	return;
> +cpu_clear:
> +	cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
> +}
> +
> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
> +	if (IS_ERR(drvdata->cpudata))
> +		return PTR_ERR(drvdata->cpudata);
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		cpudata->cpu = cpu;
> +		cpudata->drvdata = drvdata;
> +		smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_coresight_cpu(void *info)
> +{
> +	struct trbe_drvdata *drvdata = info;
> +
> +	disable_percpu_irq(drvdata->irq);
> +}
> +
> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
> +{
> +	struct trbe_cpudata *cpudata;
> +	int cpu;
> +
> +	for_each_cpu(cpu, &drvdata->supported_cpus) {
> +		smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +	}
> +	free_percpu(drvdata->cpudata);
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, 
> hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (!cpudata->csdev) {
> +			cpudata->drvdata = drvdata;
> +			smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> +		}
> +		trbe_reset_local();
> +		enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, 
> hotplug_node);
> +	struct trbe_cpudata *cpudata;
> +
> +	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
> +		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		if (cpudata->csdev) {
> +			coresight_unregister(cpudata->csdev);
> +			cpudata->drvdata = NULL;
> +			cpudata->csdev = NULL;
> +		}
> +		disable_percpu_irq(drvdata->irq);
> +		trbe_reset_local();
> +	}
> +	return 0;
> +}
> +
> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	enum cpuhp_state trbe_online;
> +
> +	trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
> +					arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
> +	if (trbe_online < 0)
> +		return -EINVAL;
> +
> +	if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
> +		return -EINVAL;
> +
> +	drvdata->trbe_online = trbe_online;
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
> +{
> +	cpuhp_remove_multi_state(drvdata->trbe_online);
> +}
> +
> +static int arm_trbe_probe_irq(struct platform_device *pdev,
> +			      struct trbe_drvdata *drvdata)
> +{
> +	drvdata->irq = platform_get_irq(pdev, 0);
> +	if (!drvdata->irq) {
> +		pr_err("IRQ not found for the platform device\n");
> +		return -ENXIO;
> +	}
> +
> +	if (!irq_is_percpu(drvdata->irq)) {
> +		pr_err("IRQ is not a PPI\n");
> +		return -EINVAL;
> +	}
> +
> +	if (irq_get_percpu_devid_partition(drvdata->irq, 
> &drvdata->supported_cpus))
> +		return -EINVAL;
> +
> +	drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
> +	if (!drvdata->handle)
> +		return -ENOMEM;
> +
> +	if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, 
> drvdata->handle)) {
> +		free_percpu(drvdata->handle);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
> +{
> +	free_percpu_irq(drvdata->irq, drvdata->handle);
> +	free_percpu(drvdata->handle);
> +}
> +
> +static int arm_trbe_device_probe(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata;
> +	struct trbe_drvdata *drvdata;
> +	struct device *dev = &pdev->dev;
> +	int ret;
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (IS_ERR(drvdata))
> +		return -ENOMEM;
> +
> +	pdata = coresight_get_platform_data(dev);
> +	if (IS_ERR(pdata)) {
> +		kfree(drvdata);
> +		return -ENOMEM;
> +	}
> +
> +	drvdata->atclk = devm_clk_get(dev, "atclk");
> +	if (!IS_ERR(drvdata->atclk)) {
> +		ret = clk_prepare_enable(drvdata->atclk);
> +		if (ret)
> +			return ret;
> +	}
> +	dev_set_drvdata(dev, drvdata);
> +	dev->platform_data = pdata;
> +	drvdata->pdev = pdev;
> +	ret = arm_trbe_probe_irq(pdev, drvdata);
> +	if (ret)
> +		goto irq_failed;
> +
> +	ret = arm_trbe_probe_coresight(drvdata);
> +	if (ret)
> +		goto probe_failed;
> +
> +	ret = arm_trbe_probe_cpuhp(drvdata);
> +	if (ret)
> +		goto cpuhp_failed;
> +
> +	return 0;
> +cpuhp_failed:
> +	arm_trbe_remove_coresight(drvdata);
> +probe_failed:
> +	arm_trbe_remove_irq(drvdata);
> +irq_failed:
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return ret;
> +}
> +
> +static int arm_trbe_device_remove(struct platform_device *pdev)
> +{
> +	struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
> +	struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
> +
> +	arm_trbe_remove_coresight(drvdata);
> +	arm_trbe_remove_cpuhp(drvdata);
> +	arm_trbe_remove_irq(drvdata);
> +	kfree(pdata);
> +	kfree(drvdata);
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM
> +static int arm_trbe_runtime_suspend(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_disable_unprepare(drvdata->atclk);
> +
> +	return 0;
> +}
> +
> +static int arm_trbe_runtime_resume(struct device *dev)
> +{
> +	struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
> +
> +	if (drvdata && !IS_ERR(drvdata->atclk))
> +		clk_prepare_enable(drvdata->atclk);
> +
> +	return 0;
> +}
> +#endif
> +
> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, 
> NULL)
> +};
> +
> +static const struct of_device_id arm_trbe_of_match[] = {
> +	{ .compatible = "arm,arm-trbe",	.data = (void *)1 },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
> +
> +static const struct platform_device_id arm_trbe_match[] = {
> +	{ "arm,trbe", 0},
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
> +
> +static struct platform_driver arm_trbe_driver = {
> +	.id_table = arm_trbe_match,
> +	.driver	= {
> +		.name = DRVNAME,
> +		.of_match_table = of_match_ptr(arm_trbe_of_match),
> +		.pm = &arm_trbe_dev_pm_ops,
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe	= arm_trbe_device_probe,
> +	.remove	= arm_trbe_device_remove,
> +};
> +builtin_platform_driver(arm_trbe_driver)
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h 
> b/drivers/hwtracing/coresight/coresight-trbe.h
> new file mode 100644
> index 0000000..82ffbfc
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
> @@ -0,0 +1,525 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This contains all required hardware related helper functions for
> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
> + *
> + * Copyright (C) 2020 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
> + */
> +#include <linux/coresight.h>
> +#include <linux/device.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/smp.h>
> +
> +#include "coresight-etm-perf.h"
> +
> +static inline bool is_trbe_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, 
> ID_AA64DFR0_TRBE_SHIFT);
> +
> +	return trbe >= 0b0001;
> +}
> +
> +static inline bool is_ete_available(void)
> +{
> +	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
> +	int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, 
> ID_AA64DFR0_TRACEVER_SHIFT);
> +
> +	return (tracever != 0b0000);
> +}
> +
> +static inline bool is_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_ENABLE;
> +}
> +
> +enum trbe_ec {
> +	TRBE_EC_OTHERS		= 0,
> +	TRBE_EC_STAGE1_ABORT	= 36,
> +	TRBE_EC_STAGE2_ABORT	= 37,
> +};
> +
> +static const char *const trbe_ec_str[] = {
> +	[TRBE_EC_OTHERS]	= "Maintenance exception",
> +	[TRBE_EC_STAGE1_ABORT]	= "Stage-1 exception",
> +	[TRBE_EC_STAGE2_ABORT]	= "Stage-2 exception",
> +};
> +
> +static inline enum trbe_ec get_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
> +}
> +
> +static inline void clr_trbe_ec(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_bsc {
> +	TRBE_BSC_NOT_STOPPED	= 0,
> +	TRBE_BSC_FILLED		= 1,
> +	TRBE_BSC_TRIGGERED	= 2,
> +};
> +
> +static const char *const trbe_bsc_str[] = {
> +	[TRBE_BSC_NOT_STOPPED]	= "TRBE collection not stopped",
> +	[TRBE_BSC_FILLED]	= "TRBE filled",
> +	[TRBE_BSC_TRIGGERED]	= "TRBE triggered",
> +};
> +
> +static inline enum trbe_bsc get_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
> +}
> +
> +static inline void clr_trbe_bsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_fsc {
> +	TRBE_FSC_ASF_LEVEL0	= 0,
> +	TRBE_FSC_ASF_LEVEL1	= 1,
> +	TRBE_FSC_ASF_LEVEL2	= 2,
> +	TRBE_FSC_ASF_LEVEL3	= 3,
> +	TRBE_FSC_TF_LEVEL0	= 4,
> +	TRBE_FSC_TF_LEVEL1	= 5,
> +	TRBE_FSC_TF_LEVEL2	= 6,
> +	TRBE_FSC_TF_LEVEL3	= 7,
> +	TRBE_FSC_AFF_LEVEL0	= 8,
> +	TRBE_FSC_AFF_LEVEL1	= 9,
> +	TRBE_FSC_AFF_LEVEL2	= 10,
> +	TRBE_FSC_AFF_LEVEL3	= 11,
> +	TRBE_FSC_PF_LEVEL0	= 12,
> +	TRBE_FSC_PF_LEVEL1	= 13,
> +	TRBE_FSC_PF_LEVEL2	= 14,
> +	TRBE_FSC_PF_LEVEL3	= 15,
> +	TRBE_FSC_SEA_WRITE	= 16,
> +	TRBE_FSC_ASEA_WRITE	= 17,
> +	TRBE_FSC_SEA_LEVEL0	= 20,
> +	TRBE_FSC_SEA_LEVEL1	= 21,
> +	TRBE_FSC_SEA_LEVEL2	= 22,
> +	TRBE_FSC_SEA_LEVEL3	= 23,
> +	TRBE_FSC_ALIGN_FAULT	= 33,
> +	TRBE_FSC_TLB_FAULT	= 48,
> +	TRBE_FSC_ATOMIC_FAULT	= 49,
> +};
> +
> +static const char *const trbe_fsc_str[] = {
> +	[TRBE_FSC_ASF_LEVEL0]	= "Address size fault - level 0",
> +	[TRBE_FSC_ASF_LEVEL1]	= "Address size fault - level 1",
> +	[TRBE_FSC_ASF_LEVEL2]	= "Address size fault - level 2",
> +	[TRBE_FSC_ASF_LEVEL3]	= "Address size fault - level 3",
> +	[TRBE_FSC_TF_LEVEL0]	= "Translation fault - level 0",
> +	[TRBE_FSC_TF_LEVEL1]	= "Translation fault - level 1",
> +	[TRBE_FSC_TF_LEVEL2]	= "Translation fault - level 2",
> +	[TRBE_FSC_TF_LEVEL3]	= "Translation fault - level 3",
> +	[TRBE_FSC_AFF_LEVEL0]	= "Access flag fault - level 0",
> +	[TRBE_FSC_AFF_LEVEL1]	= "Access flag fault - level 1",
> +	[TRBE_FSC_AFF_LEVEL2]	= "Access flag fault - level 2",
> +	[TRBE_FSC_AFF_LEVEL3]	= "Access flag fault - level 3",
> +	[TRBE_FSC_PF_LEVEL0]	= "Permission fault - level 0",
> +	[TRBE_FSC_PF_LEVEL1]	= "Permission fault - level 1",
> +	[TRBE_FSC_PF_LEVEL2]	= "Permission fault - level 2",
> +	[TRBE_FSC_PF_LEVEL3]	= "Permission fault - level 3",
> +	[TRBE_FSC_SEA_WRITE]	= "Synchronous external abort on write",
> +	[TRBE_FSC_ASEA_WRITE]	= "Asynchronous external abort on write",
> +	[TRBE_FSC_SEA_LEVEL0]	= "Syncrhonous external abort on table walk - level 
> 0",
> +	[TRBE_FSC_SEA_LEVEL1]	= "Syncrhonous external abort on table walk - level 
> 1",
> +	[TRBE_FSC_SEA_LEVEL2]	= "Syncrhonous external abort on table walk - level 
> 2",
> +	[TRBE_FSC_SEA_LEVEL3]	= "Syncrhonous external abort on table walk - level 
> 3",
> +	[TRBE_FSC_ALIGN_FAULT]	= "Alignment fault",
> +	[TRBE_FSC_TLB_FAULT]	= "TLB conflict fault",
> +	[TRBE_FSC_ATOMIC_FAULT]	= "Atmoc fault",
> +};
> +
> +static inline enum trbe_fsc get_trbe_fsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return (trbsr >> TRBSR_FSC_SHIFT) & TRBSR_FSC_MASK;
> +}
> +
> +static inline void clr_trbe_fsc(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~(TRBSR_FSC_MASK << TRBSR_FSC_SHIFT);
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_IRQ;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~TRBSR_IRQ;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_TRG;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_TRG;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_WRAP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_WRAP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void set_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr |= TRBSR_ABORT;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline void clr_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	trbsr &= ~TRBSR_ABORT;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +static inline bool is_trbe_irq(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_IRQ;
> +}
> +
> +static inline bool is_trbe_trg(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_TRG;
> +}
> +
> +static inline bool is_trbe_wrap(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_WRAP;
> +}
> +
> +static inline bool is_trbe_abort(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return trbsr & TRBSR_ABORT;
> +}
> +
> +static inline bool is_trbe_running(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	return !(trbsr & TRBSR_STOP);
> +}
> +
> +static inline void set_trbe_running(void)
> +{
> +	u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
> +
> +	trbsr &= ~TRBSR_STOP;
> +	write_sysreg_s(trbsr, SYS_TRBSR_EL1);
> +}
> +
> +enum trbe_address_mode {
> +	TRBE_ADDRESS_VIRTUAL,
> +	TRBE_ADDRESS_PHYSICAL,
> +};
> +
> +static const char *const trbe_address_mode_str[] = {
> +	[TRBE_ADDRESS_VIRTUAL]	= "Address mode - virtual",
> +	[TRBE_ADDRESS_PHYSICAL]	= "Address mode - physical",
> +};
> +
> +static inline bool is_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return !(trblimitr & TRBLIMITR_NVM);
> +}
> +
> +static inline bool is_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return trblimitr & TRBLIMITR_NVM;
> +}
> +
> +static inline void set_trbe_virtual_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_physical_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_NVM;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_trig_mode {
> +	TRBE_TRIGGER_STOP	= 0,
> +	TRBE_TRIGGER_IRQ	= 1,
> +	TRBE_TRIGGER_IGNORE	= 3,
> +};
> +
> +static const char *const trbe_trig_mode_str[] = {
> +	[TRBE_TRIGGER_STOP]	= "Trigger mode - stop",
> +	[TRBE_TRIGGER_IRQ]	= "Trigger mode - irq",
> +	[TRBE_TRIGGER_IGNORE]	= "Trigger mode - ignore",
> +};
> +
> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & 
> TRBLIMITR_TRIG_MODE_MASK;
> +}
> +
> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << 
> TRBLIMITR_TRIG_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +enum trbe_fill_mode {
> +	TRBE_FILL_STOP		= 0,
> +	TRBE_FILL_WRAP		= 1,
> +	TRBE_FILL_CIRCULAR	= 3,
> +};
> +
> +static const char *const trbe_fill_mode_str[] = {
> +	[TRBE_FILL_STOP]	= "Buffer mode - stop",
> +	[TRBE_FILL_WRAP]	= "Buffer mode - wrap",
> +	[TRBE_FILL_CIRCULAR]	= "Buffer mode - circular",
> +};
> +
> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & 
> TRBLIMITR_FILL_MODE_MASK;
> +}
> +
> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
> +	trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << 
> TRBLIMITR_FILL_MODE_SHIFT);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_disabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr &= ~TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline void set_trbe_enabled(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	trblimitr |= TRBLIMITR_ENABLE;
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline bool get_trbe_flag_update(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return trbidr & TRBIDR_FLAG;
> +}
> +
> +static inline bool is_trbe_programmable(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return !(trbidr & TRBIDR_PROG);
> +}
> +
> +enum trbe_buffer_align {
> +	TRBE_BUFFER_BYTE,
> +	TRBE_BUFFER_HALF_WORD,
> +	TRBE_BUFFER_WORD,
> +	TRBE_BUFFER_DOUBLE_WORD,
> +	TRBE_BUFFER_16_BYTES,
> +	TRBE_BUFFER_32_BYTES,
> +	TRBE_BUFFER_64_BYTES,
> +	TRBE_BUFFER_128_BYTES,
> +	TRBE_BUFFER_256_BYTES,
> +	TRBE_BUFFER_512_BYTES,
> +	TRBE_BUFFER_1K_BYTES,
> +	TRBE_BUFFER_2K_BYTES,
> +};
> +
> +static const char *const trbe_buffer_align_str[] = {
> +	[TRBE_BUFFER_BYTE]		= "Byte",
> +	[TRBE_BUFFER_HALF_WORD]		= "Half word",
> +	[TRBE_BUFFER_WORD]		= "Word",
> +	[TRBE_BUFFER_DOUBLE_WORD]	= "Double word",
> +	[TRBE_BUFFER_16_BYTES]		= "16 bytes",
> +	[TRBE_BUFFER_32_BYTES]		= "32 bytes",
> +	[TRBE_BUFFER_64_BYTES]		= "64 bytes",
> +	[TRBE_BUFFER_128_BYTES]		= "128 bytes",
> +	[TRBE_BUFFER_256_BYTES]		= "256 bytes",
> +	[TRBE_BUFFER_512_BYTES]		= "512 bytes",
> +	[TRBE_BUFFER_1K_BYTES]		= "1K bytes",
> +	[TRBE_BUFFER_2K_BYTES]		= "2K bytes",
> +};
> +
> +static inline enum trbe_buffer_align get_trbe_address_align(void)
> +{
> +	u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
> +
> +	return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
> +}
> +
> +static inline void assert_trbe_address_mode(unsigned long addr)
> +{
> +	bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
> +	bool virt_mode = is_trbe_virtual_mode();
> +
> +	WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
> +}
> +
> +static inline void assert_trbe_address_align(unsigned long addr)
> +{
> +	unsigned long nr_bytes = 1ULL << get_trbe_address_align();
> +
> +	WARN_ON(addr & (nr_bytes - 1));
> +}
> +
> +static inline unsigned long get_trbe_write_pointer(void)
> +{
> +	u64 trbptr = read_sysreg_s(SYS_TRBPTR_EL1);
> +	unsigned long addr = (trbptr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_write_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	addr = (addr >> TRBPTR_PTR_SHIFT) & TRBPTR_PTR_MASK;
> +	write_sysreg_s(addr, SYS_TRBPTR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_limit_pointer(void)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +	unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & 
> TRBLIMITR_LIMIT_MASK;
> +	unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
> +
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_limit_pointer(unsigned long addr)
> +{
> +	u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBLIMITR_LIMIT_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	trblimitr &= ~(TRBLIMITR_LIMIT_MASK << TRBLIMITR_LIMIT_SHIFT);
> +	trblimitr |= (addr & PAGE_MASK);
> +	write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
> +}
> +
> +static inline unsigned long get_trbe_base_pointer(void)
> +{
> +	u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
> +	unsigned long addr = (trbbaser >> TRBBASER_BASE_SHIFT) & 
> TRBBASER_BASE_MASK;
> +
> +	addr = addr << TRBBASER_BASE_SHIFT;
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	return addr;
> +}
> +
> +static inline void set_trbe_base_pointer(unsigned long addr)
> +{
> +	WARN_ON(is_trbe_enabled());
> +	assert_trbe_address_mode(addr);
> +	assert_trbe_address_align(addr);
> +	WARN_ON(addr & ((1UL << TRBBASER_BASE_SHIFT) - 1));
> +	WARN_ON(addr & (PAGE_SIZE - 1));
> +	write_sysreg_s(addr, SYS_TRBBASER_EL1);
> +}
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-14  5:17   ` Tingwei Zhang
@ 2020-11-16 15:00     ` Mike Leach
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Leach @ 2020-11-16 15:00 UTC (permalink / raw)
  To: Tingwei Zhang
  Cc: Anshuman Khandual, Coresight ML, Linux Kernel Mailing List,
	linux-arm-kernel

Hi Anshuman,

I've not looked in detail at this set yet, but having skimmed through
it  I do have an initial question about the handling of wrapped data
buffers.

With the ETR/ETB we found an issue with the way perf concatenated data
captured from the hardware buffer into a single contiguous data
block. The issue occurs when a wrapped buffer appears after another
buffer in the data file. In a typical session perf would stop trace
and copy the hardware buffer multiple times into the auxtrace buffer.

e.g.

For ETR/ETB we have a fixed length hardware data buffer - and no way
of detecting buffer wraps using interrupts as the tracing is in
progress.

If the buffer is not full at the point that perf transfers it then the
data will look like this:-
1) <async><synced trace data>
easy to decode, we can see the async at the start of the data - which
would be the async issued at the start of trace.

If the buffer wraps we see this:-

2) <unsynced trace data><async><synced trace data>

Again no real issue, the decoder will skip to the async and trace from
there - we lose the unsynced data.

Now the problem occurs when multiple transfers of data occur. We can
see the following appearing as contiguous trace in the auxtrace
buffer:-

3) < async><synced trace data><unsynced trace data><async><synced trace data>

Now the decoder cannot spot the point that the synced data from the
first capture ends, and the unsynced data from the second capture
begins.
This means it will continue to decode into the unsynced data - which
will result in incorrect trace / outright errors. To get round this
for ETR/ETB the driver will insert barrier packets into the datafile
if a wrap event is detected.

4) <async><synced trace data><barrier><unsynced trace
data><async><synced trace data>

This <barrier> has the effect of resetting the decoder into the
unsynced state so that the invalid trace is not decoded. This is a
workaround we have to do to handle the limitations of the ETR / ETB
trace hardware.

For TRBE we do have interrupts, so it should be possible to prevent
the buffer wrapping in most cases - but I did see in the code that
there are handlers for the TRBE buffer wrap management event. Are
there other factors in play that will prevent data pattern 3) from
appearing in the auxtrace buffer?

Regards

Mike





On Sat, 14 Nov 2020 at 05:17, Tingwei Zhang <tingweiz@codeaurora.org> wrote:
>
> Hi Anshuman,
>
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> > This series enables future IP trace features Embedded Trace Extension (ETE)
> > and Trace Buffer Extension (TRBE). This series depends on the ETM system
> > register instruction support series [0] and the v8.4 Self hosted tracing
> > support series (Jonathan Zhou) [1]. The tree is available here [2] for
> > quick access.
> >
> > ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> > extensions. ETE overlaps with the ETMv4 architecture, with additions to
> > support the newer architecture features and some restrictions on the
> > supported features w.r.t ETMv4. The ETE support is added by extending the
> > ETMv4 driver to recognise the ETE and handle the features as exposed by the
> > TRCIDRx registers. ETE only supports system instructions access from the
> > host CPU. The ETE could be integrated with a TRBE (see below), or with the
> > legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> > description as the ETMs and requires a node per instance.
> >
> > Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> > accessible via the system registers and can be combined with the ETE to
> > provide a 1x1 configuration of source & sink. TRBE is being represented
> > here as a CoreSight sink. Primary reason is that the ETE source could work
> > with other traditional CoreSight sink devices. As TRBE captures the trace
> > data which is produced by ETE, it cannot work alone.
> >
> > TRBE representation here have some distinct deviations from a traditional
> > CoreSight sink device. Coresight path between ETE and TRBE are not built
> > during boot looking at respective DT or ACPI entries. Instead TRBE gets
> > checked on each available CPU, when found gets connected with respective
> > ETE source device on the same CPU, after altering its outward connections.
> > ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> > coupling/decoupling method implemented here is not optimal and would be
> > reworked later on.
>
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?
>
> Thanks,
> Tingwei
>
> >
> > Unlike traditional sinks, TRBE can generate interrupts to signal including
> > many other things, buffer got filled. The interrupt is a PPI and should be
> > communicated from the platform. DT or ACPI entry representing TRBE should
> > have the PPI number for a given platform. During perf session, the TRBE IRQ
> > handler should capture trace for perf auxiliary buffer before restarting it
> > back. System registers being used here to configure ETE and TRBE could be
> > referred in the link below.
> >
> > https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> >
> > This adds another change where CoreSight sink device needs to be disabled
> > before capturing the trace data for perf in order to avoid race condition
> > with another simultaneous TRBE IRQ handling. This might cause problem with
> > traditional sink devices which can be operated in both sysfs and perf mode.
> > This needs to be addressed correctly. One option would be to move the
> > update_buffer callback into the respective sink devices. e.g, disable().
> >
> > This series is primarily looking from some early feed back both on proposed
> > design and its implementation. It acknowledges, that it might be incomplete
> > and will have scopes for improvement.
> >
> > Things todo:
> > - Improve ETE-TRBE coupling and decoupling method
> > - Improve TRBE IRQ handling for all possible corner cases
> > - Implement sysfs based trace sessions
> >
> > [0]
> > https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> > [1]
> > https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> > [2]
> > https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> >
> > Anshuman Khandual (6):
> >   arm64: Add TRBE definitions
> >   coresight: sink: Add TRBE driver
> >   coresight: etm-perf: Truncate the perf record if handle has no space
> >   coresight: etm-perf: Disable the path before capturing the trace data
> >   coresgith: etm-perf: Connect TRBE sink with ETE source
> >   dts: bindings: Document device tree binding for Arm TRBE
> >
> > Suzuki K Poulose (5):
> >   coresight: etm-perf: Allow an event to use different sinks
> >   coresight: Do not scan for graph if none is present
> >   coresight: etm4x: Add support for PE OS lock
> >   coresight: ete: Add support for sysreg support
> >   coresight: ete: Detect ETE as one of the supported ETMs
> >
> >  .../devicetree/bindings/arm/coresight.txt          |   3 +
> >  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
> >  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
> >  arch/arm64/include/asm/sysreg.h                    |  51 ++
> >  drivers/hwtracing/coresight/Kconfig                |  11 +
> >  drivers/hwtracing/coresight/Makefile               |   1 +
> >  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
> >  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
> >  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
> >  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
> >  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
> >  drivers/hwtracing/coresight/coresight-trbe.c       | 768
> > +++++++++++++++++++++
> >  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
> >  include/linux/coresight.h                          |   2 +
> >  14 files changed, 1680 insertions(+), 43 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
> >  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > CoreSight mailing list
> > CoreSight@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/coresight
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-16 15:00     ` Mike Leach
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Leach @ 2020-11-16 15:00 UTC (permalink / raw)
  To: Tingwei Zhang
  Cc: Coresight ML, Linux Kernel Mailing List, linux-arm-kernel,
	Anshuman Khandual

Hi Anshuman,

I've not looked in detail at this set yet, but having skimmed through
it  I do have an initial question about the handling of wrapped data
buffers.

With the ETR/ETB we found an issue with the way perf concatenated data
captured from the hardware buffer into a single contiguous data
block. The issue occurs when a wrapped buffer appears after another
buffer in the data file. In a typical session perf would stop trace
and copy the hardware buffer multiple times into the auxtrace buffer.

e.g.

For ETR/ETB we have a fixed length hardware data buffer - and no way
of detecting buffer wraps using interrupts as the tracing is in
progress.

If the buffer is not full at the point that perf transfers it then the
data will look like this:-
1) <async><synced trace data>
easy to decode, we can see the async at the start of the data - which
would be the async issued at the start of trace.

If the buffer wraps we see this:-

2) <unsynced trace data><async><synced trace data>

Again no real issue, the decoder will skip to the async and trace from
there - we lose the unsynced data.

Now the problem occurs when multiple transfers of data occur. We can
see the following appearing as contiguous trace in the auxtrace
buffer:-

3) < async><synced trace data><unsynced trace data><async><synced trace data>

Now the decoder cannot spot the point that the synced data from the
first capture ends, and the unsynced data from the second capture
begins.
This means it will continue to decode into the unsynced data - which
will result in incorrect trace / outright errors. To get round this
for ETR/ETB the driver will insert barrier packets into the datafile
if a wrap event is detected.

4) <async><synced trace data><barrier><unsynced trace
data><async><synced trace data>

This <barrier> has the effect of resetting the decoder into the
unsynced state so that the invalid trace is not decoded. This is a
workaround we have to do to handle the limitations of the ETR / ETB
trace hardware.

For TRBE we do have interrupts, so it should be possible to prevent
the buffer wrapping in most cases - but I did see in the code that
there are handlers for the TRBE buffer wrap management event. Are
there other factors in play that will prevent data pattern 3) from
appearing in the auxtrace buffer?

Regards

Mike





On Sat, 14 Nov 2020 at 05:17, Tingwei Zhang <tingweiz@codeaurora.org> wrote:
>
> Hi Anshuman,
>
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> > This series enables future IP trace features Embedded Trace Extension (ETE)
> > and Trace Buffer Extension (TRBE). This series depends on the ETM system
> > register instruction support series [0] and the v8.4 Self hosted tracing
> > support series (Jonathan Zhou) [1]. The tree is available here [2] for
> > quick access.
> >
> > ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> > extensions. ETE overlaps with the ETMv4 architecture, with additions to
> > support the newer architecture features and some restrictions on the
> > supported features w.r.t ETMv4. The ETE support is added by extending the
> > ETMv4 driver to recognise the ETE and handle the features as exposed by the
> > TRCIDRx registers. ETE only supports system instructions access from the
> > host CPU. The ETE could be integrated with a TRBE (see below), or with the
> > legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> > description as the ETMs and requires a node per instance.
> >
> > Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> > accessible via the system registers and can be combined with the ETE to
> > provide a 1x1 configuration of source & sink. TRBE is being represented
> > here as a CoreSight sink. Primary reason is that the ETE source could work
> > with other traditional CoreSight sink devices. As TRBE captures the trace
> > data which is produced by ETE, it cannot work alone.
> >
> > TRBE representation here have some distinct deviations from a traditional
> > CoreSight sink device. Coresight path between ETE and TRBE are not built
> > during boot looking at respective DT or ACPI entries. Instead TRBE gets
> > checked on each available CPU, when found gets connected with respective
> > ETE source device on the same CPU, after altering its outward connections.
> > ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> > coupling/decoupling method implemented here is not optimal and would be
> > reworked later on.
>
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?
>
> Thanks,
> Tingwei
>
> >
> > Unlike traditional sinks, TRBE can generate interrupts to signal including
> > many other things, buffer got filled. The interrupt is a PPI and should be
> > communicated from the platform. DT or ACPI entry representing TRBE should
> > have the PPI number for a given platform. During perf session, the TRBE IRQ
> > handler should capture trace for perf auxiliary buffer before restarting it
> > back. System registers being used here to configure ETE and TRBE could be
> > referred in the link below.
> >
> > https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> >
> > This adds another change where CoreSight sink device needs to be disabled
> > before capturing the trace data for perf in order to avoid race condition
> > with another simultaneous TRBE IRQ handling. This might cause problem with
> > traditional sink devices which can be operated in both sysfs and perf mode.
> > This needs to be addressed correctly. One option would be to move the
> > update_buffer callback into the respective sink devices. e.g, disable().
> >
> > This series is primarily looking from some early feed back both on proposed
> > design and its implementation. It acknowledges, that it might be incomplete
> > and will have scopes for improvement.
> >
> > Things todo:
> > - Improve ETE-TRBE coupling and decoupling method
> > - Improve TRBE IRQ handling for all possible corner cases
> > - Implement sysfs based trace sessions
> >
> > [0]
> > https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> > [1]
> > https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> > [2]
> > https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> >
> > Anshuman Khandual (6):
> >   arm64: Add TRBE definitions
> >   coresight: sink: Add TRBE driver
> >   coresight: etm-perf: Truncate the perf record if handle has no space
> >   coresight: etm-perf: Disable the path before capturing the trace data
> >   coresgith: etm-perf: Connect TRBE sink with ETE source
> >   dts: bindings: Document device tree binding for Arm TRBE
> >
> > Suzuki K Poulose (5):
> >   coresight: etm-perf: Allow an event to use different sinks
> >   coresight: Do not scan for graph if none is present
> >   coresight: etm4x: Add support for PE OS lock
> >   coresight: ete: Add support for sysreg support
> >   coresight: ete: Detect ETE as one of the supported ETMs
> >
> >  .../devicetree/bindings/arm/coresight.txt          |   3 +
> >  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
> >  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
> >  arch/arm64/include/asm/sysreg.h                    |  51 ++
> >  drivers/hwtracing/coresight/Kconfig                |  11 +
> >  drivers/hwtracing/coresight/Makefile               |   1 +
> >  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
> >  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
> >  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
> >  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
> >  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
> >  drivers/hwtracing/coresight/coresight-trbe.c       | 768
> > +++++++++++++++++++++
> >  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
> >  include/linux/coresight.h                          |   2 +
> >  14 files changed, 1680 insertions(+), 43 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
> >  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > CoreSight mailing list
> > CoreSight@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/coresight
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-14  5:17   ` Tingwei Zhang
@ 2020-11-23  2:43     ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  2:43 UTC (permalink / raw)
  To: Tingwei Zhang; +Cc: linux-arm-kernel, coresight, mike.leach, linux-kernel

Hello Tingwei,

On 11/14/20 10:47 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
>> This series enables future IP trace features Embedded Trace Extension (ETE)
>> and Trace Buffer Extension (TRBE). This series depends on the ETM system
>> register instruction support series [0] and the v8.4 Self hosted tracing
>> support series (Jonathan Zhou) [1]. The tree is available here [2] for
>> quick access.
>>
>> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
>> extensions. ETE overlaps with the ETMv4 architecture, with additions to
>> support the newer architecture features and some restrictions on the
>> supported features w.r.t ETMv4. The ETE support is added by extending the
>> ETMv4 driver to recognise the ETE and handle the features as exposed by the
>> TRCIDRx registers. ETE only supports system instructions access from the
>> host CPU. The ETE could be integrated with a TRBE (see below), or with the
>> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
>> description as the ETMs and requires a node per instance.
>>
>> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
>> accessible via the system registers and can be combined with the ETE to
>> provide a 1x1 configuration of source & sink. TRBE is being represented
>> here as a CoreSight sink. Primary reason is that the ETE source could work
>> with other traditional CoreSight sink devices. As TRBE captures the trace
>> data which is produced by ETE, it cannot work alone.
>>
>> TRBE representation here have some distinct deviations from a traditional
>> CoreSight sink device. Coresight path between ETE and TRBE are not built
>> during boot looking at respective DT or ACPI entries. Instead TRBE gets
>> checked on each available CPU, when found gets connected with respective
>> ETE source device on the same CPU, after altering its outward connections.
>> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
>> coupling/decoupling method implemented here is not optimal and would be
>> reworked later on.
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?

Yes, either in subsequent versions or later on, after first getting the perf
based functionality enabled. Nonetheless, sysfs is also on the todo list as
mentioned in the cover letter.

- Anshuman

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-23  2:43     ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  2:43 UTC (permalink / raw)
  To: Tingwei Zhang; +Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hello Tingwei,

On 11/14/20 10:47 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
>> This series enables future IP trace features Embedded Trace Extension (ETE)
>> and Trace Buffer Extension (TRBE). This series depends on the ETM system
>> register instruction support series [0] and the v8.4 Self hosted tracing
>> support series (Jonathan Zhou) [1]. The tree is available here [2] for
>> quick access.
>>
>> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
>> extensions. ETE overlaps with the ETMv4 architecture, with additions to
>> support the newer architecture features and some restrictions on the
>> supported features w.r.t ETMv4. The ETE support is added by extending the
>> ETMv4 driver to recognise the ETE and handle the features as exposed by the
>> TRCIDRx registers. ETE only supports system instructions access from the
>> host CPU. The ETE could be integrated with a TRBE (see below), or with the
>> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
>> description as the ETMs and requires a node per instance.
>>
>> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
>> accessible via the system registers and can be combined with the ETE to
>> provide a 1x1 configuration of source & sink. TRBE is being represented
>> here as a CoreSight sink. Primary reason is that the ETE source could work
>> with other traditional CoreSight sink devices. As TRBE captures the trace
>> data which is produced by ETE, it cannot work alone.
>>
>> TRBE representation here have some distinct deviations from a traditional
>> CoreSight sink device. Coresight path between ETE and TRBE are not built
>> during boot looking at respective DT or ACPI entries. Instead TRBE gets
>> checked on each available CPU, when found gets connected with respective
>> ETE source device on the same CPU, after altering its outward connections.
>> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
>> coupling/decoupling method implemented here is not optimal and would be
>> reworked later on.
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?

Yes, either in subsequent versions or later on, after first getting the perf
based functionality enabled. Nonetheless, sysfs is also on the todo list as
mentioned in the cover letter.

- Anshuman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-16 15:00     ` Mike Leach
@ 2020-11-23  3:40       ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  3:40 UTC (permalink / raw)
  To: Mike Leach, Tingwei Zhang
  Cc: Coresight ML, Linux Kernel Mailing List, linux-arm-kernel

Hello Mike,

On 11/16/20 8:30 PM, Mike Leach wrote:
> Hi Anshuman,
> 
> I've not looked in detail at this set yet, but having skimmed through
> it  I do have an initial question about the handling of wrapped data
> buffers.
> 
> With the ETR/ETB we found an issue with the way perf concatenated data
> captured from the hardware buffer into a single contiguous data
> block. The issue occurs when a wrapped buffer appears after another
> buffer in the data file. In a typical session perf would stop trace
> and copy the hardware buffer multiple times into the auxtrace buffer.

The hardware buffer and perf aux trace buffer are the same for TRBE and
hence there is no actual copy involved. Trace data gets pushed into the
user space via perf_aux_output_end() either via etm_event_stop() or via
the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
happens via updates to perf aux buffer indices i.e head, tail, wake up.
But logically, they will appear as a stream of records to the user space
while parsing perf.data file.

> 
> e.g.
> 
> For ETR/ETB we have a fixed length hardware data buffer - and no way
> of detecting buffer wraps using interrupts as the tracing is in
> progress.

TRBE has an interrupt. Hence there will be an opportunity to insert any
additional packets if required to demarcate pre and post IRQ trace data
streams. 

> 
> If the buffer is not full at the point that perf transfers it then the
> data will look like this:-
> 1) <async><synced trace data>
> easy to decode, we can see the async at the start of the data - which
> would be the async issued at the start of trace.

Just curious, what makes the tracer to generate the <async> trace packet.
Is there an explicit instruction or that is how the tracer starts when
enabled ?

> 
> If the buffer wraps we see this:-
> 
> 2) <unsynced trace data><async><synced trace data>
> 
> Again no real issue, the decoder will skip to the async and trace from
> there - we lose the unsynced data.

Could you please elaborate more on the difference between sync and async
trace data ?

> 
> Now the problem occurs when multiple transfers of data occur. We can
> see the following appearing as contiguous trace in the auxtrace
> buffer:-
> 
> 3) < async><synced trace data><unsynced trace data><async><synced trace data>

So there is an wrap around event between <synced trace data> and
<unsynced trace data> ? Are there any other situations where this
might happen ?

> 
> Now the decoder cannot spot the point that the synced data from the
> first capture ends, and the unsynced data from the second capture
> begins.

Got it.

> This means it will continue to decode into the unsynced data - which
> will result in incorrect trace / outright errors. To get round this
> for ETR/ETB the driver will insert barrier packets into the datafile
> if a wrap event is detected.

But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
is even detected ?

> 
> 4) <async><synced trace data><barrier><unsynced trace
> data><async><synced trace data>
> 
> This <barrier> has the effect of resetting the decoder into the
> unsynced state so that the invalid trace is not decoded. This is a
> workaround we have to do to handle the limitations of the ETR / ETB
> trace hardware.
Got it.

> 
> For TRBE we do have interrupts, so it should be possible to prevent
> the buffer wrapping in most cases - but I did see in the code that
> there are handlers for the TRBE buffer wrap management event. Are
> there other factors in play that will prevent data pattern 3) from
> appearing in the auxtrace buffer ?

On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
would assume that ETE will then start again with an <async> data packet
first when the handler returns. Otherwise we might also have to insert
a similar barrier packet for the user space tool to reset. As trace data
should not get lost during an wrap event, ETE should complete the packet
after the handler returns, hence aux buffer should still have logically
contiguous stream of <synced trace data> to decode. I am not sure right
now, but will look into this.

- Anshuman

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-23  3:40       ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  3:40 UTC (permalink / raw)
  To: Mike Leach, Tingwei Zhang
  Cc: Coresight ML, Linux Kernel Mailing List, linux-arm-kernel

Hello Mike,

On 11/16/20 8:30 PM, Mike Leach wrote:
> Hi Anshuman,
> 
> I've not looked in detail at this set yet, but having skimmed through
> it  I do have an initial question about the handling of wrapped data
> buffers.
> 
> With the ETR/ETB we found an issue with the way perf concatenated data
> captured from the hardware buffer into a single contiguous data
> block. The issue occurs when a wrapped buffer appears after another
> buffer in the data file. In a typical session perf would stop trace
> and copy the hardware buffer multiple times into the auxtrace buffer.

The hardware buffer and perf aux trace buffer are the same for TRBE and
hence there is no actual copy involved. Trace data gets pushed into the
user space via perf_aux_output_end() either via etm_event_stop() or via
the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
happens via updates to perf aux buffer indices i.e head, tail, wake up.
But logically, they will appear as a stream of records to the user space
while parsing perf.data file.

> 
> e.g.
> 
> For ETR/ETB we have a fixed length hardware data buffer - and no way
> of detecting buffer wraps using interrupts as the tracing is in
> progress.

TRBE has an interrupt. Hence there will be an opportunity to insert any
additional packets if required to demarcate pre and post IRQ trace data
streams. 

> 
> If the buffer is not full at the point that perf transfers it then the
> data will look like this:-
> 1) <async><synced trace data>
> easy to decode, we can see the async at the start of the data - which
> would be the async issued at the start of trace.

Just curious, what makes the tracer to generate the <async> trace packet.
Is there an explicit instruction or that is how the tracer starts when
enabled ?

> 
> If the buffer wraps we see this:-
> 
> 2) <unsynced trace data><async><synced trace data>
> 
> Again no real issue, the decoder will skip to the async and trace from
> there - we lose the unsynced data.

Could you please elaborate more on the difference between sync and async
trace data ?

> 
> Now the problem occurs when multiple transfers of data occur. We can
> see the following appearing as contiguous trace in the auxtrace
> buffer:-
> 
> 3) < async><synced trace data><unsynced trace data><async><synced trace data>

So there is an wrap around event between <synced trace data> and
<unsynced trace data> ? Are there any other situations where this
might happen ?

> 
> Now the decoder cannot spot the point that the synced data from the
> first capture ends, and the unsynced data from the second capture
> begins.

Got it.

> This means it will continue to decode into the unsynced data - which
> will result in incorrect trace / outright errors. To get round this
> for ETR/ETB the driver will insert barrier packets into the datafile
> if a wrap event is detected.

But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
is even detected ?

> 
> 4) <async><synced trace data><barrier><unsynced trace
> data><async><synced trace data>
> 
> This <barrier> has the effect of resetting the decoder into the
> unsynced state so that the invalid trace is not decoded. This is a
> workaround we have to do to handle the limitations of the ETR / ETB
> trace hardware.
Got it.

> 
> For TRBE we do have interrupts, so it should be possible to prevent
> the buffer wrapping in most cases - but I did see in the code that
> there are handlers for the TRBE buffer wrap management event. Are
> there other factors in play that will prevent data pattern 3) from
> appearing in the auxtrace buffer ?

On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
would assume that ETE will then start again with an <async> data packet
first when the handler returns. Otherwise we might also have to insert
a similar barrier packet for the user space tool to reset. As trace data
should not get lost during an wrap event, ETE should complete the packet
after the handler returns, hence aux buffer should still have logically
contiguous stream of <synced trace data> to decode. I am not sure right
now, but will look into this.

- Anshuman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
  2020-11-14  5:38     ` Tingwei Zhang
@ 2020-11-23  3:51       ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  3:51 UTC (permalink / raw)
  To: Tingwei Zhang; +Cc: linux-arm-kernel, coresight, mike.leach, linux-kernel



On 11/14/20 11:08 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>>  arch/arm64/include/asm/sysreg.h                  |   2 +
>>  drivers/hwtracing/coresight/Kconfig              |  11 +
>>  drivers/hwtracing/coresight/Makefile             |   1 +
>>  drivers/hwtracing/coresight/coresight-trbe.c     | 766 
>> +++++++++++++++++++++++
>>  drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>>  6 files changed, 1341 insertions(+)
>>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
>> b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 0000000..4320a8b
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,36 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Trace Buffer Extension (TRBE).
>> +==============================
>> +
>> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
>> +    :Date:     November 2020
>> +
>> +Hardware Description
>> +--------------------
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +Sysfs files and directories
>> +---------------------------
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +	>$ ls /sys/bus/coresight/devices
>> +	trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
>> +
>> +	>$ ls /sys/bus/coresight/devices/trbe0/
>> +	irq align dbm
>> +
>> +*Key file items are:-*
>> +   * ``irq``: TRBE maintenance interrupt number
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 14cb156..61136f6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>  #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << 
>> PSTATE_Imm_shift))
>>  #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) 
>> << PSTATE_Imm_shift))
>>  #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << 
>> PSTATE_Imm_shift))
>> +#define TSB_CSYNC			__emit_inst(0xd503225f)
>>
>>  #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>>  	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
>> @@ -865,6 +866,7 @@
>>  #define ID_AA64MMFR2_CNP_SHIFT		0
>>
>>  /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_TRBE_SHIFT		44
>>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>>  #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>>  #define ID_AA64DFR0_PMSVER_SHIFT	32
>> diff --git a/drivers/hwtracing/coresight/Kconfig 
>> b/drivers/hwtracing/coresight/Kconfig
>> index c119824..0f5e101 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>>  	  To compile this driver as a module, choose M here: the
>>  	  module will be called coresight-cti.
>>
>> +config CORESIGHT_TRBE
>> +	bool "Trace Buffer Extension (TRBE) driver"
> 
> Can you consider to support TRBE as loadable module since all coresight
> drivers support loadable module now.

Reworking the TRBE driver and making it a loadable module is part of it.

- Anshuman

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
@ 2020-11-23  3:51       ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  3:51 UTC (permalink / raw)
  To: Tingwei Zhang; +Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach



On 11/14/20 11:08 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>>  arch/arm64/include/asm/sysreg.h                  |   2 +
>>  drivers/hwtracing/coresight/Kconfig              |  11 +
>>  drivers/hwtracing/coresight/Makefile             |   1 +
>>  drivers/hwtracing/coresight/coresight-trbe.c     | 766 
>> +++++++++++++++++++++++
>>  drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>>  6 files changed, 1341 insertions(+)
>>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
>> b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 0000000..4320a8b
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,36 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Trace Buffer Extension (TRBE).
>> +==============================
>> +
>> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
>> +    :Date:     November 2020
>> +
>> +Hardware Description
>> +--------------------
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +Sysfs files and directories
>> +---------------------------
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +	>$ ls /sys/bus/coresight/devices
>> +	trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
>> +
>> +	>$ ls /sys/bus/coresight/devices/trbe0/
>> +	irq align dbm
>> +
>> +*Key file items are:-*
>> +   * ``irq``: TRBE maintenance interrupt number
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 14cb156..61136f6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>  #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << 
>> PSTATE_Imm_shift))
>>  #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) 
>> << PSTATE_Imm_shift))
>>  #define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << 
>> PSTATE_Imm_shift))
>> +#define TSB_CSYNC			__emit_inst(0xd503225f)
>>
>>  #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>>  	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
>> @@ -865,6 +866,7 @@
>>  #define ID_AA64MMFR2_CNP_SHIFT		0
>>
>>  /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_TRBE_SHIFT		44
>>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>>  #define ID_AA64DFR0_DOUBLELOCK_SHIFT	36
>>  #define ID_AA64DFR0_PMSVER_SHIFT	32
>> diff --git a/drivers/hwtracing/coresight/Kconfig 
>> b/drivers/hwtracing/coresight/Kconfig
>> index c119824..0f5e101 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>>  	  To compile this driver as a module, choose M here: the
>>  	  module will be called coresight-cti.
>>
>> +config CORESIGHT_TRBE
>> +	bool "Trace Buffer Extension (TRBE) driver"
> 
> Can you consider to support TRBE as loadable module since all coresight
> drivers support loadable module now.

Reworking the TRBE driver and making it a loadable module is part of it.

- Anshuman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
  2020-11-12  9:31     ` Suzuki K Poulose
@ 2020-11-23  5:37       ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  5:37 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 3:01 PM, Suzuki K Poulose wrote:
> Hi Anshuman,
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> Unlike traditional sink devices, individual TRBE instances are not detected
>> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
>> process. Hence a path connecting ETE and TRBE on a given CPU would not have
>> been established until then. This adds two coresight helpers that will help
>> modify outward connections from a source device to establish and terminate
>> path to a given sink device. But this method might not be optimal and would
>> be reworked later.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Instead of this, could we come up something like a percpu_sink concept ? That
> way, the TRBE driver could register the percpu_sink for the corresponding CPU
> and we don't have to worry about the order in which the ETE will be probed
> on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following
> approach would fail to register the sink).

Right, it wont work.

We already have a per cpu csdev sink. The current mechanism expects all ETEs
to have been established and the TRBEs just get plugged in during their init
while probing each individual cpus. During cpu hotplug in or out, a TRBE-ETE
link either gets created and destroyed. But it assumes that an ETE is always
present for TRBE to get plugged into or teared down from. csdev for TRBE sink
too gets released during cpu hot remove path.

Are you suggesting that there should be a percpu static csdev array defined
for potential all TRBEs so that the ETE-TRBE links be permanently established
given that the ETEs are permanent and never really go away with cpu hot remove
event (my assumption). TRBE csdevs should just get enabled or disabled without
really being destroyed during cpu hotplug, so that the corresponding TRBE-ETE
connection remains in place.

> 
> And the default sink can be initialized when the ETE instance first starts
> looking for it.

IIUC def_sink is the sink which will be selected by default for a source device
while creating a path, in case there is no clear preference from the user. ETE's
default sink should be fixed (TRBE) to be on the easy side and hence assigning
that during connection expansion procedure, does make sense. But then it can be
more complex where the 'default' sink for an ETE can be scenario specific and
may not be always be its TRBE.

The expanding connections fits into a scenario where the ETE is present with
all it's other traditional sinks and TRBE is the one which comes in or goes out
with the cpu.

If ETE also comes in and goes out with individual cpu hotplug which is preferred
ideally, we would need to also

1. Co-ordinate with TRBE bring up and connection creation to avoid race
2. Rediscover traditional sinks which were attached to the ETE before -
   go back, rescan the DT/ACPI entries for sinks with whom a path can
   be established etc.

Basically there are three choices we have here

1. ETE is permanent, TRBE and ETE-TRBE path gets created or destroyed with hotplug (current proposal)
2. ETE/TRBE/ETE-TRBE path are all permanent, ETE and TRBE get enabled or disabled with hotplug
3. ETE, TRBE and ETE-TRBE path, all get created, enabled and destroyed with hotplug in sync

- Anshuman

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
@ 2020-11-23  5:37       ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  5:37 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 3:01 PM, Suzuki K Poulose wrote:
> Hi Anshuman,
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> Unlike traditional sink devices, individual TRBE instances are not detected
>> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
>> process. Hence a path connecting ETE and TRBE on a given CPU would not have
>> been established until then. This adds two coresight helpers that will help
>> modify outward connections from a source device to establish and terminate
>> path to a given sink device. But this method might not be optimal and would
>> be reworked later.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Instead of this, could we come up something like a percpu_sink concept ? That
> way, the TRBE driver could register the percpu_sink for the corresponding CPU
> and we don't have to worry about the order in which the ETE will be probed
> on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following
> approach would fail to register the sink).

Right, it wont work.

We already have a per cpu csdev sink. The current mechanism expects all ETEs
to have been established and the TRBEs just get plugged in during their init
while probing each individual cpus. During cpu hotplug in or out, a TRBE-ETE
link either gets created and destroyed. But it assumes that an ETE is always
present for TRBE to get plugged into or teared down from. csdev for TRBE sink
too gets released during cpu hot remove path.

Are you suggesting that there should be a percpu static csdev array defined
for potential all TRBEs so that the ETE-TRBE links be permanently established
given that the ETEs are permanent and never really go away with cpu hot remove
event (my assumption). TRBE csdevs should just get enabled or disabled without
really being destroyed during cpu hotplug, so that the corresponding TRBE-ETE
connection remains in place.

> 
> And the default sink can be initialized when the ETE instance first starts
> looking for it.

IIUC def_sink is the sink which will be selected by default for a source device
while creating a path, in case there is no clear preference from the user. ETE's
default sink should be fixed (TRBE) to be on the easy side and hence assigning
that during connection expansion procedure, does make sense. But then it can be
more complex where the 'default' sink for an ETE can be scenario specific and
may not be always be its TRBE.

The expanding connections fits into a scenario where the ETE is present with
all it's other traditional sinks and TRBE is the one which comes in or goes out
with the cpu.

If ETE also comes in and goes out with individual cpu hotplug which is preferred
ideally, we would need to also

1. Co-ordinate with TRBE bring up and connection creation to avoid race
2. Rediscover traditional sinks which were attached to the ETE before -
   go back, rescan the DT/ACPI entries for sinks with whom a path can
   be established etc.

Basically there are three choices we have here

1. ETE is permanent, TRBE and ETE-TRBE path gets created or destroyed with hotplug (current proposal)
2. ETE/TRBE/ETE-TRBE path are all permanent, ETE and TRBE get enabled or disabled with hotplug
3. ETE, TRBE and ETE-TRBE path, all get created, enabled and destroyed with hotplug in sync

- Anshuman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-12  9:27     ` Suzuki K Poulose
@ 2020-11-23  6:08       ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  6:08 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> perf handle structure needs to be shared with the TRBE IRQ handler for
>> capturing trace data and restarting the handle. There is a probability
>> of an undefined reference based crash when etm event is being stopped
>> while a TRBE IRQ also getting processed. This happens due the release
>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>> link before releasing the handle, which will ensure that a simultaneous
>> TRBE IRQ could not happen.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This might cause problem with traditional sink devices which can be
>> operated in both sysfs and perf mode. This needs to be addressed
>> correctly. One option would be to move the update_buffer callback
>> into the respective sink devices. e.g, disable().
>>
>>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 534e205..1a37991 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>             size = sink_ops(sink)->update_buffer(sink, handle,
>>                             event_data->snk_config);
>> +        coresight_disable_path(path);
>>           perf_aux_output_end(handle, size);
>> +        return;
>>       }
> 
> As you mentioned, this is not ideal where another session could be triggered on
> the sink from a different ETM (not for per-CPU sink) in a different mode before
> you collect the buffer. I believe the best option is to leave the
> update_buffer() to disable_hw. This would need to pass on the "handle" to the
> disable_path.

Passing 'handle' into coresight_ops_sink->disable() would enable pushing
updated trace data into perf aux buffer. But do you propose to drop the
update_buffer() call back completely or just move it into disable() call
back (along with PERF_EF_UPDATE mode check) for all individual sinks for
now. May be, later it can be dropped off completely.

> 
> That way the races can be handled inside the sinks. Also, this aligns the
> perf mode of the sinks with that of the sysfs mode.

Did not get that, could you please elaborate ?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-11-23  6:08       ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-23  6:08 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> perf handle structure needs to be shared with the TRBE IRQ handler for
>> capturing trace data and restarting the handle. There is a probability
>> of an undefined reference based crash when etm event is being stopped
>> while a TRBE IRQ also getting processed. This happens due the release
>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>> link before releasing the handle, which will ensure that a simultaneous
>> TRBE IRQ could not happen.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This might cause problem with traditional sink devices which can be
>> operated in both sysfs and perf mode. This needs to be addressed
>> correctly. One option would be to move the update_buffer callback
>> into the respective sink devices. e.g, disable().
>>
>>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 534e205..1a37991 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>             size = sink_ops(sink)->update_buffer(sink, handle,
>>                             event_data->snk_config);
>> +        coresight_disable_path(path);
>>           perf_aux_output_end(handle, size);
>> +        return;
>>       }
> 
> As you mentioned, this is not ideal where another session could be triggered on
> the sink from a different ETM (not for per-CPU sink) in a different mode before
> you collect the buffer. I believe the best option is to leave the
> update_buffer() to disable_hw. This would need to pass on the "handle" to the
> disable_path.

Passing 'handle' into coresight_ops_sink->disable() would enable pushing
updated trace data into perf aux buffer. But do you propose to drop the
update_buffer() call back completely or just move it into disable() call
back (along with PERF_EF_UPDATE mode check) for all individual sinks for
now. May be, later it can be dropped off completely.

> 
> That way the races can be handled inside the sinks. Also, this aligns the
> perf mode of the sinks with that of the sysfs mode.

Did not get that, could you please elaborate ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
  2020-11-14  5:36     ` Tingwei Zhang
@ 2020-11-23  9:56       ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-23  9:56 UTC (permalink / raw)
  To: Tingwei Zhang, Anshuman Khandual
  Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hi Tingwei,


On 11/14/20 5:36 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> Add ETE as one of the supported device types we support
>> with ETM4x driver. The devices are named following the
>> existing convention as ete<N>.
>>
>> ETE mandates that the trace resource status register is programmed
>> before the tracing is turned on. For the moment simply write to
>> it indicating TraceActive.
>>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---

>> @@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void
>> __iomem *base)
>>   	if (!desc.access.io_mem ||
>>   	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
>>   		drvdata->skip_power_up = true;
>> +	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
>> +	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
>> +	if (drvdata->arch >= ETM_ARCH_ETE) {
>> +		type_name = "ete";
>> +		major -= 4;
>> +	} else {
>> +		type_name = "etm";
>> +	}
>> +
> When trace unit supports ETE, could it be still compatible with ETMv4.4?
> Can use selectively use it as ETM instead of ETE?

No. Even though most of the register sets are compatible, there are additional
restrictions and some new rules for the ETE. So, when you treat the ETE as an
ETMv4.4, you could be treading into "UNPREDICTABLE" behaviors.

Suzuki


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs
@ 2020-11-23  9:56       ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-23  9:56 UTC (permalink / raw)
  To: Tingwei Zhang, Anshuman Khandual
  Cc: coresight, linux-kernel, linux-arm-kernel, mike.leach

Hi Tingwei,


On 11/14/20 5:36 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:45:04PM +0800, Anshuman Khandual wrote:
>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> Add ETE as one of the supported device types we support
>> with ETM4x driver. The devices are named following the
>> existing convention as ete<N>.
>>
>> ETE mandates that the trace resource status register is programmed
>> before the tracing is turned on. For the moment simply write to
>> it indicating TraceActive.
>>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---

>> @@ -1742,6 +1758,19 @@ static int etm4_probe(struct device *dev, void
>> __iomem *base)
>>   	if (!desc.access.io_mem ||
>>   	    fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
>>   		drvdata->skip_power_up = true;
>> +	major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
>> +	minor = ETM_ARCH_MINOR_VERSION(drvdata->arch);
>> +	if (drvdata->arch >= ETM_ARCH_ETE) {
>> +		type_name = "ete";
>> +		major -= 4;
>> +	} else {
>> +		type_name = "etm";
>> +	}
>> +
> When trace unit supports ETE, could it be still compatible with ETMv4.4?
> Can use selectively use it as ETM instead of ETE?

No. Even though most of the register sets are compatible, there are additional
restrictions and some new rules for the ETE. So, when you treat the ETE as an
ETMv4.4, you could be treading into "UNPREDICTABLE" behaviors.

Suzuki


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-23  6:08       ` Anshuman Khandual
@ 2020-11-23 10:01         ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-23 10:01 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/23/20 6:08 AM, Anshuman Khandual wrote:
> 
> 
> On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> perf handle structure needs to be shared with the TRBE IRQ handler for
>>> capturing trace data and restarting the handle. There is a probability
>>> of an undefined reference based crash when etm event is being stopped
>>> while a TRBE IRQ also getting processed. This happens due the release
>>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>>> link before releasing the handle, which will ensure that a simultaneous
>>> TRBE IRQ could not happen.
>>>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>> This might cause problem with traditional sink devices which can be
>>> operated in both sysfs and perf mode. This needs to be addressed
>>> correctly. One option would be to move the update_buffer callback
>>> into the respective sink devices. e.g, disable().
>>>
>>>    drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> index 534e205..1a37991 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>>              size = sink_ops(sink)->update_buffer(sink, handle,
>>>                              event_data->snk_config);
>>> +        coresight_disable_path(path);
>>>            perf_aux_output_end(handle, size);
>>> +        return;
>>>        }
>>
>> As you mentioned, this is not ideal where another session could be triggered on
>> the sink from a different ETM (not for per-CPU sink) in a different mode before
>> you collect the buffer. I believe the best option is to leave the
>> update_buffer() to disable_hw. This would need to pass on the "handle" to the
>> disable_path.
> 
> Passing 'handle' into coresight_ops_sink->disable() would enable pushing
> updated trace data into perf aux buffer. But do you propose to drop the
> update_buffer() call back completely or just move it into disable() call
> back (along with PERF_EF_UPDATE mode check) for all individual sinks for
> now. May be, later it can be dropped off completely.

Yes, once we update the buffer from within the sink_ops->disable(), we don't
need the update buffer anymore. It is pointless to have a function that
is provided to the external user.

> 
>>
>> That way the races can be handled inside the sinks. Also, this aligns the
>> perf mode of the sinks with that of the sysfs mode.
> 
> Did not get that, could you please elaborate ?
> 

In sysfs mode, we already do an action similar to "update buffer" for all
the sinks. (e.g, see tmc_etr_sync_sysfs_buf() ). i.e, update the buffer
before the sink is disabled. That is the same we propose above.

Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-11-23 10:01         ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-23 10:01 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach

On 11/23/20 6:08 AM, Anshuman Khandual wrote:
> 
> 
> On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> perf handle structure needs to be shared with the TRBE IRQ handler for
>>> capturing trace data and restarting the handle. There is a probability
>>> of an undefined reference based crash when etm event is being stopped
>>> while a TRBE IRQ also getting processed. This happens due the release
>>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>>> link before releasing the handle, which will ensure that a simultaneous
>>> TRBE IRQ could not happen.
>>>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>> This might cause problem with traditional sink devices which can be
>>> operated in both sysfs and perf mode. This needs to be addressed
>>> correctly. One option would be to move the update_buffer callback
>>> into the respective sink devices. e.g, disable().
>>>
>>>    drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> index 534e205..1a37991 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>>              size = sink_ops(sink)->update_buffer(sink, handle,
>>>                              event_data->snk_config);
>>> +        coresight_disable_path(path);
>>>            perf_aux_output_end(handle, size);
>>> +        return;
>>>        }
>>
>> As you mentioned, this is not ideal where another session could be triggered on
>> the sink from a different ETM (not for per-CPU sink) in a different mode before
>> you collect the buffer. I believe the best option is to leave the
>> update_buffer() to disable_hw. This would need to pass on the "handle" to the
>> disable_path.
> 
> Passing 'handle' into coresight_ops_sink->disable() would enable pushing
> updated trace data into perf aux buffer. But do you propose to drop the
> update_buffer() call back completely or just move it into disable() call
> back (along with PERF_EF_UPDATE mode check) for all individual sinks for
> now. May be, later it can be dropped off completely.

Yes, once we update the buffer from within the sink_ops->disable(), we don't
need the update buffer anymore. It is pointless to have a function that
is provided to the external user.

> 
>>
>> That way the races can be handled inside the sinks. Also, this aligns the
>> perf mode of the sinks with that of the sysfs mode.
> 
> Did not get that, could you please elaborate ?
> 

In sysfs mode, we already do an action similar to "update buffer" for all
the sinks. (e.g, see tmc_etr_sync_sysfs_buf() ). i.e, update the buffer
before the sink is disabled. That is the same we propose above.

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
  2020-11-23  3:40       ` Anshuman Khandual
@ 2020-11-23 12:30         ` Mike Leach
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Leach @ 2020-11-23 12:30 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Tingwei Zhang, Coresight ML, Linux Kernel Mailing List, linux-arm-kernel

Hi Anshuman,

On Mon, 23 Nov 2020 at 03:40, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> Hello Mike,
>
> On 11/16/20 8:30 PM, Mike Leach wrote:
> > Hi Anshuman,
> >
> > I've not looked in detail at this set yet, but having skimmed through
> > it  I do have an initial question about the handling of wrapped data
> > buffers.
> >
> > With the ETR/ETB we found an issue with the way perf concatenated data
> > captured from the hardware buffer into a single contiguous data
> > block. The issue occurs when a wrapped buffer appears after another
> > buffer in the data file. In a typical session perf would stop trace
> > and copy the hardware buffer multiple times into the auxtrace buffer.
>
> The hardware buffer and perf aux trace buffer are the same for TRBE and
> hence there is no actual copy involved. Trace data gets pushed into the
> user space via perf_aux_output_end() either via etm_event_stop() or via
> the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
> happens via updates to perf aux buffer indices i.e head, tail, wake up.
> But logically, they will appear as a stream of records to the user space
> while parsing perf.data file.
>

Understood - I suspected this would use direct write to the aux trace
buffer, but the principle is the same. TRBE determines the location of
data in the buffer so even without a copy, it is possible to get
multiple TRBE "buffers" in the auxbuffer as the TRBE is stopped and
restarted. The later copy to userspace is independent of this.

> >
> > e.g.
> >
> > For ETR/ETB we have a fixed length hardware data buffer - and no way
> > of detecting buffer wraps using interrupts as the tracing is in
> > progress.
>
> TRBE has an interrupt. Hence there will be an opportunity to insert any
> additional packets if required to demarcate pre and post IRQ trace data
> streams.
>
> >
> > If the buffer is not full at the point that perf transfers it then the
> > data will look like this:-
> > 1) <async><synced trace data>
> > easy to decode, we can see the async at the start of the data - which
> > would be the async issued at the start of trace.
>
> Just curious, what makes the tracer to generate the <async> trace packet.
> Is there an explicit instruction or that is how the tracer starts when
> enabled ?

ETM / ETE will generate an async at the start of trace, and then
periodically afterwards.

>
> >
> > If the buffer wraps we see this:-
> >
> > 2) <unsynced trace data><async><synced trace data>
> >
> > Again no real issue, the decoder will skip to the async and trace from
> > there - we lose the unsynced data.
>
> Could you please elaborate more on the difference between sync and async
> trace data ?
>

The decoder will start reading trace from the start of the buffer.
Unsynced trace is trace data that appears before the first async
packet. We cannot decode this as we do not know where the packet
boundaries are.
Synced trace is any data after the first async packet - the async
enables us to determine where the packet boundaries are so we can now
determine the packets and decode the trace.

For an unwrapped buffer, we always see the first async that the ETE
generated when the trace generation was started. In a wrapped buffer
we search till we find an async generated as part of the periodic
async packets.

> >
> > Now the problem occurs when multiple transfers of data occur. We can
> > see the following appearing as contiguous trace in the auxtrace
> > buffer:-
> >
> > 3) < async><synced trace data><unsynced trace data><async><synced trace data>
>
> So there is an wrap around event between <synced trace data> and
> <unsynced trace data> ? Are there any other situations where this
> might happen ?

Not that I am aware of.

>
> >
> > Now the decoder cannot spot the point that the synced data from the
> > first capture ends, and the unsynced data from the second capture
> > begins.
>
> Got it.
>
> > This means it will continue to decode into the unsynced data - which
> > will result in incorrect trace / outright errors. To get round this
> > for ETR/ETB the driver will insert barrier packets into the datafile
> > if a wrap event is detected.
>
> But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
> is even detected ?

A bit in the status register tells us the buffer is full - i.e. the
write pointer has wrapped around to the location it started at.
We cannot tell how far, or if multiple wraps have occurred, just that
the event has occurred.

>
> >
> > 4) <async><synced trace data><barrier><unsynced trace
> > data><async><synced trace data>
> >
> > This <barrier> has the effect of resetting the decoder into the
> > unsynced state so that the invalid trace is not decoded. This is a
> > workaround we have to do to handle the limitations of the ETR / ETB
> > trace hardware.
> Got it.
>
> >
> > For TRBE we do have interrupts, so it should be possible to prevent
> > the buffer wrapping in most cases - but I did see in the code that
> > there are handlers for the TRBE buffer wrap management event. Are
> > there other factors in play that will prevent data pattern 3) from
> > appearing in the auxtrace buffer ?
>
> On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
> would assume that ETE will then start again with an <async> data packet
> first when the handler returns.

This would only occur if the ETE was stopped and flushed prior to the
wrap event. Does this happen? I am assuming that the sink is
independent from the ETE, as ETM are from ETR.

> Otherwise we might also have to insert
> a similar barrier packet for the user space tool to reset. As trace data
> should not get lost during an wrap event,

My understanding is that if a wrap has even occurred, then data is already lost.


> ETE should complete the packet
> after the handler returns, hence aux buffer should still have logically
> contiguous stream of <synced trace data> to decode. I am not sure right
> now, but will look into this.
>

So you are relying on backpressure to stop ETE emitting packets? This
could result in trace being lost due to overflow if the IRQ is not
handled sufficiently quickly/.

Regards

Mike

> - Anshuman


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
@ 2020-11-23 12:30         ` Mike Leach
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Leach @ 2020-11-23 12:30 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Coresight ML, Linux Kernel Mailing List, linux-arm-kernel, Tingwei Zhang

Hi Anshuman,

On Mon, 23 Nov 2020 at 03:40, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> Hello Mike,
>
> On 11/16/20 8:30 PM, Mike Leach wrote:
> > Hi Anshuman,
> >
> > I've not looked in detail at this set yet, but having skimmed through
> > it  I do have an initial question about the handling of wrapped data
> > buffers.
> >
> > With the ETR/ETB we found an issue with the way perf concatenated data
> > captured from the hardware buffer into a single contiguous data
> > block. The issue occurs when a wrapped buffer appears after another
> > buffer in the data file. In a typical session perf would stop trace
> > and copy the hardware buffer multiple times into the auxtrace buffer.
>
> The hardware buffer and perf aux trace buffer are the same for TRBE and
> hence there is no actual copy involved. Trace data gets pushed into the
> user space via perf_aux_output_end() either via etm_event_stop() or via
> the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
> happens via updates to perf aux buffer indices i.e head, tail, wake up.
> But logically, they will appear as a stream of records to the user space
> while parsing perf.data file.
>

Understood - I suspected this would use direct write to the aux trace
buffer, but the principle is the same. TRBE determines the location of
data in the buffer so even without a copy, it is possible to get
multiple TRBE "buffers" in the auxbuffer as the TRBE is stopped and
restarted. The later copy to userspace is independent of this.

> >
> > e.g.
> >
> > For ETR/ETB we have a fixed length hardware data buffer - and no way
> > of detecting buffer wraps using interrupts as the tracing is in
> > progress.
>
> TRBE has an interrupt. Hence there will be an opportunity to insert any
> additional packets if required to demarcate pre and post IRQ trace data
> streams.
>
> >
> > If the buffer is not full at the point that perf transfers it then the
> > data will look like this:-
> > 1) <async><synced trace data>
> > easy to decode, we can see the async at the start of the data - which
> > would be the async issued at the start of trace.
>
> Just curious, what makes the tracer to generate the <async> trace packet.
> Is there an explicit instruction or that is how the tracer starts when
> enabled ?

ETM / ETE will generate an async at the start of trace, and then
periodically afterwards.

>
> >
> > If the buffer wraps we see this:-
> >
> > 2) <unsynced trace data><async><synced trace data>
> >
> > Again no real issue, the decoder will skip to the async and trace from
> > there - we lose the unsynced data.
>
> Could you please elaborate more on the difference between sync and async
> trace data ?
>

The decoder will start reading trace from the start of the buffer.
Unsynced trace is trace data that appears before the first async
packet. We cannot decode this as we do not know where the packet
boundaries are.
Synced trace is any data after the first async packet - the async
enables us to determine where the packet boundaries are so we can now
determine the packets and decode the trace.

For an unwrapped buffer, we always see the first async that the ETE
generated when the trace generation was started. In a wrapped buffer
we search till we find an async generated as part of the periodic
async packets.

> >
> > Now the problem occurs when multiple transfers of data occur. We can
> > see the following appearing as contiguous trace in the auxtrace
> > buffer:-
> >
> > 3) < async><synced trace data><unsynced trace data><async><synced trace data>
>
> So there is an wrap around event between <synced trace data> and
> <unsynced trace data> ? Are there any other situations where this
> might happen ?

Not that I am aware of.

>
> >
> > Now the decoder cannot spot the point that the synced data from the
> > first capture ends, and the unsynced data from the second capture
> > begins.
>
> Got it.
>
> > This means it will continue to decode into the unsynced data - which
> > will result in incorrect trace / outright errors. To get round this
> > for ETR/ETB the driver will insert barrier packets into the datafile
> > if a wrap event is detected.
>
> But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
> is even detected ?

A bit in the status register tells us the buffer is full - i.e. the
write pointer has wrapped around to the location it started at.
We cannot tell how far, or if multiple wraps have occurred, just that
the event has occurred.

>
> >
> > 4) <async><synced trace data><barrier><unsynced trace
> > data><async><synced trace data>
> >
> > This <barrier> has the effect of resetting the decoder into the
> > unsynced state so that the invalid trace is not decoded. This is a
> > workaround we have to do to handle the limitations of the ETR / ETB
> > trace hardware.
> Got it.
>
> >
> > For TRBE we do have interrupts, so it should be possible to prevent
> > the buffer wrapping in most cases - but I did see in the code that
> > there are handlers for the TRBE buffer wrap management event. Are
> > there other factors in play that will prevent data pattern 3) from
> > appearing in the auxtrace buffer ?
>
> On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
> would assume that ETE will then start again with an <async> data packet
> first when the handler returns.

This would only occur if the ETE was stopped and flushed prior to the
wrap event. Does this happen? I am assuming that the sink is
independent from the ETE, as ETM are from ETR.

> Otherwise we might also have to insert
> a similar barrier packet for the user space tool to reset. As trace data
> should not get lost during an wrap event,

My understanding is that if a wrap has even occurred, then data is already lost.


> ETE should complete the packet
> after the handler returns, hence aux buffer should still have logically
> contiguous stream of <synced trace data> to decode. I am not sure right
> now, but will look into this.
>

So you are relying on backpressure to stop ETE emitting packets? This
could result in trace being lost due to overflow if the IRQ is not
handled sufficiently quickly/.

Regards

Mike

> - Anshuman


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
  2020-11-12 10:13     ` Suzuki K Poulose
@ 2020-11-25  5:25       ` Anshuman Khandual
  -1 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-25  5:25 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 3:43 PM, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>>   arch/arm64/include/asm/sysreg.h                  |   2 +
>>   drivers/hwtracing/coresight/Kconfig              |  11 +
>>   drivers/hwtracing/coresight/Makefile             |   1 +
>>   drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
>>   drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>>   6 files changed, 1341 insertions(+)
>>   create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 0000000..4320a8b
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,36 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Trace Buffer Extension (TRBE).
>> +==============================
>> +
>> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
>> +    :Date:     November 2020
>> +
>> +Hardware Description
>> +--------------------
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +Sysfs files and directories
>> +---------------------------
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +    >$ ls /sys/bus/coresight/devices
>> +    trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
>> +
>> +    >$ ls /sys/bus/coresight/devices/trbe0/
>> +    irq align dbm
>> +
>> +*Key file items are:-*
>> +   * ``irq``: TRBE maintenance interrupt number
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 14cb156..61136f6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>   #define SET_PSTATE_UAO(x)        __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
>>   #define SET_PSTATE_SSBS(x)        __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
>>   #define SET_PSTATE_TCO(x)        __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
>> +#define TSB_CSYNC            __emit_inst(0xd503225f)
>>     #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>>       __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
>> @@ -865,6 +866,7 @@
>>   #define ID_AA64MMFR2_CNP_SHIFT        0
>>     /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_TRBE_SHIFT        44
>>   #define ID_AA64DFR0_TRACE_FILT_SHIFT    40
>>   #define ID_AA64DFR0_DOUBLELOCK_SHIFT    36
>>   #define ID_AA64DFR0_PMSVER_SHIFT    32
>> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
>> index c119824..0f5e101 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>>         To compile this driver as a module, choose M here: the
>>         module will be called coresight-cti.
>>   +config CORESIGHT_TRBE
>> +    bool "Trace Buffer Extension (TRBE) driver"
>> +    depends on ARM64
>> +    help
>> +      This driver provides support for percpu Trace Buffer Extension (TRBE).
>> +      TRBE always needs to be used along with it's corresponding percpu ETE
>> +      component. ETE generates trace data which is then captured with TRBE.
>> +      Unlike traditional sink devices, TRBE is a CPU feature accessible via
>> +      system registers. But it's explicit dependency with trace unit (ETE)
>> +      requires it to be plugged in as a coresight sink device.
>> +
>>   config CORESIGHT_CTI_INTEGRATION_REGS
>>       bool "Access CTI CoreSight Integration Registers"
>>       depends on CORESIGHT_CTI
>> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
>> index f20e357..d608165 100644
>> --- a/drivers/hwtracing/coresight/Makefile
>> +++ b/drivers/hwtracing/coresight/Makefile
>> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>>   obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>>   obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>>   obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
>> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>>   coresight-cti-y := coresight-cti-core.o    coresight-cti-platform.o \
>>              coresight-cti-sysfs.o
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> new file mode 100644
>> index 0000000..48a8ec3
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -0,0 +1,766 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
>> + * sink device could then pair with an appropriate per-cpu coresight source
>> + * device (ETE) thus generating required trace data. Trace can be enabled
>> + * via the perf framework.
>> + *
>> + * Copyright (C) 2020 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#define DRVNAME "arm_trbe"
>> +
>> +#define pr_fmt(fmt) DRVNAME ": " fmt
>> +
>> +#include "coresight-trbe.h"
>> +
>> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
>> +
>> +#define ETE_IGNORE_PACKET 0x70
> 
> Add a comment here, on what this means to the decoder.

Sure, will add.

> 
>> +
>> +static const char trbe_name[] = "trbe";
> 
> Why not
> 
> #define DEVNAME    "trbe"

That can be replaced but we already define DRVNAME which gets used for
naming the TRBE interrupt that shows up in /proc/interrupts. But it is
"arm_trbe" instead. Should /sys/bus/coresight/devices/ list TRBE devices
as "arm_trbeN" ? If so, DRVNAME can be used without any problem. Should
DRVNAME be changed to just "trbe" instead ? But it makes sense to have
the same name for TRBE devices and the interrupt.

> 
> 
>> +
>> +enum trbe_fault_action {
>> +    TRBE_FAULT_ACT_WRAP,
>> +    TRBE_FAULT_ACT_SPURIOUS,
>> +    TRBE_FAULT_ACT_FATAL,
>> +};
>> +
>> +struct trbe_perf {
> 
> Please rename this to trbe_buf. This will be used for sysfs mode as well.

Sure, will do.

> 
>> +    unsigned long trbe_base;
>> +    unsigned long trbe_limit;
>> +    unsigned long trbe_write;
>> +    pid_t pid;
> 
> Why do we need this ? This seems unused and moreover, there cannot
> be multiple tracers into TRBE. So, we don't need to share the sink
> unlike the traditional ones.

Sure, will drop.

> 
>> +    int nr_pages;
>> +    void **pages;
>> +    bool snapshot;
>> +    struct trbe_cpudata *cpudata;
>> +};
>> +
>> +struct trbe_cpudata {
>> +    struct coresight_device    *csdev;
>> +    bool trbe_dbm;
> 
> Why do we need this ?

This is an internal implementation characteristic which should be
presented to the user space via sysfs for better understanding and
probably for debug purpose. The current proposal does not support
the scenario when TRBE DBM is off, which we need to incorporate
later on. Hence lets just leave this as is for now.

> 
>> +    u64 trbe_align;
>> +    int cpu;
>> +    enum cs_mode mode;
>> +    struct trbe_perf *perf;
>> +    struct trbe_drvdata *drvdata;
>> +};
>> +
>> +struct trbe_drvdata {
>> +    struct trbe_cpudata __percpu *cpudata;
>> +    struct perf_output_handle __percpu *handle;
> 
> Shouldn't this be :
> 
>     struct perf_output_handle __percpu **handle ?
> 
> as we get a handle from the etm-perf and is not controlled by
> the TRBE ?

Sure, will change this.

> 
>> +    struct hlist_node hotplug_node;
>> +    int irq;
>> +    cpumask_t supported_cpus;
>> +    enum cpuhp_state trbe_online;
>> +    struct platform_device *pdev;
>> +    struct clk *atclk;
> 
> We don't have any clocks for the TRBE instance. Please remove.

Sure, will drop.

> 
>> +};
>> +
>> +static int trbe_alloc_node(struct perf_event *event)
>> +{
>> +    if (event->cpu == -1)
>> +        return NUMA_NO_NODE;
>> +    return cpu_to_node(event->cpu);
>> +}
>> +
>> +static void trbe_disable_and_drain_local(void)
>> +{
>> +    write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>> +    isb();
>> +    dsb(nsh);
>> +    asm(TSB_CSYNC);
>> +}
>> +
>> +static void trbe_reset_local(void)
>> +{
>> +    trbe_disable_and_drain_local();
>> +    write_sysreg_s(0, SYS_TRBPTR_EL1);
>> +    isb();
>> +
>> +    write_sysreg_s(0, SYS_TRBBASER_EL1);
>> +    isb();
>> +
>> +    write_sysreg_s(0, SYS_TRBSR_EL1);
>> +    isb();
>> +}
>> +
>> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    u64 head = PERF_IDX2OFF(handle->head, perf);
>> +
>> +    memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
>> +    if (!perf->snapshot)
>> +        perf_aux_output_skip(handle, len);
>> +}
>> +
>> +static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    u64 head = PERF_IDX2OFF(handle->head, perf);
>> +    u64 limit = perf->nr_pages * PAGE_SIZE;
>> +
> 
> So we are using half of the buffer for snapshot mode to avoid a case where the
> analyzer is unable to decode the trace in case of an overflow.

Right.

> 
>> +    if (head < limit >> 1)
>> +        limit >>= 1;
> 
> Also this needs to be thought out. We may not need this restriction. The trace decoder
> will be able to walk forward and then find a synchronization packet and then continue
> the tracing from there. So, we could use the entire buffer for TRBE.

Okay. May be we could just go with half the TRBE buffer for now and
later on, use the entire buffer after better understanding on this ?

> 
> 
>> +
>> +    return limit;
>> +}
>> +
>> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    struct trbe_cpudata *cpudata = perf->cpudata;
>> +    const u64 bufsize = perf->nr_pages * PAGE_SIZE;
>> +    u64 limit = bufsize;
>> +    u64 head, tail, wakeup;
>> +
> 
> Commentary please.

Sure, will add some.

> 
>> +    head = PERF_IDX2OFF(handle->head, perf);
>> +    if (!IS_ALIGNED(head, cpudata->trbe_align)) {
>> +        unsigned long delta = roundup(head, cpudata->trbe_align) - head;
>> +
>> +        delta = min(delta, handle->size);
>> +        trbe_pad_buf(handle, delta);
>> +        head = PERF_IDX2OFF(handle->head, perf);
>> +    }
>> +
>> +    if (!handle->size) {
>> +        perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +        return 0;
>> +    }
>> +
>> +    tail = PERF_IDX2OFF(handle->head + handle->size, perf);
>> +    wakeup = PERF_IDX2OFF(handle->wakeup, perf);
>> +
> 
>> +    if (head < tail)
> 
>  comment
> 
>> +        limit = round_down(tail, PAGE_SIZE);
>> +
>> +    if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
>> +        limit = min(limit, round_up(wakeup, PAGE_SIZE));
> 
> comment. Also do we need an alignement to PAGE_SIZE ?

Limit has to be always PAGE_SIZE aligned because its eventually going
to be the TRBE limit pointer, after getting added into the TRBE base
pointer. Will add some more comment here as well.

> 
>> +
>> +    if (limit > head)
>> +        return limit;
>> +
>> +    trbe_pad_buf(handle, handle->size);
>> +    perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +    return 0;
>> +}
>> +
>> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    unsigned long offset;
>> +
>> +    if (perf->snapshot)
>> +        offset = trbe_snapshot_offset(handle);
>> +    else
>> +        offset = trbe_normal_offset(handle);
>> +    return perf->trbe_base + offset;
>> +}
>> +
>> +static void trbe_enable_hw(struct trbe_perf *perf)
>> +{
>> +    WARN_ON(perf->trbe_write < perf->trbe_base);
>> +    WARN_ON(perf->trbe_write >= perf->trbe_limit);
>> +    set_trbe_disabled();
>> +    clr_trbe_irq();
>> +    clr_trbe_wrap();
>> +    clr_trbe_abort();
>> +    clr_trbe_ec();
>> +    clr_trbe_bsc();
>> +    clr_trbe_fsc();
> 
> Please merge all of these field updates to single register update
> unless mandated by the architecture.

Sure, will do.

> 
>> +    set_trbe_virtual_mode();
>> +    set_trbe_fill_mode(TRBE_FILL_STOP);
>> +    set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
> 
> Same here ^^

Sure, will do.

> 
>> +    isb();
>> +    set_trbe_base_pointer(perf->trbe_base);
>> +    set_trbe_limit_pointer(perf->trbe_limit);
>> +    set_trbe_write_pointer(perf->trbe_write);
>> +    isb();
>> +    dsb(ishst);
>> +    flush_tlb_all();
> 
> Why is this needed ?

Will drop flush_tlb_all().

> 
>> +    set_trbe_running();
>> +    set_trbe_enabled();
>> +    asm(TSB_CSYNC);
>> +}
>> +
>> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> +                   struct perf_event *event, void **pages,
>> +                   int nr_pages, bool snapshot)
>> +{
>> +    struct trbe_perf *perf;
>> +    struct page **pglist;
>> +    int i;
>> +
>> +    if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
> 
> We may be able to remove the restriction on snapshot mode, see my comment
> above.

Sure, will drop when the entire buffer is used for the snapshot mode.

> 
>> +        return NULL;
>> +
>> +    perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
>> +    if (IS_ERR(perf))
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
>> +    if (IS_ERR(pglist)) {
>> +        kfree(perf);
>> +        return ERR_PTR(-ENOMEM);
>> +    }
>> +
>> +    for (i = 0; i < nr_pages; i++)
>> +        pglist[i] = virt_to_page(pages[i]);
>> +
>> +    perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
>> +    if (IS_ERR((void *) perf->trbe_base)) {
>> +        kfree(pglist);
>> +        kfree(perf);
>> +        return ERR_PTR(perf->trbe_base);
>> +    }
>> +    perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
>> +    perf->trbe_write = perf->trbe_base;
>> +    perf->pid = task_pid_nr(event->owner);
>> +    perf->snapshot = snapshot;
>> +    perf->nr_pages = nr_pages;
>> +    perf->pages = pages;
>> +    kfree(pglist);
>> +    return perf;
>> +}
>> +
>> +void arm_trbe_free_buffer(void *config)
>> +{
>> +    struct trbe_perf *perf = config;
>> +
>> +    vunmap((void *) perf->trbe_base);
>> +    kfree(perf);
>> +}
>> +
>> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> +                        struct perf_output_handle *handle,
>> +                        void *config)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct trbe_perf *perf = config;
>> +    unsigned long size, offset;
>> +
>> +    WARN_ON(perf->cpudata != cpudata);
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(cpudata->mode != CS_MODE_PERF);
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    offset = get_trbe_write_pointer() - get_trbe_base_pointer();
>> +    size = offset - PERF_IDX2OFF(handle->head, perf);
>> +    if (perf->snapshot)
>> +        handle->head += size;
>> +    return size;
>> +}
>> +
>> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct perf_output_handle *handle = data;
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(mode != CS_MODE_PERF);
> 
> Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure
> the mode is DISABLED (when you get to sysfs mode).
> 
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    *this_cpu_ptr(drvdata->handle) = *handle;
> 
> That is wrong. Storing a local copy of a global perf generic structure
> is calling for trouble, assuming that the global structure doesn't change
> beneath us. Please store handle ptr.

Sure, will change.

> 
>> +    cpudata->perf = perf;
>> +    cpudata->mode = mode;
>> +    perf->cpudata = cpudata;
>> +    perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return 0;
>> +    }
>> +    trbe_enable_hw(perf);
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_disable(struct coresight_device *csdev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct trbe_perf *perf = cpudata->perf;
>> +
>> +    WARN_ON(perf->cpudata != cpudata);
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(cpudata->mode != CS_MODE_PERF);
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    trbe_disable_and_drain_local();
>> +    perf->cpudata = NULL;
>> +    cpudata->perf = NULL;
>> +    cpudata->mode = CS_MODE_DISABLED;
>> +    return 0;
>> +}
>> +
>> +static void trbe_handle_fatal(struct perf_output_handle *handle)
>> +{
>> +    perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +    perf_aux_output_end(handle, 0);
>> +    trbe_disable_and_drain_local();
>> +}
>> +
>> +static void trbe_handle_spurious(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +
>> +    perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    trbe_enable_hw(perf);
>> +}
>> +
>> +static void trbe_handle_overflow(struct perf_output_handle *handle)
>> +{
>> +    struct perf_event *event = handle->event;
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    unsigned long offset, size;
>> +    struct etm_event_data *event_data;
>> +
>> +    offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
>> +    size = offset - PERF_IDX2OFF(handle->head, perf);
>> +    if (perf->snapshot)
>> +        handle->head = offset;
> 
> Is this correct ? Or was this supposed to mean :
>         handle->head += offset;

Hmm, not too sure about this but the SPE driver does the same in
arm_spe_perf_aux_output_end().

> 
> 
>> +    perf_aux_output_end(handle, size);
>> +
>> +    event_data = perf_aux_output_begin(handle, event);
>> +    if (!event_data) {
>> +        event->hw.state |= PERF_HES_STOPPED;
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    perf->trbe_write = perf->trbe_base;
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
>> +    trbe_enable_hw(perf);
>> +}
>> +
>> +static bool is_perf_trbe(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    struct trbe_cpudata *cpudata = perf->cpudata;
>> +    struct trbe_drvdata *drvdata = cpudata->drvdata;
> 
> Can you trust the cpudata ptr here as we are still verifying
> if this was legitimate ?

It verifies the legitimacy of the interrupt as being generated from
an active perf session on the cpu with some simple sanity checks.
But all data structure linkage should be intact. The perf handle
originates from the drvdata percpu structure which should have a
trbe_perf and everything flows from there.

> 
>> +    int cpu = smp_processor_id();
>> +
>> +    WARN_ON(perf->trbe_base != get_trbe_base_pointer());
>> +    WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
>> +
>> +    if (cpudata->mode != CS_MODE_PERF)
>> +        return false;
>> +
>> +    if (cpudata->cpu != cpu)
>> +        return false;
>> +
>> +    if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
>> +{
>> +    enum trbe_ec ec = get_trbe_ec();
>> +    enum trbe_bsc bsc = get_trbe_bsc();
>> +
>> +    WARN_ON(is_trbe_running());
>> +    asm(TSB_CSYNC);
>> +    dsb(nsh);
>> +    isb();
>> +
>> +    if (is_trbe_trg() || is_trbe_abort())
>> +        return TRBE_FAULT_ACT_FATAL;
>> +
>> +    if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>> +        return TRBE_FAULT_ACT_FATAL;
>> +
>> +    if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> +        if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> +            return TRBE_FAULT_ACT_WRAP;
>> +    }
>> +    return TRBE_FAULT_ACT_SPURIOUS;
>> +}
>> +
>> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> +{
>> +    struct perf_output_handle *handle = dev;
>> +    enum trbe_fault_action act;
>> +
>> +    WARN_ON(!is_trbe_irq());
>> +    clr_trbe_irq();
>> +
>> +    if (!perf_get_aux(handle))
>> +        return IRQ_NONE;
>> +
>> +    if (!is_perf_trbe(handle))
>> +        return IRQ_NONE;
>> +
>> +    irq_work_run();
>> +
>> +    act = trbe_get_fault_act(handle);
>> +    switch (act) {
>> +    case TRBE_FAULT_ACT_WRAP:
>> +        trbe_handle_overflow(handle);
>> +        break;
>> +    case TRBE_FAULT_ACT_SPURIOUS:
>> +        trbe_handle_spurious(handle);
>> +        break;
>> +    case TRBE_FAULT_ACT_FATAL:
>> +        trbe_handle_fatal(handle);
>> +        break;
>> +    }
>> +    return IRQ_HANDLED;
>> +}
>> +
> 
> 
>> +static void arm_trbe_probe_coresight_cpu(void *info)
>> +{
>> +    struct trbe_cpudata *cpudata = info;
>> +    struct device *dev = &cpudata->drvdata->pdev->dev;
>> +    struct coresight_desc desc = { 0 };
>> +
>> +    if (WARN_ON(!cpudata))
>> +        goto cpu_clear;
>> +
>> +    if (!is_trbe_available()) {
>> +        pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +
>> +    if (!is_trbe_programmable()) {
>> +        pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +    desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
>> +    if (IS_ERR(desc.name))
>> +        goto cpu_clear;
>> +
>> +    desc.type = CORESIGHT_DEV_TYPE_SINK;
>> +    desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
> 
> May be should add a new subtype to make this higher priority than the normal ETR.
> Something like :
> 
>     CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM

Sure, will do.

> 
>> +    desc.ops = &arm_trbe_cs_ops;
>> +    desc.pdata = dev_get_platdata(dev);
>> +    desc.groups = arm_trbe_groups;
>> +    desc.dev = dev;
>> +    cpudata->csdev = coresight_register(&desc);
>> +    if (IS_ERR(cpudata->csdev))
>> +        goto cpu_clear;
>> +
>> +    dev_set_drvdata(&cpudata->csdev->dev, cpudata);
>> +    cpudata->trbe_dbm = get_trbe_flag_update();
>> +    cpudata->trbe_align = 1ULL << get_trbe_address_align();
>> +    if (cpudata->trbe_align > SZ_2K) {
>> +        pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +    return;
>> +cpu_clear:
>> +    cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
>> +}
>> +
>> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
>> +{
>> +    struct trbe_cpudata *cpudata;
>> +    int cpu;
>> +
>> +    drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
>> +    if (IS_ERR(drvdata->cpudata))
>> +        return PTR_ERR(drvdata->cpudata);
>> +
>> +    for_each_cpu(cpu, &drvdata->supported_cpus) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        cpudata->cpu = cpu;
>> +        cpudata->drvdata = drvdata;
>> +        smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> 
> We could batch it and run it on all CPUs at the same time ? Also it would be better to
> leave the per_cpu area filled by the CPU itself, to avoid racing.

Sure, will re-organize the entire CPU probing/removal and also the CPU
online/offline path. Planning to use smp_call_function_many() instead
for a simultaneous init. 

> 
> 
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_coresight_cpu(void *info)
>> +{
>> +    struct trbe_drvdata *drvdata = info;
>> +
>> +    disable_percpu_irq(drvdata->irq);
>> +}
>> +
>> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
>> +{
>> +    struct trbe_cpudata *cpudata;
>> +    int cpu;
>> +
>> +    for_each_cpu(cpu, &drvdata->supported_cpus) {
>> +        smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (cpudata->csdev) {
>> +            coresight_unregister(cpudata->csdev);
>> +            cpudata->drvdata = NULL;
>> +            cpudata->csdev = NULL;
>> +        }
> 
> Please leave this to the CPU to do the part.

Sure, will do.

> 
>> +    }
>> +    free_percpu(drvdata->cpudata);
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
>> +{
>> +    struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
>> +    struct trbe_cpudata *cpudata;
>> +
>> +    if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (!cpudata->csdev) {
>> +            cpudata->drvdata = drvdata;
>> +            smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> 
> Why do we need smp_call here ? We are already on the CPU.

We dont need, will drop.

> 
>> +        }
>> +        trbe_reset_local();
>> +        enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
>> +{
>> +    struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
>> +    struct trbe_cpudata *cpudata;
>> +
>> +    if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (cpudata->csdev) {
>> +            coresight_unregister(cpudata->csdev);
>> +            cpudata->drvdata = NULL;
>> +            cpudata->csdev = NULL;
>> +        }
>> +        disable_percpu_irq(drvdata->irq);
>> +        trbe_reset_local();
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
>> +{
>> +    enum cpuhp_state trbe_online;
>> +
>> +    trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
>> +                    arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
>> +    if (trbe_online < 0)
>> +        return -EINVAL;
>> +
>> +    if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
>> +        return -EINVAL;
>> +
>> +    drvdata->trbe_online = trbe_online;
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
>> +{
>> +    cpuhp_remove_multi_state(drvdata->trbe_online);
>> +}
>> +
>> +static int arm_trbe_probe_irq(struct platform_device *pdev,
>> +                  struct trbe_drvdata *drvdata)
>> +{
>> +    drvdata->irq = platform_get_irq(pdev, 0);
>> +    if (!drvdata->irq) {
>> +        pr_err("IRQ not found for the platform device\n");
>> +        return -ENXIO;
>> +    }
>> +
>> +    if (!irq_is_percpu(drvdata->irq)) {
>> +        pr_err("IRQ is not a PPI\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
>> +        return -EINVAL;
>> +
>> +    drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
>> +    if (!drvdata->handle)
>> +        return -ENOMEM;
>> +
>> +    if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
>> +        free_percpu(drvdata->handle);
>> +        return -EINVAL;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
>> +{
>> +    free_percpu_irq(drvdata->irq, drvdata->handle);
>> +    free_percpu(drvdata->handle);
>> +}
>> +
>> +static int arm_trbe_device_probe(struct platform_device *pdev)
>> +{
>> +    struct coresight_platform_data *pdata;
>> +    struct trbe_drvdata *drvdata;
>> +    struct device *dev = &pdev->dev;
>> +    int ret;
>> +
>> +    drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>> +    if (IS_ERR(drvdata))
>> +        return -ENOMEM;
>> +
>> +    pdata = coresight_get_platform_data(dev);
>> +    if (IS_ERR(pdata)) {
>> +        kfree(drvdata);
>> +        return -ENOMEM;
>> +    }
> 
> 
>> +
>> +    drvdata->atclk = devm_clk_get(dev, "atclk");
>> +    if (!IS_ERR(drvdata->atclk)) {
>> +        ret = clk_prepare_enable(drvdata->atclk);
>> +        if (ret)
>> +            return ret;
>> +    }
> 
> Please drop the clocks, we don't have any

Right, will drop the clock and also the power management support
along with it.

> 
>> +    dev_set_drvdata(dev, drvdata);
>> +    dev->platform_data = pdata;
>> +    drvdata->pdev = pdev;
>> +    ret = arm_trbe_probe_irq(pdev, drvdata);
>> +    if (ret)
>> +        goto irq_failed;
>> +
>> +    ret = arm_trbe_probe_coresight(drvdata);
>> +    if (ret)
>> +        goto probe_failed;
>> +
>> +    ret = arm_trbe_probe_cpuhp(drvdata);
>> +    if (ret)
>> +        goto cpuhp_failed;
>> +
>> +    return 0;
>> +cpuhp_failed:
>> +    arm_trbe_remove_coresight(drvdata);
>> +probe_failed:
>> +    arm_trbe_remove_irq(drvdata);
>> +irq_failed:
>> +    kfree(pdata);
>> +    kfree(drvdata);
>> +    return ret;
>> +}
>> +
>> +static int arm_trbe_device_remove(struct platform_device *pdev)
>> +{
>> +    struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
>> +    struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
>> +
>> +    arm_trbe_remove_coresight(drvdata);
>> +    arm_trbe_remove_cpuhp(drvdata);
>> +    arm_trbe_remove_irq(drvdata);
>> +    kfree(pdata);
>> +    kfree(drvdata);
>> +    return 0;
>> +}
>> +
>> +#ifdef CONFIG_PM
>> +static int arm_trbe_runtime_suspend(struct device *dev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
>> +
>> +    if (drvdata && !IS_ERR(drvdata->atclk))
>> +        clk_disable_unprepare(drvdata->atclk);
>> +
> 
> Remove. We may need to save/restore the TRBE ptrs, depending on the
> TRBE.

Will drop it for now. Could revisit this later after the base
functionality is up and running.

> 
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_runtime_resume(struct device *dev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
>> +
>> +    if (drvdata && !IS_ERR(drvdata->atclk))
>> +        clk_prepare_enable(drvdata->atclk);
> 
> Remove. See above.
> 
>> +
>> +    return 0;
>> +}
>> +#endif
>> +
>> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
>> +    SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
>> +};
>> +
>> +static const struct of_device_id arm_trbe_of_match[] = {
>> +    { .compatible = "arm,arm-trbe",    .data = (void *)1 },
>> +    {},
>> +};
> 
> I think it is better to call this, we have too many acronyms ;-)
> 
>     "arm,trace-buffer-extension"

Sure, will change.

> 
>> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
> 
>> +
>> +static const struct platform_device_id arm_trbe_match[] = {
>> +    { "arm,trbe", 0},
>> +    { }
>> +};
>> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
> 
> Please remove. The ACPI part can be added when we get to it.

Sure, will drop for now.

> 
>> +
>> +static struct platform_driver arm_trbe_driver = {
>> +    .id_table = arm_trbe_match,
>> +    .driver    = {
>> +        .name = DRVNAME,
>> +        .of_match_table = of_match_ptr(arm_trbe_of_match),
>> +        .pm = &arm_trbe_dev_pm_ops,
>> +        .suppress_bind_attrs = true,
>> +    },
>> +    .probe    = arm_trbe_device_probe,
>> +    .remove    = arm_trbe_device_remove,
>> +};
>> +builtin_platform_driver(arm_trbe_driver)
> 
> Please make this modular.

Will do.

> 
> 
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
>> new file mode 100644
>> index 0000000..82ffbfc
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
>> @@ -0,0 +1,525 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * This contains all required hardware related helper functions for
>> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
>> + *
>> + * Copyright (C) 2020 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#include <linux/coresight.h>
>> +#include <linux/device.h>
>> +#include <linux/irq.h>
>> +#include <linux/kernel.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/smp.h>
>> +
>> +#include "coresight-etm-perf.h"
>> +
>> +static inline bool is_trbe_available(void)
>> +{
>> +    u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>> +    int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
>> +
>> +    return trbe >= 0b0001;
>> +}
>> +
>> +static inline bool is_ete_available(void)
>> +{
>> +    u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>> +    int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
>> +
>> +    return (tracever != 0b0000);
> 
> Why is this needed ?

Sure, will drop.

> 
>> +}
>> +
>> +static inline bool is_trbe_enabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return trblimitr & TRBLIMITR_ENABLE;
>> +}
>> +
>> +enum trbe_ec {
>> +    TRBE_EC_OTHERS        = 0,
>> +    TRBE_EC_STAGE1_ABORT    = 36,
>> +    TRBE_EC_STAGE2_ABORT    = 37,
>> +};
>> +
>> +static const char *const trbe_ec_str[] = {
>> +    [TRBE_EC_OTHERS]    = "Maintenance exception",
>> +    [TRBE_EC_STAGE1_ABORT]    = "Stage-1 exception",
>> +    [TRBE_EC_STAGE2_ABORT]    = "Stage-2 exception",
>> +};
>> +
> 
> Please remove the defintions that are not used by the driver.

Sure, will do.

> 
>> +static inline enum trbe_ec get_trbe_ec(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
>> +}
>> +
>> +static inline void clr_trbe_ec(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
>> +    write_sysreg_s(trbsr, SYS_TRBSR_EL1);
>> +}
>> +
>> +enum trbe_bsc {
>> +    TRBE_BSC_NOT_STOPPED    = 0,
>> +    TRBE_BSC_FILLED        = 1,
>> +    TRBE_BSC_TRIGGERED    = 2,
>> +};
>> +
>> +static const char *const trbe_bsc_str[] = {
>> +    [TRBE_BSC_NOT_STOPPED]    = "TRBE collection not stopped",
>> +    [TRBE_BSC_FILLED]    = "TRBE filled",
>> +    [TRBE_BSC_TRIGGERED]    = "TRBE triggered",
>> +};
>> +
>> +static inline enum trbe_bsc get_trbe_bsc(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
>> +}
>> +
>> +static inline void clr_trbe_bsc(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
>> +    write_sysreg_s(trbsr, SYS_TRBSR_EL1);
>> +}
>> +
>> +enum trbe_fsc {
>> +    TRBE_FSC_ASF_LEVEL0    = 0,
>> +    TRBE_FSC_ASF_LEVEL1    = 1,
>> +    TRBE_FSC_ASF_LEVEL2    = 2,
>> +    TRBE_FSC_ASF_LEVEL3    = 3,
>> +    TRBE_FSC_TF_LEVEL0    = 4,
>> +    TRBE_FSC_TF_LEVEL1    = 5,
>> +    TRBE_FSC_TF_LEVEL2    = 6,
>> +    TRBE_FSC_TF_LEVEL3    = 7,
>> +    TRBE_FSC_AFF_LEVEL0    = 8,
>> +    TRBE_FSC_AFF_LEVEL1    = 9,
>> +    TRBE_FSC_AFF_LEVEL2    = 10,
>> +    TRBE_FSC_AFF_LEVEL3    = 11,
>> +    TRBE_FSC_PF_LEVEL0    = 12,
>> +    TRBE_FSC_PF_LEVEL1    = 13,
>> +    TRBE_FSC_PF_LEVEL2    = 14,
>> +    TRBE_FSC_PF_LEVEL3    = 15,
>> +    TRBE_FSC_SEA_WRITE    = 16,
>> +    TRBE_FSC_ASEA_WRITE    = 17,
>> +    TRBE_FSC_SEA_LEVEL0    = 20,
>> +    TRBE_FSC_SEA_LEVEL1    = 21,
>> +    TRBE_FSC_SEA_LEVEL2    = 22,
>> +    TRBE_FSC_SEA_LEVEL3    = 23,
>> +    TRBE_FSC_ALIGN_FAULT    = 33,
>> +    TRBE_FSC_TLB_FAULT    = 48,
>> +    TRBE_FSC_ATOMIC_FAULT    = 49,
>> +};
> 
> Please remove ^^^

Sure, will do.

> 
>> +
>> +static const char *const trbe_fsc_str[] = {
>> +    [TRBE_FSC_ASF_LEVEL0]    = "Address size fault - level 0",
>> +    [TRBE_FSC_ASF_LEVEL1]    = "Address size fault - level 1",
>> +    [TRBE_FSC_ASF_LEVEL2]    = "Address size fault - level 2",
>> +    [TRBE_FSC_ASF_LEVEL3]    = "Address size fault - level 3",
>> +    [TRBE_FSC_TF_LEVEL0]    = "Translation fault - level 0",
>> +    [TRBE_FSC_TF_LEVEL1]    = "Translation fault - level 1",
>> +    [TRBE_FSC_TF_LEVEL2]    = "Translation fault - level 2",
>> +    [TRBE_FSC_TF_LEVEL3]    = "Translation fault - level 3",
>> +    [TRBE_FSC_AFF_LEVEL0]    = "Access flag fault - level 0",
>> +    [TRBE_FSC_AFF_LEVEL1]    = "Access flag fault - level 1",
>> +    [TRBE_FSC_AFF_LEVEL2]    = "Access flag fault - level 2",
>> +    [TRBE_FSC_AFF_LEVEL3]    = "Access flag fault - level 3",
>> +    [TRBE_FSC_PF_LEVEL0]    = "Permission fault - level 0",
>> +    [TRBE_FSC_PF_LEVEL1]    = "Permission fault - level 1",
>> +    [TRBE_FSC_PF_LEVEL2]    = "Permission fault - level 2",
>> +    [TRBE_FSC_PF_LEVEL3]    = "Permission fault - level 3",
>> +    [TRBE_FSC_SEA_WRITE]    = "Synchronous external abort on write",
>> +    [TRBE_FSC_ASEA_WRITE]    = "Asynchronous external abort on write",
>> +    [TRBE_FSC_SEA_LEVEL0]    = "Syncrhonous external abort on table walk - level 0",
>> +    [TRBE_FSC_SEA_LEVEL1]    = "Syncrhonous external abort on table walk - level 1",
>> +    [TRBE_FSC_SEA_LEVEL2]    = "Syncrhonous external abort on table walk - level 2",
>> +    [TRBE_FSC_SEA_LEVEL3]    = "Syncrhonous external abort on table walk - level 3",
>> +    [TRBE_FSC_ALIGN_FAULT]    = "Alignment fault",
>> +    [TRBE_FSC_TLB_FAULT]    = "TLB conflict fault",
>> +    [TRBE_FSC_ATOMIC_FAULT]    = "Atmoc fault",
>> +};
>>
> 
> Please remove ^^^

Sure, will do.

> 
>>
> 
>> +enum trbe_address_mode {
>> +    TRBE_ADDRESS_VIRTUAL,
>> +    TRBE_ADDRESS_PHYSICAL,
>> +};
> 
> #define please.
> 
>> +
>> +static const char *const trbe_address_mode_str[] = {
>> +    [TRBE_ADDRESS_VIRTUAL]    = "Address mode - virtual",
>> +    [TRBE_ADDRESS_PHYSICAL]    = "Address mode - physical",
>> +};
> 
> Do we need this ? We always use virtual.
> 
>> +
>> +static inline bool is_trbe_virtual_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return !(trblimitr & TRBLIMITR_NVM);
>> +}
>> +
> 
> Remove

Sure, will do.

> 
>> +static inline bool is_trbe_physical_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return trblimitr & TRBLIMITR_NVM;
>> +}
> 
> Remove

Sure, will do.

> 
>> +
>> +static inline void set_trbe_virtual_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~TRBLIMITR_NVM;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
> 
>> +static inline void set_trbe_physical_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr |= TRBLIMITR_NVM;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
> 
> Remove

Sure, will do.

> 
>> +
>> +enum trbe_trig_mode {
>> +    TRBE_TRIGGER_STOP    = 0,
>> +    TRBE_TRIGGER_IRQ    = 1,
>> +    TRBE_TRIGGER_IGNORE    = 3,
>> +};
>> +
>> +static const char *const trbe_trig_mode_str[] = {
>> +    [TRBE_TRIGGER_STOP]    = "Trigger mode - stop",
>> +    [TRBE_TRIGGER_IRQ]    = "Trigger mode - irq",
>> +    [TRBE_TRIGGER_IGNORE]    = "Trigger mode - ignore",
>> +};
>> +
>> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
>> +}
>> +
>> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
>> +    trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +enum trbe_fill_mode {
>> +    TRBE_FILL_STOP        = 0,
>> +    TRBE_FILL_WRAP        = 1,
>> +    TRBE_FILL_CIRCULAR    = 3,
>> +};
>> +
> 
> Please use #define

These are predefined constrained values which kind of makes them
a set. An enumeration seems to be a better representation.

> 
>> +static const char *const trbe_fill_mode_str[] = {
>> +    [TRBE_FILL_STOP]    = "Buffer mode - stop",
>> +    [TRBE_FILL_WRAP]    = "Buffer mode - wrap",
>> +    [TRBE_FILL_CIRCULAR]    = "Buffer mode - circular",
>> +};
>> +
>> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
>> +}
>> +
>> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
>> +    trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline void set_trbe_disabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~TRBLIMITR_ENABLE;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline void set_trbe_enabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr |= TRBLIMITR_ENABLE;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline bool get_trbe_flag_update(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return trbidr & TRBIDR_FLAG;
>> +}
>> +
>> +static inline bool is_trbe_programmable(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return !(trbidr & TRBIDR_PROG);
>> +}
>> +#
>> +enum trbe_buffer_align {
>> +    TRBE_BUFFER_BYTE,
>> +    TRBE_BUFFER_HALF_WORD,
>> +    TRBE_BUFFER_WORD,
>> +    TRBE_BUFFER_DOUBLE_WORD,
>> +    TRBE_BUFFER_16_BYTES,
>> +    TRBE_BUFFER_32_BYTES,
>> +    TRBE_BUFFER_64_BYTES,
>> +    TRBE_BUFFER_128_BYTES,
>> +    TRBE_BUFFER_256_BYTES,
>> +    TRBE_BUFFER_512_BYTES,
>> +    TRBE_BUFFER_1K_BYTES,
>> +    TRBE_BUFFER_2K_BYTES,
>> +};
>> +
> 
> Remove ^^

Sure, will do.

> 
>> +static const char *const trbe_buffer_align_str[] = {
>> +    [TRBE_BUFFER_BYTE]        = "Byte",
>> +    [TRBE_BUFFER_HALF_WORD]        = "Half word",
>> +    [TRBE_BUFFER_WORD]        = "Word",
>> +    [TRBE_BUFFER_DOUBLE_WORD]    = "Double word",
>> +    [TRBE_BUFFER_16_BYTES]        = "16 bytes",
>> +    [TRBE_BUFFER_32_BYTES]        = "32 bytes",
>> +    [TRBE_BUFFER_64_BYTES]        = "64 bytes",
>> +    [TRBE_BUFFER_128_BYTES]        = "128 bytes",
>> +    [TRBE_BUFFER_256_BYTES]        = "256 bytes",
>> +    [TRBE_BUFFER_512_BYTES]        = "512 bytes",
>> +    [TRBE_BUFFER_1K_BYTES]        = "1K bytes",
>> +    [TRBE_BUFFER_2K_BYTES]        = "2K bytes",
>> +};
> 
> We don't need any of this. We could simply "<<" and get the
> size.

Dropping all these, we will just export the hex value in the sysfs
not a string from here.

> 
> 
>> +
>> +static inline enum trbe_buffer_align get_trbe_address_align(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
>> +}
>> +
>> +static inline void assert_trbe_address_mode(unsigned long addr)
>> +{
>> +    bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
>> +    bool virt_mode = is_trbe_virtual_mode();
>> +
>> +    WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
>> +}
> 
> I am not sure if this is really helpful. You have to trust the kernel vmalloc().

Okay, dropping both address asserts i.e mode and alignment.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 07/11] coresight: sink: Add TRBE driver
@ 2020-11-25  5:25       ` Anshuman Khandual
  0 siblings, 0 replies; 72+ messages in thread
From: Anshuman Khandual @ 2020-11-25  5:25 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach



On 11/12/20 3:43 PM, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>>   arch/arm64/include/asm/sysreg.h                  |   2 +
>>   drivers/hwtracing/coresight/Kconfig              |  11 +
>>   drivers/hwtracing/coresight/Makefile             |   1 +
>>   drivers/hwtracing/coresight/coresight-trbe.c     | 766 +++++++++++++++++++++++
>>   drivers/hwtracing/coresight/coresight-trbe.h     | 525 ++++++++++++++++
>>   6 files changed, 1341 insertions(+)
>>   create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>   create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 0000000..4320a8b
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,36 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Trace Buffer Extension (TRBE).
>> +==============================
>> +
>> +    :Author:   Anshuman Khandual <anshuman.khandual@arm.com>
>> +    :Date:     November 2020
>> +
>> +Hardware Description
>> +--------------------
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +Sysfs files and directories
>> +---------------------------
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +    >$ ls /sys/bus/coresight/devices
>> +    trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe<N>`` named TRBEs are associated with a CPU.::
>> +
>> +    >$ ls /sys/bus/coresight/devices/trbe0/
>> +    irq align dbm
>> +
>> +*Key file items are:-*
>> +   * ``irq``: TRBE maintenance interrupt number
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 14cb156..61136f6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>   #define SET_PSTATE_UAO(x)        __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
>>   #define SET_PSTATE_SSBS(x)        __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
>>   #define SET_PSTATE_TCO(x)        __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
>> +#define TSB_CSYNC            __emit_inst(0xd503225f)
>>     #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>>       __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
>> @@ -865,6 +866,7 @@
>>   #define ID_AA64MMFR2_CNP_SHIFT        0
>>     /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_TRBE_SHIFT        44
>>   #define ID_AA64DFR0_TRACE_FILT_SHIFT    40
>>   #define ID_AA64DFR0_DOUBLELOCK_SHIFT    36
>>   #define ID_AA64DFR0_PMSVER_SHIFT    32
>> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
>> index c119824..0f5e101 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>>         To compile this driver as a module, choose M here: the
>>         module will be called coresight-cti.
>>   +config CORESIGHT_TRBE
>> +    bool "Trace Buffer Extension (TRBE) driver"
>> +    depends on ARM64
>> +    help
>> +      This driver provides support for percpu Trace Buffer Extension (TRBE).
>> +      TRBE always needs to be used along with it's corresponding percpu ETE
>> +      component. ETE generates trace data which is then captured with TRBE.
>> +      Unlike traditional sink devices, TRBE is a CPU feature accessible via
>> +      system registers. But it's explicit dependency with trace unit (ETE)
>> +      requires it to be plugged in as a coresight sink device.
>> +
>>   config CORESIGHT_CTI_INTEGRATION_REGS
>>       bool "Access CTI CoreSight Integration Registers"
>>       depends on CORESIGHT_CTI
>> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
>> index f20e357..d608165 100644
>> --- a/drivers/hwtracing/coresight/Makefile
>> +++ b/drivers/hwtracing/coresight/Makefile
>> @@ -21,5 +21,6 @@ obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>>   obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>>   obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>>   obj-$(CONFIG_CORESIGHT_CTI) += coresight-cti.o
>> +obj-$(CONFIG_CORESIGHT_TRBE) += coresight-trbe.o
>>   coresight-cti-y := coresight-cti-core.o    coresight-cti-platform.o \
>>              coresight-cti-sysfs.o
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> new file mode 100644
>> index 0000000..48a8ec3
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -0,0 +1,766 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * This driver enables Trace Buffer Extension (TRBE) as a per-cpu coresight
>> + * sink device could then pair with an appropriate per-cpu coresight source
>> + * device (ETE) thus generating required trace data. Trace can be enabled
>> + * via the perf framework.
>> + *
>> + * Copyright (C) 2020 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#define DRVNAME "arm_trbe"
>> +
>> +#define pr_fmt(fmt) DRVNAME ": " fmt
>> +
>> +#include "coresight-trbe.h"
>> +
>> +#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
>> +
>> +#define ETE_IGNORE_PACKET 0x70
> 
> Add a comment here, on what this means to the decoder.

Sure, will add.

> 
>> +
>> +static const char trbe_name[] = "trbe";
> 
> Why not
> 
> #define DEVNAME    "trbe"

That can be replaced but we already define DRVNAME which gets used for
naming the TRBE interrupt that shows up in /proc/interrupts. But it is
"arm_trbe" instead. Should /sys/bus/coresight/devices/ list TRBE devices
as "arm_trbeN" ? If so, DRVNAME can be used without any problem. Should
DRVNAME be changed to just "trbe" instead ? But it makes sense to have
the same name for TRBE devices and the interrupt.

> 
> 
>> +
>> +enum trbe_fault_action {
>> +    TRBE_FAULT_ACT_WRAP,
>> +    TRBE_FAULT_ACT_SPURIOUS,
>> +    TRBE_FAULT_ACT_FATAL,
>> +};
>> +
>> +struct trbe_perf {
> 
> Please rename this to trbe_buf. This will be used for sysfs mode as well.

Sure, will do.

> 
>> +    unsigned long trbe_base;
>> +    unsigned long trbe_limit;
>> +    unsigned long trbe_write;
>> +    pid_t pid;
> 
> Why do we need this ? This seems unused and moreover, there cannot
> be multiple tracers into TRBE. So, we don't need to share the sink
> unlike the traditional ones.

Sure, will drop.

> 
>> +    int nr_pages;
>> +    void **pages;
>> +    bool snapshot;
>> +    struct trbe_cpudata *cpudata;
>> +};
>> +
>> +struct trbe_cpudata {
>> +    struct coresight_device    *csdev;
>> +    bool trbe_dbm;
> 
> Why do we need this ?

This is an internal implementation characteristic which should be
presented to the user space via sysfs for better understanding and
probably for debug purpose. The current proposal does not support
the scenario when TRBE DBM is off, which we need to incorporate
later on. Hence lets just leave this as is for now.

> 
>> +    u64 trbe_align;
>> +    int cpu;
>> +    enum cs_mode mode;
>> +    struct trbe_perf *perf;
>> +    struct trbe_drvdata *drvdata;
>> +};
>> +
>> +struct trbe_drvdata {
>> +    struct trbe_cpudata __percpu *cpudata;
>> +    struct perf_output_handle __percpu *handle;
> 
> Shouldn't this be :
> 
>     struct perf_output_handle __percpu **handle ?
> 
> as we get a handle from the etm-perf and is not controlled by
> the TRBE ?

Sure, will change this.

> 
>> +    struct hlist_node hotplug_node;
>> +    int irq;
>> +    cpumask_t supported_cpus;
>> +    enum cpuhp_state trbe_online;
>> +    struct platform_device *pdev;
>> +    struct clk *atclk;
> 
> We don't have any clocks for the TRBE instance. Please remove.

Sure, will drop.

> 
>> +};
>> +
>> +static int trbe_alloc_node(struct perf_event *event)
>> +{
>> +    if (event->cpu == -1)
>> +        return NUMA_NO_NODE;
>> +    return cpu_to_node(event->cpu);
>> +}
>> +
>> +static void trbe_disable_and_drain_local(void)
>> +{
>> +    write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>> +    isb();
>> +    dsb(nsh);
>> +    asm(TSB_CSYNC);
>> +}
>> +
>> +static void trbe_reset_local(void)
>> +{
>> +    trbe_disable_and_drain_local();
>> +    write_sysreg_s(0, SYS_TRBPTR_EL1);
>> +    isb();
>> +
>> +    write_sysreg_s(0, SYS_TRBBASER_EL1);
>> +    isb();
>> +
>> +    write_sysreg_s(0, SYS_TRBSR_EL1);
>> +    isb();
>> +}
>> +
>> +static void trbe_pad_buf(struct perf_output_handle *handle, int len)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    u64 head = PERF_IDX2OFF(handle->head, perf);
>> +
>> +    memset((void *) perf->trbe_base + head, ETE_IGNORE_PACKET, len);
>> +    if (!perf->snapshot)
>> +        perf_aux_output_skip(handle, len);
>> +}
>> +
>> +static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    u64 head = PERF_IDX2OFF(handle->head, perf);
>> +    u64 limit = perf->nr_pages * PAGE_SIZE;
>> +
> 
> So we are using half of the buffer for snapshot mode to avoid a case where the
> analyzer is unable to decode the trace in case of an overflow.

Right.

> 
>> +    if (head < limit >> 1)
>> +        limit >>= 1;
> 
> Also this needs to be thought out. We may not need this restriction. The trace decoder
> will be able to walk forward and then find a synchronization packet and then continue
> the tracing from there. So, we could use the entire buffer for TRBE.

Okay. May be we could just go with half the TRBE buffer for now and
later on, use the entire buffer after better understanding on this ?

> 
> 
>> +
>> +    return limit;
>> +}
>> +
>> +static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    struct trbe_cpudata *cpudata = perf->cpudata;
>> +    const u64 bufsize = perf->nr_pages * PAGE_SIZE;
>> +    u64 limit = bufsize;
>> +    u64 head, tail, wakeup;
>> +
> 
> Commentary please.

Sure, will add some.

> 
>> +    head = PERF_IDX2OFF(handle->head, perf);
>> +    if (!IS_ALIGNED(head, cpudata->trbe_align)) {
>> +        unsigned long delta = roundup(head, cpudata->trbe_align) - head;
>> +
>> +        delta = min(delta, handle->size);
>> +        trbe_pad_buf(handle, delta);
>> +        head = PERF_IDX2OFF(handle->head, perf);
>> +    }
>> +
>> +    if (!handle->size) {
>> +        perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +        return 0;
>> +    }
>> +
>> +    tail = PERF_IDX2OFF(handle->head + handle->size, perf);
>> +    wakeup = PERF_IDX2OFF(handle->wakeup, perf);
>> +
> 
>> +    if (head < tail)
> 
>  comment
> 
>> +        limit = round_down(tail, PAGE_SIZE);
>> +
>> +    if (handle->wakeup < (handle->head + handle->size) && head <= wakeup)
>> +        limit = min(limit, round_up(wakeup, PAGE_SIZE));
> 
> comment. Also do we need an alignement to PAGE_SIZE ?

Limit has to be always PAGE_SIZE aligned because its eventually going
to be the TRBE limit pointer, after getting added into the TRBE base
pointer. Will add some more comment here as well.

> 
>> +
>> +    if (limit > head)
>> +        return limit;
>> +
>> +    trbe_pad_buf(handle, handle->size);
>> +    perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +    return 0;
>> +}
>> +
>> +static unsigned long get_trbe_limit(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    unsigned long offset;
>> +
>> +    if (perf->snapshot)
>> +        offset = trbe_snapshot_offset(handle);
>> +    else
>> +        offset = trbe_normal_offset(handle);
>> +    return perf->trbe_base + offset;
>> +}
>> +
>> +static void trbe_enable_hw(struct trbe_perf *perf)
>> +{
>> +    WARN_ON(perf->trbe_write < perf->trbe_base);
>> +    WARN_ON(perf->trbe_write >= perf->trbe_limit);
>> +    set_trbe_disabled();
>> +    clr_trbe_irq();
>> +    clr_trbe_wrap();
>> +    clr_trbe_abort();
>> +    clr_trbe_ec();
>> +    clr_trbe_bsc();
>> +    clr_trbe_fsc();
> 
> Please merge all of these field updates to single register update
> unless mandated by the architecture.

Sure, will do.

> 
>> +    set_trbe_virtual_mode();
>> +    set_trbe_fill_mode(TRBE_FILL_STOP);
>> +    set_trbe_trig_mode(TRBE_TRIGGER_IGNORE);
> 
> Same here ^^

Sure, will do.

> 
>> +    isb();
>> +    set_trbe_base_pointer(perf->trbe_base);
>> +    set_trbe_limit_pointer(perf->trbe_limit);
>> +    set_trbe_write_pointer(perf->trbe_write);
>> +    isb();
>> +    dsb(ishst);
>> +    flush_tlb_all();
> 
> Why is this needed ?

Will drop flush_tlb_all().

> 
>> +    set_trbe_running();
>> +    set_trbe_enabled();
>> +    asm(TSB_CSYNC);
>> +}
>> +
>> +static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> +                   struct perf_event *event, void **pages,
>> +                   int nr_pages, bool snapshot)
>> +{
>> +    struct trbe_perf *perf;
>> +    struct page **pglist;
>> +    int i;
>> +
>> +    if ((nr_pages < 2) || (snapshot && (nr_pages & 1)))
> 
> We may be able to remove the restriction on snapshot mode, see my comment
> above.

Sure, will drop when the entire buffer is used for the snapshot mode.

> 
>> +        return NULL;
>> +
>> +    perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, trbe_alloc_node(event));
>> +    if (IS_ERR(perf))
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
>> +    if (IS_ERR(pglist)) {
>> +        kfree(perf);
>> +        return ERR_PTR(-ENOMEM);
>> +    }
>> +
>> +    for (i = 0; i < nr_pages; i++)
>> +        pglist[i] = virt_to_page(pages[i]);
>> +
>> +    perf->trbe_base = (unsigned long) vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
>> +    if (IS_ERR((void *) perf->trbe_base)) {
>> +        kfree(pglist);
>> +        kfree(perf);
>> +        return ERR_PTR(perf->trbe_base);
>> +    }
>> +    perf->trbe_limit = perf->trbe_base + nr_pages * PAGE_SIZE;
>> +    perf->trbe_write = perf->trbe_base;
>> +    perf->pid = task_pid_nr(event->owner);
>> +    perf->snapshot = snapshot;
>> +    perf->nr_pages = nr_pages;
>> +    perf->pages = pages;
>> +    kfree(pglist);
>> +    return perf;
>> +}
>> +
>> +void arm_trbe_free_buffer(void *config)
>> +{
>> +    struct trbe_perf *perf = config;
>> +
>> +    vunmap((void *) perf->trbe_base);
>> +    kfree(perf);
>> +}
>> +
>> +static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> +                        struct perf_output_handle *handle,
>> +                        void *config)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct trbe_perf *perf = config;
>> +    unsigned long size, offset;
>> +
>> +    WARN_ON(perf->cpudata != cpudata);
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(cpudata->mode != CS_MODE_PERF);
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    offset = get_trbe_write_pointer() - get_trbe_base_pointer();
>> +    size = offset - PERF_IDX2OFF(handle->head, perf);
>> +    if (perf->snapshot)
>> +        handle->head += size;
>> +    return size;
>> +}
>> +
>> +static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct perf_output_handle *handle = data;
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(mode != CS_MODE_PERF);
> 
> Why WARN_ON ? Simply return -EINVAL ? Also you need a check to make sure
> the mode is DISABLED (when you get to sysfs mode).
> 
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    *this_cpu_ptr(drvdata->handle) = *handle;
> 
> That is wrong. Storing a local copy of a global perf generic structure
> is calling for trouble, assuming that the global structure doesn't change
> beneath us. Please store handle ptr.

Sure, will change.

> 
>> +    cpudata->perf = perf;
>> +    cpudata->mode = mode;
>> +    perf->cpudata = cpudata;
>> +    perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return 0;
>> +    }
>> +    trbe_enable_hw(perf);
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_disable(struct coresight_device *csdev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>> +    struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>> +    struct trbe_perf *perf = cpudata->perf;
>> +
>> +    WARN_ON(perf->cpudata != cpudata);
>> +    WARN_ON(cpudata->cpu != smp_processor_id());
>> +    WARN_ON(cpudata->mode != CS_MODE_PERF);
>> +    WARN_ON(cpudata->drvdata != drvdata);
>> +
>> +    trbe_disable_and_drain_local();
>> +    perf->cpudata = NULL;
>> +    cpudata->perf = NULL;
>> +    cpudata->mode = CS_MODE_DISABLED;
>> +    return 0;
>> +}
>> +
>> +static void trbe_handle_fatal(struct perf_output_handle *handle)
>> +{
>> +    perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
>> +    perf_aux_output_end(handle, 0);
>> +    trbe_disable_and_drain_local();
>> +}
>> +
>> +static void trbe_handle_spurious(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +
>> +    perf->trbe_write = perf->trbe_base + PERF_IDX2OFF(handle->head, perf);
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    trbe_enable_hw(perf);
>> +}
>> +
>> +static void trbe_handle_overflow(struct perf_output_handle *handle)
>> +{
>> +    struct perf_event *event = handle->event;
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    unsigned long offset, size;
>> +    struct etm_event_data *event_data;
>> +
>> +    offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
>> +    size = offset - PERF_IDX2OFF(handle->head, perf);
>> +    if (perf->snapshot)
>> +        handle->head = offset;
> 
> Is this correct ? Or was this supposed to mean :
>         handle->head += offset;

Hmm, not too sure about this but the SPE driver does the same in
arm_spe_perf_aux_output_end().

> 
> 
>> +    perf_aux_output_end(handle, size);
>> +
>> +    event_data = perf_aux_output_begin(handle, event);
>> +    if (!event_data) {
>> +        event->hw.state |= PERF_HES_STOPPED;
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    perf->trbe_write = perf->trbe_base;
>> +    perf->trbe_limit = get_trbe_limit(handle);
>> +    if (perf->trbe_limit == perf->trbe_base) {
>> +        trbe_disable_and_drain_local();
>> +        return;
>> +    }
>> +    *this_cpu_ptr(perf->cpudata->drvdata->handle) = *handle;
>> +    trbe_enable_hw(perf);
>> +}
>> +
>> +static bool is_perf_trbe(struct perf_output_handle *handle)
>> +{
>> +    struct trbe_perf *perf = etm_perf_sink_config(handle);
>> +    struct trbe_cpudata *cpudata = perf->cpudata;
>> +    struct trbe_drvdata *drvdata = cpudata->drvdata;
> 
> Can you trust the cpudata ptr here as we are still verifying
> if this was legitimate ?

It verifies the legitimacy of the interrupt as being generated from
an active perf session on the cpu with some simple sanity checks.
But all data structure linkage should be intact. The perf handle
originates from the drvdata percpu structure which should have a
trbe_perf and everything flows from there.

> 
>> +    int cpu = smp_processor_id();
>> +
>> +    WARN_ON(perf->trbe_base != get_trbe_base_pointer());
>> +    WARN_ON(perf->trbe_limit != get_trbe_limit_pointer());
>> +
>> +    if (cpudata->mode != CS_MODE_PERF)
>> +        return false;
>> +
>> +    if (cpudata->cpu != cpu)
>> +        return false;
>> +
>> +    if (!cpumask_test_cpu(cpu, &drvdata->supported_cpus))
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle)
>> +{
>> +    enum trbe_ec ec = get_trbe_ec();
>> +    enum trbe_bsc bsc = get_trbe_bsc();
>> +
>> +    WARN_ON(is_trbe_running());
>> +    asm(TSB_CSYNC);
>> +    dsb(nsh);
>> +    isb();
>> +
>> +    if (is_trbe_trg() || is_trbe_abort())
>> +        return TRBE_FAULT_ACT_FATAL;
>> +
>> +    if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>> +        return TRBE_FAULT_ACT_FATAL;
>> +
>> +    if (is_trbe_wrap() && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> +        if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> +            return TRBE_FAULT_ACT_WRAP;
>> +    }
>> +    return TRBE_FAULT_ACT_SPURIOUS;
>> +}
>> +
>> +static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> +{
>> +    struct perf_output_handle *handle = dev;
>> +    enum trbe_fault_action act;
>> +
>> +    WARN_ON(!is_trbe_irq());
>> +    clr_trbe_irq();
>> +
>> +    if (!perf_get_aux(handle))
>> +        return IRQ_NONE;
>> +
>> +    if (!is_perf_trbe(handle))
>> +        return IRQ_NONE;
>> +
>> +    irq_work_run();
>> +
>> +    act = trbe_get_fault_act(handle);
>> +    switch (act) {
>> +    case TRBE_FAULT_ACT_WRAP:
>> +        trbe_handle_overflow(handle);
>> +        break;
>> +    case TRBE_FAULT_ACT_SPURIOUS:
>> +        trbe_handle_spurious(handle);
>> +        break;
>> +    case TRBE_FAULT_ACT_FATAL:
>> +        trbe_handle_fatal(handle);
>> +        break;
>> +    }
>> +    return IRQ_HANDLED;
>> +}
>> +
> 
> 
>> +static void arm_trbe_probe_coresight_cpu(void *info)
>> +{
>> +    struct trbe_cpudata *cpudata = info;
>> +    struct device *dev = &cpudata->drvdata->pdev->dev;
>> +    struct coresight_desc desc = { 0 };
>> +
>> +    if (WARN_ON(!cpudata))
>> +        goto cpu_clear;
>> +
>> +    if (!is_trbe_available()) {
>> +        pr_err("TRBE is not implemented on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +
>> +    if (!is_trbe_programmable()) {
>> +        pr_err("TRBE is owned in higher exception level on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +    desc.name = devm_kasprintf(dev, GFP_KERNEL, "%s%d", trbe_name, smp_processor_id());
>> +    if (IS_ERR(desc.name))
>> +        goto cpu_clear;
>> +
>> +    desc.type = CORESIGHT_DEV_TYPE_SINK;
>> +    desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM;
> 
> May be should add a new subtype to make this higher priority than the normal ETR.
> Something like :
> 
>     CORESIGHT_DEV_SUBTYPE_SINK_PERCPU_SYSMEM

Sure, will do.

> 
>> +    desc.ops = &arm_trbe_cs_ops;
>> +    desc.pdata = dev_get_platdata(dev);
>> +    desc.groups = arm_trbe_groups;
>> +    desc.dev = dev;
>> +    cpudata->csdev = coresight_register(&desc);
>> +    if (IS_ERR(cpudata->csdev))
>> +        goto cpu_clear;
>> +
>> +    dev_set_drvdata(&cpudata->csdev->dev, cpudata);
>> +    cpudata->trbe_dbm = get_trbe_flag_update();
>> +    cpudata->trbe_align = 1ULL << get_trbe_address_align();
>> +    if (cpudata->trbe_align > SZ_2K) {
>> +        pr_err("Unsupported alignment on cpu %d\n", cpudata->cpu);
>> +        goto cpu_clear;
>> +    }
>> +    return;
>> +cpu_clear:
>> +    cpumask_clear_cpu(cpudata->cpu, &cpudata->drvdata->supported_cpus);
>> +}
>> +
>> +static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
>> +{
>> +    struct trbe_cpudata *cpudata;
>> +    int cpu;
>> +
>> +    drvdata->cpudata = alloc_percpu(typeof(*drvdata->cpudata));
>> +    if (IS_ERR(drvdata->cpudata))
>> +        return PTR_ERR(drvdata->cpudata);
>> +
>> +    for_each_cpu(cpu, &drvdata->supported_cpus) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        cpudata->cpu = cpu;
>> +        cpudata->drvdata = drvdata;
>> +        smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> 
> We could batch it and run it on all CPUs at the same time ? Also it would be better to
> leave the per_cpu area filled by the CPU itself, to avoid racing.

Sure, will re-organize the entire CPU probing/removal and also the CPU
online/offline path. Planning to use smp_call_function_many() instead
for a simultaneous init. 

> 
> 
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_coresight_cpu(void *info)
>> +{
>> +    struct trbe_drvdata *drvdata = info;
>> +
>> +    disable_percpu_irq(drvdata->irq);
>> +}
>> +
>> +static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata)
>> +{
>> +    struct trbe_cpudata *cpudata;
>> +    int cpu;
>> +
>> +    for_each_cpu(cpu, &drvdata->supported_cpus) {
>> +        smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1);
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (cpudata->csdev) {
>> +            coresight_unregister(cpudata->csdev);
>> +            cpudata->drvdata = NULL;
>> +            cpudata->csdev = NULL;
>> +        }
> 
> Please leave this to the CPU to do the part.

Sure, will do.

> 
>> +    }
>> +    free_percpu(drvdata->cpudata);
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_cpu_startup(unsigned int cpu, struct hlist_node *node)
>> +{
>> +    struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
>> +    struct trbe_cpudata *cpudata;
>> +
>> +    if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (!cpudata->csdev) {
>> +            cpudata->drvdata = drvdata;
>> +            smp_call_function_single(cpu, arm_trbe_probe_coresight_cpu, cpudata, 1);
> 
> Why do we need smp_call here ? We are already on the CPU.

We dont need, will drop.

> 
>> +        }
>> +        trbe_reset_local();
>> +        enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
>> +{
>> +    struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node);
>> +    struct trbe_cpudata *cpudata;
>> +
>> +    if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>> +        cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>> +        if (cpudata->csdev) {
>> +            coresight_unregister(cpudata->csdev);
>> +            cpudata->drvdata = NULL;
>> +            cpudata->csdev = NULL;
>> +        }
>> +        disable_percpu_irq(drvdata->irq);
>> +        trbe_reset_local();
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
>> +{
>> +    enum cpuhp_state trbe_online;
>> +
>> +    trbe_online = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
>> +                    arm_trbe_cpu_startup, arm_trbe_cpu_teardown);
>> +    if (trbe_online < 0)
>> +        return -EINVAL;
>> +
>> +    if (cpuhp_state_add_instance(trbe_online, &drvdata->hotplug_node))
>> +        return -EINVAL;
>> +
>> +    drvdata->trbe_online = trbe_online;
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata)
>> +{
>> +    cpuhp_remove_multi_state(drvdata->trbe_online);
>> +}
>> +
>> +static int arm_trbe_probe_irq(struct platform_device *pdev,
>> +                  struct trbe_drvdata *drvdata)
>> +{
>> +    drvdata->irq = platform_get_irq(pdev, 0);
>> +    if (!drvdata->irq) {
>> +        pr_err("IRQ not found for the platform device\n");
>> +        return -ENXIO;
>> +    }
>> +
>> +    if (!irq_is_percpu(drvdata->irq)) {
>> +        pr_err("IRQ is not a PPI\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (irq_get_percpu_devid_partition(drvdata->irq, &drvdata->supported_cpus))
>> +        return -EINVAL;
>> +
>> +    drvdata->handle = alloc_percpu(typeof(*drvdata->handle));
>> +    if (!drvdata->handle)
>> +        return -ENOMEM;
>> +
>> +    if (request_percpu_irq(drvdata->irq, arm_trbe_irq_handler, DRVNAME, drvdata->handle)) {
>> +        free_percpu(drvdata->handle);
>> +        return -EINVAL;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void arm_trbe_remove_irq(struct trbe_drvdata *drvdata)
>> +{
>> +    free_percpu_irq(drvdata->irq, drvdata->handle);
>> +    free_percpu(drvdata->handle);
>> +}
>> +
>> +static int arm_trbe_device_probe(struct platform_device *pdev)
>> +{
>> +    struct coresight_platform_data *pdata;
>> +    struct trbe_drvdata *drvdata;
>> +    struct device *dev = &pdev->dev;
>> +    int ret;
>> +
>> +    drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>> +    if (IS_ERR(drvdata))
>> +        return -ENOMEM;
>> +
>> +    pdata = coresight_get_platform_data(dev);
>> +    if (IS_ERR(pdata)) {
>> +        kfree(drvdata);
>> +        return -ENOMEM;
>> +    }
> 
> 
>> +
>> +    drvdata->atclk = devm_clk_get(dev, "atclk");
>> +    if (!IS_ERR(drvdata->atclk)) {
>> +        ret = clk_prepare_enable(drvdata->atclk);
>> +        if (ret)
>> +            return ret;
>> +    }
> 
> Please drop the clocks, we don't have any

Right, will drop the clock and also the power management support
along with it.

> 
>> +    dev_set_drvdata(dev, drvdata);
>> +    dev->platform_data = pdata;
>> +    drvdata->pdev = pdev;
>> +    ret = arm_trbe_probe_irq(pdev, drvdata);
>> +    if (ret)
>> +        goto irq_failed;
>> +
>> +    ret = arm_trbe_probe_coresight(drvdata);
>> +    if (ret)
>> +        goto probe_failed;
>> +
>> +    ret = arm_trbe_probe_cpuhp(drvdata);
>> +    if (ret)
>> +        goto cpuhp_failed;
>> +
>> +    return 0;
>> +cpuhp_failed:
>> +    arm_trbe_remove_coresight(drvdata);
>> +probe_failed:
>> +    arm_trbe_remove_irq(drvdata);
>> +irq_failed:
>> +    kfree(pdata);
>> +    kfree(drvdata);
>> +    return ret;
>> +}
>> +
>> +static int arm_trbe_device_remove(struct platform_device *pdev)
>> +{
>> +    struct coresight_platform_data *pdata = dev_get_platdata(&pdev->dev);
>> +    struct trbe_drvdata *drvdata = platform_get_drvdata(pdev);
>> +
>> +    arm_trbe_remove_coresight(drvdata);
>> +    arm_trbe_remove_cpuhp(drvdata);
>> +    arm_trbe_remove_irq(drvdata);
>> +    kfree(pdata);
>> +    kfree(drvdata);
>> +    return 0;
>> +}
>> +
>> +#ifdef CONFIG_PM
>> +static int arm_trbe_runtime_suspend(struct device *dev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
>> +
>> +    if (drvdata && !IS_ERR(drvdata->atclk))
>> +        clk_disable_unprepare(drvdata->atclk);
>> +
> 
> Remove. We may need to save/restore the TRBE ptrs, depending on the
> TRBE.

Will drop it for now. Could revisit this later after the base
functionality is up and running.

> 
>> +    return 0;
>> +}
>> +
>> +static int arm_trbe_runtime_resume(struct device *dev)
>> +{
>> +    struct trbe_drvdata *drvdata = dev_get_drvdata(dev);
>> +
>> +    if (drvdata && !IS_ERR(drvdata->atclk))
>> +        clk_prepare_enable(drvdata->atclk);
> 
> Remove. See above.
> 
>> +
>> +    return 0;
>> +}
>> +#endif
>> +
>> +static const struct dev_pm_ops arm_trbe_dev_pm_ops = {
>> +    SET_RUNTIME_PM_OPS(arm_trbe_runtime_suspend, arm_trbe_runtime_resume, NULL)
>> +};
>> +
>> +static const struct of_device_id arm_trbe_of_match[] = {
>> +    { .compatible = "arm,arm-trbe",    .data = (void *)1 },
>> +    {},
>> +};
> 
> I think it is better to call this, we have too many acronyms ;-)
> 
>     "arm,trace-buffer-extension"

Sure, will change.

> 
>> +MODULE_DEVICE_TABLE(of, arm_trbe_of_match);
> 
>> +
>> +static const struct platform_device_id arm_trbe_match[] = {
>> +    { "arm,trbe", 0},
>> +    { }
>> +};
>> +MODULE_DEVICE_TABLE(platform, arm_trbe_match);
> 
> Please remove. The ACPI part can be added when we get to it.

Sure, will drop for now.

> 
>> +
>> +static struct platform_driver arm_trbe_driver = {
>> +    .id_table = arm_trbe_match,
>> +    .driver    = {
>> +        .name = DRVNAME,
>> +        .of_match_table = of_match_ptr(arm_trbe_of_match),
>> +        .pm = &arm_trbe_dev_pm_ops,
>> +        .suppress_bind_attrs = true,
>> +    },
>> +    .probe    = arm_trbe_device_probe,
>> +    .remove    = arm_trbe_device_remove,
>> +};
>> +builtin_platform_driver(arm_trbe_driver)
> 
> Please make this modular.

Will do.

> 
> 
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.h b/drivers/hwtracing/coresight/coresight-trbe.h
>> new file mode 100644
>> index 0000000..82ffbfc
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.h
>> @@ -0,0 +1,525 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * This contains all required hardware related helper functions for
>> + * Trace Buffer Extension (TRBE) driver in the coresight framework.
>> + *
>> + * Copyright (C) 2020 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <anshuman.khandual@arm.com>
>> + */
>> +#include <linux/coresight.h>
>> +#include <linux/device.h>
>> +#include <linux/irq.h>
>> +#include <linux/kernel.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/smp.h>
>> +
>> +#include "coresight-etm-perf.h"
>> +
>> +static inline bool is_trbe_available(void)
>> +{
>> +    u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>> +    int trbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRBE_SHIFT);
>> +
>> +    return trbe >= 0b0001;
>> +}
>> +
>> +static inline bool is_ete_available(void)
>> +{
>> +    u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
>> +    int tracever = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_TRACEVER_SHIFT);
>> +
>> +    return (tracever != 0b0000);
> 
> Why is this needed ?

Sure, will drop.

> 
>> +}
>> +
>> +static inline bool is_trbe_enabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return trblimitr & TRBLIMITR_ENABLE;
>> +}
>> +
>> +enum trbe_ec {
>> +    TRBE_EC_OTHERS        = 0,
>> +    TRBE_EC_STAGE1_ABORT    = 36,
>> +    TRBE_EC_STAGE2_ABORT    = 37,
>> +};
>> +
>> +static const char *const trbe_ec_str[] = {
>> +    [TRBE_EC_OTHERS]    = "Maintenance exception",
>> +    [TRBE_EC_STAGE1_ABORT]    = "Stage-1 exception",
>> +    [TRBE_EC_STAGE2_ABORT]    = "Stage-2 exception",
>> +};
>> +
> 
> Please remove the defintions that are not used by the driver.

Sure, will do.

> 
>> +static inline enum trbe_ec get_trbe_ec(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    return (trbsr >> TRBSR_EC_SHIFT) & TRBSR_EC_MASK;
>> +}
>> +
>> +static inline void clr_trbe_ec(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    trbsr &= ~(TRBSR_EC_MASK << TRBSR_EC_SHIFT);
>> +    write_sysreg_s(trbsr, SYS_TRBSR_EL1);
>> +}
>> +
>> +enum trbe_bsc {
>> +    TRBE_BSC_NOT_STOPPED    = 0,
>> +    TRBE_BSC_FILLED        = 1,
>> +    TRBE_BSC_TRIGGERED    = 2,
>> +};
>> +
>> +static const char *const trbe_bsc_str[] = {
>> +    [TRBE_BSC_NOT_STOPPED]    = "TRBE collection not stopped",
>> +    [TRBE_BSC_FILLED]    = "TRBE filled",
>> +    [TRBE_BSC_TRIGGERED]    = "TRBE triggered",
>> +};
>> +
>> +static inline enum trbe_bsc get_trbe_bsc(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    return (trbsr >> TRBSR_BSC_SHIFT) & TRBSR_BSC_MASK;
>> +}
>> +
>> +static inline void clr_trbe_bsc(void)
>> +{
>> +    u64 trbsr = read_sysreg_s(SYS_TRBSR_EL1);
>> +
>> +    trbsr &= ~(TRBSR_BSC_MASK << TRBSR_BSC_SHIFT);
>> +    write_sysreg_s(trbsr, SYS_TRBSR_EL1);
>> +}
>> +
>> +enum trbe_fsc {
>> +    TRBE_FSC_ASF_LEVEL0    = 0,
>> +    TRBE_FSC_ASF_LEVEL1    = 1,
>> +    TRBE_FSC_ASF_LEVEL2    = 2,
>> +    TRBE_FSC_ASF_LEVEL3    = 3,
>> +    TRBE_FSC_TF_LEVEL0    = 4,
>> +    TRBE_FSC_TF_LEVEL1    = 5,
>> +    TRBE_FSC_TF_LEVEL2    = 6,
>> +    TRBE_FSC_TF_LEVEL3    = 7,
>> +    TRBE_FSC_AFF_LEVEL0    = 8,
>> +    TRBE_FSC_AFF_LEVEL1    = 9,
>> +    TRBE_FSC_AFF_LEVEL2    = 10,
>> +    TRBE_FSC_AFF_LEVEL3    = 11,
>> +    TRBE_FSC_PF_LEVEL0    = 12,
>> +    TRBE_FSC_PF_LEVEL1    = 13,
>> +    TRBE_FSC_PF_LEVEL2    = 14,
>> +    TRBE_FSC_PF_LEVEL3    = 15,
>> +    TRBE_FSC_SEA_WRITE    = 16,
>> +    TRBE_FSC_ASEA_WRITE    = 17,
>> +    TRBE_FSC_SEA_LEVEL0    = 20,
>> +    TRBE_FSC_SEA_LEVEL1    = 21,
>> +    TRBE_FSC_SEA_LEVEL2    = 22,
>> +    TRBE_FSC_SEA_LEVEL3    = 23,
>> +    TRBE_FSC_ALIGN_FAULT    = 33,
>> +    TRBE_FSC_TLB_FAULT    = 48,
>> +    TRBE_FSC_ATOMIC_FAULT    = 49,
>> +};
> 
> Please remove ^^^

Sure, will do.

> 
>> +
>> +static const char *const trbe_fsc_str[] = {
>> +    [TRBE_FSC_ASF_LEVEL0]    = "Address size fault - level 0",
>> +    [TRBE_FSC_ASF_LEVEL1]    = "Address size fault - level 1",
>> +    [TRBE_FSC_ASF_LEVEL2]    = "Address size fault - level 2",
>> +    [TRBE_FSC_ASF_LEVEL3]    = "Address size fault - level 3",
>> +    [TRBE_FSC_TF_LEVEL0]    = "Translation fault - level 0",
>> +    [TRBE_FSC_TF_LEVEL1]    = "Translation fault - level 1",
>> +    [TRBE_FSC_TF_LEVEL2]    = "Translation fault - level 2",
>> +    [TRBE_FSC_TF_LEVEL3]    = "Translation fault - level 3",
>> +    [TRBE_FSC_AFF_LEVEL0]    = "Access flag fault - level 0",
>> +    [TRBE_FSC_AFF_LEVEL1]    = "Access flag fault - level 1",
>> +    [TRBE_FSC_AFF_LEVEL2]    = "Access flag fault - level 2",
>> +    [TRBE_FSC_AFF_LEVEL3]    = "Access flag fault - level 3",
>> +    [TRBE_FSC_PF_LEVEL0]    = "Permission fault - level 0",
>> +    [TRBE_FSC_PF_LEVEL1]    = "Permission fault - level 1",
>> +    [TRBE_FSC_PF_LEVEL2]    = "Permission fault - level 2",
>> +    [TRBE_FSC_PF_LEVEL3]    = "Permission fault - level 3",
>> +    [TRBE_FSC_SEA_WRITE]    = "Synchronous external abort on write",
>> +    [TRBE_FSC_ASEA_WRITE]    = "Asynchronous external abort on write",
>> +    [TRBE_FSC_SEA_LEVEL0]    = "Syncrhonous external abort on table walk - level 0",
>> +    [TRBE_FSC_SEA_LEVEL1]    = "Syncrhonous external abort on table walk - level 1",
>> +    [TRBE_FSC_SEA_LEVEL2]    = "Syncrhonous external abort on table walk - level 2",
>> +    [TRBE_FSC_SEA_LEVEL3]    = "Syncrhonous external abort on table walk - level 3",
>> +    [TRBE_FSC_ALIGN_FAULT]    = "Alignment fault",
>> +    [TRBE_FSC_TLB_FAULT]    = "TLB conflict fault",
>> +    [TRBE_FSC_ATOMIC_FAULT]    = "Atmoc fault",
>> +};
>>
> 
> Please remove ^^^

Sure, will do.

> 
>>
> 
>> +enum trbe_address_mode {
>> +    TRBE_ADDRESS_VIRTUAL,
>> +    TRBE_ADDRESS_PHYSICAL,
>> +};
> 
> #define please.
> 
>> +
>> +static const char *const trbe_address_mode_str[] = {
>> +    [TRBE_ADDRESS_VIRTUAL]    = "Address mode - virtual",
>> +    [TRBE_ADDRESS_PHYSICAL]    = "Address mode - physical",
>> +};
> 
> Do we need this ? We always use virtual.
> 
>> +
>> +static inline bool is_trbe_virtual_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return !(trblimitr & TRBLIMITR_NVM);
>> +}
>> +
> 
> Remove

Sure, will do.

> 
>> +static inline bool is_trbe_physical_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return trblimitr & TRBLIMITR_NVM;
>> +}
> 
> Remove

Sure, will do.

> 
>> +
>> +static inline void set_trbe_virtual_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~TRBLIMITR_NVM;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
> 
>> +static inline void set_trbe_physical_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr |= TRBLIMITR_NVM;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
> 
> Remove

Sure, will do.

> 
>> +
>> +enum trbe_trig_mode {
>> +    TRBE_TRIGGER_STOP    = 0,
>> +    TRBE_TRIGGER_IRQ    = 1,
>> +    TRBE_TRIGGER_IGNORE    = 3,
>> +};
>> +
>> +static const char *const trbe_trig_mode_str[] = {
>> +    [TRBE_TRIGGER_STOP]    = "Trigger mode - stop",
>> +    [TRBE_TRIGGER_IRQ]    = "Trigger mode - irq",
>> +    [TRBE_TRIGGER_IGNORE]    = "Trigger mode - ignore",
>> +};
>> +
>> +static inline enum trbe_trig_mode get_trbe_trig_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return (trblimitr >> TRBLIMITR_TRIG_MODE_SHIFT) & TRBLIMITR_TRIG_MODE_MASK;
>> +}
>> +
>> +static inline void set_trbe_trig_mode(enum trbe_trig_mode mode)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~(TRBLIMITR_TRIG_MODE_MASK << TRBLIMITR_TRIG_MODE_SHIFT);
>> +    trblimitr |= ((mode & TRBLIMITR_TRIG_MODE_MASK) << TRBLIMITR_TRIG_MODE_SHIFT);
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +enum trbe_fill_mode {
>> +    TRBE_FILL_STOP        = 0,
>> +    TRBE_FILL_WRAP        = 1,
>> +    TRBE_FILL_CIRCULAR    = 3,
>> +};
>> +
> 
> Please use #define

These are predefined constrained values which kind of makes them
a set. An enumeration seems to be a better representation.

> 
>> +static const char *const trbe_fill_mode_str[] = {
>> +    [TRBE_FILL_STOP]    = "Buffer mode - stop",
>> +    [TRBE_FILL_WRAP]    = "Buffer mode - wrap",
>> +    [TRBE_FILL_CIRCULAR]    = "Buffer mode - circular",
>> +};
>> +
>> +static inline enum trbe_fill_mode get_trbe_fill_mode(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    return (trblimitr >> TRBLIMITR_FILL_MODE_SHIFT) & TRBLIMITR_FILL_MODE_MASK;
>> +}
>> +
>> +static inline void set_trbe_fill_mode(enum trbe_fill_mode mode)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~(TRBLIMITR_FILL_MODE_MASK << TRBLIMITR_FILL_MODE_SHIFT);
>> +    trblimitr |= ((mode & TRBLIMITR_FILL_MODE_MASK) << TRBLIMITR_FILL_MODE_SHIFT);
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline void set_trbe_disabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr &= ~TRBLIMITR_ENABLE;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline void set_trbe_enabled(void)
>> +{
>> +    u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> +
>> +    trblimitr |= TRBLIMITR_ENABLE;
>> +    write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> +}
>> +
>> +static inline bool get_trbe_flag_update(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return trbidr & TRBIDR_FLAG;
>> +}
>> +
>> +static inline bool is_trbe_programmable(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return !(trbidr & TRBIDR_PROG);
>> +}
>> +#
>> +enum trbe_buffer_align {
>> +    TRBE_BUFFER_BYTE,
>> +    TRBE_BUFFER_HALF_WORD,
>> +    TRBE_BUFFER_WORD,
>> +    TRBE_BUFFER_DOUBLE_WORD,
>> +    TRBE_BUFFER_16_BYTES,
>> +    TRBE_BUFFER_32_BYTES,
>> +    TRBE_BUFFER_64_BYTES,
>> +    TRBE_BUFFER_128_BYTES,
>> +    TRBE_BUFFER_256_BYTES,
>> +    TRBE_BUFFER_512_BYTES,
>> +    TRBE_BUFFER_1K_BYTES,
>> +    TRBE_BUFFER_2K_BYTES,
>> +};
>> +
> 
> Remove ^^

Sure, will do.

> 
>> +static const char *const trbe_buffer_align_str[] = {
>> +    [TRBE_BUFFER_BYTE]        = "Byte",
>> +    [TRBE_BUFFER_HALF_WORD]        = "Half word",
>> +    [TRBE_BUFFER_WORD]        = "Word",
>> +    [TRBE_BUFFER_DOUBLE_WORD]    = "Double word",
>> +    [TRBE_BUFFER_16_BYTES]        = "16 bytes",
>> +    [TRBE_BUFFER_32_BYTES]        = "32 bytes",
>> +    [TRBE_BUFFER_64_BYTES]        = "64 bytes",
>> +    [TRBE_BUFFER_128_BYTES]        = "128 bytes",
>> +    [TRBE_BUFFER_256_BYTES]        = "256 bytes",
>> +    [TRBE_BUFFER_512_BYTES]        = "512 bytes",
>> +    [TRBE_BUFFER_1K_BYTES]        = "1K bytes",
>> +    [TRBE_BUFFER_2K_BYTES]        = "2K bytes",
>> +};
> 
> We don't need any of this. We could simply "<<" and get the
> size.

Dropping all these, we will just export the hex value in the sysfs
not a string from here.

> 
> 
>> +
>> +static inline enum trbe_buffer_align get_trbe_address_align(void)
>> +{
>> +    u64 trbidr = read_sysreg_s(SYS_TRBIDR_EL1);
>> +
>> +    return (trbidr >> TRBIDR_ALIGN_SHIFT) & TRBIDR_ALIGN_MASK;
>> +}
>> +
>> +static inline void assert_trbe_address_mode(unsigned long addr)
>> +{
>> +    bool virt_addr = virt_addr_valid(addr) || is_vmalloc_addr((void *)addr);
>> +    bool virt_mode = is_trbe_virtual_mode();
>> +
>> +    WARN_ON(addr && ((virt_addr && !virt_mode) || (!virt_addr && virt_mode)));
>> +}
> 
> I am not sure if this is really helpful. You have to trust the kernel vmalloc().

Okay, dropping both address asserts i.e mode and alignment.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-11-27 10:32     ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-27 10:32 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: linux-kernel, mathieu.poirier, mike.leach, Al Grant

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> perf handle structure needs to be shared with the TRBE IRQ handler for
> capturing trace data and restarting the handle. There is a probability
> of an undefined reference based crash when etm event is being stopped
> while a TRBE IRQ also getting processed. This happens due the release
> of perf handle via perf_aux_output_end(). This stops the sinks via the
> link before releasing the handle, which will ensure that a simultaneous
> TRBE IRQ could not happen.

Or in other words :

We now have :

	update_buffer()

	perf_aux_output_end(handle)

	...
	disable_path()

This is problematic due to various reasons :

1) The semantics of update_buffer() is not clear. i.e, whether it
    should leave the "sink" "stopped" or "disabled" or "active"

2) This breaks the recommended trace collection sequence of
    "flush" and "stop" from source to the sink for trace collection.
     i.e, we stop the source now. But don't flush the components
     from source to sink, rather we stop and flush from the sink.
     And we flush and stop the path after we have collected the
     trace data at sink, which is pointless.

3) For a sink with IRQ handler, if we don't stop the sink with
    update_buffer(), we could have a situation :

    update_buffer()

    perf_aux_outpuf_end(handle) # handle is invalid now

  -----------------> IRQ    -> irq_handler()
                                perf_aux_output_end(handle) # Wrong !


    disable_path()

The sysfs mode is fine, as we defer the trace collection to disable_path().

The proposed patch is still racy, as we could still hit the problem.

So, to avoid all of these situations, I think we should defer the the
update_buffer() to sink_ops->disable(), when we have flushed and stopped
the all the components upstream and avoid any races with the IRQ
handler.

i.e,

	source_ops->stop(csdev);

	disable_path(handle); // similar to the enable_path


sink_ops->disable(csdev, handle)
{
   /* flush & stop */

   /* collect trace */
   perf_aux_output_end(handle, size);
}


Kind regards
Suzuki



> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This might cause problem with traditional sink devices which can be
> operated in both sysfs and perf mode. This needs to be addressed
> correctly. One option would be to move the update_buffer callback
> into the respective sink devices. e.g, disable().
> 
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 534e205..1a37991 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>   
>   		size = sink_ops(sink)->update_buffer(sink, handle,
>   					      event_data->snk_config);
> +		coresight_disable_path(path);
>   		perf_aux_output_end(handle, size);
> +		return;
>   	}
>   
>   	/* Disabling the path make its elements available to other sessions */
> 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-11-27 10:32     ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-11-27 10:32 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, coresight
  Cc: Al Grant, linux-kernel, mathieu.poirier, mike.leach

On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> perf handle structure needs to be shared with the TRBE IRQ handler for
> capturing trace data and restarting the handle. There is a probability
> of an undefined reference based crash when etm event is being stopped
> while a TRBE IRQ also getting processed. This happens due the release
> of perf handle via perf_aux_output_end(). This stops the sinks via the
> link before releasing the handle, which will ensure that a simultaneous
> TRBE IRQ could not happen.

Or in other words :

We now have :

	update_buffer()

	perf_aux_output_end(handle)

	...
	disable_path()

This is problematic due to various reasons :

1) The semantics of update_buffer() is not clear. i.e, whether it
    should leave the "sink" "stopped" or "disabled" or "active"

2) This breaks the recommended trace collection sequence of
    "flush" and "stop" from source to the sink for trace collection.
     i.e, we stop the source now. But don't flush the components
     from source to sink, rather we stop and flush from the sink.
     And we flush and stop the path after we have collected the
     trace data at sink, which is pointless.

3) For a sink with IRQ handler, if we don't stop the sink with
    update_buffer(), we could have a situation :

    update_buffer()

    perf_aux_outpuf_end(handle) # handle is invalid now

  -----------------> IRQ    -> irq_handler()
                                perf_aux_output_end(handle) # Wrong !


    disable_path()

The sysfs mode is fine, as we defer the trace collection to disable_path().

The proposed patch is still racy, as we could still hit the problem.

So, to avoid all of these situations, I think we should defer the the
update_buffer() to sink_ops->disable(), when we have flushed and stopped
the all the components upstream and avoid any races with the IRQ
handler.

i.e,

	source_ops->stop(csdev);

	disable_path(handle); // similar to the enable_path


sink_ops->disable(csdev, handle)
{
   /* flush & stop */

   /* collect trace */
   perf_aux_output_end(handle, size);
}


Kind regards
Suzuki



> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This might cause problem with traditional sink devices which can be
> operated in both sysfs and perf mode. This needs to be addressed
> correctly. One option would be to move the update_buffer callback
> into the respective sink devices. e.g, disable().
> 
>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 534e205..1a37991 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>   
>   		size = sink_ops(sink)->update_buffer(sink, handle,
>   					      event_data->snk_config);
> +		coresight_disable_path(path);
>   		perf_aux_output_end(handle, size);
> +		return;
>   	}
>   
>   	/* Disabling the path make its elements available to other sessions */
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-11-27 10:32     ` Suzuki K Poulose
@ 2020-12-11 20:31       ` Mathieu Poirier
  -1 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-12-11 20:31 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Anshuman Khandual, linux-arm-kernel, coresight, linux-kernel,
	mike.leach, Al Grant

On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> > perf handle structure needs to be shared with the TRBE IRQ handler for
> > capturing trace data and restarting the handle. There is a probability
> > of an undefined reference based crash when etm event is being stopped
> > while a TRBE IRQ also getting processed. This happens due the release
> > of perf handle via perf_aux_output_end(). This stops the sinks via the
> > link before releasing the handle, which will ensure that a simultaneous
> > TRBE IRQ could not happen.
> 
> Or in other words :
> 
> We now have :
> 
> 	update_buffer()
> 
> 	perf_aux_output_end(handle)
> 
> 	...
> 	disable_path()
> 
> This is problematic due to various reasons :
> 
> 1) The semantics of update_buffer() is not clear. i.e, whether it
>    should leave the "sink" "stopped" or "disabled" or "active"

I'm a little confused by the above as the modes that apply here are
CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those.  Let me know if you
meant something else.

So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set
to CS_MODE_PERF when the update has completed. 

> 
> 2) This breaks the recommended trace collection sequence of
>    "flush" and "stop" from source to the sink for trace collection.
>     i.e, we stop the source now. But don't flush the components
>     from source to sink, rather we stop and flush from the sink.
>     And we flush and stop the path after we have collected the
>     trace data at sink, which is pointless.

The above assesment is correct.  Fixing it though has far reaching ramifications
that go far beyond the scope of this patch.   

> 
> 3) For a sink with IRQ handler, if we don't stop the sink with
>    update_buffer(), we could have a situation :
> 
>    update_buffer()
> 
>    perf_aux_outpuf_end(handle) # handle is invalid now
> 
>  -----------------> IRQ    -> irq_handler()
>                                perf_aux_output_end(handle) # Wrong !
> 
> 
>    disable_path()

That's the picture of the issue I had in my head when looking at the code -
I'm glad we came to the same conclusion.

> 
> The sysfs mode is fine, as we defer the trace collection to disable_path().
> 
> The proposed patch is still racy, as we could still hit the problem.
> 
> So, to avoid all of these situations, I think we should defer the the
> update_buffer() to sink_ops->disable(), when we have flushed and stopped
> the all the components upstream and avoid any races with the IRQ
> handler.
> 
> i.e,
> 
> 	source_ops->stop(csdev);
> 
> 	disable_path(handle); // similar to the enable_path
> 
> 
> sink_ops->disable(csdev, handle)
> {
>   /* flush & stop */
> 
>   /* collect trace */
>   perf_aux_output_end(handle, size);
> }

That is one solution.  The advantage here is that it takes care of the
flusing problem you described above.  On the flip side it is moving a lot of
code around, something that is better to do in another set.

Another solution is to disable the TRBE IRQ in ->udpate_buffer().  The ETR does
the same kind of thing with tmc_flush_and_stop().  I don't know how feasible
that is but it would be a simple solution for this set.  Properly flushing the
pipeline could be done later.  I'm fine with either approach.

Thanks,
Mathieu 

> 
> 
> Kind regards
> Suzuki
> 
> 
> 
> > 
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > This might cause problem with traditional sink devices which can be
> > operated in both sysfs and perf mode. This needs to be addressed
> > correctly. One option would be to move the update_buffer callback
> > into the respective sink devices. e.g, disable().
> > 
> >   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index 534e205..1a37991 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
> >   		size = sink_ops(sink)->update_buffer(sink, handle,
> >   					      event_data->snk_config);
> > +		coresight_disable_path(path);
> >   		perf_aux_output_end(handle, size);
> > +		return;
> >   	}
> >   	/* Disabling the path make its elements available to other sessions */
> > 
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-12-11 20:31       ` Mathieu Poirier
  0 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-12-11 20:31 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Al Grant, Anshuman Khandual, coresight, linux-kernel,
	linux-arm-kernel, mike.leach

On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
> > perf handle structure needs to be shared with the TRBE IRQ handler for
> > capturing trace data and restarting the handle. There is a probability
> > of an undefined reference based crash when etm event is being stopped
> > while a TRBE IRQ also getting processed. This happens due the release
> > of perf handle via perf_aux_output_end(). This stops the sinks via the
> > link before releasing the handle, which will ensure that a simultaneous
> > TRBE IRQ could not happen.
> 
> Or in other words :
> 
> We now have :
> 
> 	update_buffer()
> 
> 	perf_aux_output_end(handle)
> 
> 	...
> 	disable_path()
> 
> This is problematic due to various reasons :
> 
> 1) The semantics of update_buffer() is not clear. i.e, whether it
>    should leave the "sink" "stopped" or "disabled" or "active"

I'm a little confused by the above as the modes that apply here are
CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those.  Let me know if you
meant something else.

So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set
to CS_MODE_PERF when the update has completed. 

> 
> 2) This breaks the recommended trace collection sequence of
>    "flush" and "stop" from source to the sink for trace collection.
>     i.e, we stop the source now. But don't flush the components
>     from source to sink, rather we stop and flush from the sink.
>     And we flush and stop the path after we have collected the
>     trace data at sink, which is pointless.

The above assesment is correct.  Fixing it though has far reaching ramifications
that go far beyond the scope of this patch.   

> 
> 3) For a sink with IRQ handler, if we don't stop the sink with
>    update_buffer(), we could have a situation :
> 
>    update_buffer()
> 
>    perf_aux_outpuf_end(handle) # handle is invalid now
> 
>  -----------------> IRQ    -> irq_handler()
>                                perf_aux_output_end(handle) # Wrong !
> 
> 
>    disable_path()

That's the picture of the issue I had in my head when looking at the code -
I'm glad we came to the same conclusion.

> 
> The sysfs mode is fine, as we defer the trace collection to disable_path().
> 
> The proposed patch is still racy, as we could still hit the problem.
> 
> So, to avoid all of these situations, I think we should defer the the
> update_buffer() to sink_ops->disable(), when we have flushed and stopped
> the all the components upstream and avoid any races with the IRQ
> handler.
> 
> i.e,
> 
> 	source_ops->stop(csdev);
> 
> 	disable_path(handle); // similar to the enable_path
> 
> 
> sink_ops->disable(csdev, handle)
> {
>   /* flush & stop */
> 
>   /* collect trace */
>   perf_aux_output_end(handle, size);
> }

That is one solution.  The advantage here is that it takes care of the
flusing problem you described above.  On the flip side it is moving a lot of
code around, something that is better to do in another set.

Another solution is to disable the TRBE IRQ in ->udpate_buffer().  The ETR does
the same kind of thing with tmc_flush_and_stop().  I don't know how feasible
that is but it would be a simple solution for this set.  Properly flushing the
pipeline could be done later.  I'm fine with either approach.

Thanks,
Mathieu 

> 
> 
> Kind regards
> Suzuki
> 
> 
> 
> > 
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > This might cause problem with traditional sink devices which can be
> > operated in both sysfs and perf mode. This needs to be addressed
> > correctly. One option would be to move the update_buffer callback
> > into the respective sink devices. e.g, disable().
> > 
> >   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index 534e205..1a37991 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
> >   		size = sink_ops(sink)->update_buffer(sink, handle,
> >   					      event_data->snk_config);
> > +		coresight_disable_path(path);
> >   		perf_aux_output_end(handle, size);
> > +		return;
> >   	}
> >   	/* Disabling the path make its elements available to other sessions */
> > 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
  2020-11-10 12:45   ` Anshuman Khandual
@ 2020-12-11 21:31     ` Mathieu Poirier
  -1 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-12-11 21:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, coresight, linux-kernel, suzuki.poulose, mike.leach

On Tue, Nov 10, 2020 at 06:15:08PM +0530, Anshuman Khandual wrote:
> Unlike traditional sink devices, individual TRBE instances are not detected
> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
> process. Hence a path connecting ETE and TRBE on a given CPU would not have
> been established until then. This adds two coresight helpers that will help
> modify outward connections from a source device to establish and terminate
> path to a given sink device. But this method might not be optimal and would
> be reworked later.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-etm-perf.h |  4 ++++
>  drivers/hwtracing/coresight/coresight-platform.c |  3 ++-
>  drivers/hwtracing/coresight/coresight-trbe.c     |  2 ++
>  include/linux/coresight.h                        |  2 ++
>  5 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 1a37991..b4ab1d4 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -664,3 +664,33 @@ void __exit etm_perf_exit(void)
>  {
>  	perf_pmu_unregister(&etm_pmu);
>  }
> +
> +#ifdef CONFIG_CORESIGHT_TRBE
> +void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu)
> +{
> +	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);

As Suzuki pointed out that won't work if the TRBE gets probed before the
ETMv4-ETE.  I also agree with Suzuki this situation should be better handled
with a per csdev_trbe that should be declared in the coresight-core.c file.
That way both sysfs and perf have access to it.  

> +
> +	if (!csdev_ete) {
> +		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
> +		return;
> +	}
> +	csdev_ete->def_sink = csdev_trbe;

That should be done in function coresight_find_default_sink().  If
per_cpu(csdev_trbe, cpu) exists then that's the what we pick.  If not then move
along with coresight_find_sink().


> +	csdev_ete->pdata->nr_outport++;
> +	if (!csdev_ete->pdata->conns)
> +		coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata);
> +	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe;

I don't think we have to go through all that dance since the TRBE is directly
connected to the ETE.  With the above about coresight_find_default_sink() in
mind, all we need to do is fix coresight_build_path() to check if the sink
parameter is the same as csdev->def_sink.  If so then just add the sink to the
patch, no need to follow ports as we do for other classic components.

Thanks,
Mathieu

> +}
> +
> +void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu)
> +{
> +	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
> +
> +	if (!csdev_ete) {
> +		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
> +		return;
> +	}
> +	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL;
> +	csdev_ete->def_sink = NULL;
> +	csdev_ete->pdata->nr_outport--;
> +}
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
> index 3e4f2ad..20386cf 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.h
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
> @@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle)
>  int __init etm_perf_init(void);
>  void __exit etm_perf_exit(void);
>  
> +#ifdef CONFIG_CORESIGHT_TRBE
> +void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu);
> +void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu);
> +#endif
>  #endif
> diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
> index c594f45..8fa7406 100644
> --- a/drivers/hwtracing/coresight/coresight-platform.c
> +++ b/drivers/hwtracing/coresight/coresight-platform.c
> @@ -23,7 +23,7 @@
>   * coresight_alloc_conns: Allocate connections record for each output
>   * port from the device.
>   */
> -static int coresight_alloc_conns(struct device *dev,
> +int coresight_alloc_conns(struct device *dev,
>  				 struct coresight_platform_data *pdata)
>  {
>  	if (pdata->nr_outport) {
> @@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev,
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(coresight_alloc_conns);
>  
>  static struct device *
>  coresight_find_device_by_fwnode(struct fwnode_handle *fwnode)
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 48a8ec3..afd1a1c 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info)
>  	if (IS_ERR(cpudata->csdev))
>  		goto cpu_clear;
>  
> +	coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu);
>  	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
>  	cpudata->trbe_dbm = get_trbe_flag_update();
>  	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> @@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
>  
>  	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>  		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		coresight_trbe_remove_ete(cpudata->csdev, cpu);
>  		if (cpudata->csdev) {
>  			coresight_unregister(cpudata->csdev);
>  			cpudata->drvdata = NULL;
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index c2d0a2a..c657813 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev,
>  			       u64 val, u32 offset);
>  void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset);
>  
> +int coresight_alloc_conns(struct device *dev,
> +			  struct coresight_platform_data *pdata);
>  
>  #else
>  static inline struct coresight_device *
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
@ 2020-12-11 21:31     ` Mathieu Poirier
  0 siblings, 0 replies; 72+ messages in thread
From: Mathieu Poirier @ 2020-12-11 21:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: coresight, mike.leach, linux-kernel, linux-arm-kernel, suzuki.poulose

On Tue, Nov 10, 2020 at 06:15:08PM +0530, Anshuman Khandual wrote:
> Unlike traditional sink devices, individual TRBE instances are not detected
> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
> process. Hence a path connecting ETE and TRBE on a given CPU would not have
> been established until then. This adds two coresight helpers that will help
> modify outward connections from a source device to establish and terminate
> path to a given sink device. But this method might not be optimal and would
> be reworked later.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-etm-perf.c | 30 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-etm-perf.h |  4 ++++
>  drivers/hwtracing/coresight/coresight-platform.c |  3 ++-
>  drivers/hwtracing/coresight/coresight-trbe.c     |  2 ++
>  include/linux/coresight.h                        |  2 ++
>  5 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 1a37991..b4ab1d4 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -664,3 +664,33 @@ void __exit etm_perf_exit(void)
>  {
>  	perf_pmu_unregister(&etm_pmu);
>  }
> +
> +#ifdef CONFIG_CORESIGHT_TRBE
> +void coresight_trbe_connect_ete(struct coresight_device *csdev_trbe, int cpu)
> +{
> +	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);

As Suzuki pointed out that won't work if the TRBE gets probed before the
ETMv4-ETE.  I also agree with Suzuki this situation should be better handled
with a per csdev_trbe that should be declared in the coresight-core.c file.
That way both sysfs and perf have access to it.  

> +
> +	if (!csdev_ete) {
> +		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
> +		return;
> +	}
> +	csdev_ete->def_sink = csdev_trbe;

That should be done in function coresight_find_default_sink().  If
per_cpu(csdev_trbe, cpu) exists then that's the what we pick.  If not then move
along with coresight_find_sink().


> +	csdev_ete->pdata->nr_outport++;
> +	if (!csdev_ete->pdata->conns)
> +		coresight_alloc_conns(&csdev_ete->dev, csdev_ete->pdata);
> +	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = csdev_trbe;

I don't think we have to go through all that dance since the TRBE is directly
connected to the ETE.  With the above about coresight_find_default_sink() in
mind, all we need to do is fix coresight_build_path() to check if the sink
parameter is the same as csdev->def_sink.  If so then just add the sink to the
patch, no need to follow ports as we do for other classic components.

Thanks,
Mathieu

> +}
> +
> +void coresight_trbe_remove_ete(struct coresight_device *csdev_trbe, int cpu)
> +{
> +	struct coresight_device *csdev_ete = per_cpu(csdev_src, cpu);
> +
> +	if (!csdev_ete) {
> +		pr_err("Corresponding ETE device not present on cpu %d\n", cpu);
> +		return;
> +	}
> +	csdev_ete->pdata->conns[csdev_ete->pdata->nr_outport - 1].child_dev = NULL;
> +	csdev_ete->def_sink = NULL;
> +	csdev_ete->pdata->nr_outport--;
> +}
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
> index 3e4f2ad..20386cf 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.h
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
> @@ -85,4 +85,8 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle)
>  int __init etm_perf_init(void);
>  void __exit etm_perf_exit(void);
>  
> +#ifdef CONFIG_CORESIGHT_TRBE
> +void coresight_trbe_connect_ete(struct coresight_device *csdev, int cpu);
> +void coresight_trbe_remove_ete(struct coresight_device *csdev, int cpu);
> +#endif
>  #endif
> diff --git a/drivers/hwtracing/coresight/coresight-platform.c b/drivers/hwtracing/coresight/coresight-platform.c
> index c594f45..8fa7406 100644
> --- a/drivers/hwtracing/coresight/coresight-platform.c
> +++ b/drivers/hwtracing/coresight/coresight-platform.c
> @@ -23,7 +23,7 @@
>   * coresight_alloc_conns: Allocate connections record for each output
>   * port from the device.
>   */
> -static int coresight_alloc_conns(struct device *dev,
> +int coresight_alloc_conns(struct device *dev,
>  				 struct coresight_platform_data *pdata)
>  {
>  	if (pdata->nr_outport) {
> @@ -35,6 +35,7 @@ static int coresight_alloc_conns(struct device *dev,
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(coresight_alloc_conns);
>  
>  static struct device *
>  coresight_find_device_by_fwnode(struct fwnode_handle *fwnode)
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 48a8ec3..afd1a1c 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -507,6 +507,7 @@ static void arm_trbe_probe_coresight_cpu(void *info)
>  	if (IS_ERR(cpudata->csdev))
>  		goto cpu_clear;
>  
> +	coresight_trbe_connect_ete(cpudata->csdev, cpudata->cpu);
>  	dev_set_drvdata(&cpudata->csdev->dev, cpudata);
>  	cpudata->trbe_dbm = get_trbe_flag_update();
>  	cpudata->trbe_align = 1ULL << get_trbe_address_align();
> @@ -586,6 +587,7 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node)
>  
>  	if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>  		cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
> +		coresight_trbe_remove_ete(cpudata->csdev, cpu);
>  		if (cpudata->csdev) {
>  			coresight_unregister(cpudata->csdev);
>  			cpudata->drvdata = NULL;
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index c2d0a2a..c657813 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -496,6 +496,8 @@ void coresight_relaxed_write64(struct coresight_device *csdev,
>  			       u64 val, u32 offset);
>  void coresight_write64(struct coresight_device *csdev, u64 val, u32 offset);
>  
> +int coresight_alloc_conns(struct device *dev,
> +			  struct coresight_platform_data *pdata);
>  
>  #else
>  static inline struct coresight_device *
> -- 
> 2.7.4
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
  2020-12-11 20:31       ` Mathieu Poirier
@ 2020-12-14 10:00         ` Suzuki K Poulose
  -1 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-12-14 10:00 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Anshuman Khandual, linux-arm-kernel, coresight, linux-kernel,
	mike.leach, Al Grant

On 12/11/20 8:31 PM, Mathieu Poirier wrote:
> On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> perf handle structure needs to be shared with the TRBE IRQ handler for
>>> capturing trace data and restarting the handle. There is a probability
>>> of an undefined reference based crash when etm event is being stopped
>>> while a TRBE IRQ also getting processed. This happens due the release
>>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>>> link before releasing the handle, which will ensure that a simultaneous
>>> TRBE IRQ could not happen.
>>
>> Or in other words :
>>
>> We now have :
>>
>> 	update_buffer()
>>
>> 	perf_aux_output_end(handle)
>>
>> 	...
>> 	disable_path()
>>
>> This is problematic due to various reasons :
>>
>> 1) The semantics of update_buffer() is not clear. i.e, whether it
>>     should leave the "sink" "stopped" or "disabled" or "active"
> 
> I'm a little confused by the above as the modes that apply here are
> CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those.  Let me know if you
> meant something else.

Sorry, I think it is a bit confusing.

stopped => Sink is in stopped HW state, but the software mode is not changed (i.e, could be
PERF or SYSF)

disabled => Sink is in stopped hw state, the software mode is DISABLED

active => Sink is active and flushing trace, with respective mode (PERF vs SYSFS).

> 
> So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set
> to CS_MODE_PERF when the update has completed.
> 
>>
>> 2) This breaks the recommended trace collection sequence of
>>     "flush" and "stop" from source to the sink for trace collection.
>>      i.e, we stop the source now. But don't flush the components
>>      from source to sink, rather we stop and flush from the sink.
>>      And we flush and stop the path after we have collected the
>>      trace data at sink, which is pointless.
> 
> The above assesment is correct.  Fixing it though has far reaching ramifications
> that go far beyond the scope of this patch.
> 
>>
>> 3) For a sink with IRQ handler, if we don't stop the sink with
>>     update_buffer(), we could have a situation :
>>
>>     update_buffer()
>>
>>     perf_aux_outpuf_end(handle) # handle is invalid now
>>
>>   -----------------> IRQ    -> irq_handler()
>>                                 perf_aux_output_end(handle) # Wrong !
>>
>>
>>     disable_path()
> 
> That's the picture of the issue I had in my head when looking at the code -
> I'm glad we came to the same conclusion.
> 
>>
>> The sysfs mode is fine, as we defer the trace collection to disable_path().
>>
>> The proposed patch is still racy, as we could still hit the problem.
>>
>> So, to avoid all of these situations, I think we should defer the the
>> update_buffer() to sink_ops->disable(), when we have flushed and stopped
>> the all the components upstream and avoid any races with the IRQ
>> handler.
>>
>> i.e,
>>
>> 	source_ops->stop(csdev);
>>
>> 	disable_path(handle); // similar to the enable_path
>>
>>
>> sink_ops->disable(csdev, handle)
>> {
>>    /* flush & stop */
>>
>>    /* collect trace */
>>    perf_aux_output_end(handle, size);
>> }
> 
> That is one solution.  The advantage here is that it takes care of the
> flusing problem you described above.  On the flip side it is moving a lot of
> code around, something that is better to do in another set.
> 
> Another solution is to disable the TRBE IRQ in ->udpate_buffer().  The ETR does
> the same kind of thing with tmc_flush_and_stop().  I don't know how feasible
> that is but it would be a simple solution for this set.  Properly flushing the
> pipeline could be done later.  I'm fine with either approach.

Agreed. I think this is reasonable forthis set. i.e, leave the hardware disabled.
We could do the proper solution above as a separate series, to keep the changes
incremental.

Kind regards
Suzuki

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
@ 2020-12-14 10:00         ` Suzuki K Poulose
  0 siblings, 0 replies; 72+ messages in thread
From: Suzuki K Poulose @ 2020-12-14 10:00 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Al Grant, Anshuman Khandual, coresight, linux-kernel,
	linux-arm-kernel, mike.leach

On 12/11/20 8:31 PM, Mathieu Poirier wrote:
> On Fri, Nov 27, 2020 at 10:32:28AM +0000, Suzuki K Poulose wrote:
>> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>>> perf handle structure needs to be shared with the TRBE IRQ handler for
>>> capturing trace data and restarting the handle. There is a probability
>>> of an undefined reference based crash when etm event is being stopped
>>> while a TRBE IRQ also getting processed. This happens due the release
>>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>>> link before releasing the handle, which will ensure that a simultaneous
>>> TRBE IRQ could not happen.
>>
>> Or in other words :
>>
>> We now have :
>>
>> 	update_buffer()
>>
>> 	perf_aux_output_end(handle)
>>
>> 	...
>> 	disable_path()
>>
>> This is problematic due to various reasons :
>>
>> 1) The semantics of update_buffer() is not clear. i.e, whether it
>>     should leave the "sink" "stopped" or "disabled" or "active"
> 
> I'm a little confused by the above as the modes that apply here are
> CS_MODE_DISABLED and CS_MODE_PERF, so I'll go with those.  Let me know if you
> meant something else.

Sorry, I think it is a bit confusing.

stopped => Sink is in stopped HW state, but the software mode is not changed (i.e, could be
PERF or SYSF)

disabled => Sink is in stopped hw state, the software mode is DISABLED

active => Sink is active and flushing trace, with respective mode (PERF vs SYSFS).

> 
> So far ->update_buffer() doesn't touch drvdata->mode and as such it is still set
> to CS_MODE_PERF when the update has completed.
> 
>>
>> 2) This breaks the recommended trace collection sequence of
>>     "flush" and "stop" from source to the sink for trace collection.
>>      i.e, we stop the source now. But don't flush the components
>>      from source to sink, rather we stop and flush from the sink.
>>      And we flush and stop the path after we have collected the
>>      trace data at sink, which is pointless.
> 
> The above assesment is correct.  Fixing it though has far reaching ramifications
> that go far beyond the scope of this patch.
> 
>>
>> 3) For a sink with IRQ handler, if we don't stop the sink with
>>     update_buffer(), we could have a situation :
>>
>>     update_buffer()
>>
>>     perf_aux_outpuf_end(handle) # handle is invalid now
>>
>>   -----------------> IRQ    -> irq_handler()
>>                                 perf_aux_output_end(handle) # Wrong !
>>
>>
>>     disable_path()
> 
> That's the picture of the issue I had in my head when looking at the code -
> I'm glad we came to the same conclusion.
> 
>>
>> The sysfs mode is fine, as we defer the trace collection to disable_path().
>>
>> The proposed patch is still racy, as we could still hit the problem.
>>
>> So, to avoid all of these situations, I think we should defer the the
>> update_buffer() to sink_ops->disable(), when we have flushed and stopped
>> the all the components upstream and avoid any races with the IRQ
>> handler.
>>
>> i.e,
>>
>> 	source_ops->stop(csdev);
>>
>> 	disable_path(handle); // similar to the enable_path
>>
>>
>> sink_ops->disable(csdev, handle)
>> {
>>    /* flush & stop */
>>
>>    /* collect trace */
>>    perf_aux_output_end(handle, size);
>> }
> 
> That is one solution.  The advantage here is that it takes care of the
> flusing problem you described above.  On the flip side it is moving a lot of
> code around, something that is better to do in another set.
> 
> Another solution is to disable the TRBE IRQ in ->udpate_buffer().  The ETR does
> the same kind of thing with tmc_flush_and_stop().  I don't know how feasible
> that is but it would be a simple solution for this set.  Properly flushing the
> pipeline could be done later.  I'm fine with either approach.

Agreed. I think this is reasonable forthis set. i.e, leave the hardware disabled.
We could do the proper solution above as a separate series, to keep the changes
incremental.

Kind regards
Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2020-12-14 10:02 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10 12:44 [RFC 00/11] arm64: coresight: Enable ETE and TRBE Anshuman Khandual
2020-11-10 12:44 ` Anshuman Khandual
2020-11-10 12:44 ` [RFC 01/11] arm64: Add TRBE definitions Anshuman Khandual
2020-11-10 12:44   ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 02/11] coresight: etm-perf: Allow an event to use different sinks Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-12  9:21   ` Suzuki K Poulose
2020-11-12  9:21     ` Suzuki K Poulose
2020-11-12 10:37     ` Linu Cherian
2020-11-12 10:37       ` Linu Cherian
2020-11-12 11:09       ` Suzuki K Poulose
2020-11-12 11:09         ` Suzuki K Poulose
2020-11-10 12:45 ` [RFC 03/11] coresight: Do not scan for graph if none is present Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 04/11] coresight: etm4x: Add support for PE OS lock Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 05/11] coresight: ete: Add support for sysreg support Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 06/11] coresight: ete: Detect ETE as one of the supported ETMs Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-14  5:36   ` Tingwei Zhang
2020-11-14  5:36     ` Tingwei Zhang
2020-11-23  9:56     ` Suzuki K Poulose
2020-11-23  9:56       ` Suzuki K Poulose
2020-11-10 12:45 ` [RFC 07/11] coresight: sink: Add TRBE driver Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-12 10:13   ` Suzuki K Poulose
2020-11-12 10:13     ` Suzuki K Poulose
2020-11-25  5:25     ` Anshuman Khandual
2020-11-25  5:25       ` Anshuman Khandual
2020-11-14  5:38   ` Tingwei Zhang
2020-11-14  5:38     ` Tingwei Zhang
2020-11-23  3:51     ` Anshuman Khandual
2020-11-23  3:51       ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 08/11] coresight: etm-perf: Truncate the perf record if handle has no space Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-10 12:45 ` [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-12  9:27   ` Suzuki K Poulose
2020-11-12  9:27     ` Suzuki K Poulose
2020-11-23  6:08     ` Anshuman Khandual
2020-11-23  6:08       ` Anshuman Khandual
2020-11-23 10:01       ` Suzuki K Poulose
2020-11-23 10:01         ` Suzuki K Poulose
2020-11-27 10:32   ` Suzuki K Poulose
2020-11-27 10:32     ` Suzuki K Poulose
2020-12-11 20:31     ` Mathieu Poirier
2020-12-11 20:31       ` Mathieu Poirier
2020-12-14 10:00       ` Suzuki K Poulose
2020-12-14 10:00         ` Suzuki K Poulose
2020-11-10 12:45 ` [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-12  9:31   ` Suzuki K Poulose
2020-11-12  9:31     ` Suzuki K Poulose
2020-11-23  5:37     ` Anshuman Khandual
2020-11-23  5:37       ` Anshuman Khandual
2020-12-11 21:31   ` Mathieu Poirier
2020-12-11 21:31     ` Mathieu Poirier
2020-11-10 12:45 ` [RFC 11/11] dts: bindings: Document device tree binding for Arm TRBE Anshuman Khandual
2020-11-10 12:45   ` Anshuman Khandual
2020-11-10 18:25 ` [RFC 00/11] arm64: coresight: Enable ETE and TRBE Mathieu Poirier
2020-11-10 18:25   ` Mathieu Poirier
2020-11-14  5:17 ` Tingwei Zhang
2020-11-14  5:17   ` Tingwei Zhang
2020-11-16 15:00   ` Mike Leach
2020-11-16 15:00     ` Mike Leach
2020-11-23  3:40     ` Anshuman Khandual
2020-11-23  3:40       ` Anshuman Khandual
2020-11-23 12:30       ` Mike Leach
2020-11-23 12:30         ` Mike Leach
2020-11-23  2:43   ` Anshuman Khandual
2020-11-23  2:43     ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.