All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/27] coresight: TMC ETR backend support for perf
@ 2018-05-01  9:10 ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose


This series achieves two goals :

 a) Support for all possible backends in ETR buffer and
    transparent management of the buffer irrespective of the
    backend in use.
 b) Adds support for perf using ETR as a sink, using the
    best possible backend.

For (a), we add support TMC ETR in-built scatter gather unit and
the new dedicated scatter-gather component, Coresight Address
Translation Unit (CATU) - new IP, part of the Arm Coresight SoC-600
family to provide improved SG mechanism.

With the addition of CATU, we could operate the ETR in 3 possible
combinations of buffers.

 1) Contiguous DMA buffer
 2) TMC-ETR built-in Scatter Gather table
 3) CATU backed scatter gather table.

To avoid the complications of the managing the buffer, this series
adds a layer for managing the ETR buffer, which makes the best possibly
choice based on what is available. The allocation can be tuned by passing
in flags, existing pages (e.g, perf ring buffer) etc.

Towards supporting ETR Scatter Gather mode and CATU tables, we introduce
a generic TMC scatter-gather table which can be used to manage the data
and table pages. The table can be filled in the respective mode (CATU
vs ETR_SG) by the mode specific code.

During the testing of v1 of the ETR SG driver, we found out that a couple
of the boards (Hikey 960 and DB410c. Juno is fine) have an unusable
Scatter-Gather mode. In SG mode the ETR performs both READs
(for next table pointer) and WRITEs (the trace data) simultaneously.
So if the READ transaction doesn't complete (which is what we
have observed), that could hold up the ETR from writing the data
as it doesn't know the buffer address and hence stalling it. This
has also been confirmed by reading the buffer data via RRD for the
ETR, which implies that there are some issues with the READ transactions
from the ETR. Juno is the only platform we have tested the SG mode
successfully. So, in order to avoid causing problems by using SG mode,
we disable the SG mode by default on all platforms unless it is known
to be safe. We add a DT bindings for white listing an ETR for scatter
gather mode.

The TMC ETR-SG mechanism doesn't allow starting the trace at non-zero
offset (required by perf). So we make some tricky changes to the table
at run time to allow starting at any "Page aligned" offset and then
wrap around to the beginning of the buffer with very less overhead.
See patches for more description.

The series also improves the way the ETR is controlled by different modes
(sysfs vs. perf) by keeping mode specific data. This allows access
to the trace data collected in sysfs mode, even when the ETR is
operated in perf mode. Also with the transparent management of the
buffer and scatter-gather mechanism, we can allow the user to
request for larger trace buffers for sysfs mode. This is supported
by providing a sysfs file, "buffer_size" which accepts a page aligned
size, which will be used by the ETR when allocating a buffer.

Finally, it cleans up the etm perf sink callbacks a little bit and
then adds the support for ETR sink. For the ETR, we try our best to
use the perf ring buffer as the target hardware buffer, provided :
 1) The ETR is dma coherent (since the pages will be shared with
    userspace perf tool).
 2) The perf is used in snapshot mode (The ETR cannot be stopped
    based on the size of the data written hence we could easily
    overwrite the buffer. We may be able to fix this in the future)
 3) The ETR supports one of the Scatter-Gather modes (in-built SG
    or a CATU).

TODO: The conditions above need some discussion. Please see the
last patch for more information.

If we can't use the perf buffers directly, we fallback to using
software buffering where we have to copy the trace data back
to the perf ring buffer.

Tested on Juno, an FPGA platform with CATU.

Applies on 4.17-rc3 and is also available at :

	git://linux-arm.org/linux-skp.git etr-perf-catu/v2

Changes since V1 :
 [ http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/538151.html ]

 - Address comments on v1
 - Fix build failure on arm32
 - Add CATU (Coresight Address Translation Unit) support.
 - Provide buffer isolation. i.e hide the buffer consumed by userspace from
   the ETR when using ETR SG or CATU
 - Add device-tree bindings to allow usage of ETR scatter-gather mode.


Suzuki K Poulose (27):
  coresight: ETM: Add support for ARM Cortex-A73
  coresight: Cleanup device subtype struct
  coresight: Add helper device type
  coresight: Introduce support for Coresight Addrss Translation Unit
  dts: bindings: Document device tree binding for CATU
  coresight: tmc etr: Disallow perf mode temporarily
  coresight: tmc: Hide trace buffer handling for file read
  coresight: tmc-etr: Do not clean trace buffer
  coresight: Add helper for inserting synchronization packets
  dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  dts: juno: Add scatter-gather support for all revisions
  coresight: tmc-etr: Allow commandline option to override SG use
  coresight: Add generic TMC sg table framework
  coresight: Add support for TMC ETR SG unit
  coresight: tmc-etr: Make SG table circular
  coresight: tmc-etr: Add transparent buffer management
  coresight: etr: Add support for save restore buffers
  coresight: catu: Add support for scatter gather tables
  coresight: catu: Plug in CATU as a backend for ETR buffer
  coresight: tmc: Add configuration support for trace buffer size
  coresight: Convert driver messages to dev_dbg
  coresight: tmc-etr: Track if the device is coherent
  coresight: tmc-etr: Handle driver mode specific ETR buffers
  coresight: tmc-etr: Relax collection of trace from sysfs mode
  coresight: etr_buf: Add helper for padding an area of trace data
  coresight: perf: Remove reset_buffer call back for sinks
  coresight: etm-perf: Add support for ETR backend

 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |    8 
 Documentation/admin-guide/kernel-parameters.txt    |    8 +
 .../devicetree/bindings/arm/coresight.txt          |   55 +
 arch/arm64/boot/dts/arm/juno-base.dtsi             |    1 +
 drivers/hwtracing/coresight/Kconfig                |   10 +
 drivers/hwtracing/coresight/Makefile               |    1 +
 drivers/hwtracing/coresight/coresight-catu.c       |  785 +++++++++
 drivers/hwtracing/coresight/coresight-catu.h       |  119 ++
 .../coresight/coresight-dynamic-replicator.c       |    4 +-
 drivers/hwtracing/coresight/coresight-etb10.c      |   74 +-
 drivers/hwtracing/coresight/coresight-etm-perf.c   |    9 +-
 drivers/hwtracing/coresight/coresight-etm3x.c      |    4 +-
 drivers/hwtracing/coresight/coresight-etm4x.c      |   28 +-
 drivers/hwtracing/coresight/coresight-funnel.c     |    4 +-
 drivers/hwtracing/coresight/coresight-priv.h       |   10 +-
 drivers/hwtracing/coresight/coresight-replicator.c |    4 +-
 drivers/hwtracing/coresight/coresight-stm.c        |    4 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |  111 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    | 1767 ++++++++++++++++++--
 drivers/hwtracing/coresight/coresight-tmc.c        |   94 +-
 drivers/hwtracing/coresight/coresight-tmc.h        |  160 +-
 drivers/hwtracing/coresight/coresight-tpiu.c       |    4 +-
 drivers/hwtracing/coresight/coresight.c            |   49 +-
 include/linux/coresight.h                          |   51 +-
 24 files changed, 3065 insertions(+), 299 deletions(-)
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 00/27] coresight: TMC ETR backend support for perf
@ 2018-05-01  9:10 ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel


This series achieves two goals :

 a) Support for all possible backends in ETR buffer and
    transparent management of the buffer irrespective of the
    backend in use.
 b) Adds support for perf using ETR as a sink, using the
    best possible backend.

For (a), we add support TMC ETR in-built scatter gather unit and
the new dedicated scatter-gather component, Coresight Address
Translation Unit (CATU) - new IP, part of the Arm Coresight SoC-600
family to provide improved SG mechanism.

With the addition of CATU, we could operate the ETR in 3 possible
combinations of buffers.

 1) Contiguous DMA buffer
 2) TMC-ETR built-in Scatter Gather table
 3) CATU backed scatter gather table.

To avoid the complications of the managing the buffer, this series
adds a layer for managing the ETR buffer, which makes the best possibly
choice based on what is available. The allocation can be tuned by passing
in flags, existing pages (e.g, perf ring buffer) etc.

Towards supporting ETR Scatter Gather mode and CATU tables, we introduce
a generic TMC scatter-gather table which can be used to manage the data
and table pages. The table can be filled in the respective mode (CATU
vs ETR_SG) by the mode specific code.

During the testing of v1 of the ETR SG driver, we found out that a couple
of the boards (Hikey 960 and DB410c. Juno is fine) have an unusable
Scatter-Gather mode. In SG mode the ETR performs both READs
(for next table pointer) and WRITEs (the trace data) simultaneously.
So if the READ transaction doesn't complete (which is what we
have observed), that could hold up the ETR from writing the data
as it doesn't know the buffer address and hence stalling it. This
has also been confirmed by reading the buffer data via RRD for the
ETR, which implies that there are some issues with the READ transactions
from the ETR. Juno is the only platform we have tested the SG mode
successfully. So, in order to avoid causing problems by using SG mode,
we disable the SG mode by default on all platforms unless it is known
to be safe. We add a DT bindings for white listing an ETR for scatter
gather mode.

The TMC ETR-SG mechanism doesn't allow starting the trace at non-zero
offset (required by perf). So we make some tricky changes to the table
at run time to allow starting at any "Page aligned" offset and then
wrap around to the beginning of the buffer with very less overhead.
See patches for more description.

The series also improves the way the ETR is controlled by different modes
(sysfs vs. perf) by keeping mode specific data. This allows access
to the trace data collected in sysfs mode, even when the ETR is
operated in perf mode. Also with the transparent management of the
buffer and scatter-gather mechanism, we can allow the user to
request for larger trace buffers for sysfs mode. This is supported
by providing a sysfs file, "buffer_size" which accepts a page aligned
size, which will be used by the ETR when allocating a buffer.

Finally, it cleans up the etm perf sink callbacks a little bit and
then adds the support for ETR sink. For the ETR, we try our best to
use the perf ring buffer as the target hardware buffer, provided :
 1) The ETR is dma coherent (since the pages will be shared with
    userspace perf tool).
 2) The perf is used in snapshot mode (The ETR cannot be stopped
    based on the size of the data written hence we could easily
    overwrite the buffer. We may be able to fix this in the future)
 3) The ETR supports one of the Scatter-Gather modes (in-built SG
    or a CATU).

TODO: The conditions above need some discussion. Please see the
last patch for more information.

If we can't use the perf buffers directly, we fallback to using
software buffering where we have to copy the trace data back
to the perf ring buffer.

Tested on Juno, an FPGA platform with CATU.

Applies on 4.17-rc3 and is also available at :

	git://linux-arm.org/linux-skp.git etr-perf-catu/v2

Changes since V1 :
 [ http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/538151.html ]

 - Address comments on v1
 - Fix build failure on arm32
 - Add CATU (Coresight Address Translation Unit) support.
 - Provide buffer isolation. i.e hide the buffer consumed by userspace from
   the ETR when using ETR SG or CATU
 - Add device-tree bindings to allow usage of ETR scatter-gather mode.


Suzuki K Poulose (27):
  coresight: ETM: Add support for ARM Cortex-A73
  coresight: Cleanup device subtype struct
  coresight: Add helper device type
  coresight: Introduce support for Coresight Addrss Translation Unit
  dts: bindings: Document device tree binding for CATU
  coresight: tmc etr: Disallow perf mode temporarily
  coresight: tmc: Hide trace buffer handling for file read
  coresight: tmc-etr: Do not clean trace buffer
  coresight: Add helper for inserting synchronization packets
  dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  dts: juno: Add scatter-gather support for all revisions
  coresight: tmc-etr: Allow commandline option to override SG use
  coresight: Add generic TMC sg table framework
  coresight: Add support for TMC ETR SG unit
  coresight: tmc-etr: Make SG table circular
  coresight: tmc-etr: Add transparent buffer management
  coresight: etr: Add support for save restore buffers
  coresight: catu: Add support for scatter gather tables
  coresight: catu: Plug in CATU as a backend for ETR buffer
  coresight: tmc: Add configuration support for trace buffer size
  coresight: Convert driver messages to dev_dbg
  coresight: tmc-etr: Track if the device is coherent
  coresight: tmc-etr: Handle driver mode specific ETR buffers
  coresight: tmc-etr: Relax collection of trace from sysfs mode
  coresight: etr_buf: Add helper for padding an area of trace data
  coresight: perf: Remove reset_buffer call back for sinks
  coresight: etm-perf: Add support for ETR backend

 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |    8 
 Documentation/admin-guide/kernel-parameters.txt    |    8 +
 .../devicetree/bindings/arm/coresight.txt          |   55 +
 arch/arm64/boot/dts/arm/juno-base.dtsi             |    1 +
 drivers/hwtracing/coresight/Kconfig                |   10 +
 drivers/hwtracing/coresight/Makefile               |    1 +
 drivers/hwtracing/coresight/coresight-catu.c       |  785 +++++++++
 drivers/hwtracing/coresight/coresight-catu.h       |  119 ++
 .../coresight/coresight-dynamic-replicator.c       |    4 +-
 drivers/hwtracing/coresight/coresight-etb10.c      |   74 +-
 drivers/hwtracing/coresight/coresight-etm-perf.c   |    9 +-
 drivers/hwtracing/coresight/coresight-etm3x.c      |    4 +-
 drivers/hwtracing/coresight/coresight-etm4x.c      |   28 +-
 drivers/hwtracing/coresight/coresight-funnel.c     |    4 +-
 drivers/hwtracing/coresight/coresight-priv.h       |   10 +-
 drivers/hwtracing/coresight/coresight-replicator.c |    4 +-
 drivers/hwtracing/coresight/coresight-stm.c        |    4 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |  111 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    | 1767 ++++++++++++++++++--
 drivers/hwtracing/coresight/coresight-tmc.c        |   94 +-
 drivers/hwtracing/coresight/coresight-tmc.h        |  160 +-
 drivers/hwtracing/coresight/coresight-tpiu.c       |    4 +-
 drivers/hwtracing/coresight/coresight.c            |   49 +-
 include/linux/coresight.h                          |   51 +-
 24 files changed, 3065 insertions(+), 299 deletions(-)
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 01/27] coresight: ETM: Add support for ARM Cortex-A73
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Add ARM Cortex A-73 ETM PIDs to the known ETM ips. While at it
also add description of the CPU to which the ETM belongs, to make
it easier to identify the ETM devices.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index cf364a5..e84d80b 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1034,7 +1034,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	}
 
 	pm_runtime_put(&adev->dev);
-	dev_info(dev, "%s initialized\n", (char *)id->data);
+	dev_info(dev, "CPU%d: %s initialized\n",
+			drvdata->cpu, (char *)id->data);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1053,20 +1054,25 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 }
 
 static const struct amba_id etm4_ids[] = {
-	{       /* ETM 4.0 - Cortex-A53  */
+	{
 		.id	= 0x000bb95d,
 		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
+		.data	= "Cortex-A53 ETM v4.0",
 	},
-	{       /* ETM 4.0 - Cortex-A57 */
+	{
 		.id	= 0x000bb95e,
 		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
+		.data	= "Cortex-A57 ETM v4.0",
 	},
-	{       /* ETM 4.0 - A72, Maia, HiSilicon */
-		.id = 0x000bb95a,
-		.mask = 0x000fffff,
-		.data = "ETM 4.0",
+	{
+		.id	= 0x000bb95a,
+		.mask	= 0x000fffff,
+		.data	= "Cortex-A72 ETM v4.0",
+	},
+	{
+		.id	= 0x000bb959,
+		.mask	= 0x000fffff,
+		.data	= "Cortex-A73 ETM v4.0",
 	},
 	{ 0, 0},
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 01/27] coresight: ETM: Add support for ARM Cortex-A73
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Add ARM Cortex A-73 ETM PIDs to the known ETM ips. While at it
also add description of the CPU to which the ETM belongs, to make
it easier to identify the ETM devices.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index cf364a5..e84d80b 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1034,7 +1034,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	}
 
 	pm_runtime_put(&adev->dev);
-	dev_info(dev, "%s initialized\n", (char *)id->data);
+	dev_info(dev, "CPU%d: %s initialized\n",
+			drvdata->cpu, (char *)id->data);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1053,20 +1054,25 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 }
 
 static const struct amba_id etm4_ids[] = {
-	{       /* ETM 4.0 - Cortex-A53  */
+	{
 		.id	= 0x000bb95d,
 		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
+		.data	= "Cortex-A53 ETM v4.0",
 	},
-	{       /* ETM 4.0 - Cortex-A57 */
+	{
 		.id	= 0x000bb95e,
 		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
+		.data	= "Cortex-A57 ETM v4.0",
 	},
-	{       /* ETM 4.0 - A72, Maia, HiSilicon */
-		.id = 0x000bb95a,
-		.mask = 0x000fffff,
-		.data = "ETM 4.0",
+	{
+		.id	= 0x000bb95a,
+		.mask	= 0x000fffff,
+		.data	= "Cortex-A72 ETM v4.0",
+	},
+	{
+		.id	= 0x000bb959,
+		.mask	= 0x000fffff,
+		.data	= "Cortex-A73 ETM v4.0",
 	},
 	{ 0, 0},
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 02/27] coresight: Cleanup device subtype struct
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Clean up our struct a little bit by using a union instead of
a struct for tracking the subtype of a device.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 include/linux/coresight.h | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index d950dad..556fe59 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -70,17 +70,20 @@ enum coresight_dev_subtype_source {
 };
 
 /**
- * struct coresight_dev_subtype - further characterisation of a type
+ * union coresight_dev_subtype - further characterisation of a type
  * @sink_subtype:	type of sink this component is, as defined
-			by @coresight_dev_subtype_sink.
+ *			by @coresight_dev_subtype_sink.
  * @link_subtype:	type of link this component is, as defined
-			by @coresight_dev_subtype_link.
+ *			by @coresight_dev_subtype_link.
  * @source_subtype:	type of source this component is, as defined
-			by @coresight_dev_subtype_source.
+ *			by @coresight_dev_subtype_source.
  */
-struct coresight_dev_subtype {
-	enum coresight_dev_subtype_sink sink_subtype;
-	enum coresight_dev_subtype_link link_subtype;
+union coresight_dev_subtype {
+	/* We have some devices which acts as LINK and SINK */
+	struct {
+		enum coresight_dev_subtype_sink sink_subtype;
+		enum coresight_dev_subtype_link link_subtype;
+	};
 	enum coresight_dev_subtype_source source_subtype;
 };
 
@@ -120,7 +123,7 @@ struct coresight_platform_data {
  */
 struct coresight_desc {
 	enum coresight_dev_type type;
-	struct coresight_dev_subtype subtype;
+	union coresight_dev_subtype subtype;
 	const struct coresight_ops *ops;
 	struct coresight_platform_data *pdata;
 	struct device *dev;
@@ -164,7 +167,7 @@ struct coresight_device {
 	int nr_inport;
 	int nr_outport;
 	enum coresight_dev_type type;
-	struct coresight_dev_subtype subtype;
+	union coresight_dev_subtype subtype;
 	const struct coresight_ops *ops;
 	struct device dev;
 	atomic_t *refcnt;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 02/27] coresight: Cleanup device subtype struct
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Clean up our struct a little bit by using a union instead of
a struct for tracking the subtype of a device.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 include/linux/coresight.h | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index d950dad..556fe59 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -70,17 +70,20 @@ enum coresight_dev_subtype_source {
 };
 
 /**
- * struct coresight_dev_subtype - further characterisation of a type
+ * union coresight_dev_subtype - further characterisation of a type
  * @sink_subtype:	type of sink this component is, as defined
-			by @coresight_dev_subtype_sink.
+ *			by @coresight_dev_subtype_sink.
  * @link_subtype:	type of link this component is, as defined
-			by @coresight_dev_subtype_link.
+ *			by @coresight_dev_subtype_link.
  * @source_subtype:	type of source this component is, as defined
-			by @coresight_dev_subtype_source.
+ *			by @coresight_dev_subtype_source.
  */
-struct coresight_dev_subtype {
-	enum coresight_dev_subtype_sink sink_subtype;
-	enum coresight_dev_subtype_link link_subtype;
+union coresight_dev_subtype {
+	/* We have some devices which acts as LINK and SINK */
+	struct {
+		enum coresight_dev_subtype_sink sink_subtype;
+		enum coresight_dev_subtype_link link_subtype;
+	};
 	enum coresight_dev_subtype_source source_subtype;
 };
 
@@ -120,7 +123,7 @@ struct coresight_platform_data {
  */
 struct coresight_desc {
 	enum coresight_dev_type type;
-	struct coresight_dev_subtype subtype;
+	union coresight_dev_subtype subtype;
 	const struct coresight_ops *ops;
 	struct coresight_platform_data *pdata;
 	struct device *dev;
@@ -164,7 +167,7 @@ struct coresight_device {
 	int nr_inport;
 	int nr_outport;
 	enum coresight_dev_type type;
-	struct coresight_dev_subtype subtype;
+	union coresight_dev_subtype subtype;
 	const struct coresight_ops *ops;
 	struct device dev;
 	atomic_t *refcnt;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 03/27] coresight: Add helper device type
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Add a new coresight device type, which do not belong to any
of the existing types, i.e, source, sink, link etc. A helper
device could be connected to a coresight device, which could
augment the functionality of the coresight device.

This is intended to cover Coresight Address Translation Unit (CATU)
devices, which provide improved Scatter Gather mechanism for TMC
ETR. The idea is that the helper device could be controlled by
the driver of the device it is attached to (in this case ETR),
transparent to the generic coresight driver (and paths).

The operations include enable(), disable(), both of which could
accept a device specific "data" which the driving device and
the helper device could share. Since they don't appear in the
coresight "path" tracked by software, we have to ensure that
they are powered up/down whenever the master device is turned
on.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight.c | 46 ++++++++++++++++++++++++++++++---
 include/linux/coresight.h               | 24 +++++++++++++++++
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index 389c4ba..fd0251e 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -430,6 +430,43 @@ struct coresight_device *coresight_get_enabled_sink(bool deactivate)
 	return dev ? to_coresight_device(dev) : NULL;
 }
 
+/*
+ * coresight_prepare_device - Prepare this device and any of the helper
+ * devices connected to it for trace operation. Since the helper devices
+ * don't appear on the trace path, they should be handled along with the
+ * the master device.
+ */
+static void coresight_prepare_device(struct coresight_device *csdev)
+{
+	int i;
+
+	for (i = 0; i < csdev->nr_outport; i++) {
+		struct coresight_device *child = csdev->conns[i].child_dev;
+
+		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
+			pm_runtime_get_sync(child->dev.parent);
+	}
+
+	pm_runtime_get_sync(csdev->dev.parent);
+}
+
+/*
+ * coresight_release_device - Release this device and any of the helper
+ * devices connected to it for trace operation.
+ */
+static void coresight_release_device(struct coresight_device *csdev)
+{
+	int i;
+
+	for (i = 0; i < csdev->nr_outport; i++) {
+		struct coresight_device *child = csdev->conns[i].child_dev;
+
+		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
+			pm_runtime_put(child->dev.parent);
+	}
+	pm_runtime_put(csdev->dev.parent);
+}
+
 /**
  * _coresight_build_path - recursively build a path from a @csdev to a sink.
  * @csdev:	The device to start from.
@@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
 
 	node->csdev = csdev;
 	list_add(&node->link, path);
-	pm_runtime_get_sync(csdev->dev.parent);
-
+	coresight_prepare_device(csdev);
 	return 0;
 }
 
@@ -524,7 +560,7 @@ void coresight_release_path(struct list_head *path)
 	list_for_each_entry_safe(nd, next, path, link) {
 		csdev = nd->csdev;
 
-		pm_runtime_put_sync(csdev->dev.parent);
+		coresight_release_device(csdev);
 		list_del(&nd->link);
 		kfree(nd);
 	}
@@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
 		.name = "source",
 		.groups = coresight_source_groups,
 	},
+	{
+		.name = "helper",
+	},
+
 };
 
 static void coresight_device_release(struct device *dev)
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 556fe59..5e926f7 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -47,6 +47,7 @@ enum coresight_dev_type {
 	CORESIGHT_DEV_TYPE_LINK,
 	CORESIGHT_DEV_TYPE_LINKSINK,
 	CORESIGHT_DEV_TYPE_SOURCE,
+	CORESIGHT_DEV_TYPE_HELPER,
 };
 
 enum coresight_dev_subtype_sink {
@@ -69,6 +70,10 @@ enum coresight_dev_subtype_source {
 	CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE,
 };
 
+enum coresight_dev_subtype_helper {
+	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
+};
+
 /**
  * union coresight_dev_subtype - further characterisation of a type
  * @sink_subtype:	type of sink this component is, as defined
@@ -77,6 +82,8 @@ enum coresight_dev_subtype_source {
  *			by @coresight_dev_subtype_link.
  * @source_subtype:	type of source this component is, as defined
  *			by @coresight_dev_subtype_source.
+ * @helper_subtype:	type of helper this component is, as defined
+ *			by @coresight_dev_subtype_helper.
  */
 union coresight_dev_subtype {
 	/* We have some devices which acts as LINK and SINK */
@@ -85,6 +92,7 @@ union coresight_dev_subtype {
 		enum coresight_dev_subtype_link link_subtype;
 	};
 	enum coresight_dev_subtype_source source_subtype;
+	enum coresight_dev_subtype_helper helper_subtype;
 };
 
 /**
@@ -181,6 +189,7 @@ struct coresight_device {
 #define source_ops(csdev)	csdev->ops->source_ops
 #define sink_ops(csdev)		csdev->ops->sink_ops
 #define link_ops(csdev)		csdev->ops->link_ops
+#define helper_ops(csdev)	csdev->ops->helper_ops
 
 /**
  * struct coresight_ops_sink - basic operations for a sink
@@ -240,10 +249,25 @@ struct coresight_ops_source {
 			struct perf_event *event);
 };
 
+/**
+ * struct coresight_ops_helper - Operations for a helper device.
+ *
+ * All operations could pass in a device specific data, which could
+ * help the helper device to determine what to do.
+ *
+ * @enable	: Turn the device ON.
+ * @disable	: Turn the device OFF.
+ */
+struct coresight_ops_helper {
+	int (*enable)(struct coresight_device *csdev, void *data);
+	int (*disable)(struct coresight_device *csdev, void *data);
+};
+
 struct coresight_ops {
 	const struct coresight_ops_sink *sink_ops;
 	const struct coresight_ops_link *link_ops;
 	const struct coresight_ops_source *source_ops;
+	const struct coresight_ops_helper *helper_ops;
 };
 
 #ifdef CONFIG_CORESIGHT
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 03/27] coresight: Add helper device type
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Add a new coresight device type, which do not belong to any
of the existing types, i.e, source, sink, link etc. A helper
device could be connected to a coresight device, which could
augment the functionality of the coresight device.

This is intended to cover Coresight Address Translation Unit (CATU)
devices, which provide improved Scatter Gather mechanism for TMC
ETR. The idea is that the helper device could be controlled by
the driver of the device it is attached to (in this case ETR),
transparent to the generic coresight driver (and paths).

The operations include enable(), disable(), both of which could
accept a device specific "data" which the driving device and
the helper device could share. Since they don't appear in the
coresight "path" tracked by software, we have to ensure that
they are powered up/down whenever the master device is turned
on.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight.c | 46 ++++++++++++++++++++++++++++++---
 include/linux/coresight.h               | 24 +++++++++++++++++
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index 389c4ba..fd0251e 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -430,6 +430,43 @@ struct coresight_device *coresight_get_enabled_sink(bool deactivate)
 	return dev ? to_coresight_device(dev) : NULL;
 }
 
+/*
+ * coresight_prepare_device - Prepare this device and any of the helper
+ * devices connected to it for trace operation. Since the helper devices
+ * don't appear on the trace path, they should be handled along with the
+ * the master device.
+ */
+static void coresight_prepare_device(struct coresight_device *csdev)
+{
+	int i;
+
+	for (i = 0; i < csdev->nr_outport; i++) {
+		struct coresight_device *child = csdev->conns[i].child_dev;
+
+		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
+			pm_runtime_get_sync(child->dev.parent);
+	}
+
+	pm_runtime_get_sync(csdev->dev.parent);
+}
+
+/*
+ * coresight_release_device - Release this device and any of the helper
+ * devices connected to it for trace operation.
+ */
+static void coresight_release_device(struct coresight_device *csdev)
+{
+	int i;
+
+	for (i = 0; i < csdev->nr_outport; i++) {
+		struct coresight_device *child = csdev->conns[i].child_dev;
+
+		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
+			pm_runtime_put(child->dev.parent);
+	}
+	pm_runtime_put(csdev->dev.parent);
+}
+
 /**
  * _coresight_build_path - recursively build a path from a @csdev to a sink.
  * @csdev:	The device to start from.
@@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
 
 	node->csdev = csdev;
 	list_add(&node->link, path);
-	pm_runtime_get_sync(csdev->dev.parent);
-
+	coresight_prepare_device(csdev);
 	return 0;
 }
 
@@ -524,7 +560,7 @@ void coresight_release_path(struct list_head *path)
 	list_for_each_entry_safe(nd, next, path, link) {
 		csdev = nd->csdev;
 
-		pm_runtime_put_sync(csdev->dev.parent);
+		coresight_release_device(csdev);
 		list_del(&nd->link);
 		kfree(nd);
 	}
@@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
 		.name = "source",
 		.groups = coresight_source_groups,
 	},
+	{
+		.name = "helper",
+	},
+
 };
 
 static void coresight_device_release(struct device *dev)
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 556fe59..5e926f7 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -47,6 +47,7 @@ enum coresight_dev_type {
 	CORESIGHT_DEV_TYPE_LINK,
 	CORESIGHT_DEV_TYPE_LINKSINK,
 	CORESIGHT_DEV_TYPE_SOURCE,
+	CORESIGHT_DEV_TYPE_HELPER,
 };
 
 enum coresight_dev_subtype_sink {
@@ -69,6 +70,10 @@ enum coresight_dev_subtype_source {
 	CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE,
 };
 
+enum coresight_dev_subtype_helper {
+	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
+};
+
 /**
  * union coresight_dev_subtype - further characterisation of a type
  * @sink_subtype:	type of sink this component is, as defined
@@ -77,6 +82,8 @@ enum coresight_dev_subtype_source {
  *			by @coresight_dev_subtype_link.
  * @source_subtype:	type of source this component is, as defined
  *			by @coresight_dev_subtype_source.
+ * @helper_subtype:	type of helper this component is, as defined
+ *			by @coresight_dev_subtype_helper.
  */
 union coresight_dev_subtype {
 	/* We have some devices which acts as LINK and SINK */
@@ -85,6 +92,7 @@ union coresight_dev_subtype {
 		enum coresight_dev_subtype_link link_subtype;
 	};
 	enum coresight_dev_subtype_source source_subtype;
+	enum coresight_dev_subtype_helper helper_subtype;
 };
 
 /**
@@ -181,6 +189,7 @@ struct coresight_device {
 #define source_ops(csdev)	csdev->ops->source_ops
 #define sink_ops(csdev)		csdev->ops->sink_ops
 #define link_ops(csdev)		csdev->ops->link_ops
+#define helper_ops(csdev)	csdev->ops->helper_ops
 
 /**
  * struct coresight_ops_sink - basic operations for a sink
@@ -240,10 +249,25 @@ struct coresight_ops_source {
 			struct perf_event *event);
 };
 
+/**
+ * struct coresight_ops_helper - Operations for a helper device.
+ *
+ * All operations could pass in a device specific data, which could
+ * help the helper device to determine what to do.
+ *
+ * @enable	: Turn the device ON.
+ * @disable	: Turn the device OFF.
+ */
+struct coresight_ops_helper {
+	int (*enable)(struct coresight_device *csdev, void *data);
+	int (*disable)(struct coresight_device *csdev, void *data);
+};
+
 struct coresight_ops {
 	const struct coresight_ops_sink *sink_ops;
 	const struct coresight_ops_link *link_ops;
 	const struct coresight_ops_source *source_ops;
+	const struct coresight_ops_helper *helper_ops;
 };
 
 #ifdef CONFIG_CORESIGHT
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Add the initial support for Coresight Address Translation Unit, which
augments the TMC in Coresight SoC-600 by providing an improved Scatter
Gather mechanism. CATU is always connected to a single TMC-ETR and
converts the AXI address with a translated address (from a given SG
table with specific format). The CATU should be programmed in pass
through mode and enabled if the ETR doesn't translation by CATU.

This patch provides mechanism to enable/disable the CATU always in the
pass through mode.

We reuse the existing ports mechanism to link the TMC-ETR to the
connected CATU.

i.e, TMC-ETR:output_port0 -> CATU:input_port0

Reference manual for  CATU component is avilable in version r2p0 of :
"Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
under Section 4.9.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/Kconfig             |  10 ++
 drivers/hwtracing/coresight/Makefile            |   1 +
 drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
 include/linux/coresight.h                       |   1 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.h

diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index ef9cb3c..21f638f 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
 	  complies with the generic implementation of the component without
 	  special enhancement or added features.
 
+config CORESIGHT_CATU
+	bool "Coresight Address Translation Unit (CATU) driver"
+	depends on CORESIGHT_LINK_AND_SINK_TMC
+	help
+	   Enable support for the Coresight Address Translation Unit (CATU).
+	   CATU supports a scatter gather table of 4K pages, with forward/backward
+	   lookup. CATU helps TMC ETR to use large physically non-contiguous trace
+	   buffer by translating the addersses used by ETR to the corresponding
+	   physical adderss by looking up the table.
+
 config CORESIGHT_SINK_TPIU
 	bool "Coresight generic TPIU driver"
 	depends on CORESIGHT_LINKS_AND_SINKS
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index 61db9dd..41870de 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
 obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
 obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
 obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
+obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
new file mode 100644
index 0000000..2cd69a6
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2017 ARM Limited. All rights reserved.
+ *
+ * Coresight Address Translation Unit support
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/amba/bus.h>
+#include <linux/io.h>
+#include <linux/slab.h>
+
+#include "coresight-catu.h"
+#include "coresight-priv.h"
+
+#define csdev_to_catu_drvdata(csdev)	\
+	dev_get_drvdata(csdev->dev.parent)
+
+coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
+coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
+coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
+coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
+coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
+coresight_simple_reg64(struct catu_drvdata, sladdr,
+		       CATU_SLADDRLO, CATU_SLADDRHI);
+coresight_simple_reg64(struct catu_drvdata, inaddr,
+		       CATU_INADDRLO, CATU_INADDRHI);
+
+static struct attribute *catu_mgmt_attrs[] = {
+	&dev_attr_control.attr,
+	&dev_attr_status.attr,
+	&dev_attr_mode.attr,
+	&dev_attr_axictrl.attr,
+	&dev_attr_irqen.attr,
+	&dev_attr_sladdr.attr,
+	&dev_attr_inaddr.attr,
+	NULL,
+};
+
+static const struct attribute_group catu_mgmt_group = {
+	.attrs = catu_mgmt_attrs,
+	.name = "mgmt",
+};
+
+static const struct attribute_group *catu_groups[] = {
+	&catu_mgmt_group,
+	NULL,
+};
+
+
+static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
+{
+	return coresight_timeout(drvdata->base,
+				 CATU_STATUS, CATU_STATUS_READY, 1);
+}
+
+static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
+{
+	u32 control;
+
+	if (catu_wait_for_ready(drvdata))
+		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
+
+	control = catu_read_control(drvdata);
+	if (control & BIT(CATU_CONTROL_ENABLE)) {
+		dev_warn(drvdata->dev, "CATU is already enabled\n");
+		return -EBUSY;
+	}
+
+	control |= BIT(CATU_CONTROL_ENABLE);
+	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
+	catu_write_control(drvdata, control);
+	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
+	return 0;
+}
+
+static int catu_enable(struct coresight_device *csdev, void *data)
+{
+	int rc;
+	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
+
+	CS_UNLOCK(catu_drvdata->base);
+	rc = catu_enable_hw(catu_drvdata, data);
+	CS_LOCK(catu_drvdata->base);
+	return rc;
+}
+
+static int catu_disable_hw(struct catu_drvdata *drvdata)
+{
+	int rc = 0;
+
+	if (catu_wait_for_ready(drvdata)) {
+		dev_info(drvdata->dev, "Timeout while waiting for READY\n");
+		rc = -EAGAIN;
+	}
+
+	catu_write_control(drvdata, 0);
+	dev_dbg(drvdata->dev, "Disabled\n");
+	return rc;
+}
+
+static int catu_disable(struct coresight_device *csdev, void *__unused)
+{
+	int rc;
+	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
+
+	CS_UNLOCK(catu_drvdata->base);
+	rc = catu_disable_hw(catu_drvdata);
+	CS_LOCK(catu_drvdata->base);
+
+	return rc;
+}
+
+const struct coresight_ops_helper catu_helper_ops = {
+	.enable = catu_enable,
+	.disable = catu_disable,
+};
+
+const struct coresight_ops catu_ops = {
+	.helper_ops = &catu_helper_ops,
+};
+
+static int catu_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	int ret = 0;
+	struct catu_drvdata *drvdata;
+	struct coresight_desc catu_desc;
+	struct coresight_platform_data *pdata = NULL;
+	struct device *dev = &adev->dev;
+	struct device_node *np = dev->of_node;
+	void __iomem *base;
+
+	if (np) {
+		pdata = of_get_coresight_platform_data(dev, np);
+		if (IS_ERR(pdata)) {
+			ret = PTR_ERR(pdata);
+			goto out;
+		}
+		dev->platform_data = pdata;
+	}
+
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (!drvdata) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	drvdata->dev = dev;
+	dev_set_drvdata(dev, drvdata);
+	base = devm_ioremap_resource(dev, &adev->res);
+	if (IS_ERR(base)) {
+		ret = PTR_ERR(base);
+		goto out;
+	}
+
+	drvdata->base = base;
+	catu_desc.pdata = pdata;
+	catu_desc.dev = dev;
+	catu_desc.groups = catu_groups;
+	catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
+	catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
+	catu_desc.ops = &catu_ops;
+	drvdata->csdev = coresight_register(&catu_desc);
+	if (IS_ERR(drvdata->csdev))
+		ret = PTR_ERR(drvdata->csdev);
+	if (!ret)
+		dev_info(drvdata->dev, "initialized\n");
+out:
+	pm_runtime_put(&adev->dev);
+	return ret;
+}
+
+static struct amba_id catu_ids[] = {
+	{
+		.id	= 0x000bb9ee,
+		.mask	= 0x000fffff,
+	},
+	{},
+};
+
+static struct amba_driver catu_driver = {
+	.drv = {
+		.name			= "coresight-catu",
+		.owner			= THIS_MODULE,
+		.suppress_bind_attrs	= true,
+	},
+	.probe				= catu_probe,
+	.id_table			= catu_ids,
+};
+
+builtin_amba_driver(catu_driver);
diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
new file mode 100644
index 0000000..cd58d6f
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-catu.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (C) 2017 ARM Limited. All rights reserved.
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ *
+ */
+
+#ifndef _CORESIGHT_CATU_H
+#define _CORESIGHT_CATU_H
+
+#include "coresight-priv.h"
+
+/* Register offset from base */
+#define CATU_CONTROL		0x000
+#define CATU_MODE		0x004
+#define CATU_AXICTRL		0x008
+#define CATU_IRQEN		0x00c
+#define CATU_SLADDRLO		0x020
+#define CATU_SLADDRHI		0x024
+#define CATU_INADDRLO		0x028
+#define CATU_INADDRHI		0x02c
+#define CATU_STATUS		0x100
+#define CATU_DEVARCH		0xfbc
+
+#define CATU_CONTROL_ENABLE	0
+
+#define CATU_MODE_PASS_THROUGH	0U
+#define CATU_MODE_TRANSLATE	1U
+
+#define CATU_STATUS_READY	8
+#define CATU_STATUS_ADRERR	0
+#define CATU_STATUS_AXIERR	4
+
+
+#define CATU_IRQEN_ON		0x1
+#define CATU_IRQEN_OFF		0x0
+
+
+struct catu_drvdata {
+	struct device *dev;
+	void __iomem *base;
+	struct coresight_device *csdev;
+	int irq;
+};
+
+#define CATU_REG32(name, offset)					\
+static inline u32							\
+catu_read_##name(struct catu_drvdata *drvdata)				\
+{									\
+	return coresight_read_reg_pair(drvdata->base, offset, -1);	\
+}									\
+static inline void							\
+catu_write_##name(struct catu_drvdata *drvdata, u32 val)		\
+{									\
+	coresight_write_reg_pair(drvdata->base, val, offset, -1);	\
+}
+
+#define CATU_REG_PAIR(name, lo_off, hi_off)				\
+static inline u64							\
+catu_read_##name(struct catu_drvdata *drvdata)				\
+{									\
+	return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);	\
+}									\
+static inline void							\
+catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
+{									\
+	coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);	\
+}
+
+CATU_REG32(control, CATU_CONTROL);
+CATU_REG32(mode, CATU_MODE);
+CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
+CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
+
+static inline bool coresight_is_catu_device(struct coresight_device *csdev)
+{
+	enum coresight_dev_subtype_helper subtype;
+
+	/* Make the checkpatch happy */
+	subtype = csdev->subtype.helper_subtype;
+
+	return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
+	       csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
+	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
+}
+
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 68fbc8f..9b0c620 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,9 +17,26 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include "coresight-catu.h"
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
+{
+	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
+
+	if (catu && helper_ops(catu)->enable)
+		helper_ops(catu)->enable(catu, NULL);
+}
+
+static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
+{
+	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
+
+	if (catu && helper_ops(catu)->disable)
+		helper_ops(catu)->disable(catu, NULL);
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
@@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	/* Zero out the memory to help with debug */
 	memset(drvdata->vaddr, 0, drvdata->size);
 
+	/*
+	 * If this ETR is connected to a CATU, enable it before we turn
+	 * this on
+	 */
+	tmc_etr_enable_catu(drvdata);
+
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
@@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
+
+	/* Disable CATU device if this ETR is connected to one */
+	tmc_etr_disable_catu(drvdata);
 }
 
 static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 8df7a81..cdff853 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -19,6 +19,7 @@
 #define _CORESIGHT_TMC_H
 
 #include <linux/miscdevice.h>
+#include "coresight-catu.h"
 
 #define TMC_RSZ			0x004
 #define TMC_STS			0x00c
@@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
 	return !!(drvdata->etr_caps & cap);
 }
 
+/*
+ * TMC ETR could be connected to a CATU device, which can provide address
+ * translation service. This is represented by the Output port of the TMC
+ * (ETR) connected to the input port of the CATU.
+ *
+ * Returns	: coresight_device ptr for the CATU device if a CATU is found.
+ *		: NULL otherwise.
+ */
+static inline struct coresight_device *
+tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
+{
+	int i;
+	struct coresight_device *tmp, *etr = drvdata->csdev;
+
+	if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
+		return NULL;
+
+	for (i = 0; i < etr->nr_outport; i++) {
+		tmp = etr->conns[0].child_dev;
+		if (tmp && coresight_is_catu_device(tmp))
+			return tmp;
+	}
+
+	return NULL;
+}
+
 #endif
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 5e926f7..c0e1568 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
 
 enum coresight_dev_subtype_helper {
 	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
+	CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Add the initial support for Coresight Address Translation Unit, which
augments the TMC in Coresight SoC-600 by providing an improved Scatter
Gather mechanism. CATU is always connected to a single TMC-ETR and
converts the AXI address with a translated address (from a given SG
table with specific format). The CATU should be programmed in pass
through mode and enabled if the ETR doesn't translation by CATU.

This patch provides mechanism to enable/disable the CATU always in the
pass through mode.

We reuse the existing ports mechanism to link the TMC-ETR to the
connected CATU.

i.e, TMC-ETR:output_port0 -> CATU:input_port0

Reference manual for  CATU component is avilable in version r2p0 of :
"Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
under Section 4.9.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/Kconfig             |  10 ++
 drivers/hwtracing/coresight/Makefile            |   1 +
 drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
 include/linux/coresight.h                       |   1 +
 7 files changed, 349 insertions(+)
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
 create mode 100644 drivers/hwtracing/coresight/coresight-catu.h

diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index ef9cb3c..21f638f 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
 	  complies with the generic implementation of the component without
 	  special enhancement or added features.
 
+config CORESIGHT_CATU
+	bool "Coresight Address Translation Unit (CATU) driver"
+	depends on CORESIGHT_LINK_AND_SINK_TMC
+	help
+	   Enable support for the Coresight Address Translation Unit (CATU).
+	   CATU supports a scatter gather table of 4K pages, with forward/backward
+	   lookup. CATU helps TMC ETR to use large physically non-contiguous trace
+	   buffer by translating the addersses used by ETR to the corresponding
+	   physical adderss by looking up the table.
+
 config CORESIGHT_SINK_TPIU
 	bool "Coresight generic TPIU driver"
 	depends on CORESIGHT_LINKS_AND_SINKS
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index 61db9dd..41870de 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
 obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
 obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
 obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
+obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
new file mode 100644
index 0000000..2cd69a6
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2017 ARM Limited. All rights reserved.
+ *
+ * Coresight Address Translation Unit support
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/amba/bus.h>
+#include <linux/io.h>
+#include <linux/slab.h>
+
+#include "coresight-catu.h"
+#include "coresight-priv.h"
+
+#define csdev_to_catu_drvdata(csdev)	\
+	dev_get_drvdata(csdev->dev.parent)
+
+coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
+coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
+coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
+coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
+coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
+coresight_simple_reg64(struct catu_drvdata, sladdr,
+		       CATU_SLADDRLO, CATU_SLADDRHI);
+coresight_simple_reg64(struct catu_drvdata, inaddr,
+		       CATU_INADDRLO, CATU_INADDRHI);
+
+static struct attribute *catu_mgmt_attrs[] = {
+	&dev_attr_control.attr,
+	&dev_attr_status.attr,
+	&dev_attr_mode.attr,
+	&dev_attr_axictrl.attr,
+	&dev_attr_irqen.attr,
+	&dev_attr_sladdr.attr,
+	&dev_attr_inaddr.attr,
+	NULL,
+};
+
+static const struct attribute_group catu_mgmt_group = {
+	.attrs = catu_mgmt_attrs,
+	.name = "mgmt",
+};
+
+static const struct attribute_group *catu_groups[] = {
+	&catu_mgmt_group,
+	NULL,
+};
+
+
+static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
+{
+	return coresight_timeout(drvdata->base,
+				 CATU_STATUS, CATU_STATUS_READY, 1);
+}
+
+static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
+{
+	u32 control;
+
+	if (catu_wait_for_ready(drvdata))
+		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
+
+	control = catu_read_control(drvdata);
+	if (control & BIT(CATU_CONTROL_ENABLE)) {
+		dev_warn(drvdata->dev, "CATU is already enabled\n");
+		return -EBUSY;
+	}
+
+	control |= BIT(CATU_CONTROL_ENABLE);
+	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
+	catu_write_control(drvdata, control);
+	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
+	return 0;
+}
+
+static int catu_enable(struct coresight_device *csdev, void *data)
+{
+	int rc;
+	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
+
+	CS_UNLOCK(catu_drvdata->base);
+	rc = catu_enable_hw(catu_drvdata, data);
+	CS_LOCK(catu_drvdata->base);
+	return rc;
+}
+
+static int catu_disable_hw(struct catu_drvdata *drvdata)
+{
+	int rc = 0;
+
+	if (catu_wait_for_ready(drvdata)) {
+		dev_info(drvdata->dev, "Timeout while waiting for READY\n");
+		rc = -EAGAIN;
+	}
+
+	catu_write_control(drvdata, 0);
+	dev_dbg(drvdata->dev, "Disabled\n");
+	return rc;
+}
+
+static int catu_disable(struct coresight_device *csdev, void *__unused)
+{
+	int rc;
+	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
+
+	CS_UNLOCK(catu_drvdata->base);
+	rc = catu_disable_hw(catu_drvdata);
+	CS_LOCK(catu_drvdata->base);
+
+	return rc;
+}
+
+const struct coresight_ops_helper catu_helper_ops = {
+	.enable = catu_enable,
+	.disable = catu_disable,
+};
+
+const struct coresight_ops catu_ops = {
+	.helper_ops = &catu_helper_ops,
+};
+
+static int catu_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	int ret = 0;
+	struct catu_drvdata *drvdata;
+	struct coresight_desc catu_desc;
+	struct coresight_platform_data *pdata = NULL;
+	struct device *dev = &adev->dev;
+	struct device_node *np = dev->of_node;
+	void __iomem *base;
+
+	if (np) {
+		pdata = of_get_coresight_platform_data(dev, np);
+		if (IS_ERR(pdata)) {
+			ret = PTR_ERR(pdata);
+			goto out;
+		}
+		dev->platform_data = pdata;
+	}
+
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (!drvdata) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	drvdata->dev = dev;
+	dev_set_drvdata(dev, drvdata);
+	base = devm_ioremap_resource(dev, &adev->res);
+	if (IS_ERR(base)) {
+		ret = PTR_ERR(base);
+		goto out;
+	}
+
+	drvdata->base = base;
+	catu_desc.pdata = pdata;
+	catu_desc.dev = dev;
+	catu_desc.groups = catu_groups;
+	catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
+	catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
+	catu_desc.ops = &catu_ops;
+	drvdata->csdev = coresight_register(&catu_desc);
+	if (IS_ERR(drvdata->csdev))
+		ret = PTR_ERR(drvdata->csdev);
+	if (!ret)
+		dev_info(drvdata->dev, "initialized\n");
+out:
+	pm_runtime_put(&adev->dev);
+	return ret;
+}
+
+static struct amba_id catu_ids[] = {
+	{
+		.id	= 0x000bb9ee,
+		.mask	= 0x000fffff,
+	},
+	{},
+};
+
+static struct amba_driver catu_driver = {
+	.drv = {
+		.name			= "coresight-catu",
+		.owner			= THIS_MODULE,
+		.suppress_bind_attrs	= true,
+	},
+	.probe				= catu_probe,
+	.id_table			= catu_ids,
+};
+
+builtin_amba_driver(catu_driver);
diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
new file mode 100644
index 0000000..cd58d6f
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-catu.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (C) 2017 ARM Limited. All rights reserved.
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ *
+ */
+
+#ifndef _CORESIGHT_CATU_H
+#define _CORESIGHT_CATU_H
+
+#include "coresight-priv.h"
+
+/* Register offset from base */
+#define CATU_CONTROL		0x000
+#define CATU_MODE		0x004
+#define CATU_AXICTRL		0x008
+#define CATU_IRQEN		0x00c
+#define CATU_SLADDRLO		0x020
+#define CATU_SLADDRHI		0x024
+#define CATU_INADDRLO		0x028
+#define CATU_INADDRHI		0x02c
+#define CATU_STATUS		0x100
+#define CATU_DEVARCH		0xfbc
+
+#define CATU_CONTROL_ENABLE	0
+
+#define CATU_MODE_PASS_THROUGH	0U
+#define CATU_MODE_TRANSLATE	1U
+
+#define CATU_STATUS_READY	8
+#define CATU_STATUS_ADRERR	0
+#define CATU_STATUS_AXIERR	4
+
+
+#define CATU_IRQEN_ON		0x1
+#define CATU_IRQEN_OFF		0x0
+
+
+struct catu_drvdata {
+	struct device *dev;
+	void __iomem *base;
+	struct coresight_device *csdev;
+	int irq;
+};
+
+#define CATU_REG32(name, offset)					\
+static inline u32							\
+catu_read_##name(struct catu_drvdata *drvdata)				\
+{									\
+	return coresight_read_reg_pair(drvdata->base, offset, -1);	\
+}									\
+static inline void							\
+catu_write_##name(struct catu_drvdata *drvdata, u32 val)		\
+{									\
+	coresight_write_reg_pair(drvdata->base, val, offset, -1);	\
+}
+
+#define CATU_REG_PAIR(name, lo_off, hi_off)				\
+static inline u64							\
+catu_read_##name(struct catu_drvdata *drvdata)				\
+{									\
+	return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);	\
+}									\
+static inline void							\
+catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
+{									\
+	coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);	\
+}
+
+CATU_REG32(control, CATU_CONTROL);
+CATU_REG32(mode, CATU_MODE);
+CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
+CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
+
+static inline bool coresight_is_catu_device(struct coresight_device *csdev)
+{
+	enum coresight_dev_subtype_helper subtype;
+
+	/* Make the checkpatch happy */
+	subtype = csdev->subtype.helper_subtype;
+
+	return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
+	       csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
+	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
+}
+
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 68fbc8f..9b0c620 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,9 +17,26 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include "coresight-catu.h"
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
+{
+	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
+
+	if (catu && helper_ops(catu)->enable)
+		helper_ops(catu)->enable(catu, NULL);
+}
+
+static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
+{
+	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
+
+	if (catu && helper_ops(catu)->disable)
+		helper_ops(catu)->disable(catu, NULL);
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
@@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	/* Zero out the memory to help with debug */
 	memset(drvdata->vaddr, 0, drvdata->size);
 
+	/*
+	 * If this ETR is connected to a CATU, enable it before we turn
+	 * this on
+	 */
+	tmc_etr_enable_catu(drvdata);
+
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
@@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
+
+	/* Disable CATU device if this ETR is connected to one */
+	tmc_etr_disable_catu(drvdata);
 }
 
 static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 8df7a81..cdff853 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -19,6 +19,7 @@
 #define _CORESIGHT_TMC_H
 
 #include <linux/miscdevice.h>
+#include "coresight-catu.h"
 
 #define TMC_RSZ			0x004
 #define TMC_STS			0x00c
@@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
 	return !!(drvdata->etr_caps & cap);
 }
 
+/*
+ * TMC ETR could be connected to a CATU device, which can provide address
+ * translation service. This is represented by the Output port of the TMC
+ * (ETR) connected to the input port of the CATU.
+ *
+ * Returns	: coresight_device ptr for the CATU device if a CATU is found.
+ *		: NULL otherwise.
+ */
+static inline struct coresight_device *
+tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
+{
+	int i;
+	struct coresight_device *tmp, *etr = drvdata->csdev;
+
+	if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
+		return NULL;
+
+	for (i = 0; i < etr->nr_outport; i++) {
+		tmp = etr->conns[0].child_dev;
+		if (tmp && coresight_is_catu_device(tmp))
+			return tmp;
+	}
+
+	return NULL;
+}
+
 #endif
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 5e926f7..c0e1568 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
 
 enum coresight_dev_subtype_helper {
 	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
+	CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose, devicetree,
	Mathieu Poirier

Document CATU device-tree bindings. CATU augments the TMC-ETR
by providing an improved Scatter Gather mechanism for streaming
trace data to non-contiguous system RAM pages.

Cc: devicetree@vger.kernel.org
Cc: frowand.list@gmail.com
Cc: Rob Herring <robh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 15ac8e8..cdd84d0 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -39,6 +39,8 @@ its hardware characteristcs.
 
 		- System Trace Macrocell:
 			"arm,coresight-stm", "arm,primecell"; [1]
+		- Coresight Address Translation Unit (CATU)
+			"arm, coresight-catu", "arm,primecell";
 
 	* reg: physical base address and length of the register
 	  set(s) of the component.
@@ -86,6 +88,9 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+* Optional property for CATU :
+	* interrupts : Exactly one SPI may be listed for reporting the address
+	  error
 
 Example:
 
@@ -118,6 +123,35 @@ Example:
 		};
 	};
 
+	etr@20070000 {
+		compatible = "arm,coresight-tmc", "arm,primecell";
+		reg = <0 0x20070000 0 0x1000>;
+
+		clocks = <&oscclk6a>;
+		clock-names = "apb_pclk";
+		ports {
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			/* input port */
+			port@0 {
+				reg =  <0>;
+				etr_in_port: endpoint {
+					slave-mode;
+					remote-endpoint = <&replicator2_out_port0>;
+				};
+			};
+
+			/* CATU link represented by output port */
+			port@1 {
+				reg = <0>;
+				etr_out_port: endpoint {
+					remote-endpoint = <&catu_in_port>;
+				};
+			};
+		};
+	};
+
 2. Links
 	replicator {
 		/* non-configurable replicators don't show up on the
@@ -247,5 +281,23 @@ Example:
 		};
 	};
 
+5. CATU
+
+	catu@207e0000 {
+		compatible = "arm,coresight-catu", "arm,primecell";
+		reg = <0 0x207e0000 0 0x1000>;
+
+		clocks = <&oscclk6a>;
+		clock-names = "apb_pclk";
+
+		interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
+		port {
+			catu_in_port: endpoint {
+				slave-mode;
+				remote-endpoint = <&etr_out_port>;
+			};
+		};
+	};
+
 [1]. There is currently two version of STM: STM32 and STM500.  Both
 have the same HW interface and as such don't need an explicit binding name.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Document CATU device-tree bindings. CATU augments the TMC-ETR
by providing an improved Scatter Gather mechanism for streaming
trace data to non-contiguous system RAM pages.

Cc: devicetree at vger.kernel.org
Cc: frowand.list at gmail.com
Cc: Rob Herring <robh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 15ac8e8..cdd84d0 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -39,6 +39,8 @@ its hardware characteristcs.
 
 		- System Trace Macrocell:
 			"arm,coresight-stm", "arm,primecell"; [1]
+		- Coresight Address Translation Unit (CATU)
+			"arm, coresight-catu", "arm,primecell";
 
 	* reg: physical base address and length of the register
 	  set(s) of the component.
@@ -86,6 +88,9 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+* Optional property for CATU :
+	* interrupts : Exactly one SPI may be listed for reporting the address
+	  error
 
 Example:
 
@@ -118,6 +123,35 @@ Example:
 		};
 	};
 
+	etr at 20070000 {
+		compatible = "arm,coresight-tmc", "arm,primecell";
+		reg = <0 0x20070000 0 0x1000>;
+
+		clocks = <&oscclk6a>;
+		clock-names = "apb_pclk";
+		ports {
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			/* input port */
+			port at 0 {
+				reg =  <0>;
+				etr_in_port: endpoint {
+					slave-mode;
+					remote-endpoint = <&replicator2_out_port0>;
+				};
+			};
+
+			/* CATU link represented by output port */
+			port at 1 {
+				reg = <0>;
+				etr_out_port: endpoint {
+					remote-endpoint = <&catu_in_port>;
+				};
+			};
+		};
+	};
+
 2. Links
 	replicator {
 		/* non-configurable replicators don't show up on the
@@ -247,5 +281,23 @@ Example:
 		};
 	};
 
+5. CATU
+
+	catu at 207e0000 {
+		compatible = "arm,coresight-catu", "arm,primecell";
+		reg = <0 0x207e0000 0 0x1000>;
+
+		clocks = <&oscclk6a>;
+		clock-names = "apb_pclk";
+
+		interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
+		port {
+			catu_in_port: endpoint {
+				slave-mode;
+				remote-endpoint = <&etr_out_port>;
+			};
+		};
+	};
+
 [1]. There is currently two version of STM: STM32 and STM500.  Both
 have the same HW interface and as such don't need an explicit binding name.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 06/27] coresight: tmc etr: Disallow perf mode temporarily
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

We don't support ETR in perf mode yet. Temporarily fail the
operation until we add proper support.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 28 ++-----------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 9b0c620..bff46f2 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -218,32 +218,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	int ret = 0;
-	unsigned long flags;
-	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (drvdata->reading) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/*
-	 * In Perf mode there can be only one writer per sink.  There
-	 * is also no need to continue if the ETR is already operated
-	 * from sysFS.
-	 */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	drvdata->mode = CS_MODE_PERF;
-	tmc_etr_enable_hw(drvdata);
-out:
-	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-	return ret;
+	/* We don't support perf mode yet ! */
+	return -EINVAL;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 06/27] coresight: tmc etr: Disallow perf mode temporarily
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

We don't support ETR in perf mode yet. Temporarily fail the
operation until we add proper support.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 28 ++-----------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 9b0c620..bff46f2 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -218,32 +218,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	int ret = 0;
-	unsigned long flags;
-	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (drvdata->reading) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/*
-	 * In Perf mode there can be only one writer per sink.  There
-	 * is also no need to continue if the ETR is already operated
-	 * from sysFS.
-	 */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	drvdata->mode = CS_MODE_PERF;
-	tmc_etr_enable_hw(drvdata);
-out:
-	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-	return ret;
+	/* We don't support perf mode yet ! */
+	return -EINVAL;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 07/27] coresight: tmc: Hide trace buffer handling for file read
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

At the moment we adjust the buffer pointers for reading the trace
data via misc device in the common code for ETF/ETB and ETR. Since
we are going to change how we manage the buffer for ETR, let us
move the buffer manipulation to the respective driver files, hiding
it from the common code. We do so by adding type specific helpers
for finding the length of data and the pointer to the buffer,
for a given length at a file position.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
 drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e2513b7..2113e93 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer from @pos, with
+ * a maximum limit of @len, updating the @bufpp on where to
+ * find it.
+ */
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				  loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+	if (actual > 0)
+		*bufpp = drvdata->buf + pos;
+	return actual;
+}
+
 static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index bff46f2..53a17a8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -92,6 +92,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer @pos, with a maximum
+ * limit of @len, also updating the @bufpp on where to find it.
+ */
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+			    loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+	char *bufp = drvdata->buf + pos;
+	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+
+	if (actual <= 0)
+		return actual;
+
+	/*
+	 * Since we use a circular buffer, with trace data starting
+	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
+	 * wrap the current @pos to within the buffer.
+	 */
+	if (bufp >= bufend)
+		bufp -= drvdata->size;
+	/*
+	 * For simplicity, avoid copying over a wrapped around buffer.
+	 */
+	if ((bufp + actual) > bufend)
+		actual = bufend - bufp;
+	*bufpp = bufp;
+	return actual;
+}
+
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
 	const u32 *barrier;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 0ea04f5..7a4e84f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
+					loff_t pos, size_t len, char **bufpp)
+{
+	switch (drvdata->config_type) {
+	case TMC_CONFIG_TYPE_ETB:
+	case TMC_CONFIG_TYPE_ETF:
+		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
+	case TMC_CONFIG_TYPE_ETR:
+		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
+	}
+
+	return  -EINVAL;
+}
+
 static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
 			loff_t *ppos)
 {
+	char *bufp;
+	ssize_t actual;
 	struct tmc_drvdata *drvdata = container_of(file->private_data,
 						   struct tmc_drvdata, miscdev);
-	char *bufp = drvdata->buf + *ppos;
+	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
+	if (actual <= 0)
+		return 0;
 
-	if (*ppos + len > drvdata->len)
-		len = drvdata->len - *ppos;
-
-	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
-		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
-			bufp = drvdata->vaddr;
-		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
-			bufp -= drvdata->size;
-		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
-			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
-	}
-
-	if (copy_to_user(data, bufp, len)) {
+	if (copy_to_user(data, bufp, actual)) {
 		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
 		return -EFAULT;
 	}
 
-	*ppos += len;
+	*ppos += actual;
+	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
 
-	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
-		__func__, len, (int)(drvdata->len - *ppos));
-	return len;
+	return actual;
 }
 
 static int tmc_release(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index cdff853..9cbc4d5 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -184,10 +184,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etb_cs_ops;
 extern const struct coresight_ops tmc_etf_cs_ops;
 
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 /* ETR functions */
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etr_cs_ops;
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 
 
 #define TMC_REG_PAIR(name, lo_off, hi_off)				\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 07/27] coresight: tmc: Hide trace buffer handling for file read
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment we adjust the buffer pointers for reading the trace
data via misc device in the common code for ETF/ETB and ETR. Since
we are going to change how we manage the buffer for ETR, let us
move the buffer manipulation to the respective driver files, hiding
it from the common code. We do so by adding type specific helpers
for finding the length of data and the pointer to the buffer,
for a given length at a file position.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
 drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e2513b7..2113e93 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer from @pos, with
+ * a maximum limit of @len, updating the @bufpp on where to
+ * find it.
+ */
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				  loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+	if (actual > 0)
+		*bufpp = drvdata->buf + pos;
+	return actual;
+}
+
 static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index bff46f2..53a17a8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -92,6 +92,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer @pos, with a maximum
+ * limit of @len, also updating the @bufpp on where to find it.
+ */
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+			    loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+	char *bufp = drvdata->buf + pos;
+	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+
+	if (actual <= 0)
+		return actual;
+
+	/*
+	 * Since we use a circular buffer, with trace data starting
+	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
+	 * wrap the current @pos to within the buffer.
+	 */
+	if (bufp >= bufend)
+		bufp -= drvdata->size;
+	/*
+	 * For simplicity, avoid copying over a wrapped around buffer.
+	 */
+	if ((bufp + actual) > bufend)
+		actual = bufend - bufp;
+	*bufpp = bufp;
+	return actual;
+}
+
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
 	const u32 *barrier;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 0ea04f5..7a4e84f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
+					loff_t pos, size_t len, char **bufpp)
+{
+	switch (drvdata->config_type) {
+	case TMC_CONFIG_TYPE_ETB:
+	case TMC_CONFIG_TYPE_ETF:
+		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
+	case TMC_CONFIG_TYPE_ETR:
+		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
+	}
+
+	return  -EINVAL;
+}
+
 static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
 			loff_t *ppos)
 {
+	char *bufp;
+	ssize_t actual;
 	struct tmc_drvdata *drvdata = container_of(file->private_data,
 						   struct tmc_drvdata, miscdev);
-	char *bufp = drvdata->buf + *ppos;
+	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
+	if (actual <= 0)
+		return 0;
 
-	if (*ppos + len > drvdata->len)
-		len = drvdata->len - *ppos;
-
-	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
-		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
-			bufp = drvdata->vaddr;
-		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
-			bufp -= drvdata->size;
-		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
-			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
-	}
-
-	if (copy_to_user(data, bufp, len)) {
+	if (copy_to_user(data, bufp, actual)) {
 		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
 		return -EFAULT;
 	}
 
-	*ppos += len;
+	*ppos += actual;
+	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
 
-	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
-		__func__, len, (int)(drvdata->len - *ppos));
-	return len;
+	return actual;
 }
 
 static int tmc_release(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index cdff853..9cbc4d5 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -184,10 +184,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etb_cs_ops;
 extern const struct coresight_ops tmc_etf_cs_ops;
 
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 /* ETR functions */
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etr_cs_ops;
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 
 
 #define TMC_REG_PAIR(name, lo_off, hi_off)				\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 08/27] coresight: tmc-etr: Do not clean trace buffer
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

We zero out the entire trace buffer used for ETR before it is enabled,
for helping with debugging. Since we could be restoring an already used
buffer in perf mode, this could destroy the data. Get rid of this step;
if someone wants to debug, they can always add it as and when needed.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 53a17a8..fc1ff3f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -41,9 +41,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
 
-	/* Zero out the memory to help with debug */
-	memset(drvdata->vaddr, 0, drvdata->size);
-
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
 	 * this on
@@ -354,9 +351,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
-		 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
-		 * so we don't have to explicitly clear it. Also, since the
-		 * tracer is still enabled drvdata::buf can't be NULL.
+		 * buffer. Since the tracer is still enabled drvdata::buf can't
+		 * be NULL.
 		 */
 		tmc_etr_enable_hw(drvdata);
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 08/27] coresight: tmc-etr: Do not clean trace buffer
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

We zero out the entire trace buffer used for ETR before it is enabled,
for helping with debugging. Since we could be restoring an already used
buffer in perf mode, this could destroy the data. Get rid of this step;
if someone wants to debug, they can always add it as and when needed.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 53a17a8..fc1ff3f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -41,9 +41,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
 
-	/* Zero out the memory to help with debug */
-	memset(drvdata->vaddr, 0, drvdata->size);
-
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
 	 * this on
@@ -354,9 +351,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
-		 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
-		 * so we don't have to explicitly clear it. Also, since the
-		 * tracer is still enabled drvdata::buf can't be NULL.
+		 * buffer. Since the tracer is still enabled drvdata::buf can't
+		 * be NULL.
 		 */
 		tmc_etr_enable_hw(drvdata);
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 09/27] coresight: Add helper for inserting synchronization packets
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Right now we open code filling the trace buffer with synchronization
packets when the circular buffer wraps around in different drivers.
Move this to a common place. While at it, clean up the barrier_pkt
array to strip off the trailing '\0'.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c   | 12 ++++-------
 drivers/hwtracing/coresight/coresight-priv.h    | 10 ++++++++-
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 ++++++++-----------------
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +-----------
 drivers/hwtracing/coresight/coresight.c         |  3 +--
 5 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 580cd38..74232e6 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -202,7 +202,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 	bool lost = false;
 	int i;
 	u8 *buf_ptr;
-	const u32 *barrier;
 	u32 read_data, depth;
 	u32 read_ptr, write_ptr;
 	u32 frame_off, frame_endoff;
@@ -233,19 +232,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 
 	depth = drvdata->buffer_depth;
 	buf_ptr = drvdata->buf;
-	barrier = barrier_pkt;
 	for (i = 0; i < depth; i++) {
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
-			read_data = *barrier;
-			barrier++;
-		}
-
 		*(u32 *)buf_ptr = read_data;
 		buf_ptr += 4;
 	}
 
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+
 	if (frame_off) {
 		buf_ptr -= (frame_endoff * 4);
 		for (i = 0; i < frame_endoff; i++) {
@@ -454,7 +450,7 @@ static void etb_update_buffer(struct coresight_device *csdev,
 		buf_ptr = buf->data_pages[cur] + offset;
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
+		if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
 			read_data = *barrier;
 			barrier++;
 		}
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index f1d0e21d..2bb0a15 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -64,7 +64,8 @@ static DEVICE_ATTR_RO(name)
 #define coresight_simple_reg64(type, name, lo_off, hi_off)		\
 	__coresight_simple_func(type, NULL, name, lo_off, hi_off)
 
-extern const u32 barrier_pkt[5];
+extern const u32 barrier_pkt[4];
+#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt))
 
 enum etm_addr_type {
 	ETM_ADDR_TYPE_NONE,
@@ -98,6 +99,13 @@ struct cs_buffers {
 	void			**data_pages;
 };
 
+static inline void coresight_insert_barrier_packet(void *buf)
+{
+	if (buf)
+		memcpy(buf, barrier_pkt, CORESIGHT_BARRIER_PKT_SIZE);
+}
+
+
 static inline void CS_LOCK(void __iomem *addr)
 {
 	do {
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 2113e93..1dd44fd 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -43,39 +43,28 @@ static void tmc_etb_enable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
 {
-	bool lost = false;
 	char *bufp;
-	const u32 *barrier;
-	u32 read_data, status;
+	u32 read_data, lost;
 	int i;
 
-	/*
-	 * Get a hold of the status register and see if a wrap around
-	 * has occurred.
-	 */
-	status = readl_relaxed(drvdata->base + TMC_STS);
-	if (status & TMC_STS_FULL)
-		lost = true;
-
+	/* Check if the buffer wrapped around. */
+	lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL;
 	bufp = drvdata->buf;
 	drvdata->len = 0;
-	barrier = barrier_pkt;
 	while (1) {
 		for (i = 0; i < drvdata->memwidth; i++) {
 			read_data = readl_relaxed(drvdata->base + TMC_RRD);
 			if (read_data == 0xFFFFFFFF)
-				return;
-
-			if (lost && *barrier) {
-				read_data = *barrier;
-				barrier++;
-			}
-
+				goto done;
 			memcpy(bufp, &read_data, 4);
 			bufp += 4;
 			drvdata->len += 4;
 		}
 	}
+done:
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+	return;
 }
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index fc1ff3f..7af72d7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -125,9 +125,7 @@ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
-	const u32 *barrier;
 	u32 val;
-	u32 *temp;
 	u64 rwp;
 
 	rwp = tmc_read_rwp(drvdata);
@@ -140,16 +138,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 	if (val & TMC_STS_FULL) {
 		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
 		drvdata->len = drvdata->size;
-
-		barrier = barrier_pkt;
-		temp = (u32 *)drvdata->buf;
-
-		while (*barrier) {
-			*temp = *barrier;
-			temp++;
-			barrier++;
-		}
-
+		coresight_insert_barrier_packet(drvdata->buf);
 	} else {
 		drvdata->buf = drvdata->vaddr;
 		drvdata->len = rwp - drvdata->paddr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index fd0251e..021d8ec 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -58,8 +58,7 @@ static struct list_head *stm_path;
  * beginning of the data collected in a buffer.  That way the decoder knows that
  * it needs to look for another sync sequence.
  */
-const u32 barrier_pkt[5] = {0x7fffffff, 0x7fffffff,
-			    0x7fffffff, 0x7fffffff, 0x0};
+const u32 barrier_pkt[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff};
 
 static int coresight_id_match(struct device *dev, void *data)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 09/27] coresight: Add helper for inserting synchronization packets
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Right now we open code filling the trace buffer with synchronization
packets when the circular buffer wraps around in different drivers.
Move this to a common place. While at it, clean up the barrier_pkt
array to strip off the trailing '\0'.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c   | 12 ++++-------
 drivers/hwtracing/coresight/coresight-priv.h    | 10 ++++++++-
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 ++++++++-----------------
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +-----------
 drivers/hwtracing/coresight/coresight.c         |  3 +--
 5 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 580cd38..74232e6 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -202,7 +202,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 	bool lost = false;
 	int i;
 	u8 *buf_ptr;
-	const u32 *barrier;
 	u32 read_data, depth;
 	u32 read_ptr, write_ptr;
 	u32 frame_off, frame_endoff;
@@ -233,19 +232,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 
 	depth = drvdata->buffer_depth;
 	buf_ptr = drvdata->buf;
-	barrier = barrier_pkt;
 	for (i = 0; i < depth; i++) {
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
-			read_data = *barrier;
-			barrier++;
-		}
-
 		*(u32 *)buf_ptr = read_data;
 		buf_ptr += 4;
 	}
 
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+
 	if (frame_off) {
 		buf_ptr -= (frame_endoff * 4);
 		for (i = 0; i < frame_endoff; i++) {
@@ -454,7 +450,7 @@ static void etb_update_buffer(struct coresight_device *csdev,
 		buf_ptr = buf->data_pages[cur] + offset;
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
+		if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
 			read_data = *barrier;
 			barrier++;
 		}
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index f1d0e21d..2bb0a15 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -64,7 +64,8 @@ static DEVICE_ATTR_RO(name)
 #define coresight_simple_reg64(type, name, lo_off, hi_off)		\
 	__coresight_simple_func(type, NULL, name, lo_off, hi_off)
 
-extern const u32 barrier_pkt[5];
+extern const u32 barrier_pkt[4];
+#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt))
 
 enum etm_addr_type {
 	ETM_ADDR_TYPE_NONE,
@@ -98,6 +99,13 @@ struct cs_buffers {
 	void			**data_pages;
 };
 
+static inline void coresight_insert_barrier_packet(void *buf)
+{
+	if (buf)
+		memcpy(buf, barrier_pkt, CORESIGHT_BARRIER_PKT_SIZE);
+}
+
+
 static inline void CS_LOCK(void __iomem *addr)
 {
 	do {
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 2113e93..1dd44fd 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -43,39 +43,28 @@ static void tmc_etb_enable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
 {
-	bool lost = false;
 	char *bufp;
-	const u32 *barrier;
-	u32 read_data, status;
+	u32 read_data, lost;
 	int i;
 
-	/*
-	 * Get a hold of the status register and see if a wrap around
-	 * has occurred.
-	 */
-	status = readl_relaxed(drvdata->base + TMC_STS);
-	if (status & TMC_STS_FULL)
-		lost = true;
-
+	/* Check if the buffer wrapped around. */
+	lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL;
 	bufp = drvdata->buf;
 	drvdata->len = 0;
-	barrier = barrier_pkt;
 	while (1) {
 		for (i = 0; i < drvdata->memwidth; i++) {
 			read_data = readl_relaxed(drvdata->base + TMC_RRD);
 			if (read_data == 0xFFFFFFFF)
-				return;
-
-			if (lost && *barrier) {
-				read_data = *barrier;
-				barrier++;
-			}
-
+				goto done;
 			memcpy(bufp, &read_data, 4);
 			bufp += 4;
 			drvdata->len += 4;
 		}
 	}
+done:
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+	return;
 }
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index fc1ff3f..7af72d7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -125,9 +125,7 @@ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
-	const u32 *barrier;
 	u32 val;
-	u32 *temp;
 	u64 rwp;
 
 	rwp = tmc_read_rwp(drvdata);
@@ -140,16 +138,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 	if (val & TMC_STS_FULL) {
 		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
 		drvdata->len = drvdata->size;
-
-		barrier = barrier_pkt;
-		temp = (u32 *)drvdata->buf;
-
-		while (*barrier) {
-			*temp = *barrier;
-			temp++;
-			barrier++;
-		}
-
+		coresight_insert_barrier_packet(drvdata->buf);
 	} else {
 		drvdata->buf = drvdata->vaddr;
 		drvdata->len = rwp - drvdata->paddr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index fd0251e..021d8ec 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -58,8 +58,7 @@ static struct list_head *stm_path;
  * beginning of the data collected in a buffer.  That way the decoder knows that
  * it needs to look for another sync sequence.
  */
-const u32 barrier_pkt[5] = {0x7fffffff, 0x7fffffff,
-			    0x7fffffff, 0x7fffffff, 0x0};
+const u32 barrier_pkt[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff};
 
 static int coresight_id_match(struct device *dev, void *data)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose,
	Mathieu Poirier, devicetree

We are about to add the support for ETR builtin scatter-gather mode
for dealing with large amount of trace buffers. However, on some of
the platforms, using the ETR SG mode can lock up the system due to
the way the ETR is connected to the memory subsystem.

In SG mode, the ETR performs READ from the scatter-gather table to
fetch the next page and regular WRITE of trace data. If the READ
operation doesn't complete(due to the memory subsystem issues,
which we have seen on a couple of platforms) the trace WRITE
cannot proceed leading to issues. So, we by default do not
use the SG mode, unless it is known to be safe on the platform.
We define a DT property for the TMC node to specify whether we
have a proper SG mode.

Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: John Horley <john.horley@arm.com>
Cc: Robert Walker <robert.walker@arm.com>
Cc: devicetree@vger.kernel.org
Cc: frowand.list@gmail.com
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
 drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index cdd84d0..7c0c8f0 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -88,6 +88,9 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+	* scatter-gather: boolean. Indicates that the TMC-ETR can safely
+	  use the SG mode on this system.
+
 * Optional property for CATU :
 	* interrupts : Exactly one SPI may be listed for reporting the address
 	  error
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7a4e84f..e38379c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
@@ -304,6 +305,11 @@ const struct attribute_group *coresight_tmc_groups[] = {
 	NULL,
 };
 
+static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
+{
+	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
+}
+
 /* Detect and initialise the capabilities of a TMC ETR */
 static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 			     u32 devid, void *dev_caps)
@@ -313,7 +319,7 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	/* Set the unadvertised capabilities */
 	tmc_etr_init_caps(drvdata, (u32)(unsigned long)dev_caps);
 
-	if (!(devid & TMC_DEVID_NOSCAT))
+	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
 	/* Check if the AXI address width is available */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to add the support for ETR builtin scatter-gather mode
for dealing with large amount of trace buffers. However, on some of
the platforms, using the ETR SG mode can lock up the system due to
the way the ETR is connected to the memory subsystem.

In SG mode, the ETR performs READ from the scatter-gather table to
fetch the next page and regular WRITE of trace data. If the READ
operation doesn't complete(due to the memory subsystem issues,
which we have seen on a couple of platforms) the trace WRITE
cannot proceed leading to issues. So, we by default do not
use the SG mode, unless it is known to be safe on the platform.
We define a DT property for the TMC node to specify whether we
have a proper SG mode.

Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: John Horley <john.horley@arm.com>
Cc: Robert Walker <robert.walker@arm.com>
Cc: devicetree at vger.kernel.org
Cc: frowand.list at gmail.com
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
 drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index cdd84d0..7c0c8f0 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -88,6 +88,9 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+	* scatter-gather: boolean. Indicates that the TMC-ETR can safely
+	  use the SG mode on this system.
+
 * Optional property for CATU :
 	* interrupts : Exactly one SPI may be listed for reporting the address
 	  error
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7a4e84f..e38379c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
@@ -304,6 +305,11 @@ const struct attribute_group *coresight_tmc_groups[] = {
 	NULL,
 };
 
+static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
+{
+	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
+}
+
 /* Detect and initialise the capabilities of a TMC ETR */
 static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 			     u32 devid, void *dev_caps)
@@ -313,7 +319,7 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	/* Set the unadvertised capabilities */
 	tmc_etr_init_caps(drvdata, (u32)(unsigned long)dev_caps);
 
-	if (!(devid & TMC_DEVID_NOSCAT))
+	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
 	/* Check if the AXI address width is available */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 11/27] dts: juno: Add scatter-gather support for all revisions
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose, Liviu Dudau,
	Lorenzo Pieralisi

Advertise that the scatter-gather is properly integrated on
all revisions of Juno board.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index eb749c5..34ae303 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -198,6 +198,7 @@
 		clocks = <&soc_smc50mhz>;
 		clock-names = "apb_pclk";
 		power-domains = <&scpi_devpd 0>;
+		scatter-gather;
 		port {
 			etr_in_port: endpoint {
 				slave-mode;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 11/27] dts: juno: Add scatter-gather support for all revisions
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Advertise that the scatter-gather is properly integrated on
all revisions of Juno board.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index eb749c5..34ae303 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -198,6 +198,7 @@
 		clocks = <&soc_smc50mhz>;
 		clock-names = "apb_pclk";
 		power-domains = <&scpi_devpd 0>;
+		scatter-gather;
 		port {
 			etr_in_port: endpoint {
 				slave-mode;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

The Coresight TMC-ETR SG mode could be unsafe on a platform where
the ETR is not properly connected to account for READ operations.
We use a DT node property to indicate if the system is safe.
This patch also provides a command line parameter to "force"
the use of SG mode to override the firmware information.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Hi

This is more of a debug patch for people who may want to
test their platform without too much of hacking. I am not
too keen on pushing this patch in.
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 7 ++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28e..03b51c3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -675,6 +675,14 @@
 			Enable/disable the CPU sampling based debugging.
 			0: default value, disable debugging
 			1: enable debugging at boot time
+	coresight_tmc.etr_force_sg
+			[ARM, ARM64]
+			Format: <bool>
+			Force using the TMC ETR builtin scatter-gather mode
+			even when it may be unsafe to use.
+			Default : 0, do not force using the builtin SG mode.
+				  1, Allow using the SG, ignoring the firmware
+				     provided information.
 
 	cpuidle.off=1	[CPU_IDLE]
 			disable the cpuidle sub-system
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index e38379c..c7bc681 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/module.h>
 #include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
@@ -33,6 +34,8 @@
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+static bool etr_force_sg;
+
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata)
 {
 	/* Ensure formatter, unformatter and hardware fifo are empty */
@@ -307,7 +310,8 @@ const struct attribute_group *coresight_tmc_groups[] = {
 
 static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
 {
-	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
+	return etr_force_sg ||
+	       fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
 }
 
 /* Detect and initialise the capabilities of a TMC ETR */
@@ -482,3 +486,4 @@ static struct amba_driver tmc_driver = {
 	.id_table	= tmc_ids,
 };
 builtin_amba_driver(tmc_driver);
+module_param(etr_force_sg, bool, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

The Coresight TMC-ETR SG mode could be unsafe on a platform where
the ETR is not properly connected to account for READ operations.
We use a DT node property to indicate if the system is safe.
This patch also provides a command line parameter to "force"
the use of SG mode to override the firmware information.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Hi

This is more of a debug patch for people who may want to
test their platform without too much of hacking. I am not
too keen on pushing this patch in.
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 7 ++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28e..03b51c3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -675,6 +675,14 @@
 			Enable/disable the CPU sampling based debugging.
 			0: default value, disable debugging
 			1: enable debugging at boot time
+	coresight_tmc.etr_force_sg
+			[ARM, ARM64]
+			Format: <bool>
+			Force using the TMC ETR builtin scatter-gather mode
+			even when it may be unsafe to use.
+			Default : 0, do not force using the builtin SG mode.
+				  1, Allow using the SG, ignoring the firmware
+				     provided information.
 
 	cpuidle.off=1	[CPU_IDLE]
 			disable the cpuidle sub-system
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index e38379c..c7bc681 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/module.h>
 #include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
@@ -33,6 +34,8 @@
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+static bool etr_force_sg;
+
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata)
 {
 	/* Ensure formatter, unformatter and hardware fifo are empty */
@@ -307,7 +310,8 @@ const struct attribute_group *coresight_tmc_groups[] = {
 
 static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
 {
-	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
+	return etr_force_sg ||
+	       fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
 }
 
 /* Detect and initialise the capabilities of a TMC ETR */
@@ -482,3 +486,4 @@ static struct amba_driver tmc_driver = {
 	.id_table	= tmc_ids,
 };
 builtin_amba_driver(tmc_driver);
+module_param(etr_force_sg, bool, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 13/27] coresight: Add generic TMC sg table framework
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose,
	Mathieu Poirier

This patch introduces a generic sg table data structure and
associated operations. An SG table can be used to map a set
of Data pages where the trace data could be stored by the TMC
ETR. The information about the data pages could be stored in
different formats, depending on the type of the underlying
SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
structure provides book keeping of the pages used for the data
as well as the table contents. The table should be filled by
the user of the infrastructure.

A table can be created by specifying the number of data pages
as well as the number of table pages required to hold the
pointers, where the latter could be different for different
types of tables. The pages are mapped in the appropriate dma
data direction mode (i.e, DMA_TO_DEVICE for table pages
and DMA_FROM_DEVICE for data pages).  The framework can optionally
accept a set of allocated data pages (e.g, perf ring buffer) and
map them accordingly. The table and data pages are vmap'ed to allow
easier access by the drivers. The framework also provides helpers to
sync the data written to the pages with appropriate directions.

This will be later used by the TMC ETR SG unit and CATU.

Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 284 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  50 +++++
 2 files changed, 334 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7af72d7..57a8fe1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,10 +17,294 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
 #include "coresight-catu.h"
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/*
+ * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
+ * and map the device address @addr to an offset within the virtual
+ * contiguous buffer.
+ */
+static long
+tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
+{
+	int i;
+	dma_addr_t page_start;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		page_start = tmc_pages->daddrs[i];
+		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
+			return i * PAGE_SIZE + (addr - page_start);
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * tmc_pages_free : Unmap and free the pages used by tmc_pages.
+ */
+static void tmc_pages_free(struct tmc_pages *tmc_pages,
+			   struct device *dev, enum dma_data_direction dir)
+{
+	int i;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
+			dma_unmap_page(dev, tmc_pages->daddrs[i],
+					 PAGE_SIZE, dir);
+		if (tmc_pages->pages && tmc_pages->pages[i])
+			__free_page(tmc_pages->pages[i]);
+	}
+
+	kfree(tmc_pages->pages);
+	kfree(tmc_pages->daddrs);
+	tmc_pages->pages = NULL;
+	tmc_pages->daddrs = NULL;
+	tmc_pages->nr_pages = 0;
+}
+
+/*
+ * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
+ * If @pages is not NULL, the list of page virtual addresses are
+ * used as the data pages. The pages are then dma_map'ed for @dev
+ * with dma_direction @dir.
+ *
+ * Returns 0 upon success, else the error number.
+ */
+static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
+			   struct device *dev, int node,
+			   enum dma_data_direction dir, void **pages)
+{
+	int i, nr_pages;
+	dma_addr_t paddr;
+	struct page *page;
+
+	nr_pages = tmc_pages->nr_pages;
+	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
+					 GFP_KERNEL);
+	if (!tmc_pages->daddrs)
+		return -ENOMEM;
+	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
+					 GFP_KERNEL);
+	if (!tmc_pages->pages) {
+		kfree(tmc_pages->daddrs);
+		tmc_pages->daddrs = NULL;
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		if (pages && pages[i]) {
+			page = virt_to_page(pages[i]);
+			get_page(page);
+		} else {
+			page = alloc_pages_node(node,
+						GFP_KERNEL | __GFP_ZERO, 0);
+		}
+		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+		if (dma_mapping_error(dev, paddr))
+			goto err;
+		tmc_pages->daddrs[i] = paddr;
+		tmc_pages->pages[i] = page;
+	}
+	return 0;
+err:
+	tmc_pages_free(tmc_pages, dev, dir);
+	return -ENOMEM;
+}
+
+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return 0;
+	return sg_table->table_daddr;
+}
+
+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return NULL;
+	return sg_table->table_vaddr;
+}
+
+static inline void *
+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.nr_pages))
+		return 0;
+	return sg_table->data_vaddr;
+}
+
+static inline long
+tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
+{
+	return tmc_pages_get_offset(&sg_table->data_pages, addr);
+}
+
+static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->table_vaddr)
+		vunmap(sg_table->table_vaddr);
+	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
+}
+
+static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->data_vaddr)
+		vunmap(sg_table->data_vaddr);
+	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
+}
+
+void tmc_free_sg_table(struct tmc_sg_table *sg_table)
+{
+	tmc_free_table_pages(sg_table);
+	tmc_free_data_pages(sg_table);
+}
+
+/*
+ * Alloc pages for the table. Since this will be used by the device,
+ * allocate the pages closer to the device (i.e, dev_to_node(dev)
+ * rather than the CPU node).
+ */
+static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
+{
+	int rc;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	rc = tmc_pages_alloc(table_pages, sg_table->dev,
+			     dev_to_node(sg_table->dev),
+			     DMA_TO_DEVICE, NULL);
+	if (rc)
+		return rc;
+	sg_table->table_vaddr = vmap(table_pages->pages,
+				     table_pages->nr_pages,
+				     VM_MAP,
+				     PAGE_KERNEL);
+	if (!sg_table->table_vaddr)
+		rc = -ENOMEM;
+	else
+		sg_table->table_daddr = table_pages->daddrs[0];
+	return rc;
+}
+
+static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
+{
+	int rc;
+
+	/* Allocate data pages on the node requested by the caller */
+	rc = tmc_pages_alloc(&sg_table->data_pages,
+			     sg_table->dev, sg_table->node,
+			     DMA_FROM_DEVICE, pages);
+	if (!rc) {
+		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
+					   sg_table->data_pages.nr_pages,
+					   VM_MAP,
+					   PAGE_KERNEL);
+		if (!sg_table->data_vaddr)
+			rc = -ENOMEM;
+	}
+	return rc;
+}
+
+/*
+ * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
+ * and data buffers. TMC writes to the data buffers and reads from the SG
+ * Table pages.
+ *
+ * @dev		- Device to which page should be DMA mapped.
+ * @node	- Numa node for mem allocations
+ * @nr_tpages	- Number of pages for the table entries.
+ * @nr_dpages	- Number of pages for Data buffer.
+ * @pages	- Optional list of virtual address of pages.
+ */
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages)
+{
+	long rc;
+	struct tmc_sg_table *sg_table;
+
+	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
+	if (!sg_table)
+		return ERR_PTR(-ENOMEM);
+	sg_table->data_pages.nr_pages = nr_dpages;
+	sg_table->table_pages.nr_pages = nr_tpages;
+	sg_table->node = node;
+	sg_table->dev = dev;
+
+	rc  = tmc_alloc_data_pages(sg_table, pages);
+	if (!rc)
+		rc = tmc_alloc_table_pages(sg_table);
+	if (rc) {
+		tmc_free_sg_table(sg_table);
+		kfree(sg_table);
+		return ERR_PTR(rc);
+	}
+
+	return sg_table;
+}
+
+/*
+ * tmc_sg_table_sync_data_range: Sync the data buffer written
+ * by the device from @offset upto a @size bytes.
+ */
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size)
+{
+	int i, index, start;
+	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
+	struct device *dev = table->dev;
+	struct tmc_pages *data = &table->data_pages;
+
+	start = offset >> PAGE_SHIFT;
+	for (i = start; i < (start + npages); i++) {
+		index = i % data->nr_pages;
+		dma_sync_single_for_cpu(dev, data->daddrs[index],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+
+/* tmc_sg_sync_table: Sync the page table */
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
+{
+	int i;
+	struct device *dev = sg_table->dev;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	for (i = 0; i < table_pages->nr_pages; i++)
+		dma_sync_single_for_device(dev, table_pages->daddrs[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+}
+
+/*
+ * tmc_sg_table_get_data: Get the buffer pointer for data @offset
+ * in the SG buffer. The @bufpp is updated to point to the buffer.
+ * Returns :
+ *	the length of linear data available at @offset.
+ *	or
+ *	<= 0 if no data is available.
+ */
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+				u64 offset, size_t len, char **bufpp)
+{
+	size_t size;
+	int pg_idx = offset >> PAGE_SHIFT;
+	int pg_offset = offset & (PAGE_SIZE - 1);
+	struct tmc_pages *data_pages = &sg_table->data_pages;
+
+	size = tmc_sg_table_buf_size(sg_table);
+	if (offset >= size)
+		return -EINVAL;
+	len = (len < (size - offset)) ? len : size - offset;
+	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
+	if (len > 0)
+		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
+	return len;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 9cbc4d5..74d8f24 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -19,6 +19,7 @@
 #define _CORESIGHT_TMC_H
 
 #include <linux/miscdevice.h>
+#include <linux/dma-mapping.h>
 #include "coresight-catu.h"
 
 #define TMC_RSZ			0x004
@@ -172,6 +173,38 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+/**
+ * struct tmc_pages - Collection of pages used for SG.
+ * @nr_pages:		Number of pages in the list.
+ * @daddrs:		Array of DMA'able page address.
+ * @pages:		Array pages for the buffer.
+ */
+struct tmc_pages {
+	int nr_pages;
+	dma_addr_t	*daddrs;
+	struct page	**pages;
+};
+
+/*
+ * struct tmc_sg_table - Generic SG table for TMC
+ * @dev:		Device for DMA allocations
+ * @table_vaddr:	Contiguous Virtual address for PageTable
+ * @data_vaddr:		Contiguous Virtual address for Data Buffer
+ * @table_daddr:	DMA address of the PageTable base
+ * @node:		Node for Page allocations
+ * @table_pages:	List of pages & dma address for Table
+ * @data_pages:		List of pages & dma address for Data
+ */
+struct tmc_sg_table {
+	struct device *dev;
+	void *table_vaddr;
+	void *data_vaddr;
+	dma_addr_t table_daddr;
+	int node;
+	struct tmc_pages table_pages;
+	struct tmc_pages data_pages;
+};
+
 /* Generic functions */
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
 void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
@@ -253,4 +286,21 @@ tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
 	return NULL;
 }
 
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages);
+void tmc_free_sg_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size);
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp);
+static inline unsigned long
+tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
+{
+	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
+}
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 13/27] coresight: Add generic TMC sg table framework
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

This patch introduces a generic sg table data structure and
associated operations. An SG table can be used to map a set
of Data pages where the trace data could be stored by the TMC
ETR. The information about the data pages could be stored in
different formats, depending on the type of the underlying
SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
structure provides book keeping of the pages used for the data
as well as the table contents. The table should be filled by
the user of the infrastructure.

A table can be created by specifying the number of data pages
as well as the number of table pages required to hold the
pointers, where the latter could be different for different
types of tables. The pages are mapped in the appropriate dma
data direction mode (i.e, DMA_TO_DEVICE for table pages
and DMA_FROM_DEVICE for data pages).  The framework can optionally
accept a set of allocated data pages (e.g, perf ring buffer) and
map them accordingly. The table and data pages are vmap'ed to allow
easier access by the drivers. The framework also provides helpers to
sync the data written to the pages with appropriate directions.

This will be later used by the TMC ETR SG unit and CATU.

Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 284 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  50 +++++
 2 files changed, 334 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7af72d7..57a8fe1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,10 +17,294 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
 #include "coresight-catu.h"
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/*
+ * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
+ * and map the device address @addr to an offset within the virtual
+ * contiguous buffer.
+ */
+static long
+tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
+{
+	int i;
+	dma_addr_t page_start;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		page_start = tmc_pages->daddrs[i];
+		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
+			return i * PAGE_SIZE + (addr - page_start);
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * tmc_pages_free : Unmap and free the pages used by tmc_pages.
+ */
+static void tmc_pages_free(struct tmc_pages *tmc_pages,
+			   struct device *dev, enum dma_data_direction dir)
+{
+	int i;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
+			dma_unmap_page(dev, tmc_pages->daddrs[i],
+					 PAGE_SIZE, dir);
+		if (tmc_pages->pages && tmc_pages->pages[i])
+			__free_page(tmc_pages->pages[i]);
+	}
+
+	kfree(tmc_pages->pages);
+	kfree(tmc_pages->daddrs);
+	tmc_pages->pages = NULL;
+	tmc_pages->daddrs = NULL;
+	tmc_pages->nr_pages = 0;
+}
+
+/*
+ * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
+ * If @pages is not NULL, the list of page virtual addresses are
+ * used as the data pages. The pages are then dma_map'ed for @dev
+ * with dma_direction @dir.
+ *
+ * Returns 0 upon success, else the error number.
+ */
+static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
+			   struct device *dev, int node,
+			   enum dma_data_direction dir, void **pages)
+{
+	int i, nr_pages;
+	dma_addr_t paddr;
+	struct page *page;
+
+	nr_pages = tmc_pages->nr_pages;
+	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
+					 GFP_KERNEL);
+	if (!tmc_pages->daddrs)
+		return -ENOMEM;
+	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
+					 GFP_KERNEL);
+	if (!tmc_pages->pages) {
+		kfree(tmc_pages->daddrs);
+		tmc_pages->daddrs = NULL;
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		if (pages && pages[i]) {
+			page = virt_to_page(pages[i]);
+			get_page(page);
+		} else {
+			page = alloc_pages_node(node,
+						GFP_KERNEL | __GFP_ZERO, 0);
+		}
+		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+		if (dma_mapping_error(dev, paddr))
+			goto err;
+		tmc_pages->daddrs[i] = paddr;
+		tmc_pages->pages[i] = page;
+	}
+	return 0;
+err:
+	tmc_pages_free(tmc_pages, dev, dir);
+	return -ENOMEM;
+}
+
+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return 0;
+	return sg_table->table_daddr;
+}
+
+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return NULL;
+	return sg_table->table_vaddr;
+}
+
+static inline void *
+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.nr_pages))
+		return 0;
+	return sg_table->data_vaddr;
+}
+
+static inline long
+tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
+{
+	return tmc_pages_get_offset(&sg_table->data_pages, addr);
+}
+
+static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->table_vaddr)
+		vunmap(sg_table->table_vaddr);
+	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
+}
+
+static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->data_vaddr)
+		vunmap(sg_table->data_vaddr);
+	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
+}
+
+void tmc_free_sg_table(struct tmc_sg_table *sg_table)
+{
+	tmc_free_table_pages(sg_table);
+	tmc_free_data_pages(sg_table);
+}
+
+/*
+ * Alloc pages for the table. Since this will be used by the device,
+ * allocate the pages closer to the device (i.e, dev_to_node(dev)
+ * rather than the CPU node).
+ */
+static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
+{
+	int rc;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	rc = tmc_pages_alloc(table_pages, sg_table->dev,
+			     dev_to_node(sg_table->dev),
+			     DMA_TO_DEVICE, NULL);
+	if (rc)
+		return rc;
+	sg_table->table_vaddr = vmap(table_pages->pages,
+				     table_pages->nr_pages,
+				     VM_MAP,
+				     PAGE_KERNEL);
+	if (!sg_table->table_vaddr)
+		rc = -ENOMEM;
+	else
+		sg_table->table_daddr = table_pages->daddrs[0];
+	return rc;
+}
+
+static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
+{
+	int rc;
+
+	/* Allocate data pages on the node requested by the caller */
+	rc = tmc_pages_alloc(&sg_table->data_pages,
+			     sg_table->dev, sg_table->node,
+			     DMA_FROM_DEVICE, pages);
+	if (!rc) {
+		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
+					   sg_table->data_pages.nr_pages,
+					   VM_MAP,
+					   PAGE_KERNEL);
+		if (!sg_table->data_vaddr)
+			rc = -ENOMEM;
+	}
+	return rc;
+}
+
+/*
+ * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
+ * and data buffers. TMC writes to the data buffers and reads from the SG
+ * Table pages.
+ *
+ * @dev		- Device to which page should be DMA mapped.
+ * @node	- Numa node for mem allocations
+ * @nr_tpages	- Number of pages for the table entries.
+ * @nr_dpages	- Number of pages for Data buffer.
+ * @pages	- Optional list of virtual address of pages.
+ */
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages)
+{
+	long rc;
+	struct tmc_sg_table *sg_table;
+
+	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
+	if (!sg_table)
+		return ERR_PTR(-ENOMEM);
+	sg_table->data_pages.nr_pages = nr_dpages;
+	sg_table->table_pages.nr_pages = nr_tpages;
+	sg_table->node = node;
+	sg_table->dev = dev;
+
+	rc  = tmc_alloc_data_pages(sg_table, pages);
+	if (!rc)
+		rc = tmc_alloc_table_pages(sg_table);
+	if (rc) {
+		tmc_free_sg_table(sg_table);
+		kfree(sg_table);
+		return ERR_PTR(rc);
+	}
+
+	return sg_table;
+}
+
+/*
+ * tmc_sg_table_sync_data_range: Sync the data buffer written
+ * by the device from @offset upto a @size bytes.
+ */
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size)
+{
+	int i, index, start;
+	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
+	struct device *dev = table->dev;
+	struct tmc_pages *data = &table->data_pages;
+
+	start = offset >> PAGE_SHIFT;
+	for (i = start; i < (start + npages); i++) {
+		index = i % data->nr_pages;
+		dma_sync_single_for_cpu(dev, data->daddrs[index],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+
+/* tmc_sg_sync_table: Sync the page table */
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
+{
+	int i;
+	struct device *dev = sg_table->dev;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	for (i = 0; i < table_pages->nr_pages; i++)
+		dma_sync_single_for_device(dev, table_pages->daddrs[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+}
+
+/*
+ * tmc_sg_table_get_data: Get the buffer pointer for data @offset
+ * in the SG buffer. The @bufpp is updated to point to the buffer.
+ * Returns :
+ *	the length of linear data available at @offset.
+ *	or
+ *	<= 0 if no data is available.
+ */
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+				u64 offset, size_t len, char **bufpp)
+{
+	size_t size;
+	int pg_idx = offset >> PAGE_SHIFT;
+	int pg_offset = offset & (PAGE_SIZE - 1);
+	struct tmc_pages *data_pages = &sg_table->data_pages;
+
+	size = tmc_sg_table_buf_size(sg_table);
+	if (offset >= size)
+		return -EINVAL;
+	len = (len < (size - offset)) ? len : size - offset;
+	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
+	if (len > 0)
+		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
+	return len;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 9cbc4d5..74d8f24 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -19,6 +19,7 @@
 #define _CORESIGHT_TMC_H
 
 #include <linux/miscdevice.h>
+#include <linux/dma-mapping.h>
 #include "coresight-catu.h"
 
 #define TMC_RSZ			0x004
@@ -172,6 +173,38 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+/**
+ * struct tmc_pages - Collection of pages used for SG.
+ * @nr_pages:		Number of pages in the list.
+ * @daddrs:		Array of DMA'able page address.
+ * @pages:		Array pages for the buffer.
+ */
+struct tmc_pages {
+	int nr_pages;
+	dma_addr_t	*daddrs;
+	struct page	**pages;
+};
+
+/*
+ * struct tmc_sg_table - Generic SG table for TMC
+ * @dev:		Device for DMA allocations
+ * @table_vaddr:	Contiguous Virtual address for PageTable
+ * @data_vaddr:		Contiguous Virtual address for Data Buffer
+ * @table_daddr:	DMA address of the PageTable base
+ * @node:		Node for Page allocations
+ * @table_pages:	List of pages & dma address for Table
+ * @data_pages:		List of pages & dma address for Data
+ */
+struct tmc_sg_table {
+	struct device *dev;
+	void *table_vaddr;
+	void *data_vaddr;
+	dma_addr_t table_daddr;
+	int node;
+	struct tmc_pages table_pages;
+	struct tmc_pages data_pages;
+};
+
 /* Generic functions */
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
 void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
@@ -253,4 +286,21 @@ tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
 	return NULL;
 }
 
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages);
+void tmc_free_sg_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size);
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp);
+static inline unsigned long
+tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
+{
+	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
+}
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 14/27] coresight: Add support for TMC ETR SG unit
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

This patch adds support for setting up an SG table used by the
TMC ETR inbuilt SG unit. The TMC ETR uses 4K page sized tables
to hold pointers to the 4K data pages with the last entry in a
table pointing to the next table with the entries, by kind of
chaining. The 2 LSBs determine the type of the table entry, to
one of :

 Normal - Points to a 4KB data page.
 Last   - Points to a 4KB data page, but is the last entry in the
          page table.
 Link   - Points to another 4KB table page with pointers to data.

The code takes care of handling the system page size which could
be different than 4K. So we could end up putting multiple ETR
SG tables in a single system page, vice versa for the data pages.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 262 ++++++++++++++++++++++++
 1 file changed, 262 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 57a8fe1..a003cfc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -23,6 +23,87 @@
 #include "coresight-tmc.h"
 
 /*
+ * The TMC ETR SG has a page size of 4K. The SG table contains pointers
+ * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
+ * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
+ * contain more than one SG buffer and tables.
+ *
+ * A table entry has the following format:
+ *
+ * ---Bit31------------Bit4-------Bit1-----Bit0--
+ * |     Address[39:12]    | SBZ |  Entry Type  |
+ * ----------------------------------------------
+ *
+ * Address: Bits [39:12] of a physical page address. Bits [11:0] are
+ *	    always zero.
+ *
+ * Entry type:
+ *	b00 - Reserved.
+ *	b01 - Last entry in the tables, points to 4K page buffer.
+ *	b10 - Normal entry, points to 4K page buffer.
+ *	b11 - Link. The address points to the base of next table.
+ */
+
+typedef u32 sgte_t;
+
+#define ETR_SG_PAGE_SHIFT		12
+#define ETR_SG_PAGE_SIZE		(1UL << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_PAGES_PER_SYSPAGE	(PAGE_SIZE / ETR_SG_PAGE_SIZE)
+#define ETR_SG_PTRS_PER_PAGE		(ETR_SG_PAGE_SIZE / sizeof(sgte_t))
+#define ETR_SG_PTRS_PER_SYSPAGE		(PAGE_SIZE / sizeof(sgte_t))
+
+#define ETR_SG_ET_MASK			0x3
+#define ETR_SG_ET_LAST			0x1
+#define ETR_SG_ET_NORMAL		0x2
+#define ETR_SG_ET_LINK			0x3
+
+#define ETR_SG_ADDR_SHIFT		4
+
+#define ETR_SG_ENTRY(addr, type) \
+	(sgte_t)((((addr) >> ETR_SG_PAGE_SHIFT) << ETR_SG_ADDR_SHIFT) | \
+		 (type & ETR_SG_ET_MASK))
+
+#define ETR_SG_ADDR(entry) \
+	(((dma_addr_t)(entry) >> ETR_SG_ADDR_SHIFT) << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_ET(entry)		((entry) & ETR_SG_ET_MASK)
+
+/*
+ * struct etr_sg_table : ETR SG Table
+ * @sg_table:		Generic SG Table holding the data/table pages.
+ * @hwaddr:		hwaddress used by the TMC, which is the base
+ *			address of the table.
+ */
+struct etr_sg_table {
+	struct tmc_sg_table	*sg_table;
+	dma_addr_t		hwaddr;
+};
+
+/*
+ * tmc_etr_sg_table_entries: Total number of table entries required to map
+ * @nr_pages system pages.
+ *
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
+ * with the last entry pointing to another page of table entries.
+ * If we spill over to a new page for mapping 1 entry, we could as
+ * well replace the link entry of the previous page with the last entry.
+ */
+static inline unsigned long __attribute_const__
+tmc_etr_sg_table_entries(int nr_pages)
+{
+	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
+	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
+	/*
+	 * If we spill over to a new page for 1 entry, we could as well
+	 * make it the LAST entry in the previous page, skipping the Link
+	 * address.
+	 */
+	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+		nr_sglinks--;
+	return nr_sgpages + nr_sglinks;
+}
+
+/*
  * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
  * and map the device address @addr to an offset within the virtual
  * contiguous buffer.
@@ -305,6 +386,187 @@ ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
 	return len;
 }
 
+#ifdef ETR_SG_DEBUG
+/* Map a dma address to virtual address */
+static unsigned long
+tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
+			dma_addr_t addr, bool table)
+{
+	long offset;
+	unsigned long base;
+	struct tmc_pages *tmc_pages;
+
+	if (table) {
+		tmc_pages = &sg_table->table_pages;
+		base = (unsigned long)sg_table->table_vaddr;
+	} else {
+		tmc_pages = &sg_table->data_pages;
+		base = (unsigned long)sg_table->data_vaddr;
+	}
+
+	offset = tmc_pages_get_offset(tmc_pages, addr);
+	if (offset < 0)
+		return 0;
+	return base + offset;
+}
+
+/* Dump the given sg_table */
+static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
+{
+	sgte_t *ptr;
+	int i = 0;
+	dma_addr_t addr;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+
+	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+					      etr_table->hwaddr, true);
+	while (ptr) {
+		addr = ETR_SG_ADDR(*ptr);
+		switch (ETR_SG_ET(*ptr)) {
+		case ETR_SG_ET_NORMAL:
+			dev_dbg(sg_table->dev,
+				"%05d: %p\t:[N] 0x%llx\n", i, ptr, addr);
+			ptr++;
+			break;
+		case ETR_SG_ET_LINK:
+			dev_dbg(sg_table->dev,
+				"%05d: *** %p\t:{L} 0x%llx ***\n",
+				 i, ptr, addr);
+			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+							      addr, true);
+			break;
+		case ETR_SG_ET_LAST:
+			dev_dbg(sg_table->dev,
+				"%05d: ### %p\t:[L] 0x%llx ###\n",
+				 i, ptr, addr);
+			return;
+		default:
+			dev_dbg(sg_table->dev,
+				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
+				 i, ptr, addr);
+			return;
+		}
+		i++;
+	}
+	dev_dbg(sg_table->dev, "******* End of Table *****\n");
+}
+#endif
+
+/*
+ * Populate the SG Table page table entries from table/data
+ * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages.
+ * So does a Table page. So we keep track of indices of the tables
+ * in each system page and move the pointers accordingly.
+ */
+#define INC_IDX_ROUND(idx, size) ((idx) = ((idx) + 1) % (size))
+static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
+{
+	dma_addr_t paddr;
+	int i, type, nr_entries;
+	int tpidx = 0; /* index to the current system table_page */
+	int sgtidx = 0;	/* index to the sg_table within the current syspage */
+	int sgtentry = 0; /* the entry within the sg_table */
+	int dpidx = 0; /* index to the current system data_page */
+	int spidx = 0; /* index to the SG page within the current data page */
+	sgte_t *ptr; /* pointer to the table entry to fill */
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
+	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
+
+	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
+	/*
+	 * Use the contiguous virtual address of the table to update entries.
+	 */
+	ptr = sg_table->table_vaddr;
+	/*
+	 * Fill all the entries, except the last entry to avoid special
+	 * checks within the loop.
+	 */
+	for (i = 0; i < nr_entries - 1; i++) {
+		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
+			/*
+			 * Last entry in a sg_table page is a link address to
+			 * the next table page. If this sg_table is the last
+			 * one in the system page, it links to the first
+			 * sg_table in the next system page. Otherwise, it
+			 * links to the next sg_table page within the system
+			 * page.
+			 */
+			if (sgtidx == ETR_SG_PAGES_PER_SYSPAGE - 1) {
+				paddr = table_daddrs[tpidx + 1];
+			} else {
+				paddr = table_daddrs[tpidx] +
+					(ETR_SG_PAGE_SIZE * (sgtidx + 1));
+			}
+			type = ETR_SG_ET_LINK;
+		} else {
+			/*
+			 * Update the indices to the data_pages to point to the
+			 * next sg_page in the data buffer.
+			 */
+			type = ETR_SG_ET_NORMAL;
+			paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+			if (!INC_IDX_ROUND(spidx, ETR_SG_PAGES_PER_SYSPAGE))
+				dpidx++;
+		}
+		*ptr++ = ETR_SG_ENTRY(paddr, type);
+		/*
+		 * Move to the next table pointer, moving the table page index
+		 * if necessary
+		 */
+		if (!INC_IDX_ROUND(sgtentry, ETR_SG_PTRS_PER_PAGE)) {
+			if (!INC_IDX_ROUND(sgtidx, ETR_SG_PAGES_PER_SYSPAGE))
+				tpidx++;
+		}
+	}
+
+	/* Set up the last entry, which is always a data pointer */
+	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+}
+
+/*
+ * tmc_init_etr_sg_table: Allocate a TMC ETR SG table, data buffer of @size and
+ * populate the table.
+ *
+ * @dev		- Device pointer for the TMC
+ * @node	- NUMA node where the memory should be allocated
+ * @size	- Total size of the data buffer
+ * @pages	- Optional list of page virtual address
+ */
+static struct etr_sg_table __maybe_unused *
+tmc_init_etr_sg_table(struct device *dev, int node,
+		  unsigned long size, void **pages)
+{
+	int nr_entries, nr_tpages;
+	int nr_dpages = size >> PAGE_SHIFT;
+	struct tmc_sg_table *sg_table;
+	struct etr_sg_table *etr_table;
+
+	etr_table = kzalloc(sizeof(*etr_table), GFP_KERNEL);
+	if (!etr_table)
+		return ERR_PTR(-ENOMEM);
+	nr_entries = tmc_etr_sg_table_entries(nr_dpages);
+	nr_tpages = DIV_ROUND_UP(nr_entries, ETR_SG_PTRS_PER_SYSPAGE);
+
+	sg_table = tmc_alloc_sg_table(dev, node, nr_tpages, nr_dpages, pages);
+	if (IS_ERR(sg_table)) {
+		kfree(etr_table);
+		return ERR_PTR(PTR_ERR(sg_table));
+	}
+
+	etr_table->sg_table = sg_table;
+	/* TMC should use table base address for DBA */
+	etr_table->hwaddr = sg_table->table_daddr;
+	tmc_etr_sg_table_populate(etr_table);
+	/* Sync the table pages for the HW */
+	tmc_sg_table_sync_table(sg_table);
+#ifdef ETR_SG_DEBUG
+	tmc_etr_sg_table_dump(etr_table);
+#endif
+	return etr_table;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 14/27] coresight: Add support for TMC ETR SG unit
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for setting up an SG table used by the
TMC ETR inbuilt SG unit. The TMC ETR uses 4K page sized tables
to hold pointers to the 4K data pages with the last entry in a
table pointing to the next table with the entries, by kind of
chaining. The 2 LSBs determine the type of the table entry, to
one of :

 Normal - Points to a 4KB data page.
 Last   - Points to a 4KB data page, but is the last entry in the
          page table.
 Link   - Points to another 4KB table page with pointers to data.

The code takes care of handling the system page size which could
be different than 4K. So we could end up putting multiple ETR
SG tables in a single system page, vice versa for the data pages.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 262 ++++++++++++++++++++++++
 1 file changed, 262 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 57a8fe1..a003cfc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -23,6 +23,87 @@
 #include "coresight-tmc.h"
 
 /*
+ * The TMC ETR SG has a page size of 4K. The SG table contains pointers
+ * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
+ * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
+ * contain more than one SG buffer and tables.
+ *
+ * A table entry has the following format:
+ *
+ * ---Bit31------------Bit4-------Bit1-----Bit0--
+ * |     Address[39:12]    | SBZ |  Entry Type  |
+ * ----------------------------------------------
+ *
+ * Address: Bits [39:12] of a physical page address. Bits [11:0] are
+ *	    always zero.
+ *
+ * Entry type:
+ *	b00 - Reserved.
+ *	b01 - Last entry in the tables, points to 4K page buffer.
+ *	b10 - Normal entry, points to 4K page buffer.
+ *	b11 - Link. The address points to the base of next table.
+ */
+
+typedef u32 sgte_t;
+
+#define ETR_SG_PAGE_SHIFT		12
+#define ETR_SG_PAGE_SIZE		(1UL << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_PAGES_PER_SYSPAGE	(PAGE_SIZE / ETR_SG_PAGE_SIZE)
+#define ETR_SG_PTRS_PER_PAGE		(ETR_SG_PAGE_SIZE / sizeof(sgte_t))
+#define ETR_SG_PTRS_PER_SYSPAGE		(PAGE_SIZE / sizeof(sgte_t))
+
+#define ETR_SG_ET_MASK			0x3
+#define ETR_SG_ET_LAST			0x1
+#define ETR_SG_ET_NORMAL		0x2
+#define ETR_SG_ET_LINK			0x3
+
+#define ETR_SG_ADDR_SHIFT		4
+
+#define ETR_SG_ENTRY(addr, type) \
+	(sgte_t)((((addr) >> ETR_SG_PAGE_SHIFT) << ETR_SG_ADDR_SHIFT) | \
+		 (type & ETR_SG_ET_MASK))
+
+#define ETR_SG_ADDR(entry) \
+	(((dma_addr_t)(entry) >> ETR_SG_ADDR_SHIFT) << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_ET(entry)		((entry) & ETR_SG_ET_MASK)
+
+/*
+ * struct etr_sg_table : ETR SG Table
+ * @sg_table:		Generic SG Table holding the data/table pages.
+ * @hwaddr:		hwaddress used by the TMC, which is the base
+ *			address of the table.
+ */
+struct etr_sg_table {
+	struct tmc_sg_table	*sg_table;
+	dma_addr_t		hwaddr;
+};
+
+/*
+ * tmc_etr_sg_table_entries: Total number of table entries required to map
+ * @nr_pages system pages.
+ *
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
+ * with the last entry pointing to another page of table entries.
+ * If we spill over to a new page for mapping 1 entry, we could as
+ * well replace the link entry of the previous page with the last entry.
+ */
+static inline unsigned long __attribute_const__
+tmc_etr_sg_table_entries(int nr_pages)
+{
+	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
+	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
+	/*
+	 * If we spill over to a new page for 1 entry, we could as well
+	 * make it the LAST entry in the previous page, skipping the Link
+	 * address.
+	 */
+	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+		nr_sglinks--;
+	return nr_sgpages + nr_sglinks;
+}
+
+/*
  * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
  * and map the device address @addr to an offset within the virtual
  * contiguous buffer.
@@ -305,6 +386,187 @@ ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
 	return len;
 }
 
+#ifdef ETR_SG_DEBUG
+/* Map a dma address to virtual address */
+static unsigned long
+tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
+			dma_addr_t addr, bool table)
+{
+	long offset;
+	unsigned long base;
+	struct tmc_pages *tmc_pages;
+
+	if (table) {
+		tmc_pages = &sg_table->table_pages;
+		base = (unsigned long)sg_table->table_vaddr;
+	} else {
+		tmc_pages = &sg_table->data_pages;
+		base = (unsigned long)sg_table->data_vaddr;
+	}
+
+	offset = tmc_pages_get_offset(tmc_pages, addr);
+	if (offset < 0)
+		return 0;
+	return base + offset;
+}
+
+/* Dump the given sg_table */
+static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
+{
+	sgte_t *ptr;
+	int i = 0;
+	dma_addr_t addr;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+
+	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+					      etr_table->hwaddr, true);
+	while (ptr) {
+		addr = ETR_SG_ADDR(*ptr);
+		switch (ETR_SG_ET(*ptr)) {
+		case ETR_SG_ET_NORMAL:
+			dev_dbg(sg_table->dev,
+				"%05d: %p\t:[N] 0x%llx\n", i, ptr, addr);
+			ptr++;
+			break;
+		case ETR_SG_ET_LINK:
+			dev_dbg(sg_table->dev,
+				"%05d: *** %p\t:{L} 0x%llx ***\n",
+				 i, ptr, addr);
+			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+							      addr, true);
+			break;
+		case ETR_SG_ET_LAST:
+			dev_dbg(sg_table->dev,
+				"%05d: ### %p\t:[L] 0x%llx ###\n",
+				 i, ptr, addr);
+			return;
+		default:
+			dev_dbg(sg_table->dev,
+				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
+				 i, ptr, addr);
+			return;
+		}
+		i++;
+	}
+	dev_dbg(sg_table->dev, "******* End of Table *****\n");
+}
+#endif
+
+/*
+ * Populate the SG Table page table entries from table/data
+ * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages.
+ * So does a Table page. So we keep track of indices of the tables
+ * in each system page and move the pointers accordingly.
+ */
+#define INC_IDX_ROUND(idx, size) ((idx) = ((idx) + 1) % (size))
+static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
+{
+	dma_addr_t paddr;
+	int i, type, nr_entries;
+	int tpidx = 0; /* index to the current system table_page */
+	int sgtidx = 0;	/* index to the sg_table within the current syspage */
+	int sgtentry = 0; /* the entry within the sg_table */
+	int dpidx = 0; /* index to the current system data_page */
+	int spidx = 0; /* index to the SG page within the current data page */
+	sgte_t *ptr; /* pointer to the table entry to fill */
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
+	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
+
+	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
+	/*
+	 * Use the contiguous virtual address of the table to update entries.
+	 */
+	ptr = sg_table->table_vaddr;
+	/*
+	 * Fill all the entries, except the last entry to avoid special
+	 * checks within the loop.
+	 */
+	for (i = 0; i < nr_entries - 1; i++) {
+		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
+			/*
+			 * Last entry in a sg_table page is a link address to
+			 * the next table page. If this sg_table is the last
+			 * one in the system page, it links to the first
+			 * sg_table in the next system page. Otherwise, it
+			 * links to the next sg_table page within the system
+			 * page.
+			 */
+			if (sgtidx == ETR_SG_PAGES_PER_SYSPAGE - 1) {
+				paddr = table_daddrs[tpidx + 1];
+			} else {
+				paddr = table_daddrs[tpidx] +
+					(ETR_SG_PAGE_SIZE * (sgtidx + 1));
+			}
+			type = ETR_SG_ET_LINK;
+		} else {
+			/*
+			 * Update the indices to the data_pages to point to the
+			 * next sg_page in the data buffer.
+			 */
+			type = ETR_SG_ET_NORMAL;
+			paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+			if (!INC_IDX_ROUND(spidx, ETR_SG_PAGES_PER_SYSPAGE))
+				dpidx++;
+		}
+		*ptr++ = ETR_SG_ENTRY(paddr, type);
+		/*
+		 * Move to the next table pointer, moving the table page index
+		 * if necessary
+		 */
+		if (!INC_IDX_ROUND(sgtentry, ETR_SG_PTRS_PER_PAGE)) {
+			if (!INC_IDX_ROUND(sgtidx, ETR_SG_PAGES_PER_SYSPAGE))
+				tpidx++;
+		}
+	}
+
+	/* Set up the last entry, which is always a data pointer */
+	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+}
+
+/*
+ * tmc_init_etr_sg_table: Allocate a TMC ETR SG table, data buffer of @size and
+ * populate the table.
+ *
+ * @dev		- Device pointer for the TMC
+ * @node	- NUMA node where the memory should be allocated
+ * @size	- Total size of the data buffer
+ * @pages	- Optional list of page virtual address
+ */
+static struct etr_sg_table __maybe_unused *
+tmc_init_etr_sg_table(struct device *dev, int node,
+		  unsigned long size, void **pages)
+{
+	int nr_entries, nr_tpages;
+	int nr_dpages = size >> PAGE_SHIFT;
+	struct tmc_sg_table *sg_table;
+	struct etr_sg_table *etr_table;
+
+	etr_table = kzalloc(sizeof(*etr_table), GFP_KERNEL);
+	if (!etr_table)
+		return ERR_PTR(-ENOMEM);
+	nr_entries = tmc_etr_sg_table_entries(nr_dpages);
+	nr_tpages = DIV_ROUND_UP(nr_entries, ETR_SG_PTRS_PER_SYSPAGE);
+
+	sg_table = tmc_alloc_sg_table(dev, node, nr_tpages, nr_dpages, pages);
+	if (IS_ERR(sg_table)) {
+		kfree(etr_table);
+		return ERR_PTR(PTR_ERR(sg_table));
+	}
+
+	etr_table->sg_table = sg_table;
+	/* TMC should use table base address for DBA */
+	etr_table->hwaddr = sg_table->table_daddr;
+	tmc_etr_sg_table_populate(etr_table);
+	/* Sync the table pages for the HW */
+	tmc_sg_table_sync_table(sg_table);
+#ifdef ETR_SG_DEBUG
+	tmc_etr_sg_table_dump(etr_table);
+#endif
+	return etr_table;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 15/27] coresight: tmc-etr: Make SG table circular
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Make the ETR SG table Circular buffer so that we could start
at any of the SG pages and use the entire buffer for tracing.
Also support using a partial buffer for tracing (i.e restricting
the size to a requested value) by adjusting the LAST buffer pointer.

This can be achieved by :

1) While building the SG table, allocate an exta entry, which
will be made as a LINK pointer at the very end of the SG table,
i.e, after the LAST buffer entry, to point back to the beginning
of the first table. This will allow us to use the buffer normally
when we start the trace at offset 0 of the buffer, as the LAST
buffer entry hints the TMC-ETR and it automatically wraps to the
offset 0.

2) If we want to start at any other ETR SG page aligned offset,
with a different size than the full buffer size, we could :

 a) Make the page entry at the (offset + new_size) as LAST.
 b) Make the original LAST entry a normal entry.
 c) Use the table pointer to the "new" start offset as the
    base of the table address.

This works fine, as the TMC doesn't mandate that the page table
base address should be 4K page aligned. So we can program the
DBA to point to the entry which points the required SG PAGE.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V1:
 - Add a size parameter, which will be used to limit the
   trace buffer exposed to the ETR. This could prevent the
   ETR from overwriting a shared perf ring buffer area,
   which is being consumed by the userspace.
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 173 +++++++++++++++++++++---
 1 file changed, 154 insertions(+), 19 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index a003cfc..d18043d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,6 +17,7 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/iommu.h>
 #include <linux/slab.h>
 #include "coresight-catu.h"
 #include "coresight-priv.h"
@@ -72,35 +73,41 @@ typedef u32 sgte_t;
  * @sg_table:		Generic SG Table holding the data/table pages.
  * @hwaddr:		hwaddress used by the TMC, which is the base
  *			address of the table.
+ * @nr_entries:		Total number of pointers in the table.
+ * @first_entry:	Index to the current "start" of the buffer.
+ * @last_entry:		Index to the last entry of the buffer.
  */
 struct etr_sg_table {
 	struct tmc_sg_table	*sg_table;
 	dma_addr_t		hwaddr;
+	u32			nr_entries;
+	u32			first_entry;
+	u32			last_entry;
 };
 
 /*
  * tmc_etr_sg_table_entries: Total number of table entries required to map
  * @nr_pages system pages.
  *
- * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages and
+ * an additional Link pointer for making it a Circular buffer.
  * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
  * with the last entry pointing to another page of table entries.
- * If we spill over to a new page for mapping 1 entry, we could as
- * well replace the link entry of the previous page with the last entry.
+ * If we fill the last table in full with the pointers, (i.e,
+ * nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) == 0, we don't have to
+ * allocate another table and use the unused Link entry to make it
+ * circular.
  */
 static inline unsigned long __attribute_const__
 tmc_etr_sg_table_entries(int nr_pages)
 {
 	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
 	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
-	/*
-	 * If we spill over to a new page for 1 entry, we could as well
-	 * make it the LAST entry in the previous page, skipping the Link
-	 * address.
-	 */
-	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+
+	if (nr_sglinks && !(nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1)))
 		nr_sglinks--;
-	return nr_sgpages + nr_sglinks;
+	/* Add an entry for the circular link */
+	return nr_sgpages + nr_sglinks + 1;
 }
 
 /*
@@ -413,14 +420,22 @@ tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
 /* Dump the given sg_table */
 static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 {
-	sgte_t *ptr;
+	sgte_t *ptr, *start;
 	int i = 0;
 	dma_addr_t addr;
 	struct tmc_sg_table *sg_table = etr_table->sg_table;
 
-	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+	start = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
 					      etr_table->hwaddr, true);
-	while (ptr) {
+	if (!start) {
+		dev_dbg(sg_table->dev,
+			"ERROR: Failed to translate table base: 0x%llx\n",
+			 etr_table->hwaddr);
+		return;
+	}
+
+	ptr = start;
+	do {
 		addr = ETR_SG_ADDR(*ptr);
 		switch (ETR_SG_ET(*ptr)) {
 		case ETR_SG_ET_NORMAL:
@@ -434,12 +449,16 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 				 i, ptr, addr);
 			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
 							      addr, true);
+			if (!ptr)
+				dev_dbg(sg_table->dev,
+					"ERROR: Bad Link 0x%llx\n", addr);
 			break;
 		case ETR_SG_ET_LAST:
 			dev_dbg(sg_table->dev,
 				"%05d: ### %p\t:[L] 0x%llx ###\n",
 				 i, ptr, addr);
-			return;
+			ptr++;
+			break;
 		default:
 			dev_dbg(sg_table->dev,
 				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
@@ -447,7 +466,7 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 			return;
 		}
 		i++;
-	}
+	} while (ptr && ptr != start);
 	dev_dbg(sg_table->dev, "******* End of Table *****\n");
 }
 #endif
@@ -462,7 +481,7 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 {
 	dma_addr_t paddr;
-	int i, type, nr_entries;
+	int i, type;
 	int tpidx = 0; /* index to the current system table_page */
 	int sgtidx = 0;	/* index to the sg_table within the current syspage */
 	int sgtentry = 0; /* the entry within the sg_table */
@@ -473,16 +492,16 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
 	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
 
-	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
 	/*
 	 * Use the contiguous virtual address of the table to update entries.
 	 */
 	ptr = sg_table->table_vaddr;
 	/*
-	 * Fill all the entries, except the last entry to avoid special
+	 * Fill all the entries, except the last two entries (i.e, the last
+	 * buffer and the circular link back to the base) to avoid special
 	 * checks within the loop.
 	 */
-	for (i = 0; i < nr_entries - 1; i++) {
+	for (i = 0; i < etr_table->nr_entries - 2; i++) {
 		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
 			/*
 			 * Last entry in a sg_table page is a link address to
@@ -523,6 +542,119 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 	/* Set up the last entry, which is always a data pointer */
 	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
 	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+	/* followed by a circular link, back to the start of the table */
+	*ptr++ = ETR_SG_ENTRY(sg_table->table_daddr, ETR_SG_ET_LINK);
+}
+
+/*
+ * tmc_etr_sg_offset_to_table_index : Translate a given data @offset
+ * to the index of the page table "entry" pointing to the "page".
+ * For each (ETR_SG_PTRS_PER_PAGE - 1) sg pages, we add a Link pointer.
+ */
+static inline u32
+tmc_etr_sg_offset_to_table_index(u64 offset)
+{
+	u32 sg_page = offset >> ETR_SG_PAGE_SHIFT;
+
+	return sg_page + sg_page / (u32)(ETR_SG_PTRS_PER_PAGE - 1);
+}
+
+/*
+ * tmc_etr_sg_update_type: Update the type of a given entry in the
+ * table to the requested entry. This is only used for data buffers
+ * to toggle the "NORMAL" vs "LAST" buffer entries.
+ */
+static inline void tmc_etr_sg_update_type(sgte_t *entry, u32 type)
+{
+	WARN_ON(!ETR_SG_ET(*entry) || ETR_SG_ET(*entry) == ETR_SG_ET_LINK);
+	*entry &= ~ETR_SG_ET_MASK;
+	*entry |= type;
+}
+
+/*
+ * tmc_etr_sg_table_index_to_daddr: Return the hardware address to the table
+ * entry @index. Use this address to let the table begin @index.
+ */
+static inline dma_addr_t
+tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
+{
+	u32 sys_page_idx = index / ETR_SG_PTRS_PER_SYSPAGE;
+	u32 sys_page_offset = index % ETR_SG_PTRS_PER_SYSPAGE;
+
+	if (sys_page_idx < sg_table->table_pages.nr_pages)
+		return sg_table->table_pages.daddrs[sys_page_idx] +
+			sizeof(sgte_t) * sys_page_offset;
+	return 0;
+}
+
+/*
+ * tmc_etr_sg_table_rotate : Rotate the SG circular buffer, moving
+ * the "base" to a requested offset and reset the size to @size
+ * We do so by :
+ *
+ * 1) Reset the current LAST buffer.
+ * 2) Update the hwaddr to point to the table pointer for the buffer
+ *    which starts @base_offset.
+ * 2) Mark the page at the base_offset + size as LAST.
+ */
+static int __maybe_unused
+tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
+			unsigned long base_offset, unsigned long size)
+{
+	u32 last_entry, first_entry;
+	u64 last_offset;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	sgte_t *table_ptr = sg_table->table_vaddr;
+	ssize_t buf_size = tmc_sg_table_buf_size(sg_table);
+
+	if (size > buf_size || size & (ETR_SG_PAGE_SIZE - 1)) {
+		dev_dbg(sg_table->dev, "unsupported size: %lx\n", size);
+		return -EINVAL;
+	}
+
+	/* Offset should always be SG PAGE_SIZE aligned */
+	if (base_offset & (ETR_SG_PAGE_SIZE - 1)) {
+		dev_dbg(sg_table->dev,
+			"unaligned base offset %lx\n", base_offset);
+		return -EINVAL;
+	}
+	/* Make sure the offsets are within the range */
+	base_offset -= (base_offset > buf_size) ? buf_size : 0;
+	last_offset = base_offset + size;
+	last_offset -= (last_offset > buf_size) ? buf_size : 0;
+
+	first_entry = tmc_etr_sg_offset_to_table_index(base_offset);
+	last_entry = tmc_etr_sg_offset_to_table_index(last_offset);
+
+	/* Reset the current LAST page to NORMAL and set the new LAST page */
+	if (last_entry != etr_table->last_entry) {
+		tmc_etr_sg_update_type(&table_ptr[etr_table->last_entry],
+				       ETR_SG_ET_NORMAL);
+		tmc_etr_sg_update_type(&table_ptr[last_entry], ETR_SG_ET_LAST);
+	}
+
+	etr_table->hwaddr = tmc_etr_sg_table_index_to_daddr(sg_table,
+							    first_entry);
+
+	/*
+	 * We shouldn't be hitting an invalid index, unless something
+	 * seriously wrong.
+	 */
+	if (WARN_ON(!etr_table->hwaddr))
+		return -EINVAL;
+
+	etr_table->first_entry = first_entry;
+	etr_table->last_entry = last_entry;
+	dev_dbg(sg_table->dev,
+		"table rotated to offset %lx, dba: %lx, size: %ldKB\n",
+		base_offset, (unsigned long)etr_table->hwaddr,
+		(unsigned long)(size >> 10));
+	/* Sync the table for device */
+	tmc_sg_table_sync_table(sg_table);
+#ifdef ETR_SG_DEBUG
+	tmc_etr_sg_table_dump(etr_table);
+#endif
+	return 0;
 }
 
 /*
@@ -556,6 +688,9 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	}
 
 	etr_table->sg_table = sg_table;
+	etr_table->nr_entries = nr_entries;
+	etr_table->first_entry = 0;
+	etr_table->last_entry = nr_entries - 2;
 	/* TMC should use table base address for DBA */
 	etr_table->hwaddr = sg_table->table_daddr;
 	tmc_etr_sg_table_populate(etr_table);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 15/27] coresight: tmc-etr: Make SG table circular
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Make the ETR SG table Circular buffer so that we could start
at any of the SG pages and use the entire buffer for tracing.
Also support using a partial buffer for tracing (i.e restricting
the size to a requested value) by adjusting the LAST buffer pointer.

This can be achieved by :

1) While building the SG table, allocate an exta entry, which
will be made as a LINK pointer at the very end of the SG table,
i.e, after the LAST buffer entry, to point back to the beginning
of the first table. This will allow us to use the buffer normally
when we start the trace at offset 0 of the buffer, as the LAST
buffer entry hints the TMC-ETR and it automatically wraps to the
offset 0.

2) If we want to start at any other ETR SG page aligned offset,
with a different size than the full buffer size, we could :

 a) Make the page entry at the (offset + new_size) as LAST.
 b) Make the original LAST entry a normal entry.
 c) Use the table pointer to the "new" start offset as the
    base of the table address.

This works fine, as the TMC doesn't mandate that the page table
base address should be 4K page aligned. So we can program the
DBA to point to the entry which points the required SG PAGE.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V1:
 - Add a size parameter, which will be used to limit the
   trace buffer exposed to the ETR. This could prevent the
   ETR from overwriting a shared perf ring buffer area,
   which is being consumed by the userspace.
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 173 +++++++++++++++++++++---
 1 file changed, 154 insertions(+), 19 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index a003cfc..d18043d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,6 +17,7 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/iommu.h>
 #include <linux/slab.h>
 #include "coresight-catu.h"
 #include "coresight-priv.h"
@@ -72,35 +73,41 @@ typedef u32 sgte_t;
  * @sg_table:		Generic SG Table holding the data/table pages.
  * @hwaddr:		hwaddress used by the TMC, which is the base
  *			address of the table.
+ * @nr_entries:		Total number of pointers in the table.
+ * @first_entry:	Index to the current "start" of the buffer.
+ * @last_entry:		Index to the last entry of the buffer.
  */
 struct etr_sg_table {
 	struct tmc_sg_table	*sg_table;
 	dma_addr_t		hwaddr;
+	u32			nr_entries;
+	u32			first_entry;
+	u32			last_entry;
 };
 
 /*
  * tmc_etr_sg_table_entries: Total number of table entries required to map
  * @nr_pages system pages.
  *
- * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages and
+ * an additional Link pointer for making it a Circular buffer.
  * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
  * with the last entry pointing to another page of table entries.
- * If we spill over to a new page for mapping 1 entry, we could as
- * well replace the link entry of the previous page with the last entry.
+ * If we fill the last table in full with the pointers, (i.e,
+ * nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) == 0, we don't have to
+ * allocate another table and use the unused Link entry to make it
+ * circular.
  */
 static inline unsigned long __attribute_const__
 tmc_etr_sg_table_entries(int nr_pages)
 {
 	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
 	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
-	/*
-	 * If we spill over to a new page for 1 entry, we could as well
-	 * make it the LAST entry in the previous page, skipping the Link
-	 * address.
-	 */
-	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+
+	if (nr_sglinks && !(nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1)))
 		nr_sglinks--;
-	return nr_sgpages + nr_sglinks;
+	/* Add an entry for the circular link */
+	return nr_sgpages + nr_sglinks + 1;
 }
 
 /*
@@ -413,14 +420,22 @@ tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
 /* Dump the given sg_table */
 static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 {
-	sgte_t *ptr;
+	sgte_t *ptr, *start;
 	int i = 0;
 	dma_addr_t addr;
 	struct tmc_sg_table *sg_table = etr_table->sg_table;
 
-	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+	start = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
 					      etr_table->hwaddr, true);
-	while (ptr) {
+	if (!start) {
+		dev_dbg(sg_table->dev,
+			"ERROR: Failed to translate table base: 0x%llx\n",
+			 etr_table->hwaddr);
+		return;
+	}
+
+	ptr = start;
+	do {
 		addr = ETR_SG_ADDR(*ptr);
 		switch (ETR_SG_ET(*ptr)) {
 		case ETR_SG_ET_NORMAL:
@@ -434,12 +449,16 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 				 i, ptr, addr);
 			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
 							      addr, true);
+			if (!ptr)
+				dev_dbg(sg_table->dev,
+					"ERROR: Bad Link 0x%llx\n", addr);
 			break;
 		case ETR_SG_ET_LAST:
 			dev_dbg(sg_table->dev,
 				"%05d: ### %p\t:[L] 0x%llx ###\n",
 				 i, ptr, addr);
-			return;
+			ptr++;
+			break;
 		default:
 			dev_dbg(sg_table->dev,
 				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
@@ -447,7 +466,7 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 			return;
 		}
 		i++;
-	}
+	} while (ptr && ptr != start);
 	dev_dbg(sg_table->dev, "******* End of Table *****\n");
 }
 #endif
@@ -462,7 +481,7 @@ static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
 static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 {
 	dma_addr_t paddr;
-	int i, type, nr_entries;
+	int i, type;
 	int tpidx = 0; /* index to the current system table_page */
 	int sgtidx = 0;	/* index to the sg_table within the current syspage */
 	int sgtentry = 0; /* the entry within the sg_table */
@@ -473,16 +492,16 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
 	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
 
-	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
 	/*
 	 * Use the contiguous virtual address of the table to update entries.
 	 */
 	ptr = sg_table->table_vaddr;
 	/*
-	 * Fill all the entries, except the last entry to avoid special
+	 * Fill all the entries, except the last two entries (i.e, the last
+	 * buffer and the circular link back to the base) to avoid special
 	 * checks within the loop.
 	 */
-	for (i = 0; i < nr_entries - 1; i++) {
+	for (i = 0; i < etr_table->nr_entries - 2; i++) {
 		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
 			/*
 			 * Last entry in a sg_table page is a link address to
@@ -523,6 +542,119 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
 	/* Set up the last entry, which is always a data pointer */
 	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
 	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+	/* followed by a circular link, back to the start of the table */
+	*ptr++ = ETR_SG_ENTRY(sg_table->table_daddr, ETR_SG_ET_LINK);
+}
+
+/*
+ * tmc_etr_sg_offset_to_table_index : Translate a given data @offset
+ * to the index of the page table "entry" pointing to the "page".
+ * For each (ETR_SG_PTRS_PER_PAGE - 1) sg pages, we add a Link pointer.
+ */
+static inline u32
+tmc_etr_sg_offset_to_table_index(u64 offset)
+{
+	u32 sg_page = offset >> ETR_SG_PAGE_SHIFT;
+
+	return sg_page + sg_page / (u32)(ETR_SG_PTRS_PER_PAGE - 1);
+}
+
+/*
+ * tmc_etr_sg_update_type: Update the type of a given entry in the
+ * table to the requested entry. This is only used for data buffers
+ * to toggle the "NORMAL" vs "LAST" buffer entries.
+ */
+static inline void tmc_etr_sg_update_type(sgte_t *entry, u32 type)
+{
+	WARN_ON(!ETR_SG_ET(*entry) || ETR_SG_ET(*entry) == ETR_SG_ET_LINK);
+	*entry &= ~ETR_SG_ET_MASK;
+	*entry |= type;
+}
+
+/*
+ * tmc_etr_sg_table_index_to_daddr: Return the hardware address to the table
+ * entry @index. Use this address to let the table begin @index.
+ */
+static inline dma_addr_t
+tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
+{
+	u32 sys_page_idx = index / ETR_SG_PTRS_PER_SYSPAGE;
+	u32 sys_page_offset = index % ETR_SG_PTRS_PER_SYSPAGE;
+
+	if (sys_page_idx < sg_table->table_pages.nr_pages)
+		return sg_table->table_pages.daddrs[sys_page_idx] +
+			sizeof(sgte_t) * sys_page_offset;
+	return 0;
+}
+
+/*
+ * tmc_etr_sg_table_rotate : Rotate the SG circular buffer, moving
+ * the "base" to a requested offset and reset the size to @size
+ * We do so by :
+ *
+ * 1) Reset the current LAST buffer.
+ * 2) Update the hwaddr to point to the table pointer for the buffer
+ *    which starts @base_offset.
+ * 2) Mark the page at the base_offset + size as LAST.
+ */
+static int __maybe_unused
+tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
+			unsigned long base_offset, unsigned long size)
+{
+	u32 last_entry, first_entry;
+	u64 last_offset;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	sgte_t *table_ptr = sg_table->table_vaddr;
+	ssize_t buf_size = tmc_sg_table_buf_size(sg_table);
+
+	if (size > buf_size || size & (ETR_SG_PAGE_SIZE - 1)) {
+		dev_dbg(sg_table->dev, "unsupported size: %lx\n", size);
+		return -EINVAL;
+	}
+
+	/* Offset should always be SG PAGE_SIZE aligned */
+	if (base_offset & (ETR_SG_PAGE_SIZE - 1)) {
+		dev_dbg(sg_table->dev,
+			"unaligned base offset %lx\n", base_offset);
+		return -EINVAL;
+	}
+	/* Make sure the offsets are within the range */
+	base_offset -= (base_offset > buf_size) ? buf_size : 0;
+	last_offset = base_offset + size;
+	last_offset -= (last_offset > buf_size) ? buf_size : 0;
+
+	first_entry = tmc_etr_sg_offset_to_table_index(base_offset);
+	last_entry = tmc_etr_sg_offset_to_table_index(last_offset);
+
+	/* Reset the current LAST page to NORMAL and set the new LAST page */
+	if (last_entry != etr_table->last_entry) {
+		tmc_etr_sg_update_type(&table_ptr[etr_table->last_entry],
+				       ETR_SG_ET_NORMAL);
+		tmc_etr_sg_update_type(&table_ptr[last_entry], ETR_SG_ET_LAST);
+	}
+
+	etr_table->hwaddr = tmc_etr_sg_table_index_to_daddr(sg_table,
+							    first_entry);
+
+	/*
+	 * We shouldn't be hitting an invalid index, unless something
+	 * seriously wrong.
+	 */
+	if (WARN_ON(!etr_table->hwaddr))
+		return -EINVAL;
+
+	etr_table->first_entry = first_entry;
+	etr_table->last_entry = last_entry;
+	dev_dbg(sg_table->dev,
+		"table rotated to offset %lx, dba: %lx, size: %ldKB\n",
+		base_offset, (unsigned long)etr_table->hwaddr,
+		(unsigned long)(size >> 10));
+	/* Sync the table for device */
+	tmc_sg_table_sync_table(sg_table);
+#ifdef ETR_SG_DEBUG
+	tmc_etr_sg_table_dump(etr_table);
+#endif
+	return 0;
 }
 
 /*
@@ -556,6 +688,9 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	}
 
 	etr_table->sg_table = sg_table;
+	etr_table->nr_entries = nr_entries;
+	etr_table->first_entry = 0;
+	etr_table->last_entry = nr_entries - 2;
 	/* TMC should use table base address for DBA */
 	etr_table->hwaddr = sg_table->table_daddr;
 	tmc_etr_sg_table_populate(etr_table);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 16/27] coresight: tmc-etr: Add transparent buffer management
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

At the moment we always use contiguous memory for TMC ETR tracing
when used from sysfs. The size of the buffer is fixed at boot time
and can only be changed by modifiying the DT. With the introduction
of SG support we could support really large buffers in that mode.
This patch abstracts the buffer used for ETR to switch between a
contiguous buffer or a SG table depending on the availability of
the memory.

This also enables the sysfs mode to use the ETR in SG mode depending
on configured the trace buffer size. Also, since ETR will use the
new infrastructure to manage the buffer, we can get rid of some
of the members in the tmc_drvdata and clean up the fields a bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 451 +++++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
 2 files changed, 418 insertions(+), 90 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index d18043d..fde3fa6 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -23,6 +23,13 @@
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+struct etr_flat_buf {
+	struct device	*dev;
+	dma_addr_t	daddr;
+	void		*vaddr;
+	size_t		size;
+};
+
 /*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
@@ -666,7 +673,7 @@ tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
  * @size	- Total size of the data buffer
  * @pages	- Optional list of page virtual address
  */
-static struct etr_sg_table __maybe_unused *
+static struct etr_sg_table *
 tmc_init_etr_sg_table(struct device *dev, int node,
 		  unsigned long size, void **pages)
 {
@@ -702,6 +709,296 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	return etr_table;
 }
 
+/*
+ * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
+ */
+static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
+				  struct etr_buf *etr_buf, int node,
+				  void **pages)
+{
+	struct etr_flat_buf *flat_buf;
+
+	/* We cannot reuse existing pages for flat buf */
+	if (pages)
+		return -EINVAL;
+
+	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
+	if (!flat_buf)
+		return -ENOMEM;
+
+	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
+					   &flat_buf->daddr, GFP_KERNEL);
+	if (!flat_buf->vaddr) {
+		kfree(flat_buf);
+		return -ENOMEM;
+	}
+
+	flat_buf->size = etr_buf->size;
+	flat_buf->dev = drvdata->dev;
+	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->mode = ETR_MODE_FLAT;
+	etr_buf->private = flat_buf;
+	return 0;
+}
+
+static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (flat_buf && flat_buf->daddr)
+		dma_free_coherent(flat_buf->dev, flat_buf->size,
+				  flat_buf->vaddr, flat_buf->daddr);
+	kfree(flat_buf);
+}
+
+static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	/*
+	 * Adjust the buffer to point to the beginning of the trace data
+	 * and update the available trace data.
+	 */
+	etr_buf->offset = rrp - etr_buf->hwaddr;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = rwp - rrp;
+}
+
+static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
+					 u64 offset, size_t len, char **bufpp)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	*bufpp = (char *)flat_buf->vaddr + offset;
+	/*
+	 * tmc_etr_buf_get_data already adjusts the length to handle
+	 * buffer wrapping around.
+	 */
+	return len;
+}
+
+static const struct etr_buf_operations etr_flat_buf_ops = {
+	.alloc = tmc_etr_alloc_flat_buf,
+	.free = tmc_etr_free_flat_buf,
+	.sync = tmc_etr_sync_flat_buf,
+	.get_data = tmc_etr_get_data_flat_buf,
+};
+
+/*
+ * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
+ * appropriately.
+ */
+static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
+				struct etr_buf *etr_buf, int node,
+				void **pages)
+{
+	struct etr_sg_table *etr_table;
+
+	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
+					  etr_buf->size, pages);
+	if (IS_ERR(etr_table))
+		return -ENOMEM;
+	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->mode = ETR_MODE_ETR_SG;
+	etr_buf->private = etr_table;
+	return 0;
+}
+
+static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	if (etr_table) {
+		tmc_free_sg_table(etr_table->sg_table);
+		kfree(etr_table);
+	}
+}
+
+static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
+				       size_t len, char **bufpp)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
+}
+
+static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	long r_offset, w_offset;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct tmc_sg_table *table = etr_table->sg_table;
+
+	/* Convert hw address to offset in the buffer */
+	r_offset = tmc_sg_get_data_page_offset(table, rrp);
+	if (r_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RRP %llx to offset\n", rrp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	w_offset = tmc_sg_get_data_page_offset(table, rwp);
+	if (w_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RWP %llx to offset\n", rwp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	etr_buf->offset = r_offset;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+				w_offset - r_offset;
+	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
+}
+
+static const struct etr_buf_operations etr_sg_buf_ops = {
+	.alloc = tmc_etr_alloc_sg_buf,
+	.free = tmc_etr_free_sg_buf,
+	.sync = tmc_etr_sync_sg_buf,
+	.get_data = tmc_etr_get_data_sg_buf,
+};
+
+static const struct etr_buf_operations *etr_buf_ops[] = {
+	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
+	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+};
+
+static inline int tmc_etr_mode_alloc_buf(int mode,
+					 struct tmc_drvdata *drvdata,
+					 struct etr_buf *etr_buf, int node,
+					 void **pages)
+{
+	int rc;
+
+	switch (mode) {
+	case ETR_MODE_FLAT:
+	case ETR_MODE_ETR_SG:
+		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
+		if (!rc)
+			etr_buf->ops = etr_buf_ops[mode];
+		return rc;
+	default:
+		return -EINVAL;
+	}
+}
+
+/*
+ * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
+ * @drvdata	: ETR device details.
+ * @size	: size of the requested buffer.
+ * @flags	: Required properties for the buffer.
+ * @node	: Node for memory allocations.
+ * @pages	: An optional list of pages.
+ */
+static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
+					 ssize_t size, int flags,
+					 int node, void **pages)
+{
+	int rc = -ENOMEM;
+	bool has_etr_sg, has_iommu;
+	struct etr_buf *etr_buf;
+
+	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
+	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+
+	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
+	if (!etr_buf)
+		return ERR_PTR(-ENOMEM);
+
+	etr_buf->size = size;
+
+	/*
+	 * If we have to use an existing list of pages, we cannot reliably
+	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
+	 * we use the contiguous DMA memory if at least one of the following
+	 * conditions is true:
+	 *  a) The ETR cannot use Scatter-Gather.
+	 *  b) we have a backing IOMMU
+	 *  c) The requested memory size is smaller (< 1M).
+	 *
+	 * Fallback to available mechanisms.
+	 *
+	 */
+	if (!pages &&
+	    (!has_etr_sg || has_iommu || size < SZ_1M))
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
+					    etr_buf, node, pages);
+	if (rc && has_etr_sg)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
+					    etr_buf, node, pages);
+	if (rc) {
+		kfree(etr_buf);
+		return ERR_PTR(rc);
+	}
+
+	return etr_buf;
+}
+
+static void tmc_free_etr_buf(struct etr_buf *etr_buf)
+{
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
+	etr_buf->ops->free(etr_buf);
+	kfree(etr_buf);
+}
+
+/*
+ * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
+ * with a maximum of @len bytes.
+ * Returns: The size of the linear data available @pos, with *bufpp
+ * updated to point to the buffer.
+ */
+static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
+				    u64 offset, size_t len, char **bufpp)
+{
+	/* Adjust the length to limit this transaction to end of buffer */
+	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
+
+	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	ssize_t len;
+	char *bufp;
+
+	len = tmc_etr_buf_get_data(etr_buf, offset,
+				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
+	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+		return -EINVAL;
+	coresight_insert_barrier_packet(bufp);
+	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+}
+
+/*
+ * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
+ * Makes sure the trace data is synced to the memory for consumption.
+ * @etr_buf->offset will hold the offset to the beginning of the trace data
+ * within the buffer, with @etr_buf->len bytes to consume.
+ */
+static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
+{
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+	u64 rrp, rwp;
+	u32 status;
+
+	rrp = tmc_read_rrp(drvdata);
+	rwp = tmc_read_rwp(drvdata);
+	status = readl_relaxed(drvdata->base + TMC_STS);
+	etr_buf->full = status & TMC_STS_FULL;
+
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
+
+	etr_buf->ops->sync(etr_buf, rrp, rwp);
+
+	/* Insert barrier packets at the beginning, if there was an overflow */
+	if (etr_buf->full)
+		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
@@ -721,6 +1018,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
@@ -733,7 +1031,7 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	/* Wait for TMCSReady bit to be set */
 	tmc_wait_for_tmcready(drvdata);
 
-	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
+	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
 	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
 
 	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
@@ -746,16 +1044,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 		axictl |= TMC_AXICTL_ARCACHE_OS;
 	}
 
+	if (etr_buf->mode == ETR_MODE_ETR_SG) {
+		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
+			return;
+		axictl |= TMC_AXICTL_SCT_GAT_MODE;
+	}
+
 	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
-	tmc_write_dba(drvdata, drvdata->paddr);
+	tmc_write_dba(drvdata, etr_buf->hwaddr);
 	/*
 	 * If the TMC pointers must be programmed before the session,
 	 * we have to set it properly (i.e, RRP/RWP to base address and
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, drvdata->paddr);
-		tmc_write_rwp(drvdata, drvdata->paddr);
+		tmc_write_rrp(drvdata, etr_buf->hwaddr);
+		tmc_write_rwp(drvdata, etr_buf->hwaddr);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
@@ -771,63 +1075,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 }
 
 /*
- * Return the available trace data in the buffer @pos, with a maximum
- * limit of @len, also updating the @bufpp on where to find it.
+ * Return the available trace data in the buffer (starts at etr_buf->offset,
+ * limited by etr_buf->len) from @pos, with a maximum limit of @len,
+ * also updating the @bufpp on where to find it. Since the trace data
+ * starts at anywhere in the buffer, depending on the RRP, we adjust the
+ * @len returned to handle buffer wrapping around.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
-			    loff_t pos, size_t len, char **bufpp)
+				loff_t pos, size_t len, char **bufpp)
 {
+	s64 offset;
 	ssize_t actual = len;
-	char *bufp = drvdata->buf + pos;
-	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
-
-	/* Adjust the len to available size @pos */
-	if (pos + actual > drvdata->len)
-		actual = drvdata->len - pos;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
+	if (pos + actual > etr_buf->len)
+		actual = etr_buf->len - pos;
 	if (actual <= 0)
 		return actual;
 
-	/*
-	 * Since we use a circular buffer, with trace data starting
-	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
-	 * wrap the current @pos to within the buffer.
-	 */
-	if (bufp >= bufend)
-		bufp -= drvdata->size;
-	/*
-	 * For simplicity, avoid copying over a wrapped around buffer.
-	 */
-	if ((bufp + actual) > bufend)
-		actual = bufend - bufp;
-	*bufpp = bufp;
-	return actual;
+	/* Compute the offset from which we read the data */
+	offset = etr_buf->offset + pos;
+	if (offset >= etr_buf->size)
+		offset -= etr_buf->size;
+	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
 }
 
-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
+static struct etr_buf *
+tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	u32 val;
-	u64 rwp;
+	return tmc_alloc_etr_buf(drvdata, drvdata->size,
+				 0, cpu_to_node(0), NULL);
+}
 
-	rwp = tmc_read_rwp(drvdata);
-	val = readl_relaxed(drvdata->base + TMC_STS);
+static void
+tmc_etr_free_sysfs_buf(struct etr_buf *buf)
+{
+	if (buf)
+		tmc_free_etr_buf(buf);
+}
 
-	/*
-	 * Adjust the buffer to point to the beginning of the trace data
-	 * and update the available trace data.
-	 */
-	if (val & TMC_STS_FULL) {
-		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
-		drvdata->len = drvdata->size;
-		coresight_insert_barrier_packet(drvdata->buf);
-	} else {
-		drvdata->buf = drvdata->vaddr;
-		drvdata->len = rwp - drvdata->paddr;
-	}
+static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
+{
+	tmc_sync_etr_buf(drvdata);
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 {
+
 	CS_UNLOCK(drvdata->base);
 
 	tmc_flush_and_stop(drvdata);
@@ -836,7 +1130,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	 * read before the TMC is disabled.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
-		tmc_etr_dump_hw(drvdata);
+		tmc_etr_sync_sysfs_buf(drvdata);
+
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
@@ -850,34 +1145,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	int ret = 0;
 	bool used = false;
 	unsigned long flags;
-	void __iomem *vaddr = NULL;
-	dma_addr_t paddr;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_buf *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
-	 * If we don't have a buffer release the lock and allocate memory.
-	 * Otherwise keep the lock and move along.
+	 * If we are enabling the ETR from disabled state, we need to make
+	 * sure we have a buffer with the right size. The etr_buf is not reset
+	 * immediately after we stop the tracing in SYSFS mode as we wait for
+	 * the user to collect the data. We may be able to reuse the existing
+	 * buffer, provided the size matches. Any allocation has to be done
+	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->vaddr) {
+	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-		/*
-		 * Contiguous  memory can't be allocated while a spinlock is
-		 * held.  As such allocate memory here and free it if a buffer
-		 * has already been allocated (from a previous session).
-		 */
-		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
-					   &paddr, GFP_KERNEL);
-		if (!vaddr)
-			return -ENOMEM;
+		/* Allocate memory with the spinlock released */
+		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
+		if (IS_ERR(new_buf))
+			return PTR_ERR(new_buf);
 
 		/* Let's try again */
 		spin_lock_irqsave(&drvdata->spinlock, flags);
 	}
 
-	if (drvdata->reading) {
+	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -885,21 +1177,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
-	 * touched.
+	 * touched, even if the buffer size has changed.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
-	 * If drvdata::buf == NULL, use the memory allocated above.
-	 * Otherwise a buffer still exists from a previous session, so
-	 * simply use that.
+	 * If we don't have a buffer or it doesn't match the requested size,
+	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (drvdata->buf == NULL) {
+	if (!drvdata->etr_buf ||
+	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
 		used = true;
-		drvdata->vaddr = vaddr;
-		drvdata->paddr = paddr;
-		drvdata->buf = drvdata->vaddr;
+		free_buf = drvdata->etr_buf;
+		drvdata->etr_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
@@ -908,8 +1199,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free memory outside the spinlock if need be */
-	if (!used && vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (free_buf)
+		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
 		dev_info(drvdata->dev, "TMC-ETR enabled\n");
@@ -988,8 +1279,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::buf is NULL the trace data has been read already */
-	if (drvdata->buf == NULL) {
+	/* If drvdata::etr_buf is NULL the trace data has been read already */
+	if (drvdata->etr_buf == NULL) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1008,8 +1299,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	dma_addr_t paddr;
-	void __iomem *vaddr = NULL;
+	struct etr_buf *etr_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -1030,17 +1320,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		vaddr = drvdata->vaddr;
-		paddr = drvdata->paddr;
-		drvdata->buf = drvdata->vaddr = NULL;
+		etr_buf =  drvdata->etr_buf;
+		drvdata->etr_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (etr_buf)
+		tmc_free_etr_buf(etr_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 74d8f24..6f7bec7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -56,6 +56,7 @@
 #define TMC_STS_TMCREADY_BIT	2
 #define TMC_STS_FULL		BIT(0)
 #define TMC_STS_TRIGGERED	BIT(1)
+
 /*
  * TMC_AXICTL - 0x110
  *
@@ -135,6 +136,35 @@ enum tmc_mem_intf_width {
 #define CORESIGHT_SOC_600_ETR_CAPS	\
 	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
 
+enum etr_mode {
+	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
+	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+};
+
+struct etr_buf_operations;
+
+/**
+ * struct etr_buf - Details of the buffer used by ETR
+ * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
+ * @full	: Trace data overflow
+ * @size	: Size of the buffer.
+ * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
+ * @offset	: Offset of the trace data in the buffer for consumption.
+ * @len		: Available trace data @buf (may round up to the beginning).
+ * @ops		: ETR buffer operations for the mode.
+ * @private	: Backend specific information for the buf
+ */
+struct etr_buf {
+	enum etr_mode			mode;
+	bool				full;
+	ssize_t				size;
+	dma_addr_t			hwaddr;
+	unsigned long			offset;
+	s64				len;
+	const struct etr_buf_operations	*ops;
+	void				*private;
+};
+
 /**
  * struct tmc_drvdata - specifics associated to an TMC component
  * @base:	memory mapped base address for this component.
@@ -142,11 +172,10 @@ enum tmc_mem_intf_width {
  * @csdev:	component vitals needed by the framework.
  * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
  * @spinlock:	only one at a time pls.
- * @buf:	area of memory where trace data get sent.
- * @paddr:	DMA start location in RAM.
- * @vaddr:	virtual representation of @paddr.
- * @size:	trace buffer size.
- * @len:	size of the available trace.
+ * @buf:	Snapshot of the trace data for ETF/ETB.
+ * @etr_buf:	details of buffer used in TMC-ETR
+ * @len:	size of the available trace for ETF/ETB.
+ * @size:	trace buffer size for this TMC (common for all modes).
  * @mode:	how this TMC is being used.
  * @config_type: TMC variant, must be of type @tmc_config_type.
  * @memwidth:	width of the memory interface databus, in bytes.
@@ -161,11 +190,12 @@ struct tmc_drvdata {
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
 	bool			reading;
-	char			*buf;
-	dma_addr_t		paddr;
-	void __iomem		*vaddr;
-	u32			size;
+	union {
+		char		*buf;		/* TMC ETB */
+		struct etr_buf	*etr_buf;	/* TMC ETR */
+	};
 	u32			len;
+	u32			size;
 	u32			mode;
 	enum tmc_config_type	config_type;
 	enum tmc_mem_intf_width	memwidth;
@@ -173,6 +203,15 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+struct etr_buf_operations {
+	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+			int node, void **pages);
+	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
+				char **bufpp);
+	void (*free)(struct etr_buf *etr_buf);
+};
+
 /**
  * struct tmc_pages - Collection of pages used for SG.
  * @nr_pages:		Number of pages in the list.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 16/27] coresight: tmc-etr: Add transparent buffer management
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment we always use contiguous memory for TMC ETR tracing
when used from sysfs. The size of the buffer is fixed at boot time
and can only be changed by modifiying the DT. With the introduction
of SG support we could support really large buffers in that mode.
This patch abstracts the buffer used for ETR to switch between a
contiguous buffer or a SG table depending on the availability of
the memory.

This also enables the sysfs mode to use the ETR in SG mode depending
on configured the trace buffer size. Also, since ETR will use the
new infrastructure to manage the buffer, we can get rid of some
of the members in the tmc_drvdata and clean up the fields a bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 451 +++++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
 2 files changed, 418 insertions(+), 90 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index d18043d..fde3fa6 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -23,6 +23,13 @@
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+struct etr_flat_buf {
+	struct device	*dev;
+	dma_addr_t	daddr;
+	void		*vaddr;
+	size_t		size;
+};
+
 /*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
@@ -666,7 +673,7 @@ tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
  * @size	- Total size of the data buffer
  * @pages	- Optional list of page virtual address
  */
-static struct etr_sg_table __maybe_unused *
+static struct etr_sg_table *
 tmc_init_etr_sg_table(struct device *dev, int node,
 		  unsigned long size, void **pages)
 {
@@ -702,6 +709,296 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	return etr_table;
 }
 
+/*
+ * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
+ */
+static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
+				  struct etr_buf *etr_buf, int node,
+				  void **pages)
+{
+	struct etr_flat_buf *flat_buf;
+
+	/* We cannot reuse existing pages for flat buf */
+	if (pages)
+		return -EINVAL;
+
+	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
+	if (!flat_buf)
+		return -ENOMEM;
+
+	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
+					   &flat_buf->daddr, GFP_KERNEL);
+	if (!flat_buf->vaddr) {
+		kfree(flat_buf);
+		return -ENOMEM;
+	}
+
+	flat_buf->size = etr_buf->size;
+	flat_buf->dev = drvdata->dev;
+	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->mode = ETR_MODE_FLAT;
+	etr_buf->private = flat_buf;
+	return 0;
+}
+
+static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (flat_buf && flat_buf->daddr)
+		dma_free_coherent(flat_buf->dev, flat_buf->size,
+				  flat_buf->vaddr, flat_buf->daddr);
+	kfree(flat_buf);
+}
+
+static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	/*
+	 * Adjust the buffer to point to the beginning of the trace data
+	 * and update the available trace data.
+	 */
+	etr_buf->offset = rrp - etr_buf->hwaddr;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = rwp - rrp;
+}
+
+static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
+					 u64 offset, size_t len, char **bufpp)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	*bufpp = (char *)flat_buf->vaddr + offset;
+	/*
+	 * tmc_etr_buf_get_data already adjusts the length to handle
+	 * buffer wrapping around.
+	 */
+	return len;
+}
+
+static const struct etr_buf_operations etr_flat_buf_ops = {
+	.alloc = tmc_etr_alloc_flat_buf,
+	.free = tmc_etr_free_flat_buf,
+	.sync = tmc_etr_sync_flat_buf,
+	.get_data = tmc_etr_get_data_flat_buf,
+};
+
+/*
+ * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
+ * appropriately.
+ */
+static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
+				struct etr_buf *etr_buf, int node,
+				void **pages)
+{
+	struct etr_sg_table *etr_table;
+
+	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
+					  etr_buf->size, pages);
+	if (IS_ERR(etr_table))
+		return -ENOMEM;
+	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->mode = ETR_MODE_ETR_SG;
+	etr_buf->private = etr_table;
+	return 0;
+}
+
+static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	if (etr_table) {
+		tmc_free_sg_table(etr_table->sg_table);
+		kfree(etr_table);
+	}
+}
+
+static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
+				       size_t len, char **bufpp)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
+}
+
+static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	long r_offset, w_offset;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct tmc_sg_table *table = etr_table->sg_table;
+
+	/* Convert hw address to offset in the buffer */
+	r_offset = tmc_sg_get_data_page_offset(table, rrp);
+	if (r_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RRP %llx to offset\n", rrp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	w_offset = tmc_sg_get_data_page_offset(table, rwp);
+	if (w_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RWP %llx to offset\n", rwp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	etr_buf->offset = r_offset;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+				w_offset - r_offset;
+	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
+}
+
+static const struct etr_buf_operations etr_sg_buf_ops = {
+	.alloc = tmc_etr_alloc_sg_buf,
+	.free = tmc_etr_free_sg_buf,
+	.sync = tmc_etr_sync_sg_buf,
+	.get_data = tmc_etr_get_data_sg_buf,
+};
+
+static const struct etr_buf_operations *etr_buf_ops[] = {
+	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
+	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+};
+
+static inline int tmc_etr_mode_alloc_buf(int mode,
+					 struct tmc_drvdata *drvdata,
+					 struct etr_buf *etr_buf, int node,
+					 void **pages)
+{
+	int rc;
+
+	switch (mode) {
+	case ETR_MODE_FLAT:
+	case ETR_MODE_ETR_SG:
+		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
+		if (!rc)
+			etr_buf->ops = etr_buf_ops[mode];
+		return rc;
+	default:
+		return -EINVAL;
+	}
+}
+
+/*
+ * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
+ * @drvdata	: ETR device details.
+ * @size	: size of the requested buffer.
+ * @flags	: Required properties for the buffer.
+ * @node	: Node for memory allocations.
+ * @pages	: An optional list of pages.
+ */
+static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
+					 ssize_t size, int flags,
+					 int node, void **pages)
+{
+	int rc = -ENOMEM;
+	bool has_etr_sg, has_iommu;
+	struct etr_buf *etr_buf;
+
+	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
+	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+
+	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
+	if (!etr_buf)
+		return ERR_PTR(-ENOMEM);
+
+	etr_buf->size = size;
+
+	/*
+	 * If we have to use an existing list of pages, we cannot reliably
+	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
+	 * we use the contiguous DMA memory if at least one of the following
+	 * conditions is true:
+	 *  a) The ETR cannot use Scatter-Gather.
+	 *  b) we have a backing IOMMU
+	 *  c) The requested memory size is smaller (< 1M).
+	 *
+	 * Fallback to available mechanisms.
+	 *
+	 */
+	if (!pages &&
+	    (!has_etr_sg || has_iommu || size < SZ_1M))
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
+					    etr_buf, node, pages);
+	if (rc && has_etr_sg)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
+					    etr_buf, node, pages);
+	if (rc) {
+		kfree(etr_buf);
+		return ERR_PTR(rc);
+	}
+
+	return etr_buf;
+}
+
+static void tmc_free_etr_buf(struct etr_buf *etr_buf)
+{
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
+	etr_buf->ops->free(etr_buf);
+	kfree(etr_buf);
+}
+
+/*
+ * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
+ * with a maximum of @len bytes.
+ * Returns: The size of the linear data available @pos, with *bufpp
+ * updated to point to the buffer.
+ */
+static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
+				    u64 offset, size_t len, char **bufpp)
+{
+	/* Adjust the length to limit this transaction to end of buffer */
+	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
+
+	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	ssize_t len;
+	char *bufp;
+
+	len = tmc_etr_buf_get_data(etr_buf, offset,
+				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
+	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+		return -EINVAL;
+	coresight_insert_barrier_packet(bufp);
+	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+}
+
+/*
+ * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
+ * Makes sure the trace data is synced to the memory for consumption.
+ * @etr_buf->offset will hold the offset to the beginning of the trace data
+ * within the buffer, with @etr_buf->len bytes to consume.
+ */
+static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
+{
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+	u64 rrp, rwp;
+	u32 status;
+
+	rrp = tmc_read_rrp(drvdata);
+	rwp = tmc_read_rwp(drvdata);
+	status = readl_relaxed(drvdata->base + TMC_STS);
+	etr_buf->full = status & TMC_STS_FULL;
+
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
+
+	etr_buf->ops->sync(etr_buf, rrp, rwp);
+
+	/* Insert barrier packets at the beginning, if there was an overflow */
+	if (etr_buf->full)
+		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
@@ -721,6 +1018,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
@@ -733,7 +1031,7 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	/* Wait for TMCSReady bit to be set */
 	tmc_wait_for_tmcready(drvdata);
 
-	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
+	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
 	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
 
 	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
@@ -746,16 +1044,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 		axictl |= TMC_AXICTL_ARCACHE_OS;
 	}
 
+	if (etr_buf->mode == ETR_MODE_ETR_SG) {
+		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
+			return;
+		axictl |= TMC_AXICTL_SCT_GAT_MODE;
+	}
+
 	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
-	tmc_write_dba(drvdata, drvdata->paddr);
+	tmc_write_dba(drvdata, etr_buf->hwaddr);
 	/*
 	 * If the TMC pointers must be programmed before the session,
 	 * we have to set it properly (i.e, RRP/RWP to base address and
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, drvdata->paddr);
-		tmc_write_rwp(drvdata, drvdata->paddr);
+		tmc_write_rrp(drvdata, etr_buf->hwaddr);
+		tmc_write_rwp(drvdata, etr_buf->hwaddr);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
@@ -771,63 +1075,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 }
 
 /*
- * Return the available trace data in the buffer @pos, with a maximum
- * limit of @len, also updating the @bufpp on where to find it.
+ * Return the available trace data in the buffer (starts at etr_buf->offset,
+ * limited by etr_buf->len) from @pos, with a maximum limit of @len,
+ * also updating the @bufpp on where to find it. Since the trace data
+ * starts at anywhere in the buffer, depending on the RRP, we adjust the
+ * @len returned to handle buffer wrapping around.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
-			    loff_t pos, size_t len, char **bufpp)
+				loff_t pos, size_t len, char **bufpp)
 {
+	s64 offset;
 	ssize_t actual = len;
-	char *bufp = drvdata->buf + pos;
-	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
-
-	/* Adjust the len to available size @pos */
-	if (pos + actual > drvdata->len)
-		actual = drvdata->len - pos;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
+	if (pos + actual > etr_buf->len)
+		actual = etr_buf->len - pos;
 	if (actual <= 0)
 		return actual;
 
-	/*
-	 * Since we use a circular buffer, with trace data starting
-	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
-	 * wrap the current @pos to within the buffer.
-	 */
-	if (bufp >= bufend)
-		bufp -= drvdata->size;
-	/*
-	 * For simplicity, avoid copying over a wrapped around buffer.
-	 */
-	if ((bufp + actual) > bufend)
-		actual = bufend - bufp;
-	*bufpp = bufp;
-	return actual;
+	/* Compute the offset from which we read the data */
+	offset = etr_buf->offset + pos;
+	if (offset >= etr_buf->size)
+		offset -= etr_buf->size;
+	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
 }
 
-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
+static struct etr_buf *
+tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	u32 val;
-	u64 rwp;
+	return tmc_alloc_etr_buf(drvdata, drvdata->size,
+				 0, cpu_to_node(0), NULL);
+}
 
-	rwp = tmc_read_rwp(drvdata);
-	val = readl_relaxed(drvdata->base + TMC_STS);
+static void
+tmc_etr_free_sysfs_buf(struct etr_buf *buf)
+{
+	if (buf)
+		tmc_free_etr_buf(buf);
+}
 
-	/*
-	 * Adjust the buffer to point to the beginning of the trace data
-	 * and update the available trace data.
-	 */
-	if (val & TMC_STS_FULL) {
-		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
-		drvdata->len = drvdata->size;
-		coresight_insert_barrier_packet(drvdata->buf);
-	} else {
-		drvdata->buf = drvdata->vaddr;
-		drvdata->len = rwp - drvdata->paddr;
-	}
+static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
+{
+	tmc_sync_etr_buf(drvdata);
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 {
+
 	CS_UNLOCK(drvdata->base);
 
 	tmc_flush_and_stop(drvdata);
@@ -836,7 +1130,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	 * read before the TMC is disabled.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
-		tmc_etr_dump_hw(drvdata);
+		tmc_etr_sync_sysfs_buf(drvdata);
+
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
@@ -850,34 +1145,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	int ret = 0;
 	bool used = false;
 	unsigned long flags;
-	void __iomem *vaddr = NULL;
-	dma_addr_t paddr;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_buf *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
-	 * If we don't have a buffer release the lock and allocate memory.
-	 * Otherwise keep the lock and move along.
+	 * If we are enabling the ETR from disabled state, we need to make
+	 * sure we have a buffer with the right size. The etr_buf is not reset
+	 * immediately after we stop the tracing in SYSFS mode as we wait for
+	 * the user to collect the data. We may be able to reuse the existing
+	 * buffer, provided the size matches. Any allocation has to be done
+	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->vaddr) {
+	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-		/*
-		 * Contiguous  memory can't be allocated while a spinlock is
-		 * held.  As such allocate memory here and free it if a buffer
-		 * has already been allocated (from a previous session).
-		 */
-		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
-					   &paddr, GFP_KERNEL);
-		if (!vaddr)
-			return -ENOMEM;
+		/* Allocate memory with the spinlock released */
+		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
+		if (IS_ERR(new_buf))
+			return PTR_ERR(new_buf);
 
 		/* Let's try again */
 		spin_lock_irqsave(&drvdata->spinlock, flags);
 	}
 
-	if (drvdata->reading) {
+	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -885,21 +1177,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
-	 * touched.
+	 * touched, even if the buffer size has changed.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
-	 * If drvdata::buf == NULL, use the memory allocated above.
-	 * Otherwise a buffer still exists from a previous session, so
-	 * simply use that.
+	 * If we don't have a buffer or it doesn't match the requested size,
+	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (drvdata->buf == NULL) {
+	if (!drvdata->etr_buf ||
+	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
 		used = true;
-		drvdata->vaddr = vaddr;
-		drvdata->paddr = paddr;
-		drvdata->buf = drvdata->vaddr;
+		free_buf = drvdata->etr_buf;
+		drvdata->etr_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
@@ -908,8 +1199,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free memory outside the spinlock if need be */
-	if (!used && vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (free_buf)
+		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
 		dev_info(drvdata->dev, "TMC-ETR enabled\n");
@@ -988,8 +1279,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::buf is NULL the trace data has been read already */
-	if (drvdata->buf == NULL) {
+	/* If drvdata::etr_buf is NULL the trace data has been read already */
+	if (drvdata->etr_buf == NULL) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1008,8 +1299,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	dma_addr_t paddr;
-	void __iomem *vaddr = NULL;
+	struct etr_buf *etr_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -1030,17 +1320,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		vaddr = drvdata->vaddr;
-		paddr = drvdata->paddr;
-		drvdata->buf = drvdata->vaddr = NULL;
+		etr_buf =  drvdata->etr_buf;
+		drvdata->etr_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (etr_buf)
+		tmc_free_etr_buf(etr_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 74d8f24..6f7bec7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -56,6 +56,7 @@
 #define TMC_STS_TMCREADY_BIT	2
 #define TMC_STS_FULL		BIT(0)
 #define TMC_STS_TRIGGERED	BIT(1)
+
 /*
  * TMC_AXICTL - 0x110
  *
@@ -135,6 +136,35 @@ enum tmc_mem_intf_width {
 #define CORESIGHT_SOC_600_ETR_CAPS	\
 	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
 
+enum etr_mode {
+	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
+	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+};
+
+struct etr_buf_operations;
+
+/**
+ * struct etr_buf - Details of the buffer used by ETR
+ * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
+ * @full	: Trace data overflow
+ * @size	: Size of the buffer.
+ * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
+ * @offset	: Offset of the trace data in the buffer for consumption.
+ * @len		: Available trace data @buf (may round up to the beginning).
+ * @ops		: ETR buffer operations for the mode.
+ * @private	: Backend specific information for the buf
+ */
+struct etr_buf {
+	enum etr_mode			mode;
+	bool				full;
+	ssize_t				size;
+	dma_addr_t			hwaddr;
+	unsigned long			offset;
+	s64				len;
+	const struct etr_buf_operations	*ops;
+	void				*private;
+};
+
 /**
  * struct tmc_drvdata - specifics associated to an TMC component
  * @base:	memory mapped base address for this component.
@@ -142,11 +172,10 @@ enum tmc_mem_intf_width {
  * @csdev:	component vitals needed by the framework.
  * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
  * @spinlock:	only one at a time pls.
- * @buf:	area of memory where trace data get sent.
- * @paddr:	DMA start location in RAM.
- * @vaddr:	virtual representation of @paddr.
- * @size:	trace buffer size.
- * @len:	size of the available trace.
+ * @buf:	Snapshot of the trace data for ETF/ETB.
+ * @etr_buf:	details of buffer used in TMC-ETR
+ * @len:	size of the available trace for ETF/ETB.
+ * @size:	trace buffer size for this TMC (common for all modes).
  * @mode:	how this TMC is being used.
  * @config_type: TMC variant, must be of type @tmc_config_type.
  * @memwidth:	width of the memory interface databus, in bytes.
@@ -161,11 +190,12 @@ struct tmc_drvdata {
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
 	bool			reading;
-	char			*buf;
-	dma_addr_t		paddr;
-	void __iomem		*vaddr;
-	u32			size;
+	union {
+		char		*buf;		/* TMC ETB */
+		struct etr_buf	*etr_buf;	/* TMC ETR */
+	};
 	u32			len;
+	u32			size;
 	u32			mode;
 	enum tmc_config_type	config_type;
 	enum tmc_mem_intf_width	memwidth;
@@ -173,6 +203,15 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+struct etr_buf_operations {
+	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+			int node, void **pages);
+	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
+				char **bufpp);
+	void (*free)(struct etr_buf *etr_buf);
+};
+
 /**
  * struct tmc_pages - Collection of pages used for SG.
  * @nr_pages:		Number of pages in the list.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 17/27] coresight: etr: Add support for save restore buffers
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Add support for creating buffers which can be used in save-restore
mode (e.g, for use by perf). If the TMC-ETR supports save-restore
feature, we could support the mode in all buffer backends. However,
if it doesn't, we should fall back to using in built SG mechanism,
where we can rotate the SG table by making some adjustments in the
page table.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 142 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-tmc.h     |  16 +++
 2 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index fde3fa6..25e7feb 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -604,7 +604,7 @@ tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
  *    which starts @base_offset.
  * 2) Mark the page at the base_offset + size as LAST.
  */
-static int __maybe_unused
+static int
 tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
 			unsigned long base_offset, unsigned long size)
 {
@@ -736,6 +736,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
 	flat_buf->size = etr_buf->size;
 	flat_buf->dev = drvdata->dev;
 	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->rrp = flat_buf->daddr;
+	etr_buf->rwp = flat_buf->daddr;
+	etr_buf->status = 0;
 	etr_buf->mode = ETR_MODE_FLAT;
 	etr_buf->private = flat_buf;
 	return 0;
@@ -777,11 +780,36 @@ static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
 	return len;
 }
 
+/*
+ * tmc_etr_restore_flat_buf: Restore the flat buffer pointers.
+ * This is only possible with in-built ETR capability to save-restore
+ * the pointers. The DBA will still point to the original start of the
+ * buffer.
+ */
+static int tmc_etr_restore_flat_buf(struct etr_buf *etr_buf,
+				    unsigned long r_offset,
+				    unsigned long w_offset,
+				    unsigned long size,
+				    u32 status,
+				    bool has_save_restore)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (!has_save_restore || !flat_buf || size > flat_buf->size)
+		return -EINVAL;
+	etr_buf->rrp = flat_buf->daddr + (r_offset % flat_buf->size);
+	etr_buf->rwp = flat_buf->daddr + (w_offset % flat_buf->size);
+	etr_buf->size = size;
+	etr_buf->status = status;
+	return 0;
+}
+
 static const struct etr_buf_operations etr_flat_buf_ops = {
 	.alloc = tmc_etr_alloc_flat_buf,
 	.free = tmc_etr_free_flat_buf,
 	.sync = tmc_etr_sync_flat_buf,
 	.get_data = tmc_etr_get_data_flat_buf,
+	.restore = tmc_etr_restore_flat_buf,
 };
 
 /*
@@ -799,6 +827,7 @@ static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
 	if (IS_ERR(etr_table))
 		return -ENOMEM;
 	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->status = 0;
 	etr_buf->mode = ETR_MODE_ETR_SG;
 	etr_buf->private = etr_table;
 	return 0;
@@ -825,9 +854,11 @@ static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
 static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
 {
 	long r_offset, w_offset;
+	unsigned long buf_size;
 	struct etr_sg_table *etr_table = etr_buf->private;
 	struct tmc_sg_table *table = etr_table->sg_table;
 
+	buf_size = tmc_sg_table_buf_size(table);
 	/* Convert hw address to offset in the buffer */
 	r_offset = tmc_sg_get_data_page_offset(table, rrp);
 	if (r_offset < 0) {
@@ -849,16 +880,62 @@ static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
 	if (etr_buf->full)
 		etr_buf->len = etr_buf->size;
 	else
-		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+		etr_buf->len = ((w_offset < r_offset) ? buf_size : 0) +
 				w_offset - r_offset;
 	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
 }
 
+static int tmc_etr_restore_sg_buf(struct etr_buf *etr_buf,
+				  unsigned long r_offset,
+				  unsigned long w_offset,
+				  unsigned long size,
+				  u32 __always_unused status,
+				  bool has_save_restore)
+{
+	int rc;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct device *dev = etr_table->sg_table->dev;
+
+	/*
+	 * It is highly unlikely that we have an ETR with in-built SG and
+	 * Save-Restore capability and we are not sure if the PTRs will
+	 * be updated.
+	 */
+	if (has_save_restore) {
+		dev_warn_once(dev,
+		"Unexpected feature combination of SG and save-restore\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Since we cannot program RRP/RWP different from DBAL, the offsets
+	 * should match.
+	 */
+	if (r_offset != w_offset) {
+		dev_dbg(dev, "Mismatched RRP/RWP offsets\n");
+		return -EINVAL;
+	}
+
+	/* Make sure the size is aligned */
+	size &= ~(ETR_SG_PAGE_SIZE - 1);
+
+	rc = tmc_etr_sg_table_rotate(etr_table, w_offset, size);
+	if (!rc) {
+		etr_buf->hwaddr = etr_table->hwaddr;
+		etr_buf->rrp = etr_table->hwaddr;
+		etr_buf->rwp = etr_table->hwaddr;
+		etr_buf->size = size;
+	}
+
+	return rc;
+}
+
 static const struct etr_buf_operations etr_sg_buf_ops = {
 	.alloc = tmc_etr_alloc_sg_buf,
 	.free = tmc_etr_free_sg_buf,
 	.sync = tmc_etr_sync_sg_buf,
 	.get_data = tmc_etr_get_data_sg_buf,
+	.restore = tmc_etr_restore_sg_buf,
 };
 
 static const struct etr_buf_operations *etr_buf_ops[] = {
@@ -899,10 +976,42 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 {
 	int rc = -ENOMEM;
 	bool has_etr_sg, has_iommu;
+	bool has_flat, has_save_restore;
 	struct etr_buf *etr_buf;
 
 	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
 	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+
+	/*
+	 * We can normally use flat DMA buffer provided that the buffer
+	 * is not used in save restore fashion without hardware support.
+	 */
+	has_flat = !(flags & ETR_BUF_F_RESTORE_PTRS) || has_save_restore;
+
+	/*
+	 * To support save-restore on a given ETR we have the following
+	 * conditions:
+	 *  1) If the buffer requires save-restore of a pointers as well
+	 *     as the Status bit, we require ETR support for it and we coul
+	 *     support all the backends.
+	 *  2) If the buffer requires only save-restore of pointers, then
+	 *     we could exploit a circular ETR SG list. None of the other
+	 *     backends can support it without the ETR feature.
+	 *
+	 * If the buffer will be used in a save-restore mode without
+	 * the ETR support for SAVE_RESTORE, we can only support TMC
+	 * ETR in-built SG tables which can be rotated to make it work.
+	 */
+	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
+		return ERR_PTR(-EINVAL);
+
+	if (!has_flat && !has_etr_sg) {
+		dev_dbg(drvdata->dev,
+			"No available backends for ETR buffer with flags %x\n",
+			flags);
+		return ERR_PTR(-EINVAL);
+	}
 
 	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
 	if (!etr_buf)
@@ -922,7 +1031,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	 * Fallback to available mechanisms.
 	 *
 	 */
-	if (!pages &&
+	if (!pages && has_flat &&
 	    (!has_etr_sg || has_iommu || size < SZ_1M))
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
 					    etr_buf, node, pages);
@@ -999,6 +1108,29 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
 }
 
+static int __maybe_unused
+tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+		    unsigned long r_offset, unsigned long w_offset,
+		    unsigned long size, u32 status)
+{
+	bool has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+
+	if (WARN_ON_ONCE(!has_save_restore && etr_buf->mode != ETR_MODE_ETR_SG))
+		return -EINVAL;
+	/*
+	 * If we use a circular SG list without ETR support, we can't
+	 * support restoring "Full" bit.
+	 */
+	if (WARN_ON_ONCE(!has_save_restore && status))
+		return -EINVAL;
+	if (status & ~TMC_STS_FULL)
+		return -EINVAL;
+	if (etr_buf->ops->restore)
+		return etr_buf->ops->restore(etr_buf, r_offset, w_offset, size,
+					      status, has_save_restore);
+	return -EINVAL;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
@@ -1058,8 +1190,8 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, etr_buf->hwaddr);
-		tmc_write_rwp(drvdata, etr_buf->hwaddr);
+		tmc_write_rrp(drvdata, etr_buf->rrp);
+		tmc_write_rwp(drvdata, etr_buf->rwp);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 6f7bec7..1bdfb38 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -141,12 +141,22 @@ enum etr_mode {
 	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
 };
 
+/* ETR buffer should support save-restore */
+#define ETR_BUF_F_RESTORE_PTRS		0x1
+#define ETR_BUF_F_RESTORE_STATUS	0x2
+
+#define ETR_BUF_F_RESTORE_MINIMAL	ETR_BUF_F_RESTORE_PTRS
+#define ETR_BUF_F_RESTORE_FULL		(ETR_BUF_F_RESTORE_PTRS |\
+					 ETR_BUF_F_RESTORE_STATUS)
 struct etr_buf_operations;
 
 /**
  * struct etr_buf - Details of the buffer used by ETR
  * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
  * @full	: Trace data overflow
+ * @status	: Value for STATUS if the ETR supports save-restore.
+ * @rrp		: Value for RRP{LO:HI} if the ETR supports save-restore
+ * @rwp		: Value for RWP{LO:HI} if the ETR supports save-restore
  * @size	: Size of the buffer.
  * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
  * @offset	: Offset of the trace data in the buffer for consumption.
@@ -157,6 +167,9 @@ struct etr_buf_operations;
 struct etr_buf {
 	enum etr_mode			mode;
 	bool				full;
+	u32				status;
+	dma_addr_t			rrp;
+	dma_addr_t			rwp;
 	ssize_t				size;
 	dma_addr_t			hwaddr;
 	unsigned long			offset;
@@ -207,6 +220,9 @@ struct etr_buf_operations {
 	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
 			int node, void **pages);
 	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	int (*restore)(struct etr_buf *etr_buf, unsigned long r_offset,
+		       unsigned long w_offset, unsigned long size,
+		       u32 status, bool has_save_restore);
 	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
 				char **bufpp);
 	void (*free)(struct etr_buf *etr_buf);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 17/27] coresight: etr: Add support for save restore buffers
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for creating buffers which can be used in save-restore
mode (e.g, for use by perf). If the TMC-ETR supports save-restore
feature, we could support the mode in all buffer backends. However,
if it doesn't, we should fall back to using in built SG mechanism,
where we can rotate the SG table by making some adjustments in the
page table.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 142 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-tmc.h     |  16 +++
 2 files changed, 153 insertions(+), 5 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index fde3fa6..25e7feb 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -604,7 +604,7 @@ tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
  *    which starts @base_offset.
  * 2) Mark the page at the base_offset + size as LAST.
  */
-static int __maybe_unused
+static int
 tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
 			unsigned long base_offset, unsigned long size)
 {
@@ -736,6 +736,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
 	flat_buf->size = etr_buf->size;
 	flat_buf->dev = drvdata->dev;
 	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->rrp = flat_buf->daddr;
+	etr_buf->rwp = flat_buf->daddr;
+	etr_buf->status = 0;
 	etr_buf->mode = ETR_MODE_FLAT;
 	etr_buf->private = flat_buf;
 	return 0;
@@ -777,11 +780,36 @@ static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
 	return len;
 }
 
+/*
+ * tmc_etr_restore_flat_buf: Restore the flat buffer pointers.
+ * This is only possible with in-built ETR capability to save-restore
+ * the pointers. The DBA will still point to the original start of the
+ * buffer.
+ */
+static int tmc_etr_restore_flat_buf(struct etr_buf *etr_buf,
+				    unsigned long r_offset,
+				    unsigned long w_offset,
+				    unsigned long size,
+				    u32 status,
+				    bool has_save_restore)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (!has_save_restore || !flat_buf || size > flat_buf->size)
+		return -EINVAL;
+	etr_buf->rrp = flat_buf->daddr + (r_offset % flat_buf->size);
+	etr_buf->rwp = flat_buf->daddr + (w_offset % flat_buf->size);
+	etr_buf->size = size;
+	etr_buf->status = status;
+	return 0;
+}
+
 static const struct etr_buf_operations etr_flat_buf_ops = {
 	.alloc = tmc_etr_alloc_flat_buf,
 	.free = tmc_etr_free_flat_buf,
 	.sync = tmc_etr_sync_flat_buf,
 	.get_data = tmc_etr_get_data_flat_buf,
+	.restore = tmc_etr_restore_flat_buf,
 };
 
 /*
@@ -799,6 +827,7 @@ static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
 	if (IS_ERR(etr_table))
 		return -ENOMEM;
 	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->status = 0;
 	etr_buf->mode = ETR_MODE_ETR_SG;
 	etr_buf->private = etr_table;
 	return 0;
@@ -825,9 +854,11 @@ static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
 static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
 {
 	long r_offset, w_offset;
+	unsigned long buf_size;
 	struct etr_sg_table *etr_table = etr_buf->private;
 	struct tmc_sg_table *table = etr_table->sg_table;
 
+	buf_size = tmc_sg_table_buf_size(table);
 	/* Convert hw address to offset in the buffer */
 	r_offset = tmc_sg_get_data_page_offset(table, rrp);
 	if (r_offset < 0) {
@@ -849,16 +880,62 @@ static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
 	if (etr_buf->full)
 		etr_buf->len = etr_buf->size;
 	else
-		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+		etr_buf->len = ((w_offset < r_offset) ? buf_size : 0) +
 				w_offset - r_offset;
 	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
 }
 
+static int tmc_etr_restore_sg_buf(struct etr_buf *etr_buf,
+				  unsigned long r_offset,
+				  unsigned long w_offset,
+				  unsigned long size,
+				  u32 __always_unused status,
+				  bool has_save_restore)
+{
+	int rc;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct device *dev = etr_table->sg_table->dev;
+
+	/*
+	 * It is highly unlikely that we have an ETR with in-built SG and
+	 * Save-Restore capability and we are not sure if the PTRs will
+	 * be updated.
+	 */
+	if (has_save_restore) {
+		dev_warn_once(dev,
+		"Unexpected feature combination of SG and save-restore\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Since we cannot program RRP/RWP different from DBAL, the offsets
+	 * should match.
+	 */
+	if (r_offset != w_offset) {
+		dev_dbg(dev, "Mismatched RRP/RWP offsets\n");
+		return -EINVAL;
+	}
+
+	/* Make sure the size is aligned */
+	size &= ~(ETR_SG_PAGE_SIZE - 1);
+
+	rc = tmc_etr_sg_table_rotate(etr_table, w_offset, size);
+	if (!rc) {
+		etr_buf->hwaddr = etr_table->hwaddr;
+		etr_buf->rrp = etr_table->hwaddr;
+		etr_buf->rwp = etr_table->hwaddr;
+		etr_buf->size = size;
+	}
+
+	return rc;
+}
+
 static const struct etr_buf_operations etr_sg_buf_ops = {
 	.alloc = tmc_etr_alloc_sg_buf,
 	.free = tmc_etr_free_sg_buf,
 	.sync = tmc_etr_sync_sg_buf,
 	.get_data = tmc_etr_get_data_sg_buf,
+	.restore = tmc_etr_restore_sg_buf,
 };
 
 static const struct etr_buf_operations *etr_buf_ops[] = {
@@ -899,10 +976,42 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 {
 	int rc = -ENOMEM;
 	bool has_etr_sg, has_iommu;
+	bool has_flat, has_save_restore;
 	struct etr_buf *etr_buf;
 
 	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
 	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+
+	/*
+	 * We can normally use flat DMA buffer provided that the buffer
+	 * is not used in save restore fashion without hardware support.
+	 */
+	has_flat = !(flags & ETR_BUF_F_RESTORE_PTRS) || has_save_restore;
+
+	/*
+	 * To support save-restore on a given ETR we have the following
+	 * conditions:
+	 *  1) If the buffer requires save-restore of a pointers as well
+	 *     as the Status bit, we require ETR support for it and we coul
+	 *     support all the backends.
+	 *  2) If the buffer requires only save-restore of pointers, then
+	 *     we could exploit a circular ETR SG list. None of the other
+	 *     backends can support it without the ETR feature.
+	 *
+	 * If the buffer will be used in a save-restore mode without
+	 * the ETR support for SAVE_RESTORE, we can only support TMC
+	 * ETR in-built SG tables which can be rotated to make it work.
+	 */
+	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
+		return ERR_PTR(-EINVAL);
+
+	if (!has_flat && !has_etr_sg) {
+		dev_dbg(drvdata->dev,
+			"No available backends for ETR buffer with flags %x\n",
+			flags);
+		return ERR_PTR(-EINVAL);
+	}
 
 	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
 	if (!etr_buf)
@@ -922,7 +1031,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	 * Fallback to available mechanisms.
 	 *
 	 */
-	if (!pages &&
+	if (!pages && has_flat &&
 	    (!has_etr_sg || has_iommu || size < SZ_1M))
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
 					    etr_buf, node, pages);
@@ -999,6 +1108,29 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
 }
 
+static int __maybe_unused
+tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+		    unsigned long r_offset, unsigned long w_offset,
+		    unsigned long size, u32 status)
+{
+	bool has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+
+	if (WARN_ON_ONCE(!has_save_restore && etr_buf->mode != ETR_MODE_ETR_SG))
+		return -EINVAL;
+	/*
+	 * If we use a circular SG list without ETR support, we can't
+	 * support restoring "Full" bit.
+	 */
+	if (WARN_ON_ONCE(!has_save_restore && status))
+		return -EINVAL;
+	if (status & ~TMC_STS_FULL)
+		return -EINVAL;
+	if (etr_buf->ops->restore)
+		return etr_buf->ops->restore(etr_buf, r_offset, w_offset, size,
+					      status, has_save_restore);
+	return -EINVAL;
+}
+
 static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 {
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
@@ -1058,8 +1190,8 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, etr_buf->hwaddr);
-		tmc_write_rwp(drvdata, etr_buf->hwaddr);
+		tmc_write_rrp(drvdata, etr_buf->rrp);
+		tmc_write_rwp(drvdata, etr_buf->rwp);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 6f7bec7..1bdfb38 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -141,12 +141,22 @@ enum etr_mode {
 	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
 };
 
+/* ETR buffer should support save-restore */
+#define ETR_BUF_F_RESTORE_PTRS		0x1
+#define ETR_BUF_F_RESTORE_STATUS	0x2
+
+#define ETR_BUF_F_RESTORE_MINIMAL	ETR_BUF_F_RESTORE_PTRS
+#define ETR_BUF_F_RESTORE_FULL		(ETR_BUF_F_RESTORE_PTRS |\
+					 ETR_BUF_F_RESTORE_STATUS)
 struct etr_buf_operations;
 
 /**
  * struct etr_buf - Details of the buffer used by ETR
  * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
  * @full	: Trace data overflow
+ * @status	: Value for STATUS if the ETR supports save-restore.
+ * @rrp		: Value for RRP{LO:HI} if the ETR supports save-restore
+ * @rwp		: Value for RWP{LO:HI} if the ETR supports save-restore
  * @size	: Size of the buffer.
  * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
  * @offset	: Offset of the trace data in the buffer for consumption.
@@ -157,6 +167,9 @@ struct etr_buf_operations;
 struct etr_buf {
 	enum etr_mode			mode;
 	bool				full;
+	u32				status;
+	dma_addr_t			rrp;
+	dma_addr_t			rwp;
 	ssize_t				size;
 	dma_addr_t			hwaddr;
 	unsigned long			offset;
@@ -207,6 +220,9 @@ struct etr_buf_operations {
 	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
 			int node, void **pages);
 	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	int (*restore)(struct etr_buf *etr_buf, unsigned long r_offset,
+		       unsigned long w_offset, unsigned long size,
+		       u32 status, bool has_save_restore);
 	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
 				char **bufpp);
 	void (*free)(struct etr_buf *etr_buf);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

This patch adds the support for setting up a SG table for use
by the CATU. We reuse the tmc_sg_table to represent the table/data
pages, even though the table format is different.

Similar to ETR SG table, CATU uses a 4KB page size for data buffers
as well as page tables. All table entries are 64bit wide and have
the following format:

        63                      12      1  0
        x-----------------------------------x
        |        Address [63-12] | SBZ  | V |
        x-----------------------------------x

	Where [V] ->	 0 - Pointer is invalid
			 1 - Pointer is Valid

CATU uses only first half of the page for data page pointers.
i.e, single table page will only have 256 page pointers, addressing
upto 1MB of data. The second half of a table page contains only two
pointers at the end of the page (i.e, pointers at index 510 and 511),
which are used as links to the "Previous" and "Next" page tables
respectively.

The first table page has an "Invalid" previous pointer and the
next pointer entry points to the second page table if there is one.
Similarly the last table page has an "Invalid" next pointer to
indicate the end of the table chain.

We create a circular buffer (i.e, first_table[prev] => last_table
and last_table[next] => first_table) by default and provide
helpers to make the buffer linear from a given offset. When we
set the buffer to linear, we also mark the "pointers" in the
outside the given "range" as invalid. We have to do this only
for the starting and ending tables, as we disconnect the other
table by invalidating the links. This will allow the ETR buf to
be restored from a given offset with any size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-catu.c | 409 +++++++++++++++++++++++++++
 1 file changed, 409 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
index 2cd69a6..4cc2928 100644
--- a/drivers/hwtracing/coresight/coresight-catu.c
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -16,10 +16,419 @@
 
 #include "coresight-catu.h"
 #include "coresight-priv.h"
+#include "coresight-tmc.h"
 
 #define csdev_to_catu_drvdata(csdev)	\
 	dev_get_drvdata(csdev->dev.parent)
 
+/*
+ * CATU uses a page size of 4KB for page tables as well as data pages.
+ * Each 64bit entry in the table has the following format.
+ *
+ *	63			12	1  0
+ *	------------------------------------
+ *	|	 Address [63-12] | SBZ	| V|
+ *	------------------------------------
+ *
+ * Where bit[0] V indicates if the address is valid or not.
+ * Each 4K table pages have upto 256 data page pointers, taking upto 2K
+ * size. There are two Link pointers, pointing to the previous and next
+ * table pages respectively at the end of the 4K page. (i.e, entry 510
+ * and 511).
+ *  E.g, a table of two pages could look like :
+ *
+ *                 Table Page 0               Table Page 1
+ * SLADDR ===> x------------------x  x--> x-----------------x
+ * INADDR    ->|  Page 0      | V |  |    | Page 256    | V | <- INADDR+1M
+ *             |------------------|  |    |-----------------|
+ * INADDR+4K ->|  Page 1      | V |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |  Page 2      | V |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |   ...        | V |  |    |    ...          |
+ *             |------------------|  |    |-----------------|
+ * INADDR+1020K|  Page 255    | V |  |    |   Page 511  | V |
+ * SLADDR+2K==>|------------------|  |    |-----------------|
+ *             |  UNUSED      |   |  |    |                 |
+ *             |------------------|  |    |                 |
+ *             |  UNUSED      |   |  |    |                 |
+ *             |------------------|  |    |                 |
+ *             |    ...       |   |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |   IGNORED    | 0 |  |    | Table Page 0| 1 |
+ *             |------------------|  |    |-----------------|
+ *             |  Table Page 1| 1 |--x    | IGNORED     | 0 |
+ *             x------------------x       x-----------------x
+ * SLADDR+4K==>
+ *
+ * The base input address (used by the ETR, programmed in INADDR_{LO,HI})
+ * must be aligned to 1MB (the size addressable by a single page table).
+ * The CATU maps INADDR{LO:HI} to the first page in the table pointed
+ * to by SLADDR{LO:HI} and so on.
+ *
+ */
+typedef u64 cate_t;
+
+#define CATU_PAGE_SHIFT		12
+#define CATU_PAGE_SIZE		(1UL << CATU_PAGE_SHIFT)
+#define CATU_PAGES_PER_SYSPAGE	(PAGE_SIZE / CATU_PAGE_SIZE)
+
+/* Page pointers are only allocated in the first 2K half */
+#define CATU_PTRS_PER_PAGE	((CATU_PAGE_SIZE >> 1) / sizeof(cate_t))
+#define CATU_PTRS_PER_SYSPAGE	(CATU_PAGES_PER_SYSPAGE * CATU_PTRS_PER_PAGE)
+#define CATU_LINK_PREV		((CATU_PAGE_SIZE / sizeof(cate_t)) - 2)
+#define CATU_LINK_NEXT		((CATU_PAGE_SIZE / sizeof(cate_t)) - 1)
+
+#define CATU_ADDR_SHIFT		12
+#define CATU_ADDR_MASK		~(((cate_t)1 << CATU_ADDR_SHIFT) - 1)
+#define CATU_ENTRY_VALID	((cate_t)0x1)
+#define CATU_ENTRY_INVALID	((cate_t)0)
+#define CATU_VALID_ENTRY(addr) \
+	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
+#define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
+
+/*
+ * Index into the CATU entry pointing to the page within
+ * the table. Each table entry can point to a 4KB page, with
+ * a total of 255 entries in the table adding upto 1MB per table.
+ *
+ * So, bits 19:12 gives you the index of the entry in
+ * the table.
+ */
+static inline unsigned long catu_offset_to_entry_idx(unsigned long offset)
+{
+	return (offset & (SZ_1M - 1)) >> 12;
+}
+
+static inline void catu_update_state(cate_t *catep, int valid)
+{
+	*catep &= ~CATU_ENTRY_VALID;
+	*catep |= valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
+}
+
+/*
+ * Update the valid bit for a given range of indices [start, end)
+ * in the given table @table.
+ */
+static inline void catu_update_state_range(cate_t *table, int start,
+						 int end, int valid)
+{
+	int i;
+	cate_t *pentry = &table[start];
+	cate_t state = valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
+
+	/* Limit the "end" to maximum range */
+	if (end > CATU_PTRS_PER_PAGE)
+		end = CATU_PTRS_PER_PAGE;
+
+	for (i = start; i < end; i++, pentry++) {
+		*pentry &= ~(cate_t)CATU_ENTRY_VALID;
+		*pentry |= state;
+	}
+}
+
+/*
+ * Update valid bit for all entries in the range [start, end)
+ */
+static inline void
+catu_table_update_offset_range(cate_t *table,
+			       unsigned long start,
+			       unsigned long end,
+			       int valid)
+{
+	catu_update_state_range(table,
+				catu_offset_to_entry_idx(start),
+				catu_offset_to_entry_idx(end),
+				valid);
+}
+
+static inline void catu_table_update_prev(cate_t *table, int valid)
+{
+	catu_update_state(&table[CATU_LINK_PREV], valid);
+}
+
+static inline void catu_table_update_next(cate_t *table, int valid)
+{
+	catu_update_state(&table[CATU_LINK_NEXT], valid);
+}
+
+/*
+ * catu_get_table : Retrieve the table pointers for the given @offset
+ * within the buffer. The buffer is wrapped around to a valid offset.
+ *
+ * Returns : The CPU virtual address for the beginning of the table
+ * containing the data page pointer for @offset. If @daddrp is not NULL,
+ * @daddrp points the DMA address of the beginning of the table.
+ */
+static inline cate_t *catu_get_table(struct tmc_sg_table *catu_table,
+				     unsigned long offset,
+				     dma_addr_t *daddrp)
+{
+	unsigned long buf_size = tmc_sg_table_buf_size(catu_table);
+	unsigned int table_nr, pg_idx, pg_offset;
+	struct tmc_pages *table_pages = &catu_table->table_pages;
+	void *ptr;
+
+	/* Make sure offset is within the range */
+	offset %= buf_size;
+
+	/*
+	 * Each table can address 1MB and a single kernel page can
+	 * contain "CATU_PAGES_PER_SYSPAGE" CATU tables.
+	 */
+	table_nr = offset >> 20;
+	/* Find the table page where the table_nr lies in */
+	pg_idx = table_nr / CATU_PAGES_PER_SYSPAGE;
+	pg_offset = (table_nr % CATU_PAGES_PER_SYSPAGE) * CATU_PAGE_SIZE;
+	if (daddrp)
+		*daddrp = table_pages->daddrs[pg_idx] + pg_offset;
+	ptr = page_address(table_pages->pages[pg_idx]);
+	return (cate_t *)((unsigned long)ptr + pg_offset);
+}
+
+#ifdef CATU_DEBUG
+static void catu_dump_table(struct tmc_sg_table *catu_table)
+{
+	int i;
+	cate_t *table;
+	unsigned long table_end, buf_size, offset = 0;
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	dev_dbg(catu_table->dev,
+		"Dump table %p, tdaddr: %llx\n",
+		catu_table, catu_table->table_daddr);
+
+	while (offset < buf_size) {
+		table_end = offset + SZ_1M < buf_size ?
+			    offset + SZ_1M : buf_size;
+		table = catu_get_table(catu_table, offset, NULL);
+		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
+			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
+		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
+			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
+		dev_dbg(catu_table->dev, "== End of sub-table ===");
+	}
+	dev_dbg(catu_table->dev, "== End of Table ===");
+}
+
+#else
+static inline void catu_dump_table(struct tmc_sg_table *catu_table)
+{
+}
+#endif
+
+/*
+ * catu_update_table: Update the start and end tables for the
+ * region [base, base + size) to, validate/invalidate the pointers
+ * outside the area.
+ *
+ * CATU expects the table base address (SLADDR) aligned to 4K.
+ * If the @base is not aligned to 1MB, we should mark all the
+ * pointers in the start table before @base "INVALID".
+ * Similarly all pointers in the last table beyond (@base + @size)
+ * should be marked INVALID.
+ * The table page containinig the "base" is marked first (by
+ * marking the previous link INVALID) and the table page
+ * containing "base + size" is marked last (by marking next
+ * link INVALID).
+ * By default we have to update the state of pointers
+ * for offsets in the range :
+ *    Start table: [0, ALIGN_DOWN(base))
+ *    End table  : [ALIGN(end + 1), SZ_1M)
+ * But, if we the buffer wraps around and ends in the same table
+ * as the "base", (i,e this should be :
+ *         [ALIGN(end + 1), base)
+ *
+ * Returns the dma_address for the start_table, which can be set as
+ * SLADDR.
+ */
+static dma_addr_t catu_update_table(struct tmc_sg_table *catu_table,
+				    u64 base, u64 size, int valid)
+{
+	cate_t *start_table, *end_table;
+	dma_addr_t taddr;
+	u64 buf_size, end = base + size - 1;
+	unsigned int start_off = 0;	/* Offset to begin in start_table */
+	unsigned int end_off = SZ_1M;	/* Offset to end in the end_table */
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	if (end > buf_size)
+		end -= buf_size;
+
+	/* Get both the virtual and the DMA address of the first table */
+	start_table = catu_get_table(catu_table, base, &taddr);
+	end_table = catu_get_table(catu_table, end, NULL);
+
+	/* Update the "PREV" link for the starting table */
+	catu_table_update_prev(start_table, valid);
+
+	/* Update the "NEXT" link only if this is not the start_table */
+	if (end_table != start_table) {
+		catu_table_update_next(end_table, valid);
+	} else if (end < base) {
+		/*
+		 * If the buffer has wrapped around and we have got the
+		 * "end" before "base" in the same table, we need to be
+		 * extra careful. We only need to invalidate the ptrs
+		 * in between the "end" and "base".
+		 */
+		start_off = ALIGN(end, CATU_PAGE_SIZE);
+		end_off = 0;
+	}
+
+	/* Update the pointers in the starting table before the "base" */
+	catu_table_update_offset_range(start_table,
+				       start_off,
+				       base,
+				       valid);
+	if (end_off)
+		catu_table_update_offset_range(end_table,
+					       end,
+					       end_off,
+					       valid);
+
+	catu_dump_table(catu_table);
+	return taddr;
+}
+
+/*
+ * catu_set_table : Set the buffer to act as linear buffer
+ * from @base of @size.
+ *
+ * Returns : The DMA address for the table containing base.
+ * This can then be programmed into SLADDR.
+ */
+static dma_addr_t
+catu_set_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
+{
+	/* Make all the entries outside this range invalid */
+	dma_addr_t sladdr =  catu_update_table(catu_table, base, size, 0);
+	/* Sync the changes to memory for CATU */
+	tmc_sg_table_sync_table(catu_table);
+	return sladdr;
+}
+
+static void __maybe_unused
+catu_reset_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
+{
+	/* Make all the entries outside this range valid */
+	(void)catu_update_table(catu_table, base, size, 1);
+}
+
+/*
+ * catu_populate_table : Populate the given CATU table.
+ * The table is always populated as a circular table.
+ * i.e, the "prev" link of the "first" table points to the "last"
+ * table and the "next" link of the "last" table points to the
+ * "first" table. The buffer should be made linear by calling
+ * catu_set_table().
+ */
+static void
+catu_populate_table(struct tmc_sg_table *catu_table)
+{
+	int i, dpidx, s_dpidx;
+	unsigned long offset, buf_size, last_offset;
+	dma_addr_t data_daddr;
+	dma_addr_t prev_taddr, next_taddr, cur_taddr;
+	cate_t *table_ptr, *next_table;
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	dpidx = s_dpidx = 0;
+	offset = 0;
+
+	table_ptr = catu_get_table(catu_table, 0, &cur_taddr);
+	/*
+	 * Use the address of the "last" table as the "prev" link
+	 * for the first table.
+	 */
+	(void)catu_get_table(catu_table, buf_size - 1, &prev_taddr);
+
+	while (offset < buf_size) {
+		/*
+		 * The @offset is always 1M aligned here and we have an
+		 * empty table @table_ptr to fill. Each table can address
+		 * upto 1MB data buffer. The last table may have fewer
+		 * entries if the buffer size is not aligned.
+		 */
+		last_offset = (offset + SZ_1M) < buf_size ?
+			      (offset + SZ_1M) : buf_size;
+		for (i = 0; offset < last_offset; i++) {
+
+			data_daddr = catu_table->data_pages.daddrs[dpidx] +
+				     s_dpidx * CATU_PAGE_SIZE;
+#ifdef CATU_DEBUG
+			dev_dbg(catu_table->dev,
+				"[table %5d:%03d] 0x%llx\n",
+				(offset >> 20), i, data_daddr);
+#endif
+			table_ptr[i] = CATU_VALID_ENTRY(data_daddr);
+			offset += CATU_PAGE_SIZE;
+			/* Move the pointers for data pages */
+			s_dpidx = (s_dpidx + 1) % CATU_PAGES_PER_SYSPAGE;
+			if (s_dpidx == 0)
+				dpidx++;
+		}
+
+		/*
+		 * If we have finished all the valid entries, fill the rest of
+		 * the table (i.e, last table page) with invalid entries,
+		 * to fail the lookups.
+		 */
+		if (offset == buf_size)
+			catu_table_update_offset_range(table_ptr,
+						       offset - 1, SZ_1M, 0);
+
+		/*
+		 * Find the next table by looking up the table that contains
+		 * @offset. For the last table, this will return the very
+		 * first table (as the offset == buf_size, and thus returns
+		 * the table for offset = 0.)
+		 */
+		next_table = catu_get_table(catu_table, offset, &next_taddr);
+		table_ptr[CATU_LINK_PREV] = CATU_VALID_ENTRY(prev_taddr);
+		table_ptr[CATU_LINK_NEXT] = CATU_VALID_ENTRY(next_taddr);
+
+#ifdef CATU_DEBUG
+		dev_dbg(catu_table->dev,
+			"[table%5d]: Cur: 0x%llx Prev: 0x%llx, Next: 0x%llx\n",
+			(offset >> 20) - 1,  cur_taddr, prev_taddr, next_taddr);
+#endif
+
+		/* Update the prev/next addresses */
+		prev_taddr = cur_taddr;
+		cur_taddr = next_taddr;
+		table_ptr = next_table;
+	}
+}
+
+static struct tmc_sg_table __maybe_unused *
+catu_init_sg_table(struct device *catu_dev, int node,
+		   ssize_t size, void **pages)
+{
+	int nr_tpages;
+	struct tmc_sg_table *catu_table;
+
+	/*
+	 * Each table can address upto 1MB and we can have
+	 * CATU_PAGES_PER_SYSPAGE tables in a system page.
+	 */
+	nr_tpages = DIV_ROUND_UP(size, SZ_1M) / CATU_PAGES_PER_SYSPAGE;
+	catu_table = tmc_alloc_sg_table(catu_dev, node, nr_tpages,
+					size >> PAGE_SHIFT, pages);
+	if (IS_ERR(catu_table))
+		return catu_table;
+
+	catu_populate_table(catu_table);
+	/* Make the buf linear from offset 0 */
+	(void)catu_set_table(catu_table, 0, size);
+
+	dev_dbg(catu_dev,
+		"Setup table %p, size %ldKB, %d table pages\n",
+		catu_table, (unsigned long)size >> 10,  nr_tpages);
+	catu_dump_table(catu_table);
+	return catu_table;
+}
+
 coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
 coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
 coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds the support for setting up a SG table for use
by the CATU. We reuse the tmc_sg_table to represent the table/data
pages, even though the table format is different.

Similar to ETR SG table, CATU uses a 4KB page size for data buffers
as well as page tables. All table entries are 64bit wide and have
the following format:

        63                      12      1  0
        x-----------------------------------x
        |        Address [63-12] | SBZ  | V |
        x-----------------------------------x

	Where [V] ->	 0 - Pointer is invalid
			 1 - Pointer is Valid

CATU uses only first half of the page for data page pointers.
i.e, single table page will only have 256 page pointers, addressing
upto 1MB of data. The second half of a table page contains only two
pointers at the end of the page (i.e, pointers at index 510 and 511),
which are used as links to the "Previous" and "Next" page tables
respectively.

The first table page has an "Invalid" previous pointer and the
next pointer entry points to the second page table if there is one.
Similarly the last table page has an "Invalid" next pointer to
indicate the end of the table chain.

We create a circular buffer (i.e, first_table[prev] => last_table
and last_table[next] => first_table) by default and provide
helpers to make the buffer linear from a given offset. When we
set the buffer to linear, we also mark the "pointers" in the
outside the given "range" as invalid. We have to do this only
for the starting and ending tables, as we disconnect the other
table by invalidating the links. This will allow the ETR buf to
be restored from a given offset with any size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-catu.c | 409 +++++++++++++++++++++++++++
 1 file changed, 409 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
index 2cd69a6..4cc2928 100644
--- a/drivers/hwtracing/coresight/coresight-catu.c
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -16,10 +16,419 @@
 
 #include "coresight-catu.h"
 #include "coresight-priv.h"
+#include "coresight-tmc.h"
 
 #define csdev_to_catu_drvdata(csdev)	\
 	dev_get_drvdata(csdev->dev.parent)
 
+/*
+ * CATU uses a page size of 4KB for page tables as well as data pages.
+ * Each 64bit entry in the table has the following format.
+ *
+ *	63			12	1  0
+ *	------------------------------------
+ *	|	 Address [63-12] | SBZ	| V|
+ *	------------------------------------
+ *
+ * Where bit[0] V indicates if the address is valid or not.
+ * Each 4K table pages have upto 256 data page pointers, taking upto 2K
+ * size. There are two Link pointers, pointing to the previous and next
+ * table pages respectively at the end of the 4K page. (i.e, entry 510
+ * and 511).
+ *  E.g, a table of two pages could look like :
+ *
+ *                 Table Page 0               Table Page 1
+ * SLADDR ===> x------------------x  x--> x-----------------x
+ * INADDR    ->|  Page 0      | V |  |    | Page 256    | V | <- INADDR+1M
+ *             |------------------|  |    |-----------------|
+ * INADDR+4K ->|  Page 1      | V |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |  Page 2      | V |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |   ...        | V |  |    |    ...          |
+ *             |------------------|  |    |-----------------|
+ * INADDR+1020K|  Page 255    | V |  |    |   Page 511  | V |
+ * SLADDR+2K==>|------------------|  |    |-----------------|
+ *             |  UNUSED      |   |  |    |                 |
+ *             |------------------|  |    |                 |
+ *             |  UNUSED      |   |  |    |                 |
+ *             |------------------|  |    |                 |
+ *             |    ...       |   |  |    |                 |
+ *             |------------------|  |    |-----------------|
+ *             |   IGNORED    | 0 |  |    | Table Page 0| 1 |
+ *             |------------------|  |    |-----------------|
+ *             |  Table Page 1| 1 |--x    | IGNORED     | 0 |
+ *             x------------------x       x-----------------x
+ * SLADDR+4K==>
+ *
+ * The base input address (used by the ETR, programmed in INADDR_{LO,HI})
+ * must be aligned to 1MB (the size addressable by a single page table).
+ * The CATU maps INADDR{LO:HI} to the first page in the table pointed
+ * to by SLADDR{LO:HI} and so on.
+ *
+ */
+typedef u64 cate_t;
+
+#define CATU_PAGE_SHIFT		12
+#define CATU_PAGE_SIZE		(1UL << CATU_PAGE_SHIFT)
+#define CATU_PAGES_PER_SYSPAGE	(PAGE_SIZE / CATU_PAGE_SIZE)
+
+/* Page pointers are only allocated in the first 2K half */
+#define CATU_PTRS_PER_PAGE	((CATU_PAGE_SIZE >> 1) / sizeof(cate_t))
+#define CATU_PTRS_PER_SYSPAGE	(CATU_PAGES_PER_SYSPAGE * CATU_PTRS_PER_PAGE)
+#define CATU_LINK_PREV		((CATU_PAGE_SIZE / sizeof(cate_t)) - 2)
+#define CATU_LINK_NEXT		((CATU_PAGE_SIZE / sizeof(cate_t)) - 1)
+
+#define CATU_ADDR_SHIFT		12
+#define CATU_ADDR_MASK		~(((cate_t)1 << CATU_ADDR_SHIFT) - 1)
+#define CATU_ENTRY_VALID	((cate_t)0x1)
+#define CATU_ENTRY_INVALID	((cate_t)0)
+#define CATU_VALID_ENTRY(addr) \
+	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
+#define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
+
+/*
+ * Index into the CATU entry pointing to the page within
+ * the table. Each table entry can point to a 4KB page, with
+ * a total of 255 entries in the table adding upto 1MB per table.
+ *
+ * So, bits 19:12 gives you the index of the entry in
+ * the table.
+ */
+static inline unsigned long catu_offset_to_entry_idx(unsigned long offset)
+{
+	return (offset & (SZ_1M - 1)) >> 12;
+}
+
+static inline void catu_update_state(cate_t *catep, int valid)
+{
+	*catep &= ~CATU_ENTRY_VALID;
+	*catep |= valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
+}
+
+/*
+ * Update the valid bit for a given range of indices [start, end)
+ * in the given table @table.
+ */
+static inline void catu_update_state_range(cate_t *table, int start,
+						 int end, int valid)
+{
+	int i;
+	cate_t *pentry = &table[start];
+	cate_t state = valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
+
+	/* Limit the "end" to maximum range */
+	if (end > CATU_PTRS_PER_PAGE)
+		end = CATU_PTRS_PER_PAGE;
+
+	for (i = start; i < end; i++, pentry++) {
+		*pentry &= ~(cate_t)CATU_ENTRY_VALID;
+		*pentry |= state;
+	}
+}
+
+/*
+ * Update valid bit for all entries in the range [start, end)
+ */
+static inline void
+catu_table_update_offset_range(cate_t *table,
+			       unsigned long start,
+			       unsigned long end,
+			       int valid)
+{
+	catu_update_state_range(table,
+				catu_offset_to_entry_idx(start),
+				catu_offset_to_entry_idx(end),
+				valid);
+}
+
+static inline void catu_table_update_prev(cate_t *table, int valid)
+{
+	catu_update_state(&table[CATU_LINK_PREV], valid);
+}
+
+static inline void catu_table_update_next(cate_t *table, int valid)
+{
+	catu_update_state(&table[CATU_LINK_NEXT], valid);
+}
+
+/*
+ * catu_get_table : Retrieve the table pointers for the given @offset
+ * within the buffer. The buffer is wrapped around to a valid offset.
+ *
+ * Returns : The CPU virtual address for the beginning of the table
+ * containing the data page pointer for @offset. If @daddrp is not NULL,
+ * @daddrp points the DMA address of the beginning of the table.
+ */
+static inline cate_t *catu_get_table(struct tmc_sg_table *catu_table,
+				     unsigned long offset,
+				     dma_addr_t *daddrp)
+{
+	unsigned long buf_size = tmc_sg_table_buf_size(catu_table);
+	unsigned int table_nr, pg_idx, pg_offset;
+	struct tmc_pages *table_pages = &catu_table->table_pages;
+	void *ptr;
+
+	/* Make sure offset is within the range */
+	offset %= buf_size;
+
+	/*
+	 * Each table can address 1MB and a single kernel page can
+	 * contain "CATU_PAGES_PER_SYSPAGE" CATU tables.
+	 */
+	table_nr = offset >> 20;
+	/* Find the table page where the table_nr lies in */
+	pg_idx = table_nr / CATU_PAGES_PER_SYSPAGE;
+	pg_offset = (table_nr % CATU_PAGES_PER_SYSPAGE) * CATU_PAGE_SIZE;
+	if (daddrp)
+		*daddrp = table_pages->daddrs[pg_idx] + pg_offset;
+	ptr = page_address(table_pages->pages[pg_idx]);
+	return (cate_t *)((unsigned long)ptr + pg_offset);
+}
+
+#ifdef CATU_DEBUG
+static void catu_dump_table(struct tmc_sg_table *catu_table)
+{
+	int i;
+	cate_t *table;
+	unsigned long table_end, buf_size, offset = 0;
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	dev_dbg(catu_table->dev,
+		"Dump table %p, tdaddr: %llx\n",
+		catu_table, catu_table->table_daddr);
+
+	while (offset < buf_size) {
+		table_end = offset + SZ_1M < buf_size ?
+			    offset + SZ_1M : buf_size;
+		table = catu_get_table(catu_table, offset, NULL);
+		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
+			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
+		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
+			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
+		dev_dbg(catu_table->dev, "== End of sub-table ===");
+	}
+	dev_dbg(catu_table->dev, "== End of Table ===");
+}
+
+#else
+static inline void catu_dump_table(struct tmc_sg_table *catu_table)
+{
+}
+#endif
+
+/*
+ * catu_update_table: Update the start and end tables for the
+ * region [base, base + size) to, validate/invalidate the pointers
+ * outside the area.
+ *
+ * CATU expects the table base address (SLADDR) aligned to 4K.
+ * If the @base is not aligned to 1MB, we should mark all the
+ * pointers in the start table before @base "INVALID".
+ * Similarly all pointers in the last table beyond (@base + @size)
+ * should be marked INVALID.
+ * The table page containinig the "base" is marked first (by
+ * marking the previous link INVALID) and the table page
+ * containing "base + size" is marked last (by marking next
+ * link INVALID).
+ * By default we have to update the state of pointers
+ * for offsets in the range :
+ *    Start table: [0, ALIGN_DOWN(base))
+ *    End table  : [ALIGN(end + 1), SZ_1M)
+ * But, if we the buffer wraps around and ends in the same table
+ * as the "base", (i,e this should be :
+ *         [ALIGN(end + 1), base)
+ *
+ * Returns the dma_address for the start_table, which can be set as
+ * SLADDR.
+ */
+static dma_addr_t catu_update_table(struct tmc_sg_table *catu_table,
+				    u64 base, u64 size, int valid)
+{
+	cate_t *start_table, *end_table;
+	dma_addr_t taddr;
+	u64 buf_size, end = base + size - 1;
+	unsigned int start_off = 0;	/* Offset to begin in start_table */
+	unsigned int end_off = SZ_1M;	/* Offset to end in the end_table */
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	if (end > buf_size)
+		end -= buf_size;
+
+	/* Get both the virtual and the DMA address of the first table */
+	start_table = catu_get_table(catu_table, base, &taddr);
+	end_table = catu_get_table(catu_table, end, NULL);
+
+	/* Update the "PREV" link for the starting table */
+	catu_table_update_prev(start_table, valid);
+
+	/* Update the "NEXT" link only if this is not the start_table */
+	if (end_table != start_table) {
+		catu_table_update_next(end_table, valid);
+	} else if (end < base) {
+		/*
+		 * If the buffer has wrapped around and we have got the
+		 * "end" before "base" in the same table, we need to be
+		 * extra careful. We only need to invalidate the ptrs
+		 * in between the "end" and "base".
+		 */
+		start_off = ALIGN(end, CATU_PAGE_SIZE);
+		end_off = 0;
+	}
+
+	/* Update the pointers in the starting table before the "base" */
+	catu_table_update_offset_range(start_table,
+				       start_off,
+				       base,
+				       valid);
+	if (end_off)
+		catu_table_update_offset_range(end_table,
+					       end,
+					       end_off,
+					       valid);
+
+	catu_dump_table(catu_table);
+	return taddr;
+}
+
+/*
+ * catu_set_table : Set the buffer to act as linear buffer
+ * from @base of @size.
+ *
+ * Returns : The DMA address for the table containing base.
+ * This can then be programmed into SLADDR.
+ */
+static dma_addr_t
+catu_set_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
+{
+	/* Make all the entries outside this range invalid */
+	dma_addr_t sladdr =  catu_update_table(catu_table, base, size, 0);
+	/* Sync the changes to memory for CATU */
+	tmc_sg_table_sync_table(catu_table);
+	return sladdr;
+}
+
+static void __maybe_unused
+catu_reset_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
+{
+	/* Make all the entries outside this range valid */
+	(void)catu_update_table(catu_table, base, size, 1);
+}
+
+/*
+ * catu_populate_table : Populate the given CATU table.
+ * The table is always populated as a circular table.
+ * i.e, the "prev" link of the "first" table points to the "last"
+ * table and the "next" link of the "last" table points to the
+ * "first" table. The buffer should be made linear by calling
+ * catu_set_table().
+ */
+static void
+catu_populate_table(struct tmc_sg_table *catu_table)
+{
+	int i, dpidx, s_dpidx;
+	unsigned long offset, buf_size, last_offset;
+	dma_addr_t data_daddr;
+	dma_addr_t prev_taddr, next_taddr, cur_taddr;
+	cate_t *table_ptr, *next_table;
+
+	buf_size = tmc_sg_table_buf_size(catu_table);
+	dpidx = s_dpidx = 0;
+	offset = 0;
+
+	table_ptr = catu_get_table(catu_table, 0, &cur_taddr);
+	/*
+	 * Use the address of the "last" table as the "prev" link
+	 * for the first table.
+	 */
+	(void)catu_get_table(catu_table, buf_size - 1, &prev_taddr);
+
+	while (offset < buf_size) {
+		/*
+		 * The @offset is always 1M aligned here and we have an
+		 * empty table @table_ptr to fill. Each table can address
+		 * upto 1MB data buffer. The last table may have fewer
+		 * entries if the buffer size is not aligned.
+		 */
+		last_offset = (offset + SZ_1M) < buf_size ?
+			      (offset + SZ_1M) : buf_size;
+		for (i = 0; offset < last_offset; i++) {
+
+			data_daddr = catu_table->data_pages.daddrs[dpidx] +
+				     s_dpidx * CATU_PAGE_SIZE;
+#ifdef CATU_DEBUG
+			dev_dbg(catu_table->dev,
+				"[table %5d:%03d] 0x%llx\n",
+				(offset >> 20), i, data_daddr);
+#endif
+			table_ptr[i] = CATU_VALID_ENTRY(data_daddr);
+			offset += CATU_PAGE_SIZE;
+			/* Move the pointers for data pages */
+			s_dpidx = (s_dpidx + 1) % CATU_PAGES_PER_SYSPAGE;
+			if (s_dpidx == 0)
+				dpidx++;
+		}
+
+		/*
+		 * If we have finished all the valid entries, fill the rest of
+		 * the table (i.e, last table page) with invalid entries,
+		 * to fail the lookups.
+		 */
+		if (offset == buf_size)
+			catu_table_update_offset_range(table_ptr,
+						       offset - 1, SZ_1M, 0);
+
+		/*
+		 * Find the next table by looking up the table that contains
+		 * @offset. For the last table, this will return the very
+		 * first table (as the offset == buf_size, and thus returns
+		 * the table for offset = 0.)
+		 */
+		next_table = catu_get_table(catu_table, offset, &next_taddr);
+		table_ptr[CATU_LINK_PREV] = CATU_VALID_ENTRY(prev_taddr);
+		table_ptr[CATU_LINK_NEXT] = CATU_VALID_ENTRY(next_taddr);
+
+#ifdef CATU_DEBUG
+		dev_dbg(catu_table->dev,
+			"[table%5d]: Cur: 0x%llx Prev: 0x%llx, Next: 0x%llx\n",
+			(offset >> 20) - 1,  cur_taddr, prev_taddr, next_taddr);
+#endif
+
+		/* Update the prev/next addresses */
+		prev_taddr = cur_taddr;
+		cur_taddr = next_taddr;
+		table_ptr = next_table;
+	}
+}
+
+static struct tmc_sg_table __maybe_unused *
+catu_init_sg_table(struct device *catu_dev, int node,
+		   ssize_t size, void **pages)
+{
+	int nr_tpages;
+	struct tmc_sg_table *catu_table;
+
+	/*
+	 * Each table can address upto 1MB and we can have
+	 * CATU_PAGES_PER_SYSPAGE tables in a system page.
+	 */
+	nr_tpages = DIV_ROUND_UP(size, SZ_1M) / CATU_PAGES_PER_SYSPAGE;
+	catu_table = tmc_alloc_sg_table(catu_dev, node, nr_tpages,
+					size >> PAGE_SHIFT, pages);
+	if (IS_ERR(catu_table))
+		return catu_table;
+
+	catu_populate_table(catu_table);
+	/* Make the buf linear from offset 0 */
+	(void)catu_set_table(catu_table, 0, size);
+
+	dev_dbg(catu_dev,
+		"Setup table %p, size %ldKB, %d table pages\n",
+		catu_table, (unsigned long)size >> 10,  nr_tpages);
+	catu_dump_table(catu_table);
+	return catu_table;
+}
+
 coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
 coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
 coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Now that we can use a CATU with a scatter gather table, add support
for the TMC ETR to make use of the connected CATU in translate mode.
This is done by adding CATU as new buffer mode. CATU's SLADDR must
always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
and we adjust the DBA for the ETR to align to the "offset" within
the 1MB page.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-catu.c    | 189 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-catu.h    |  30 ++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c |  23 ++-
 drivers/hwtracing/coresight/coresight-tmc.h     |   1 +
 4 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
index 4cc2928..a4792fa 100644
--- a/drivers/hwtracing/coresight/coresight-catu.c
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -22,6 +22,21 @@
 	dev_get_drvdata(csdev->dev.parent)
 
 /*
+ * catu_etr_buf		- CATU buffer descriptor
+ * @catu_table		- SG table for the CATU
+ * @sladdr		- Table base address for CATU
+ * @start_offset	- Current offset where the ETR starts writing
+ *			  within the buffer.
+ * @cur_size		- Current size used by the ETR.
+ */
+struct catu_etr_buf {
+	struct tmc_sg_table	*catu_table;
+	u64			sladdr;
+	u64			start_offset;
+	u64			cur_size;
+};
+
+/*
  * CATU uses a page size of 4KB for page tables as well as data pages.
  * Each 64bit entry in the table has the following format.
  *
@@ -87,6 +102,9 @@ typedef u64 cate_t;
 	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
 #define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
 
+/* CATU expects the INADDR to be aligned to 1M. */
+#define CATU_DEFAULT_INADDR	(1ULL << 20)
+
 /*
  * Index into the CATU entry pointing to the page within
  * the table. Each table entry can point to a 4KB page, with
@@ -401,7 +419,7 @@ catu_populate_table(struct tmc_sg_table *catu_table)
 	}
 }
 
-static struct tmc_sg_table __maybe_unused *
+static struct tmc_sg_table *
 catu_init_sg_table(struct device *catu_dev, int node,
 		   ssize_t size, void **pages)
 {
@@ -429,6 +447,149 @@ catu_init_sg_table(struct device *catu_dev, int node,
 	return catu_table;
 }
 
+static void catu_free_etr_buf(struct etr_buf *etr_buf)
+{
+	struct catu_etr_buf *catu_buf;
+
+	if (!etr_buf || etr_buf->mode != ETR_MODE_CATU || !etr_buf->private)
+		return;
+	catu_buf = etr_buf->private;
+	tmc_free_sg_table(catu_buf->catu_table);
+	kfree(catu_buf);
+}
+
+static ssize_t catu_get_data_etr_buf(struct etr_buf *etr_buf, u64 offset,
+				     size_t len, char **bufpp)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+
+	return tmc_sg_table_get_data(catu_buf->catu_table, offset, len, bufpp);
+}
+
+static void catu_sync_etr_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+	s64 r_offset, w_offset;
+	unsigned long buf_size = tmc_sg_table_buf_size(catu_buf->catu_table);
+
+	/*
+	 * ETR started off at etr_buf->hwaddr which corresponds to
+	 * start_offset within the trace buffer. Convert the RRP/RWP
+	 * offsets within the trace buffer.
+	 */
+	r_offset = (s64)rrp - etr_buf->hwaddr + catu_buf->start_offset;
+	r_offset -= (r_offset > buf_size) ? buf_size : 0;
+
+	w_offset = (s64)rwp - etr_buf->hwaddr + catu_buf->start_offset;
+	w_offset -= (w_offset > buf_size) ? buf_size : 0;
+
+	if (!etr_buf->full) {
+		etr_buf->len = w_offset - r_offset;
+		if (w_offset < r_offset)
+			etr_buf->len += buf_size;
+	} else {
+		etr_buf->len = etr_buf->size;
+	}
+
+	etr_buf->offset = r_offset;
+	tmc_sg_table_sync_data_range(catu_buf->catu_table,
+				     r_offset, etr_buf->len);
+}
+
+static inline void catu_set_etr_buf(struct etr_buf *etr_buf, u64 base, u64 size)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+
+	catu_buf->start_offset = base;
+	catu_buf->cur_size = size;
+
+	/*
+	 * CATU always maps 1MB aligned addresses. ETR should start at
+	 * the offset within the first table.
+	 */
+	etr_buf->hwaddr = CATU_DEFAULT_INADDR + (base & (SZ_1M - 1));
+	etr_buf->size = size;
+	etr_buf->rwp = etr_buf->rrp = etr_buf->hwaddr;
+}
+
+static int catu_restore_etr_buf(struct etr_buf *etr_buf,
+				unsigned long r_offset,
+				unsigned long w_offset,
+				unsigned long size,
+				u32 status,
+				bool has_save_restore)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+	u64 end = w_offset + size;
+	u64 cur_end = catu_buf->start_offset + catu_buf->cur_size;
+
+	/*
+	 * We cannot support rotation without a full table
+	 * at the end. i.e, the buffer size should be aligned
+	 * to 1MB.
+	 */
+	if (tmc_sg_table_buf_size(catu_buf->catu_table) & (SZ_1M - 1))
+		return -EINVAL;
+
+	/*
+	 * We don't have to make any changes to the table if the
+	 * current (start, end) and the new (start, end) are in the
+	 * same pages respectively.
+	 */
+	if ((w_offset ^ catu_buf->start_offset) & ~(CATU_PAGE_SIZE - 1) ||
+	    (end ^ cur_end) & ~(CATU_PAGE_SIZE - 1)) {
+		catu_reset_table(catu_buf->catu_table, catu_buf->start_offset,
+				 catu_buf->cur_size);
+		catu_buf->sladdr = catu_set_table(catu_buf->catu_table,
+						  w_offset, size);
+	}
+
+	catu_set_etr_buf(etr_buf, w_offset, size);
+
+	return 0;
+}
+
+static int catu_alloc_etr_buf(struct tmc_drvdata *tmc_drvdata,
+			      struct etr_buf *etr_buf, int node, void **pages)
+{
+	struct coresight_device *csdev;
+	struct device *catu_dev;
+	struct tmc_sg_table *catu_table;
+	struct catu_etr_buf *catu_buf;
+
+	csdev = tmc_etr_get_catu_device(tmc_drvdata);
+	if (!csdev)
+		return -ENODEV;
+	catu_dev = csdev->dev.parent;
+	catu_buf = kzalloc(sizeof(*catu_buf), GFP_KERNEL);
+	if (!catu_buf)
+		return -ENOMEM;
+
+	catu_table = catu_init_sg_table(catu_dev, node, etr_buf->size, pages);
+	if (IS_ERR(catu_table)) {
+		kfree(catu_buf);
+		return PTR_ERR(catu_table);
+	}
+
+	etr_buf->mode = ETR_MODE_CATU;
+	etr_buf->private = catu_buf;
+	catu_buf->catu_table = catu_table;
+
+	/* By default make the buffer linear from 0 with full size */
+	catu_set_etr_buf(etr_buf, 0, etr_buf->size);
+	catu_dump_table(catu_table);
+
+	return 0;
+}
+
+const struct etr_buf_operations etr_catu_buf_ops = {
+	.alloc = catu_alloc_etr_buf,
+	.free = catu_free_etr_buf,
+	.sync = catu_sync_etr_buf,
+	.get_data = catu_get_data_etr_buf,
+	.restore = catu_restore_etr_buf,
+};
+
 coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
 coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
 coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
@@ -467,9 +628,11 @@ static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
 				 CATU_STATUS, CATU_STATUS_READY, 1);
 }
 
-static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
+static int catu_enable_hw(struct catu_drvdata *drvdata, void *data)
 {
 	u32 control;
+	u32 mode;
+	struct etr_buf *etr_buf = data;
 
 	if (catu_wait_for_ready(drvdata))
 		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
@@ -481,9 +644,27 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
 	}
 
 	control |= BIT(CATU_CONTROL_ENABLE);
-	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
+
+	if (etr_buf && etr_buf->mode == ETR_MODE_CATU) {
+		struct catu_etr_buf *catu_buf = etr_buf->private;
+
+		mode = CATU_MODE_TRANSLATE;
+		catu_write_axictrl(drvdata, CATU_OS_AXICTRL);
+		catu_write_sladdr(drvdata, catu_buf->sladdr);
+		catu_write_inaddr(drvdata, CATU_DEFAULT_INADDR);
+	} else {
+		mode = CATU_MODE_PASS_THROUGH;
+		catu_write_sladdr(drvdata, 0);
+		catu_write_inaddr(drvdata, 0);
+	}
+
+	catu_write_irqen(drvdata, 0);
+	catu_write_mode(drvdata, mode);
 	catu_write_control(drvdata, control);
-	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
+	dev_dbg(drvdata->dev, "Enabled in %s mode\n",
+		(mode == CATU_MODE_PASS_THROUGH) ?
+		"Pass through" :
+		"Translate");
 	return 0;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
index cd58d6f..b673a73 100644
--- a/drivers/hwtracing/coresight/coresight-catu.h
+++ b/drivers/hwtracing/coresight/coresight-catu.h
@@ -29,6 +29,32 @@
 #define CATU_MODE_PASS_THROUGH	0U
 #define CATU_MODE_TRANSLATE	1U
 
+#define CATU_AXICTRL_ARCACHE_SHIFT	4
+#define CATU_AXICTRL_ARCACHE_MASK	0xf
+#define CATU_AXICTRL_ARPROT_MASK	0x3
+#define CATU_AXICTRL_ARCACHE(arcache)		\
+	(((arcache) & CATU_AXICTRL_ARCACHE_MASK) << CATU_AXICTRL_ARCACHE_SHIFT)
+
+#define CATU_AXICTRL_VAL(arcache, arprot)	\
+	(CATU_AXICTRL_ARCACHE(arcache) | ((arprot) & CATU_AXICTRL_ARPROT_MASK))
+
+#define AXI3_AxCACHE_WB_READ_ALLOC	0x7
+/*
+ * AXI - ARPROT bits:
+ * See AMBA AXI & ACE Protocol specification (ARM IHI 0022E)
+ * sectionA4.7 Access Permissions.
+ *
+ * Bit 0: 0 - Unprivileged access, 1 - Privileged access
+ * Bit 1: 0 - Secure access, 1 - Non-secure access.
+ * Bit 2: 0 - Data access, 1 - instruction access.
+ *
+ * CATU AXICTRL:ARPROT[2] is res0 as we always access data.
+ */
+#define CATU_OS_ARPROT			0x2
+
+#define CATU_OS_AXICTRL		\
+	CATU_AXICTRL_VAL(AXI3_AxCACHE_WB_READ_ALLOC, CATU_OS_ARPROT)
+
 #define CATU_STATUS_READY	8
 #define CATU_STATUS_ADRERR	0
 #define CATU_STATUS_AXIERR	4
@@ -71,6 +97,8 @@ catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
 
 CATU_REG32(control, CATU_CONTROL);
 CATU_REG32(mode, CATU_MODE);
+CATU_REG32(irqen, CATU_IRQEN);
+CATU_REG32(axictrl, CATU_AXICTRL);
 CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
 CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
 
@@ -86,4 +114,6 @@ static inline bool coresight_is_catu_device(struct coresight_device *csdev)
 	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
 }
 
+extern const struct etr_buf_operations etr_catu_buf_ops;
+
 #endif
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 25e7feb..41dde0a 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
 static const struct etr_buf_operations *etr_buf_ops[] = {
 	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
 	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+#ifdef CONFIG_CORESIGHT_CATU
+	[ETR_MODE_CATU] = &etr_catu_buf_ops,
+#endif
 };
 
 static inline int tmc_etr_mode_alloc_buf(int mode,
@@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
 	switch (mode) {
 	case ETR_MODE_FLAT:
 	case ETR_MODE_ETR_SG:
+#ifdef CONFIG_CORESIGHT_CATU
+	case ETR_MODE_CATU:
+#endif
 		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
 		if (!rc)
 			etr_buf->ops = etr_buf_ops[mode];
@@ -977,11 +983,15 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	int rc = -ENOMEM;
 	bool has_etr_sg, has_iommu;
 	bool has_flat, has_save_restore;
+	bool has_sg, has_catu;
 	struct etr_buf *etr_buf;
 
 	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
 	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
 	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+	has_catu = !!tmc_etr_get_catu_device(drvdata);
+
+	has_sg = has_catu || has_etr_sg;
 
 	/*
 	 * We can normally use flat DMA buffer provided that the buffer
@@ -1006,7 +1016,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
 		return ERR_PTR(-EINVAL);
 
-	if (!has_flat && !has_etr_sg) {
+	if (!has_flat && !has_sg) {
 		dev_dbg(drvdata->dev,
 			"No available backends for ETR buffer with flags %x\n",
 			flags);
@@ -1032,17 +1042,22 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	 *
 	 */
 	if (!pages && has_flat &&
-	    (!has_etr_sg || has_iommu || size < SZ_1M))
+	    (!has_sg || has_iommu || size < SZ_1M))
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
 					    etr_buf, node, pages);
 	if (rc && has_etr_sg)
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
 					    etr_buf, node, pages);
+	if (rc && has_catu)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_CATU, drvdata,
+					    etr_buf, node, pages);
 	if (rc) {
 		kfree(etr_buf);
 		return ERR_PTR(rc);
 	}
 
+	dev_dbg(drvdata->dev, "allocated buffer of size %ldKB in mode %d\n",
+		(unsigned long)size >> 10, etr_buf->mode);
 	return etr_buf;
 }
 
@@ -1136,7 +1151,7 @@ static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
 
 	if (catu && helper_ops(catu)->enable)
-		helper_ops(catu)->enable(catu, NULL);
+		helper_ops(catu)->enable(catu, drvdata->etr_buf);
 }
 
 static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
@@ -1144,7 +1159,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
 
 	if (catu && helper_ops(catu)->disable)
-		helper_ops(catu)->disable(catu, NULL);
+		helper_ops(catu)->disable(catu, drvdata->etr_buf);
 }
 
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 1bdfb38..1f6aa49 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -139,6 +139,7 @@ enum tmc_mem_intf_width {
 enum etr_mode {
 	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
 	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+	ETR_MODE_CATU,		/* Use SG mechanism in CATU */
 };
 
 /* ETR buffer should support save-restore */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we can use a CATU with a scatter gather table, add support
for the TMC ETR to make use of the connected CATU in translate mode.
This is done by adding CATU as new buffer mode. CATU's SLADDR must
always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
and we adjust the DBA for the ETR to align to the "offset" within
the 1MB page.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-catu.c    | 189 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-catu.h    |  30 ++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c |  23 ++-
 drivers/hwtracing/coresight/coresight-tmc.h     |   1 +
 4 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
index 4cc2928..a4792fa 100644
--- a/drivers/hwtracing/coresight/coresight-catu.c
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@@ -22,6 +22,21 @@
 	dev_get_drvdata(csdev->dev.parent)
 
 /*
+ * catu_etr_buf		- CATU buffer descriptor
+ * @catu_table		- SG table for the CATU
+ * @sladdr		- Table base address for CATU
+ * @start_offset	- Current offset where the ETR starts writing
+ *			  within the buffer.
+ * @cur_size		- Current size used by the ETR.
+ */
+struct catu_etr_buf {
+	struct tmc_sg_table	*catu_table;
+	u64			sladdr;
+	u64			start_offset;
+	u64			cur_size;
+};
+
+/*
  * CATU uses a page size of 4KB for page tables as well as data pages.
  * Each 64bit entry in the table has the following format.
  *
@@ -87,6 +102,9 @@ typedef u64 cate_t;
 	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
 #define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
 
+/* CATU expects the INADDR to be aligned to 1M. */
+#define CATU_DEFAULT_INADDR	(1ULL << 20)
+
 /*
  * Index into the CATU entry pointing to the page within
  * the table. Each table entry can point to a 4KB page, with
@@ -401,7 +419,7 @@ catu_populate_table(struct tmc_sg_table *catu_table)
 	}
 }
 
-static struct tmc_sg_table __maybe_unused *
+static struct tmc_sg_table *
 catu_init_sg_table(struct device *catu_dev, int node,
 		   ssize_t size, void **pages)
 {
@@ -429,6 +447,149 @@ catu_init_sg_table(struct device *catu_dev, int node,
 	return catu_table;
 }
 
+static void catu_free_etr_buf(struct etr_buf *etr_buf)
+{
+	struct catu_etr_buf *catu_buf;
+
+	if (!etr_buf || etr_buf->mode != ETR_MODE_CATU || !etr_buf->private)
+		return;
+	catu_buf = etr_buf->private;
+	tmc_free_sg_table(catu_buf->catu_table);
+	kfree(catu_buf);
+}
+
+static ssize_t catu_get_data_etr_buf(struct etr_buf *etr_buf, u64 offset,
+				     size_t len, char **bufpp)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+
+	return tmc_sg_table_get_data(catu_buf->catu_table, offset, len, bufpp);
+}
+
+static void catu_sync_etr_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+	s64 r_offset, w_offset;
+	unsigned long buf_size = tmc_sg_table_buf_size(catu_buf->catu_table);
+
+	/*
+	 * ETR started off at etr_buf->hwaddr which corresponds to
+	 * start_offset within the trace buffer. Convert the RRP/RWP
+	 * offsets within the trace buffer.
+	 */
+	r_offset = (s64)rrp - etr_buf->hwaddr + catu_buf->start_offset;
+	r_offset -= (r_offset > buf_size) ? buf_size : 0;
+
+	w_offset = (s64)rwp - etr_buf->hwaddr + catu_buf->start_offset;
+	w_offset -= (w_offset > buf_size) ? buf_size : 0;
+
+	if (!etr_buf->full) {
+		etr_buf->len = w_offset - r_offset;
+		if (w_offset < r_offset)
+			etr_buf->len += buf_size;
+	} else {
+		etr_buf->len = etr_buf->size;
+	}
+
+	etr_buf->offset = r_offset;
+	tmc_sg_table_sync_data_range(catu_buf->catu_table,
+				     r_offset, etr_buf->len);
+}
+
+static inline void catu_set_etr_buf(struct etr_buf *etr_buf, u64 base, u64 size)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+
+	catu_buf->start_offset = base;
+	catu_buf->cur_size = size;
+
+	/*
+	 * CATU always maps 1MB aligned addresses. ETR should start at
+	 * the offset within the first table.
+	 */
+	etr_buf->hwaddr = CATU_DEFAULT_INADDR + (base & (SZ_1M - 1));
+	etr_buf->size = size;
+	etr_buf->rwp = etr_buf->rrp = etr_buf->hwaddr;
+}
+
+static int catu_restore_etr_buf(struct etr_buf *etr_buf,
+				unsigned long r_offset,
+				unsigned long w_offset,
+				unsigned long size,
+				u32 status,
+				bool has_save_restore)
+{
+	struct catu_etr_buf *catu_buf = etr_buf->private;
+	u64 end = w_offset + size;
+	u64 cur_end = catu_buf->start_offset + catu_buf->cur_size;
+
+	/*
+	 * We cannot support rotation without a full table
+	 * at the end. i.e, the buffer size should be aligned
+	 * to 1MB.
+	 */
+	if (tmc_sg_table_buf_size(catu_buf->catu_table) & (SZ_1M - 1))
+		return -EINVAL;
+
+	/*
+	 * We don't have to make any changes to the table if the
+	 * current (start, end) and the new (start, end) are in the
+	 * same pages respectively.
+	 */
+	if ((w_offset ^ catu_buf->start_offset) & ~(CATU_PAGE_SIZE - 1) ||
+	    (end ^ cur_end) & ~(CATU_PAGE_SIZE - 1)) {
+		catu_reset_table(catu_buf->catu_table, catu_buf->start_offset,
+				 catu_buf->cur_size);
+		catu_buf->sladdr = catu_set_table(catu_buf->catu_table,
+						  w_offset, size);
+	}
+
+	catu_set_etr_buf(etr_buf, w_offset, size);
+
+	return 0;
+}
+
+static int catu_alloc_etr_buf(struct tmc_drvdata *tmc_drvdata,
+			      struct etr_buf *etr_buf, int node, void **pages)
+{
+	struct coresight_device *csdev;
+	struct device *catu_dev;
+	struct tmc_sg_table *catu_table;
+	struct catu_etr_buf *catu_buf;
+
+	csdev = tmc_etr_get_catu_device(tmc_drvdata);
+	if (!csdev)
+		return -ENODEV;
+	catu_dev = csdev->dev.parent;
+	catu_buf = kzalloc(sizeof(*catu_buf), GFP_KERNEL);
+	if (!catu_buf)
+		return -ENOMEM;
+
+	catu_table = catu_init_sg_table(catu_dev, node, etr_buf->size, pages);
+	if (IS_ERR(catu_table)) {
+		kfree(catu_buf);
+		return PTR_ERR(catu_table);
+	}
+
+	etr_buf->mode = ETR_MODE_CATU;
+	etr_buf->private = catu_buf;
+	catu_buf->catu_table = catu_table;
+
+	/* By default make the buffer linear from 0 with full size */
+	catu_set_etr_buf(etr_buf, 0, etr_buf->size);
+	catu_dump_table(catu_table);
+
+	return 0;
+}
+
+const struct etr_buf_operations etr_catu_buf_ops = {
+	.alloc = catu_alloc_etr_buf,
+	.free = catu_free_etr_buf,
+	.sync = catu_sync_etr_buf,
+	.get_data = catu_get_data_etr_buf,
+	.restore = catu_restore_etr_buf,
+};
+
 coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
 coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
 coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
@@ -467,9 +628,11 @@ static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
 				 CATU_STATUS, CATU_STATUS_READY, 1);
 }
 
-static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
+static int catu_enable_hw(struct catu_drvdata *drvdata, void *data)
 {
 	u32 control;
+	u32 mode;
+	struct etr_buf *etr_buf = data;
 
 	if (catu_wait_for_ready(drvdata))
 		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
@@ -481,9 +644,27 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
 	}
 
 	control |= BIT(CATU_CONTROL_ENABLE);
-	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
+
+	if (etr_buf && etr_buf->mode == ETR_MODE_CATU) {
+		struct catu_etr_buf *catu_buf = etr_buf->private;
+
+		mode = CATU_MODE_TRANSLATE;
+		catu_write_axictrl(drvdata, CATU_OS_AXICTRL);
+		catu_write_sladdr(drvdata, catu_buf->sladdr);
+		catu_write_inaddr(drvdata, CATU_DEFAULT_INADDR);
+	} else {
+		mode = CATU_MODE_PASS_THROUGH;
+		catu_write_sladdr(drvdata, 0);
+		catu_write_inaddr(drvdata, 0);
+	}
+
+	catu_write_irqen(drvdata, 0);
+	catu_write_mode(drvdata, mode);
 	catu_write_control(drvdata, control);
-	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
+	dev_dbg(drvdata->dev, "Enabled in %s mode\n",
+		(mode == CATU_MODE_PASS_THROUGH) ?
+		"Pass through" :
+		"Translate");
 	return 0;
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
index cd58d6f..b673a73 100644
--- a/drivers/hwtracing/coresight/coresight-catu.h
+++ b/drivers/hwtracing/coresight/coresight-catu.h
@@ -29,6 +29,32 @@
 #define CATU_MODE_PASS_THROUGH	0U
 #define CATU_MODE_TRANSLATE	1U
 
+#define CATU_AXICTRL_ARCACHE_SHIFT	4
+#define CATU_AXICTRL_ARCACHE_MASK	0xf
+#define CATU_AXICTRL_ARPROT_MASK	0x3
+#define CATU_AXICTRL_ARCACHE(arcache)		\
+	(((arcache) & CATU_AXICTRL_ARCACHE_MASK) << CATU_AXICTRL_ARCACHE_SHIFT)
+
+#define CATU_AXICTRL_VAL(arcache, arprot)	\
+	(CATU_AXICTRL_ARCACHE(arcache) | ((arprot) & CATU_AXICTRL_ARPROT_MASK))
+
+#define AXI3_AxCACHE_WB_READ_ALLOC	0x7
+/*
+ * AXI - ARPROT bits:
+ * See AMBA AXI & ACE Protocol specification (ARM IHI 0022E)
+ * sectionA4.7 Access Permissions.
+ *
+ * Bit 0: 0 - Unprivileged access, 1 - Privileged access
+ * Bit 1: 0 - Secure access, 1 - Non-secure access.
+ * Bit 2: 0 - Data access, 1 - instruction access.
+ *
+ * CATU AXICTRL:ARPROT[2] is res0 as we always access data.
+ */
+#define CATU_OS_ARPROT			0x2
+
+#define CATU_OS_AXICTRL		\
+	CATU_AXICTRL_VAL(AXI3_AxCACHE_WB_READ_ALLOC, CATU_OS_ARPROT)
+
 #define CATU_STATUS_READY	8
 #define CATU_STATUS_ADRERR	0
 #define CATU_STATUS_AXIERR	4
@@ -71,6 +97,8 @@ catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
 
 CATU_REG32(control, CATU_CONTROL);
 CATU_REG32(mode, CATU_MODE);
+CATU_REG32(irqen, CATU_IRQEN);
+CATU_REG32(axictrl, CATU_AXICTRL);
 CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
 CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
 
@@ -86,4 +114,6 @@ static inline bool coresight_is_catu_device(struct coresight_device *csdev)
 	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
 }
 
+extern const struct etr_buf_operations etr_catu_buf_ops;
+
 #endif
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 25e7feb..41dde0a 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
 static const struct etr_buf_operations *etr_buf_ops[] = {
 	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
 	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+#ifdef CONFIG_CORESIGHT_CATU
+	[ETR_MODE_CATU] = &etr_catu_buf_ops,
+#endif
 };
 
 static inline int tmc_etr_mode_alloc_buf(int mode,
@@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
 	switch (mode) {
 	case ETR_MODE_FLAT:
 	case ETR_MODE_ETR_SG:
+#ifdef CONFIG_CORESIGHT_CATU
+	case ETR_MODE_CATU:
+#endif
 		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
 		if (!rc)
 			etr_buf->ops = etr_buf_ops[mode];
@@ -977,11 +983,15 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	int rc = -ENOMEM;
 	bool has_etr_sg, has_iommu;
 	bool has_flat, has_save_restore;
+	bool has_sg, has_catu;
 	struct etr_buf *etr_buf;
 
 	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
 	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
 	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
+	has_catu = !!tmc_etr_get_catu_device(drvdata);
+
+	has_sg = has_catu || has_etr_sg;
 
 	/*
 	 * We can normally use flat DMA buffer provided that the buffer
@@ -1006,7 +1016,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
 		return ERR_PTR(-EINVAL);
 
-	if (!has_flat && !has_etr_sg) {
+	if (!has_flat && !has_sg) {
 		dev_dbg(drvdata->dev,
 			"No available backends for ETR buffer with flags %x\n",
 			flags);
@@ -1032,17 +1042,22 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
 	 *
 	 */
 	if (!pages && has_flat &&
-	    (!has_etr_sg || has_iommu || size < SZ_1M))
+	    (!has_sg || has_iommu || size < SZ_1M))
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
 					    etr_buf, node, pages);
 	if (rc && has_etr_sg)
 		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
 					    etr_buf, node, pages);
+	if (rc && has_catu)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_CATU, drvdata,
+					    etr_buf, node, pages);
 	if (rc) {
 		kfree(etr_buf);
 		return ERR_PTR(rc);
 	}
 
+	dev_dbg(drvdata->dev, "allocated buffer of size %ldKB in mode %d\n",
+		(unsigned long)size >> 10, etr_buf->mode);
 	return etr_buf;
 }
 
@@ -1136,7 +1151,7 @@ static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
 
 	if (catu && helper_ops(catu)->enable)
-		helper_ops(catu)->enable(catu, NULL);
+		helper_ops(catu)->enable(catu, drvdata->etr_buf);
 }
 
 static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
@@ -1144,7 +1159,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
 
 	if (catu && helper_ops(catu)->disable)
-		helper_ops(catu)->disable(catu, NULL);
+		helper_ops(catu)->disable(catu, drvdata->etr_buf);
 }
 
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 1bdfb38..1f6aa49 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -139,6 +139,7 @@ enum tmc_mem_intf_width {
 enum etr_mode {
 	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
 	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+	ETR_MODE_CATU,		/* Use SG mechanism in CATU */
 };
 
 /* ETR buffer should support save-restore */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 20/27] coresight: tmc: Add configuration support for trace buffer size
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Now that we can dynamically switch between contiguous memory and
SG table depending on the trace buffer size, provide the support
for selecting an appropriate buffer size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |  8 ++++++
 drivers/hwtracing/coresight/coresight-tmc.c        | 33 ++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
index 4fe677e..ea78714 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
@@ -83,3 +83,11 @@ KernelVersion:	4.7
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(R) Indicates the capabilities of the Coresight TMC.
 		The value is read directly from the DEVID register, 0xFC8,
+
+What:		/sys/bus/coresight/devices/<memory_map>.tmc/buffer_size
+Date:		August 2018
+KernelVersion:	4.18
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
+		mode. Writable only for TMC-ETR configurations. The value
+		should be aligned to the kernel pagesize.
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index c7bc681..4d41b4b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -288,8 +288,41 @@ static ssize_t trigger_cntr_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(trigger_cntr);
 
+static ssize_t buffer_size_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%#x\n", drvdata->size);
+}
+
+static ssize_t buffer_size_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	/* Only permitted for TMC-ETRs */
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+	/* The buffer size should be page aligned */
+	if (val & (PAGE_SIZE - 1))
+		return -EINVAL;
+	drvdata->size = val;
+	return size;
+}
+
+static DEVICE_ATTR_RW(buffer_size);
+
 static struct attribute *coresight_tmc_attrs[] = {
 	&dev_attr_trigger_cntr.attr,
+	&dev_attr_buffer_size.attr,
 	NULL,
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 20/27] coresight: tmc: Add configuration support for trace buffer size
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we can dynamically switch between contiguous memory and
SG table depending on the trace buffer size, provide the support
for selecting an appropriate buffer size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |  8 ++++++
 drivers/hwtracing/coresight/coresight-tmc.c        | 33 ++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
index 4fe677e..ea78714 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
@@ -83,3 +83,11 @@ KernelVersion:	4.7
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(R) Indicates the capabilities of the Coresight TMC.
 		The value is read directly from the DEVID register, 0xFC8,
+
+What:		/sys/bus/coresight/devices/<memory_map>.tmc/buffer_size
+Date:		August 2018
+KernelVersion:	4.18
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
+		mode. Writable only for TMC-ETR configurations. The value
+		should be aligned to the kernel pagesize.
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index c7bc681..4d41b4b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -288,8 +288,41 @@ static ssize_t trigger_cntr_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(trigger_cntr);
 
+static ssize_t buffer_size_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%#x\n", drvdata->size);
+}
+
+static ssize_t buffer_size_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	/* Only permitted for TMC-ETRs */
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+	/* The buffer size should be page aligned */
+	if (val & (PAGE_SIZE - 1))
+		return -EINVAL;
+	drvdata->size = val;
+	return size;
+}
+
+static DEVICE_ATTR_RW(buffer_size);
+
 static struct attribute *coresight_tmc_attrs[] = {
 	&dev_attr_trigger_cntr.attr,
+	&dev_attr_buffer_size.attr,
 	NULL,
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Convert component enable/disable messages from dev_info to dev_dbg.
This is required to prevent LOCKDEP splats when operating in perf
mode where we could be called with locks held to enable a coresight
path. If someone wants to really see the messages, they can always
enable it at runtime via dynamic_debug.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-dynamic-replicator.c | 4 ++--
 drivers/hwtracing/coresight/coresight-etb10.c              | 6 +++---
 drivers/hwtracing/coresight/coresight-etm3x.c              | 4 ++--
 drivers/hwtracing/coresight/coresight-etm4x.c              | 4 ++--
 drivers/hwtracing/coresight/coresight-funnel.c             | 4 ++--
 drivers/hwtracing/coresight/coresight-replicator.c         | 4 ++--
 drivers/hwtracing/coresight/coresight-stm.c                | 4 ++--
 drivers/hwtracing/coresight/coresight-tmc-etf.c            | 8 ++++----
 drivers/hwtracing/coresight/coresight-tmc-etr.c            | 4 ++--
 drivers/hwtracing/coresight/coresight-tmc.c                | 4 ++--
 drivers/hwtracing/coresight/coresight-tpiu.c               | 4 ++--
 11 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
index 043da86..c41d95c 100644
--- a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
@@ -64,7 +64,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
 
 	CS_LOCK(drvdata->base);
 
-	dev_info(drvdata->dev, "REPLICATOR enabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
 	return 0;
 }
 
@@ -83,7 +83,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
 
 	CS_LOCK(drvdata->base);
 
-	dev_info(drvdata->dev, "REPLICATOR disabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
 }
 
 static const struct coresight_ops_link replicator_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 74232e6..d9c2f87 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -163,7 +163,7 @@ static int etb_enable(struct coresight_device *csdev, u32 mode)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 out:
-	dev_info(drvdata->dev, "ETB enabled\n");
+	dev_dbg(drvdata->dev, "ETB enabled\n");
 	return 0;
 }
 
@@ -269,7 +269,7 @@ static void etb_disable(struct coresight_device *csdev)
 
 	local_set(&drvdata->mode, CS_MODE_DISABLED);
 
-	dev_info(drvdata->dev, "ETB disabled\n");
+	dev_dbg(drvdata->dev, "ETB disabled\n");
 }
 
 static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
@@ -512,7 +512,7 @@ static void etb_dump(struct etb_drvdata *drvdata)
 	}
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "ETB dumped\n");
+	dev_dbg(drvdata->dev, "ETB dumped\n");
 }
 
 static int etb_open(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c
index 39f42fd..9d4a663 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x.c
@@ -510,7 +510,7 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
 	drvdata->sticky_enable = true;
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "ETM tracing enabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
 	return 0;
 
 err:
@@ -613,7 +613,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
 
-	dev_info(drvdata->dev, "ETM tracing disabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
 }
 
 static void etm_disable(struct coresight_device *csdev,
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index e84d80b..c9c73c2 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -274,7 +274,7 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
 	drvdata->sticky_enable = true;
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "ETM tracing enabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
 	return 0;
 
 err:
@@ -387,7 +387,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
 
-	dev_info(drvdata->dev, "ETM tracing disabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
 }
 
 static void etm4_disable(struct coresight_device *csdev,
diff --git a/drivers/hwtracing/coresight/coresight-funnel.c b/drivers/hwtracing/coresight/coresight-funnel.c
index 9f8ac0be..18b5361 100644
--- a/drivers/hwtracing/coresight/coresight-funnel.c
+++ b/drivers/hwtracing/coresight/coresight-funnel.c
@@ -72,7 +72,7 @@ static int funnel_enable(struct coresight_device *csdev, int inport,
 
 	funnel_enable_hw(drvdata, inport);
 
-	dev_info(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
+	dev_dbg(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
 	return 0;
 }
 
@@ -96,7 +96,7 @@ static void funnel_disable(struct coresight_device *csdev, int inport,
 
 	funnel_disable_hw(drvdata, inport);
 
-	dev_info(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
+	dev_dbg(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
 }
 
 static const struct coresight_ops_link funnel_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
index 3756e71..4f77812 100644
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@@ -42,7 +42,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
 {
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	dev_info(drvdata->dev, "REPLICATOR enabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
 	return 0;
 }
 
@@ -51,7 +51,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
 {
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	dev_info(drvdata->dev, "REPLICATOR disabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
 }
 
 static const struct coresight_ops_link replicator_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c
index 15e7ef38..4c88d99 100644
--- a/drivers/hwtracing/coresight/coresight-stm.c
+++ b/drivers/hwtracing/coresight/coresight-stm.c
@@ -218,7 +218,7 @@ static int stm_enable(struct coresight_device *csdev,
 	stm_enable_hw(drvdata);
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "STM tracing enabled\n");
+	dev_dbg(drvdata->dev, "STM tracing enabled\n");
 	return 0;
 }
 
@@ -281,7 +281,7 @@ static void stm_disable(struct coresight_device *csdev,
 		pm_runtime_put(drvdata->dev);
 
 		local_set(&drvdata->mode, CS_MODE_DISABLED);
-		dev_info(drvdata->dev, "STM tracing disabled\n");
+		dev_dbg(drvdata->dev, "STM tracing disabled\n");
 	}
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 1dd44fd..0a32734 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -244,7 +244,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
 	if (ret)
 		return ret;
 
-	dev_info(drvdata->dev, "TMC-ETB/ETF enabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETB/ETF enabled\n");
 	return 0;
 }
 
@@ -267,7 +267,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev)
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETB/ETF disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n");
 }
 
 static int tmc_enable_etf_link(struct coresight_device *csdev,
@@ -286,7 +286,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev,
 	drvdata->mode = CS_MODE_SYSFS;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETF enabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETF enabled\n");
 	return 0;
 }
 
@@ -306,7 +306,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
 	drvdata->mode = CS_MODE_DISABLED;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETF disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETF disabled\n");
 }
 
 static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu,
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 41dde0a..1ef0f62 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1350,7 +1350,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC-ETR enabled\n");
+		dev_dbg(drvdata->dev, "TMC-ETR enabled\n");
 
 	return ret;
 }
@@ -1393,7 +1393,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETR disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETR disabled\n");
 }
 
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 4d41b4b..7adcde3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -92,7 +92,7 @@ static int tmc_read_prepare(struct tmc_drvdata *drvdata)
 	}
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC read start\n");
+		dev_dbg(drvdata->dev, "TMC read start\n");
 
 	return ret;
 }
@@ -114,7 +114,7 @@ static int tmc_read_unprepare(struct tmc_drvdata *drvdata)
 	}
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC read end\n");
+		dev_dbg(drvdata->dev, "TMC read end\n");
 
 	return ret;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c
index 805f7c2..c7f0827 100644
--- a/drivers/hwtracing/coresight/coresight-tpiu.c
+++ b/drivers/hwtracing/coresight/coresight-tpiu.c
@@ -80,7 +80,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode)
 
 	tpiu_enable_hw(drvdata);
 
-	dev_info(drvdata->dev, "TPIU enabled\n");
+	dev_dbg(drvdata->dev, "TPIU enabled\n");
 	return 0;
 }
 
@@ -106,7 +106,7 @@ static void tpiu_disable(struct coresight_device *csdev)
 
 	tpiu_disable_hw(drvdata);
 
-	dev_info(drvdata->dev, "TPIU disabled\n");
+	dev_dbg(drvdata->dev, "TPIU disabled\n");
 }
 
 static const struct coresight_ops_sink tpiu_sink_ops = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Convert component enable/disable messages from dev_info to dev_dbg.
This is required to prevent LOCKDEP splats when operating in perf
mode where we could be called with locks held to enable a coresight
path. If someone wants to really see the messages, they can always
enable it at runtime via dynamic_debug.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-dynamic-replicator.c | 4 ++--
 drivers/hwtracing/coresight/coresight-etb10.c              | 6 +++---
 drivers/hwtracing/coresight/coresight-etm3x.c              | 4 ++--
 drivers/hwtracing/coresight/coresight-etm4x.c              | 4 ++--
 drivers/hwtracing/coresight/coresight-funnel.c             | 4 ++--
 drivers/hwtracing/coresight/coresight-replicator.c         | 4 ++--
 drivers/hwtracing/coresight/coresight-stm.c                | 4 ++--
 drivers/hwtracing/coresight/coresight-tmc-etf.c            | 8 ++++----
 drivers/hwtracing/coresight/coresight-tmc-etr.c            | 4 ++--
 drivers/hwtracing/coresight/coresight-tmc.c                | 4 ++--
 drivers/hwtracing/coresight/coresight-tpiu.c               | 4 ++--
 11 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
index 043da86..c41d95c 100644
--- a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
@@ -64,7 +64,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
 
 	CS_LOCK(drvdata->base);
 
-	dev_info(drvdata->dev, "REPLICATOR enabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
 	return 0;
 }
 
@@ -83,7 +83,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
 
 	CS_LOCK(drvdata->base);
 
-	dev_info(drvdata->dev, "REPLICATOR disabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
 }
 
 static const struct coresight_ops_link replicator_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 74232e6..d9c2f87 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -163,7 +163,7 @@ static int etb_enable(struct coresight_device *csdev, u32 mode)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 out:
-	dev_info(drvdata->dev, "ETB enabled\n");
+	dev_dbg(drvdata->dev, "ETB enabled\n");
 	return 0;
 }
 
@@ -269,7 +269,7 @@ static void etb_disable(struct coresight_device *csdev)
 
 	local_set(&drvdata->mode, CS_MODE_DISABLED);
 
-	dev_info(drvdata->dev, "ETB disabled\n");
+	dev_dbg(drvdata->dev, "ETB disabled\n");
 }
 
 static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
@@ -512,7 +512,7 @@ static void etb_dump(struct etb_drvdata *drvdata)
 	}
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "ETB dumped\n");
+	dev_dbg(drvdata->dev, "ETB dumped\n");
 }
 
 static int etb_open(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c
index 39f42fd..9d4a663 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x.c
@@ -510,7 +510,7 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
 	drvdata->sticky_enable = true;
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "ETM tracing enabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
 	return 0;
 
 err:
@@ -613,7 +613,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
 
-	dev_info(drvdata->dev, "ETM tracing disabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
 }
 
 static void etm_disable(struct coresight_device *csdev,
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index e84d80b..c9c73c2 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -274,7 +274,7 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
 	drvdata->sticky_enable = true;
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "ETM tracing enabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
 	return 0;
 
 err:
@@ -387,7 +387,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
 	spin_unlock(&drvdata->spinlock);
 	cpus_read_unlock();
 
-	dev_info(drvdata->dev, "ETM tracing disabled\n");
+	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
 }
 
 static void etm4_disable(struct coresight_device *csdev,
diff --git a/drivers/hwtracing/coresight/coresight-funnel.c b/drivers/hwtracing/coresight/coresight-funnel.c
index 9f8ac0be..18b5361 100644
--- a/drivers/hwtracing/coresight/coresight-funnel.c
+++ b/drivers/hwtracing/coresight/coresight-funnel.c
@@ -72,7 +72,7 @@ static int funnel_enable(struct coresight_device *csdev, int inport,
 
 	funnel_enable_hw(drvdata, inport);
 
-	dev_info(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
+	dev_dbg(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
 	return 0;
 }
 
@@ -96,7 +96,7 @@ static void funnel_disable(struct coresight_device *csdev, int inport,
 
 	funnel_disable_hw(drvdata, inport);
 
-	dev_info(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
+	dev_dbg(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
 }
 
 static const struct coresight_ops_link funnel_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
index 3756e71..4f77812 100644
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@@ -42,7 +42,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
 {
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	dev_info(drvdata->dev, "REPLICATOR enabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
 	return 0;
 }
 
@@ -51,7 +51,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
 {
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	dev_info(drvdata->dev, "REPLICATOR disabled\n");
+	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
 }
 
 static const struct coresight_ops_link replicator_link_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c
index 15e7ef38..4c88d99 100644
--- a/drivers/hwtracing/coresight/coresight-stm.c
+++ b/drivers/hwtracing/coresight/coresight-stm.c
@@ -218,7 +218,7 @@ static int stm_enable(struct coresight_device *csdev,
 	stm_enable_hw(drvdata);
 	spin_unlock(&drvdata->spinlock);
 
-	dev_info(drvdata->dev, "STM tracing enabled\n");
+	dev_dbg(drvdata->dev, "STM tracing enabled\n");
 	return 0;
 }
 
@@ -281,7 +281,7 @@ static void stm_disable(struct coresight_device *csdev,
 		pm_runtime_put(drvdata->dev);
 
 		local_set(&drvdata->mode, CS_MODE_DISABLED);
-		dev_info(drvdata->dev, "STM tracing disabled\n");
+		dev_dbg(drvdata->dev, "STM tracing disabled\n");
 	}
 }
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 1dd44fd..0a32734 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -244,7 +244,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
 	if (ret)
 		return ret;
 
-	dev_info(drvdata->dev, "TMC-ETB/ETF enabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETB/ETF enabled\n");
 	return 0;
 }
 
@@ -267,7 +267,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev)
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETB/ETF disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n");
 }
 
 static int tmc_enable_etf_link(struct coresight_device *csdev,
@@ -286,7 +286,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev,
 	drvdata->mode = CS_MODE_SYSFS;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETF enabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETF enabled\n");
 	return 0;
 }
 
@@ -306,7 +306,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
 	drvdata->mode = CS_MODE_DISABLED;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETF disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETF disabled\n");
 }
 
 static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu,
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 41dde0a..1ef0f62 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1350,7 +1350,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC-ETR enabled\n");
+		dev_dbg(drvdata->dev, "TMC-ETR enabled\n");
 
 	return ret;
 }
@@ -1393,7 +1393,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
-	dev_info(drvdata->dev, "TMC-ETR disabled\n");
+	dev_dbg(drvdata->dev, "TMC-ETR disabled\n");
 }
 
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 4d41b4b..7adcde3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -92,7 +92,7 @@ static int tmc_read_prepare(struct tmc_drvdata *drvdata)
 	}
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC read start\n");
+		dev_dbg(drvdata->dev, "TMC read start\n");
 
 	return ret;
 }
@@ -114,7 +114,7 @@ static int tmc_read_unprepare(struct tmc_drvdata *drvdata)
 	}
 
 	if (!ret)
-		dev_info(drvdata->dev, "TMC read end\n");
+		dev_dbg(drvdata->dev, "TMC read end\n");
 
 	return ret;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c
index 805f7c2..c7f0827 100644
--- a/drivers/hwtracing/coresight/coresight-tpiu.c
+++ b/drivers/hwtracing/coresight/coresight-tpiu.c
@@ -80,7 +80,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode)
 
 	tpiu_enable_hw(drvdata);
 
-	dev_info(drvdata->dev, "TPIU enabled\n");
+	dev_dbg(drvdata->dev, "TPIU enabled\n");
 	return 0;
 }
 
@@ -106,7 +106,7 @@ static void tpiu_disable(struct coresight_device *csdev)
 
 	tpiu_disable_hw(drvdata);
 
-	dev_info(drvdata->dev, "TPIU disabled\n");
+	dev_dbg(drvdata->dev, "TPIU disabled\n");
 }
 
 static const struct coresight_ops_sink tpiu_sink_ops = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 22/27] coresight: tmc-etr: Track if the device is coherent
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Track if the ETR is dma-coherent or not. This will be useful
in deciding if we should use software buffering for perf.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc.c | 3 +++
 drivers/hwtracing/coresight/coresight-tmc.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7adcde3..91a8f7b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -359,6 +359,9 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
+	if (device_get_dma_attr(drvdata->dev) == DEV_DMA_COHERENT)
+		tmc_etr_set_cap(drvdata, TMC_ETR_COHERENT);
+
 	/* Check if the AXI address width is available */
 	if (devid & TMC_DEVID_AXIAW_VALID)
 		dma_mask = ((devid >> TMC_DEVID_AXIAW_SHIFT) &
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 1f6aa49..76a89a6 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -131,6 +131,7 @@ enum tmc_mem_intf_width {
  * so we have to rely on PID of the IP to detect the functionality.
  */
 #define TMC_ETR_SAVE_RESTORE		(0x1U << 2)
+#define TMC_ETR_COHERENT		(0x1U << 3)
 
 /* Coresight SoC-600 TMC-ETR unadvertised capabilities */
 #define CORESIGHT_SOC_600_ETR_CAPS	\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 22/27] coresight: tmc-etr: Track if the device is coherent
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Track if the ETR is dma-coherent or not. This will be useful
in deciding if we should use software buffering for perf.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc.c | 3 +++
 drivers/hwtracing/coresight/coresight-tmc.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7adcde3..91a8f7b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -359,6 +359,9 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
+	if (device_get_dma_attr(drvdata->dev) == DEV_DMA_COHERENT)
+		tmc_etr_set_cap(drvdata, TMC_ETR_COHERENT);
+
 	/* Check if the AXI address width is available */
 	if (devid & TMC_DEVID_AXIAW_VALID)
 		dma_mask = ((devid >> TMC_DEVID_AXIAW_SHIFT) &
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 1f6aa49..76a89a6 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -131,6 +131,7 @@ enum tmc_mem_intf_width {
  * so we have to rely on PID of the IP to detect the functionality.
  */
 #define TMC_ETR_SAVE_RESTORE		(0x1U << 2)
+#define TMC_ETR_COHERENT		(0x1U << 3)
 
 /* Coresight SoC-600 TMC-ETR unadvertised capabilities */
 #define CORESIGHT_SOC_600_ETR_CAPS	\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Since the ETR could be driven either by SYSFS or by perf, it
becomes complicated how we deal with the buffers used for each
of these modes. The ETR driver cannot simply free the current
attached buffer without knowing the provider (i.e, sysfs vs perf).

To solve this issue, we provide:
1) the driver-mode specific etr buffer to be retained in the drvdata
2) the etr_buf for a session should be passed on when enabling the
   hardware, which will be stored in drvdata->etr_buf. This will be
   replaced (not free'd) as soon as the hardware is disabled, after
   necessary sync operation.

The advantages of this are :

1) The common code path doesn't need to worry about how to dispose
   an existing buffer, if it is about to start a new session with a
   different buffer, possibly in a different mode.
2) The driver mode can control its buffers and can get access to the
   saved session even when the hardware is operating in a different
   mode. (e.g, we can still access a trace buffer from a sysfs mode
   even if the etr is now used in perf mode, without disrupting the
   current session.)

Towards this, we introduce a sysfs specific data which will hold the
etr_buf used for sysfs mode of operation, controlled solely by the
sysfs mode handling code.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 59 ++++++++++++++++---------
 drivers/hwtracing/coresight/coresight-tmc.h     |  2 +
 2 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 1ef0f62..a35a12f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1162,10 +1162,15 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 		helper_ops(catu)->disable(catu, drvdata->etr_buf);
 }
 
-static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
+static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata,
+			      struct etr_buf *etr_buf)
 {
 	u32 axictl, sts;
-	struct etr_buf *etr_buf = drvdata->etr_buf;
+
+	/* Callers should provide an appropriate buffer for use */
+	if (WARN_ON(!etr_buf || drvdata->etr_buf))
+		return;
+	drvdata->etr_buf = etr_buf;
 
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
@@ -1227,13 +1232,16 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
  * also updating the @bufpp on where to find it. Since the trace data
  * starts at anywhere in the buffer, depending on the RRP, we adjust the
  * @len returned to handle buffer wrapping around.
+ *
+ * We are protected here by drvdata->reading != 0, which ensures the
+ * sysfs_buf stays alive.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 				loff_t pos, size_t len, char **bufpp)
 {
 	s64 offset;
 	ssize_t actual = len;
-	struct etr_buf *etr_buf = drvdata->etr_buf;
+	struct etr_buf *etr_buf = drvdata->sysfs_buf;
 
 	if (pos + actual > etr_buf->len)
 		actual = etr_buf->len - pos;
@@ -1263,7 +1271,14 @@ tmc_etr_free_sysfs_buf(struct etr_buf *buf)
 
 static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	tmc_sync_etr_buf(drvdata);
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+
+	if (WARN_ON(drvdata->sysfs_buf != etr_buf)) {
+		tmc_etr_free_sysfs_buf(drvdata->sysfs_buf);
+		drvdata->sysfs_buf = NULL;
+	} else {
+		tmc_sync_etr_buf(drvdata);
+	}
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
@@ -1285,6 +1300,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 
 	/* Disable CATU device if this ETR is connected to one */
 	tmc_etr_disable_catu(drvdata);
+	/* Reset the ETR buf used by hardware */
+	drvdata->etr_buf = NULL;
 }
 
 static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
@@ -1293,7 +1310,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	bool used = false;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-	struct etr_buf *new_buf = NULL, *free_buf = NULL;
+	struct etr_buf *sysfs_buf = NULL, *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
@@ -1305,7 +1322,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
+	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
+	if (!sysfs_buf || (sysfs_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
 		/* Allocate memory with the spinlock released */
 		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
@@ -1333,15 +1351,16 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	 * If we don't have a buffer or it doesn't match the requested size,
 	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (!drvdata->etr_buf ||
-	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
+	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
+	if (!sysfs_buf ||
+	    (new_buf && sysfs_buf->size != new_buf->size)) {
 		used = true;
-		free_buf = drvdata->etr_buf;
-		drvdata->etr_buf = new_buf;
+		free_buf = sysfs_buf;
+		drvdata->sysfs_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
-	tmc_etr_enable_hw(drvdata);
+	tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -1426,13 +1445,13 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::etr_buf is NULL the trace data has been read already */
-	if (drvdata->etr_buf == NULL) {
+	/* If sysfs_buf is NULL the trace data has been read already */
+	if (!drvdata->sysfs_buf) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	/* Disable the TMC if need be */
+	/* Disable the TMC if we are trying to read from a running session */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_disable_hw(drvdata);
 
@@ -1446,7 +1465,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	struct etr_buf *etr_buf = NULL;
+	struct etr_buf *sysfs_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -1461,22 +1480,22 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * buffer. Since the tracer is still enabled drvdata::buf can't
 		 * be NULL.
 		 */
-		tmc_etr_enable_hw(drvdata);
+		tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
 	} else {
 		/*
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		etr_buf =  drvdata->etr_buf;
-		drvdata->etr_buf = NULL;
+		sysfs_buf = drvdata->sysfs_buf;
+		drvdata->sysfs_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (etr_buf)
-		tmc_free_etr_buf(etr_buf);
+	if (sysfs_buf)
+		tmc_etr_free_sysfs_buf(sysfs_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 76a89a6..185dc12 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -197,6 +197,7 @@ struct etr_buf {
  * @trigger_cntr: amount of words to store after a trigger.
  * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
  *		device configuration register (DEVID)
+ * @sysfs_data:	SYSFS buffer for ETR.
  */
 struct tmc_drvdata {
 	void __iomem		*base;
@@ -216,6 +217,7 @@ struct tmc_drvdata {
 	enum tmc_mem_intf_width	memwidth;
 	u32			trigger_cntr;
 	u32			etr_caps;
+	struct etr_buf		*sysfs_buf;
 };
 
 struct etr_buf_operations {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Since the ETR could be driven either by SYSFS or by perf, it
becomes complicated how we deal with the buffers used for each
of these modes. The ETR driver cannot simply free the current
attached buffer without knowing the provider (i.e, sysfs vs perf).

To solve this issue, we provide:
1) the driver-mode specific etr buffer to be retained in the drvdata
2) the etr_buf for a session should be passed on when enabling the
   hardware, which will be stored in drvdata->etr_buf. This will be
   replaced (not free'd) as soon as the hardware is disabled, after
   necessary sync operation.

The advantages of this are :

1) The common code path doesn't need to worry about how to dispose
   an existing buffer, if it is about to start a new session with a
   different buffer, possibly in a different mode.
2) The driver mode can control its buffers and can get access to the
   saved session even when the hardware is operating in a different
   mode. (e.g, we can still access a trace buffer from a sysfs mode
   even if the etr is now used in perf mode, without disrupting the
   current session.)

Towards this, we introduce a sysfs specific data which will hold the
etr_buf used for sysfs mode of operation, controlled solely by the
sysfs mode handling code.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 59 ++++++++++++++++---------
 drivers/hwtracing/coresight/coresight-tmc.h     |  2 +
 2 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 1ef0f62..a35a12f 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1162,10 +1162,15 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 		helper_ops(catu)->disable(catu, drvdata->etr_buf);
 }
 
-static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
+static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata,
+			      struct etr_buf *etr_buf)
 {
 	u32 axictl, sts;
-	struct etr_buf *etr_buf = drvdata->etr_buf;
+
+	/* Callers should provide an appropriate buffer for use */
+	if (WARN_ON(!etr_buf || drvdata->etr_buf))
+		return;
+	drvdata->etr_buf = etr_buf;
 
 	/*
 	 * If this ETR is connected to a CATU, enable it before we turn
@@ -1227,13 +1232,16 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
  * also updating the @bufpp on where to find it. Since the trace data
  * starts at anywhere in the buffer, depending on the RRP, we adjust the
  * @len returned to handle buffer wrapping around.
+ *
+ * We are protected here by drvdata->reading != 0, which ensures the
+ * sysfs_buf stays alive.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 				loff_t pos, size_t len, char **bufpp)
 {
 	s64 offset;
 	ssize_t actual = len;
-	struct etr_buf *etr_buf = drvdata->etr_buf;
+	struct etr_buf *etr_buf = drvdata->sysfs_buf;
 
 	if (pos + actual > etr_buf->len)
 		actual = etr_buf->len - pos;
@@ -1263,7 +1271,14 @@ tmc_etr_free_sysfs_buf(struct etr_buf *buf)
 
 static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	tmc_sync_etr_buf(drvdata);
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+
+	if (WARN_ON(drvdata->sysfs_buf != etr_buf)) {
+		tmc_etr_free_sysfs_buf(drvdata->sysfs_buf);
+		drvdata->sysfs_buf = NULL;
+	} else {
+		tmc_sync_etr_buf(drvdata);
+	}
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
@@ -1285,6 +1300,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 
 	/* Disable CATU device if this ETR is connected to one */
 	tmc_etr_disable_catu(drvdata);
+	/* Reset the ETR buf used by hardware */
+	drvdata->etr_buf = NULL;
 }
 
 static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
@@ -1293,7 +1310,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	bool used = false;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-	struct etr_buf *new_buf = NULL, *free_buf = NULL;
+	struct etr_buf *sysfs_buf = NULL, *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
@@ -1305,7 +1322,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
+	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
+	if (!sysfs_buf || (sysfs_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
 		/* Allocate memory with the spinlock released */
 		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
@@ -1333,15 +1351,16 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	 * If we don't have a buffer or it doesn't match the requested size,
 	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (!drvdata->etr_buf ||
-	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
+	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
+	if (!sysfs_buf ||
+	    (new_buf && sysfs_buf->size != new_buf->size)) {
 		used = true;
-		free_buf = drvdata->etr_buf;
-		drvdata->etr_buf = new_buf;
+		free_buf = sysfs_buf;
+		drvdata->sysfs_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
-	tmc_etr_enable_hw(drvdata);
+	tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -1426,13 +1445,13 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::etr_buf is NULL the trace data has been read already */
-	if (drvdata->etr_buf == NULL) {
+	/* If sysfs_buf is NULL the trace data has been read already */
+	if (!drvdata->sysfs_buf) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	/* Disable the TMC if need be */
+	/* Disable the TMC if we are trying to read from a running session */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_disable_hw(drvdata);
 
@@ -1446,7 +1465,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	struct etr_buf *etr_buf = NULL;
+	struct etr_buf *sysfs_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -1461,22 +1480,22 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * buffer. Since the tracer is still enabled drvdata::buf can't
 		 * be NULL.
 		 */
-		tmc_etr_enable_hw(drvdata);
+		tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
 	} else {
 		/*
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		etr_buf =  drvdata->etr_buf;
-		drvdata->etr_buf = NULL;
+		sysfs_buf = drvdata->sysfs_buf;
+		drvdata->sysfs_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (etr_buf)
-		tmc_free_etr_buf(etr_buf);
+	if (sysfs_buf)
+		tmc_etr_free_sysfs_buf(sysfs_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 76a89a6..185dc12 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -197,6 +197,7 @@ struct etr_buf {
  * @trigger_cntr: amount of words to store after a trigger.
  * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
  *		device configuration register (DEVID)
+ * @sysfs_data:	SYSFS buffer for ETR.
  */
 struct tmc_drvdata {
 	void __iomem		*base;
@@ -216,6 +217,7 @@ struct tmc_drvdata {
 	enum tmc_mem_intf_width	memwidth;
 	u32			trigger_cntr;
 	u32			etr_caps;
+	struct etr_buf		*sysfs_buf;
 };
 
 struct etr_buf_operations {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 24/27] coresight: tmc-etr: Relax collection of trace from sysfs mode
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Since the ETR now uses mode specific buffers, we can reliably
provide the trace data captured in sysfs mode, even when the ETR
is operating in PERF mode.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index a35a12f..7551272 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1439,19 +1439,17 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* Don't interfere if operated from Perf */
-	if (drvdata->mode == CS_MODE_PERF) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* If sysfs_buf is NULL the trace data has been read already */
+	/*
+	 * We can safely allow reads even if the ETR is operating in PERF mode,
+	 * since the sysfs session is captured in mode specific data.
+	 * If drvdata::sysfs_data is NULL the trace data has been read already.
+	 */
 	if (!drvdata->sysfs_buf) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	/* Disable the TMC if we are trying to read from a running session */
+	/* Disable the TMC if we are trying to read from a running session. */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_disable_hw(drvdata);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 24/27] coresight: tmc-etr: Relax collection of trace from sysfs mode
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Since the ETR now uses mode specific buffers, we can reliably
provide the trace data captured in sysfs mode, even when the ETR
is operating in PERF mode.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index a35a12f..7551272 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1439,19 +1439,17 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* Don't interfere if operated from Perf */
-	if (drvdata->mode == CS_MODE_PERF) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* If sysfs_buf is NULL the trace data has been read already */
+	/*
+	 * We can safely allow reads even if the ETR is operating in PERF mode,
+	 * since the sysfs session is captured in mode specific data.
+	 * If drvdata::sysfs_data is NULL the trace data has been read already.
+	 */
 	if (!drvdata->sysfs_buf) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	/* Disable the TMC if we are trying to read from a running session */
+	/* Disable the TMC if we are trying to read from a running session. */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_disable_hw(drvdata);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 25/27] coresight: etr_buf: Add helper for padding an area of trace data
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

This patch adds a helper to insert barrier packets for a given
size (aligned to packet size) at given offset in an etr_buf. This
will be used later for perf mode when we try to start in the
middle of an SG buffer.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 53 ++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7551272..8159e84 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1083,18 +1083,59 @@ static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
 	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
 }
 
+/*
+ * tmc_etr_buf_insert_barrier_packets : Insert barrier packets at @offset upto
+ * @size of bytes in the given buffer. @size should be aligned to the barrier
+ * packet size.
+ *
+ * Returns the new @offset after filling the barriers on success. Otherwise
+ * returns error.
+ */
 static inline s64
-tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+tmc_etr_buf_insert_barrier_packets(struct etr_buf *etr_buf,
+				   u64 offset, u64 size)
 {
 	ssize_t len;
 	char *bufp;
 
-	len = tmc_etr_buf_get_data(etr_buf, offset,
-				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
-	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+	if (size < CORESIGHT_BARRIER_PKT_SIZE)
 		return -EINVAL;
-	coresight_insert_barrier_packet(bufp);
-	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+	/*
+	 * Normally the size should be aligned to the frame size
+	 * of the ETR. Even if it isn't, the decoder looks for a
+	 * barrier packet at a frame size aligned offset. So align
+	 * the buffer to frame size first and then fill barrier
+	 * packets.
+	 */
+	do {
+		len = tmc_etr_buf_get_data(etr_buf, offset, size, &bufp);
+		if (WARN_ON(len <= 0))
+			return -EINVAL;
+		/*
+		 * We are guaranteed that @bufp will point to a linear range
+		 * of @len bytes, where @len <= @size.
+		 */
+		size -= len;
+		offset += len;
+		while (len >= CORESIGHT_BARRIER_PKT_SIZE) {
+			coresight_insert_barrier_packet(bufp);
+			bufp += CORESIGHT_BARRIER_PKT_SIZE;
+			len -= CORESIGHT_BARRIER_PKT_SIZE;
+		}
+
+		/* If we reached the end of the buffer, wrap around */
+		if (offset == etr_buf->size)
+			offset -= etr_buf->size;
+	} while (size);
+
+	return offset;
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	return tmc_etr_buf_insert_barrier_packets(etr_buf, offset,
+					  CORESIGHT_BARRIER_PKT_SIZE);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 25/27] coresight: etr_buf: Add helper for padding an area of trace data
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds a helper to insert barrier packets for a given
size (aligned to packet size) at given offset in an etr_buf. This
will be used later for perf mode when we try to start in the
middle of an SG buffer.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 53 ++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7551272..8159e84 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1083,18 +1083,59 @@ static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
 	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
 }
 
+/*
+ * tmc_etr_buf_insert_barrier_packets : Insert barrier packets at @offset upto
+ * @size of bytes in the given buffer. @size should be aligned to the barrier
+ * packet size.
+ *
+ * Returns the new @offset after filling the barriers on success. Otherwise
+ * returns error.
+ */
 static inline s64
-tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+tmc_etr_buf_insert_barrier_packets(struct etr_buf *etr_buf,
+				   u64 offset, u64 size)
 {
 	ssize_t len;
 	char *bufp;
 
-	len = tmc_etr_buf_get_data(etr_buf, offset,
-				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
-	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+	if (size < CORESIGHT_BARRIER_PKT_SIZE)
 		return -EINVAL;
-	coresight_insert_barrier_packet(bufp);
-	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+	/*
+	 * Normally the size should be aligned to the frame size
+	 * of the ETR. Even if it isn't, the decoder looks for a
+	 * barrier packet@a frame size aligned offset. So align
+	 * the buffer to frame size first and then fill barrier
+	 * packets.
+	 */
+	do {
+		len = tmc_etr_buf_get_data(etr_buf, offset, size, &bufp);
+		if (WARN_ON(len <= 0))
+			return -EINVAL;
+		/*
+		 * We are guaranteed that @bufp will point to a linear range
+		 * of @len bytes, where @len <= @size.
+		 */
+		size -= len;
+		offset += len;
+		while (len >= CORESIGHT_BARRIER_PKT_SIZE) {
+			coresight_insert_barrier_packet(bufp);
+			bufp += CORESIGHT_BARRIER_PKT_SIZE;
+			len -= CORESIGHT_BARRIER_PKT_SIZE;
+		}
+
+		/* If we reached the end of the buffer, wrap around */
+		if (offset == etr_buf->size)
+			offset -= etr_buf->size;
+	} while (size);
+
+	return offset;
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	return tmc_etr_buf_insert_barrier_packets(etr_buf, offset,
+					  CORESIGHT_BARRIER_PKT_SIZE);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Right now we issue an update_buffer() and reset_buffer() call backs
in succession when we stop tracing an event. The update_buffer is
supposed to check the status of the buffer and make sure the ring buffer
is updated with the trace data. And we store information about the
size of the data collected only to be consumed by the reset_buffer
callback which always follows the update_buffer. This was originally
designed for handling future IPs which could trigger a buffer overflow
interrupt. This patch gets rid of the reset_buffer callback altogether
and performs the actions in update_buffer, making it return the size
collected. We can always add the support for handling the overflow
interrupt case later.

This removes some not-so pretty hack (storing the new head in the
size field for snapshot mode) and cleans it up a little bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
 drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
 drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
 include/linux/coresight.h                        |  5 +-
 4 files changed, 26 insertions(+), 102 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d9c2f87..b13712a 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
 	return ret;
 }
 
-static unsigned long etb_reset_buffer(struct coresight_device *csdev,
-				      struct perf_output_handle *handle,
-				      void *sink_config)
-{
-	unsigned long size = 0;
-	struct cs_buffers *buf = sink_config;
-
-	if (buf) {
-		/*
-		 * In snapshot mode ->data_size holds the new address of the
-		 * ring buffer's head.  The size itself is the whole address
-		 * range since we want the latest information.
-		 */
-		if (buf->snapshot)
-			handle->head = local_xchg(&buf->data_size,
-						  buf->nr_pages << PAGE_SHIFT);
-
-		/*
-		 * Tell the tracer PMU how much we got in this run and if
-		 * something went wrong along the way.  Nobody else can use
-		 * this cs_buffers instance until we are done.  As such
-		 * resetting parameters here and squaring off with the ring
-		 * buffer API in the tracer PMU is fine.
-		 */
-		size = local_xchg(&buf->data_size, 0);
-	}
-
-	return size;
-}
-
-static void etb_update_buffer(struct coresight_device *csdev,
+static unsigned long etb_update_buffer(struct coresight_device *csdev,
 			      struct perf_output_handle *handle,
 			      void *sink_config)
 {
@@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
 	u8 *buf_ptr;
 	const u32 *barrier;
 	u32 read_ptr, write_ptr, capacity;
-	u32 status, read_data, to_read;
-	unsigned long offset;
+	u32 status, read_data;
+	unsigned long offset, to_read;
 	struct cs_buffers *buf = sink_config;
 	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
 	if (!buf)
-		return;
+		return 0;
 
 	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
 
@@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
 	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
 
 	/*
-	 * In snapshot mode all we have to do is communicate to
-	 * perf_aux_output_end() the address of the current head.  In full
-	 * trace mode the same function expects a size to move rb->aux_head
-	 * forward.
+	 * In snapshot mode we have to update the handle->head to point
+	 * to the new location.
 	 */
-	if (buf->snapshot)
-		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
-	else
-		local_add(to_read, &buf->data_size);
-
+	if (buf->snapshot) {
+		handle->head = (cur * PAGE_SIZE) + offset;
+		to_read = buf->nr_pages << PAGE_SHIFT;
+	}
 	etb_enable_hw(drvdata);
 	CS_LOCK(drvdata->base);
+
+	return to_read;
 }
 
 static const struct coresight_ops_sink etb_sink_ops = {
@@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
 	.alloc_buffer	= etb_alloc_buffer,
 	.free_buffer	= etb_free_buffer,
 	.set_buffer	= etb_set_buffer,
-	.reset_buffer	= etb_reset_buffer,
 	.update_buffer	= etb_update_buffer,
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 4e5ed65..5096def 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
 		if (!sink_ops(sink)->update_buffer)
 			return;
 
-		sink_ops(sink)->update_buffer(sink, handle,
+		size = sink_ops(sink)->update_buffer(sink, handle,
 					      event_data->snk_config);
-
-		if (!sink_ops(sink)->reset_buffer)
-			return;
-
-		size = sink_ops(sink)->reset_buffer(sink, handle,
-						    event_data->snk_config);
-
 		perf_aux_output_end(handle, size);
 	}
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 0a32734..75ef5c4 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
 	return ret;
 }
 
-static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
-					  struct perf_output_handle *handle,
-					  void *sink_config)
-{
-	long size = 0;
-	struct cs_buffers *buf = sink_config;
-
-	if (buf) {
-		/*
-		 * In snapshot mode ->data_size holds the new address of the
-		 * ring buffer's head.  The size itself is the whole address
-		 * range since we want the latest information.
-		 */
-		if (buf->snapshot)
-			handle->head = local_xchg(&buf->data_size,
-						  buf->nr_pages << PAGE_SHIFT);
-		/*
-		 * Tell the tracer PMU how much we got in this run and if
-		 * something went wrong along the way.  Nobody else can use
-		 * this cs_buffers instance until we are done.  As such
-		 * resetting parameters here and squaring off with the ring
-		 * buffer API in the tracer PMU is fine.
-		 */
-		size = local_xchg(&buf->data_size, 0);
-	}
-
-	return size;
-}
-
-static void tmc_update_etf_buffer(struct coresight_device *csdev,
+static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
 				  struct perf_output_handle *handle,
 				  void *sink_config)
 {
@@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
 	const u32 *barrier;
 	u32 *buf_ptr;
 	u64 read_ptr, write_ptr;
-	u32 status, to_read;
-	unsigned long offset;
+	u32 status;
+	unsigned long offset, to_read;
 	struct cs_buffers *buf = sink_config;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
 	if (!buf)
-		return;
+		return 0;
 
 	/* This shouldn't happen */
 	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
-		return;
+		return 0;
 
 	CS_UNLOCK(drvdata->base);
 
@@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
 		}
 	}
 
-	/*
-	 * In snapshot mode all we have to do is communicate to
-	 * perf_aux_output_end() the address of the current head.  In full
-	 * trace mode the same function expects a size to move rb->aux_head
-	 * forward.
-	 */
-	if (buf->snapshot)
-		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
-	else
-		local_add(to_read, &buf->data_size);
-
+	/* In snapshot mode we have to update the head */
+	if (buf->snapshot) {
+		handle->head = (cur * PAGE_SIZE) + offset;
+		to_read = buf->nr_pages << PAGE_SHIFT;
+	}
 	CS_LOCK(drvdata->base);
+
+	return to_read;
 }
 
 static const struct coresight_ops_sink tmc_etf_sink_ops = {
@@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
 	.alloc_buffer	= tmc_alloc_etf_buffer,
 	.free_buffer	= tmc_free_etf_buffer,
 	.set_buffer	= tmc_set_etf_buffer,
-	.reset_buffer	= tmc_reset_etf_buffer,
 	.update_buffer	= tmc_update_etf_buffer,
 };
 
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index c0e1568..41b3729 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -212,10 +212,7 @@ struct coresight_ops_sink {
 	int (*set_buffer)(struct coresight_device *csdev,
 			  struct perf_output_handle *handle,
 			  void *sink_config);
-	unsigned long (*reset_buffer)(struct coresight_device *csdev,
-				      struct perf_output_handle *handle,
-				      void *sink_config);
-	void (*update_buffer)(struct coresight_device *csdev,
+	unsigned long (*update_buffer)(struct coresight_device *csdev,
 			      struct perf_output_handle *handle,
 			      void *sink_config);
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Right now we issue an update_buffer() and reset_buffer() call backs
in succession when we stop tracing an event. The update_buffer is
supposed to check the status of the buffer and make sure the ring buffer
is updated with the trace data. And we store information about the
size of the data collected only to be consumed by the reset_buffer
callback which always follows the update_buffer. This was originally
designed for handling future IPs which could trigger a buffer overflow
interrupt. This patch gets rid of the reset_buffer callback altogether
and performs the actions in update_buffer, making it return the size
collected. We can always add the support for handling the overflow
interrupt case later.

This removes some not-so pretty hack (storing the new head in the
size field for snapshot mode) and cleans it up a little bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
 drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
 drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
 include/linux/coresight.h                        |  5 +-
 4 files changed, 26 insertions(+), 102 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d9c2f87..b13712a 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
 	return ret;
 }
 
-static unsigned long etb_reset_buffer(struct coresight_device *csdev,
-				      struct perf_output_handle *handle,
-				      void *sink_config)
-{
-	unsigned long size = 0;
-	struct cs_buffers *buf = sink_config;
-
-	if (buf) {
-		/*
-		 * In snapshot mode ->data_size holds the new address of the
-		 * ring buffer's head.  The size itself is the whole address
-		 * range since we want the latest information.
-		 */
-		if (buf->snapshot)
-			handle->head = local_xchg(&buf->data_size,
-						  buf->nr_pages << PAGE_SHIFT);
-
-		/*
-		 * Tell the tracer PMU how much we got in this run and if
-		 * something went wrong along the way.  Nobody else can use
-		 * this cs_buffers instance until we are done.  As such
-		 * resetting parameters here and squaring off with the ring
-		 * buffer API in the tracer PMU is fine.
-		 */
-		size = local_xchg(&buf->data_size, 0);
-	}
-
-	return size;
-}
-
-static void etb_update_buffer(struct coresight_device *csdev,
+static unsigned long etb_update_buffer(struct coresight_device *csdev,
 			      struct perf_output_handle *handle,
 			      void *sink_config)
 {
@@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
 	u8 *buf_ptr;
 	const u32 *barrier;
 	u32 read_ptr, write_ptr, capacity;
-	u32 status, read_data, to_read;
-	unsigned long offset;
+	u32 status, read_data;
+	unsigned long offset, to_read;
 	struct cs_buffers *buf = sink_config;
 	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
 	if (!buf)
-		return;
+		return 0;
 
 	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
 
@@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
 	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
 
 	/*
-	 * In snapshot mode all we have to do is communicate to
-	 * perf_aux_output_end() the address of the current head.  In full
-	 * trace mode the same function expects a size to move rb->aux_head
-	 * forward.
+	 * In snapshot mode we have to update the handle->head to point
+	 * to the new location.
 	 */
-	if (buf->snapshot)
-		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
-	else
-		local_add(to_read, &buf->data_size);
-
+	if (buf->snapshot) {
+		handle->head = (cur * PAGE_SIZE) + offset;
+		to_read = buf->nr_pages << PAGE_SHIFT;
+	}
 	etb_enable_hw(drvdata);
 	CS_LOCK(drvdata->base);
+
+	return to_read;
 }
 
 static const struct coresight_ops_sink etb_sink_ops = {
@@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
 	.alloc_buffer	= etb_alloc_buffer,
 	.free_buffer	= etb_free_buffer,
 	.set_buffer	= etb_set_buffer,
-	.reset_buffer	= etb_reset_buffer,
 	.update_buffer	= etb_update_buffer,
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 4e5ed65..5096def 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
 		if (!sink_ops(sink)->update_buffer)
 			return;
 
-		sink_ops(sink)->update_buffer(sink, handle,
+		size = sink_ops(sink)->update_buffer(sink, handle,
 					      event_data->snk_config);
-
-		if (!sink_ops(sink)->reset_buffer)
-			return;
-
-		size = sink_ops(sink)->reset_buffer(sink, handle,
-						    event_data->snk_config);
-
 		perf_aux_output_end(handle, size);
 	}
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 0a32734..75ef5c4 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
 	return ret;
 }
 
-static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
-					  struct perf_output_handle *handle,
-					  void *sink_config)
-{
-	long size = 0;
-	struct cs_buffers *buf = sink_config;
-
-	if (buf) {
-		/*
-		 * In snapshot mode ->data_size holds the new address of the
-		 * ring buffer's head.  The size itself is the whole address
-		 * range since we want the latest information.
-		 */
-		if (buf->snapshot)
-			handle->head = local_xchg(&buf->data_size,
-						  buf->nr_pages << PAGE_SHIFT);
-		/*
-		 * Tell the tracer PMU how much we got in this run and if
-		 * something went wrong along the way.  Nobody else can use
-		 * this cs_buffers instance until we are done.  As such
-		 * resetting parameters here and squaring off with the ring
-		 * buffer API in the tracer PMU is fine.
-		 */
-		size = local_xchg(&buf->data_size, 0);
-	}
-
-	return size;
-}
-
-static void tmc_update_etf_buffer(struct coresight_device *csdev,
+static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
 				  struct perf_output_handle *handle,
 				  void *sink_config)
 {
@@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
 	const u32 *barrier;
 	u32 *buf_ptr;
 	u64 read_ptr, write_ptr;
-	u32 status, to_read;
-	unsigned long offset;
+	u32 status;
+	unsigned long offset, to_read;
 	struct cs_buffers *buf = sink_config;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
 	if (!buf)
-		return;
+		return 0;
 
 	/* This shouldn't happen */
 	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
-		return;
+		return 0;
 
 	CS_UNLOCK(drvdata->base);
 
@@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
 		}
 	}
 
-	/*
-	 * In snapshot mode all we have to do is communicate to
-	 * perf_aux_output_end() the address of the current head.  In full
-	 * trace mode the same function expects a size to move rb->aux_head
-	 * forward.
-	 */
-	if (buf->snapshot)
-		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
-	else
-		local_add(to_read, &buf->data_size);
-
+	/* In snapshot mode we have to update the head */
+	if (buf->snapshot) {
+		handle->head = (cur * PAGE_SIZE) + offset;
+		to_read = buf->nr_pages << PAGE_SHIFT;
+	}
 	CS_LOCK(drvdata->base);
+
+	return to_read;
 }
 
 static const struct coresight_ops_sink tmc_etf_sink_ops = {
@@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
 	.alloc_buffer	= tmc_alloc_etf_buffer,
 	.free_buffer	= tmc_free_etf_buffer,
 	.set_buffer	= tmc_set_etf_buffer,
-	.reset_buffer	= tmc_reset_etf_buffer,
 	.update_buffer	= tmc_update_etf_buffer,
 };
 
diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index c0e1568..41b3729 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -212,10 +212,7 @@ struct coresight_ops_sink {
 	int (*set_buffer)(struct coresight_device *csdev,
 			  struct perf_output_handle *handle,
 			  void *sink_config);
-	unsigned long (*reset_buffer)(struct coresight_device *csdev,
-				      struct perf_output_handle *handle,
-				      void *sink_config);
-	void (*update_buffer)(struct coresight_device *csdev,
+	unsigned long (*update_buffer)(struct coresight_device *csdev,
 			      struct perf_output_handle *handle,
 			      void *sink_config);
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 27/27] coresight: etm-perf: Add support for ETR backend
  2018-05-01  9:10 ` Suzuki K Poulose
@ 2018-05-01  9:10   ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Suzuki K Poulose

Add necessary support for using ETR as a sink in ETM perf tracing.
We try make the best use of the available modes of buffers to
try and avoid software double buffering.

We can use the perf ring buffer for ETR directly if all of the
conditions below are met :
 1) ETR is DMA coherent
 2) perf is used in snapshot mode. In full tracing mode, we cannot
    guarantee that the ETR will stop before it overwrites the data
    at the beginning of the trace buffer leading to loss of trace
    data. (The buffer which is being consumed by the perf is still
    hidden from the ETR).
 3) ETR supports save-restore with a scatter-gather mechanism
    which can use a given set of pages we use the perf ring buffer
    directly. If we have an in-built TMC ETR Scatter Gather unit,
    we make use of a circular SG list to restart from a given head.
    However, we need to align the starting offset to 4K in this case.
    With CATU and ETR Save restore feature, we don't have to necessarily
    align the head of the buffer.

If the ETR doesn't support either of this, we fallback to software
double buffering.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---

Note: The conditions above need some rethink.

For (1) : We always sync the buffer for CPU, before we update the
pointers. So, we should be safe here and should be able to remove
this condition.

(2) is a bit more of problem, as the ETR (without SFIFO_2 mode)
doesn't stop writing out the trace buffer, eventhough we exclude
the part of the ring buffer currently consumed by perf, leading
to loss of data. Also, since we don't have an interrupt (without
SFIFO_2), we can't wake up the userspace reliably to consume
the data.

One possible option is to use an hrtimer to wake up the userspace
early enough, using a low wakeup mark. But that doesn't necessarily
guarantee that the ETR will not wrap around overwriting the data,
as we can't modify the ETR pointers, unless we disable it, which
could again potentially cause data loss in Circular Buffer mode.
We may still be able to detect if there was a data loss by checking
how far the userspace has consumed the data.
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 387 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-tmc.h     |   2 +
 2 files changed, 386 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 8159e84..3e9ba02 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -31,6 +31,32 @@ struct etr_flat_buf {
 };
 
 /*
+ * etr_perf_buffer - Perf buffer used for ETR
+ * @etr_buf		- Actual buffer used by the ETR
+ * @snaphost		- Perf session mode
+ * @head		- handle->head at the beginning of the session.
+ * @nr_pages		- Number of pages in the ring buffer.
+ * @pages		- Pages in the ring buffer.
+ * @flags		- Capabilities of the hardware buffer used in the
+ *			  session. If flags == 0, we use software double
+ *			  buffering.
+ */
+struct etr_perf_buffer {
+	struct etr_buf		*etr_buf;
+	bool			snapshot;
+	unsigned long		head;
+	int			nr_pages;
+	void			**pages;
+	u32			flags;
+};
+
+/* Convert the perf index to an offset within the ETR buffer */
+#define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+
+/* Lower limit for ETR hardware buffer in double buffering mode */
+#define TMC_ETR_PERF_MIN_BUF_SIZE	SZ_1M
+
+/*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
  * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
@@ -1164,7 +1190,7 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
 }
 
-static int __maybe_unused
+static int
 tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
 		    unsigned long r_offset, unsigned long w_offset,
 		    unsigned long size, u32 status)
@@ -1415,10 +1441,361 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	return ret;
 }
 
+/*
+ * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. We try to
+ * use perf ring buffer pages for the ETR when we can. In the worst case
+ * we fallback to software double buffering. The size of the hardware buffer
+ * in this case is dependent on the size configured via sysfs, if we can't
+ * match the perf ring buffer size. We scale down the size by half until
+ * it reaches a limit of 1M, beyond which we give up.
+ */
+static struct etr_perf_buffer *
+tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
+		       void **pages, bool snapshot)
+{
+	int i;
+	struct etr_buf *etr_buf;
+	struct etr_perf_buffer *etr_perf;
+	unsigned long size;
+	unsigned long buf_flags[] = {
+					ETR_BUF_F_RESTORE_FULL,
+					ETR_BUF_F_RESTORE_MINIMAL,
+					0,
+				    };
+
+	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
+	if (!etr_perf)
+		return ERR_PTR(-ENOMEM);
+
+	size = nr_pages << PAGE_SHIFT;
+	/*
+	 * TODO: We need to refine the following rule.
+	 *
+	 * We can use the perf ring buffer for ETR only if it is coherent
+	 * and used in snapshot mode.
+	 *
+	 * The ETR (without SFIFO_2 mode) cannot stop writing when a
+	 * certain limit is reached, nor can it interrupt driver.
+	 * We can protect the data which is being consumed by the
+	 * userspace, by hiding it from the ETR's tables. So, we could
+	 * potentially loose the trace data only for the current session
+	 * session if the ETR wraps around.
+	 */
+	if (tmc_etr_has_cap(drvdata, TMC_ETR_COHERENT) && snapshot) {
+		for (i = 0; buf_flags[i]; i++) {
+			etr_buf = tmc_alloc_etr_buf(drvdata, size,
+						 buf_flags[i], node, pages);
+			if (!IS_ERR(etr_buf)) {
+				etr_perf->flags = buf_flags[i];
+				goto done;
+			}
+		}
+	}
+
+	/*
+	 * We have to now fallback to software double buffering.
+	 * The tricky decision is choosing a size for the hardware buffer.
+	 * We could start with drvdata->size (configurable via sysfs) and
+	 * scale it down until we can allocate the data.
+	 */
+	etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
+	if (!IS_ERR(etr_buf))
+		goto done;
+	size = drvdata->size;
+	do {
+		etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
+		if (!IS_ERR(etr_buf))
+			goto done;
+		size /= 2;
+	} while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
+
+	kfree(etr_perf);
+	return ERR_PTR(-ENOMEM);
+
+done:
+	etr_perf->etr_buf = etr_buf;
+	return etr_perf;
+}
+
+
+static void *tmc_etr_alloc_perf_buffer(struct coresight_device *csdev,
+					int cpu, void **pages, int nr_pages,
+					bool snapshot)
+{
+	struct etr_perf_buffer *etr_perf;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	if (cpu == -1)
+		cpu = smp_processor_id();
+
+	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
+					     nr_pages, pages, snapshot);
+	if (IS_ERR(etr_perf)) {
+		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
+		return NULL;
+	}
+
+	etr_perf->snapshot = snapshot;
+	etr_perf->nr_pages = nr_pages;
+	etr_perf->pages = pages;
+
+	return etr_perf;
+}
+
+static void tmc_etr_free_perf_buffer(void *config)
+{
+	struct etr_perf_buffer *etr_perf = config;
+
+	if (etr_perf->etr_buf)
+		tmc_free_etr_buf(etr_perf->etr_buf);
+	kfree(etr_perf);
+}
+
+/*
+ * Pad the etr buffer with barrier packets to align the head to 4K aligned
+ * offset. This is required for ETR SG backed buffers, so that we can rotate
+ * the buffer easily and avoid a software double buffering.
+ */
+static long tmc_etr_pad_perf_buffer(struct etr_perf_buffer *etr_perf, long head)
+{
+	long new_head;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	head = PERF_IDX2OFF(head, etr_perf);
+	new_head = ALIGN(head, SZ_4K);
+	if (head == new_head)
+		return head;
+	/*
+	 * If the padding is not aligned to barrier packet size
+	 * we can't do much.
+	 */
+	if ((new_head - head) % CORESIGHT_BARRIER_PKT_SIZE)
+		return -EINVAL;
+	return tmc_etr_buf_insert_barrier_packets(etr_buf, head,
+						  new_head - head);
+}
+
+static int tmc_etr_set_perf_buffer(struct coresight_device *csdev,
+				   struct perf_output_handle *handle,
+				   void *config)
+{
+	int rc;
+	unsigned long flags;
+	long head, new_head;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_perf_buffer *etr_perf = config;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	etr_perf->head = handle->head;
+	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+	switch (etr_perf->flags) {
+	case ETR_BUF_F_RESTORE_MINIMAL:
+		new_head = tmc_etr_pad_perf_buffer(etr_perf, head);
+		if (new_head < 0)
+			return new_head;
+		if (head != new_head) {
+			rc = perf_aux_output_skip(handle, new_head - head);
+			if (rc)
+				return rc;
+			etr_perf->head = handle->head;
+			head = new_head;
+		}
+		/* Fall through */
+	case ETR_BUF_F_RESTORE_FULL:
+		rc = tmc_restore_etr_buf(drvdata, etr_buf,
+					 head, head, handle->size, 0);
+		break;
+	case 0:
+		/* Nothing to do here. */
+		rc = 0;
+		break;
+	default:
+		dev_warn(drvdata->dev, "Unexpected flags in etr_perf buffer\n");
+		WARN_ON(1);
+		rc = -EINVAL;
+	}
+
+	/*
+	 * This sink is going to be used in perf mode. No other session can
+	 * grab it from us. So set the perf mode specific data here. This will
+	 * be released just before we disable the sink from update_buffer call
+	 * back.
+	 */
+	if (!rc) {
+		spin_lock_irqsave(&drvdata->spinlock, flags);
+		if (WARN_ON(drvdata->perf_data))
+			rc = -EBUSY;
+		else
+			drvdata->perf_data = etr_perf;
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	}
+	return rc;
+}
+
+/*
+ * tmc_etr_sync_perf_buffer: Copy the actual trace data from the hardware
+ * buffer to the perf ring buffer.
+ */
+static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf)
+{
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+	long bytes, to_copy;
+	unsigned long head = etr_perf->head;
+	unsigned long pg_idx, pg_offset, src_offset;
+	char **dst_pages, *src_buf;
+
+	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+	pg_idx = head >> PAGE_SHIFT;
+	pg_offset = head & (PAGE_SIZE - 1);
+	dst_pages = (char **)etr_perf->pages;
+	src_offset = etr_buf->offset;
+	to_copy = etr_buf->len;
+
+	while (to_copy > 0) {
+		/*
+		 * We can copy minimum of :
+		 *  1) what is available in the source buffer,
+		 *  2) what is available in the source buffer, before it
+		 *     wraps around.
+		 *  3) what is available in the destination page.
+		 * in one iteration.
+		 */
+		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
+					     &src_buf);
+		if (WARN_ON_ONCE(bytes <= 0))
+			break;
+		if (PAGE_SIZE - pg_offset <  bytes)
+			bytes = PAGE_SIZE - pg_offset;
+
+		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
+		to_copy -= bytes;
+		/* Move destination pointers */
+		pg_offset += bytes;
+		if (pg_offset == PAGE_SIZE) {
+			pg_offset = 0;
+			if (++pg_idx == etr_perf->nr_pages)
+				pg_idx = 0;
+		}
+
+		/* Move source pointers */
+		src_offset += bytes;
+		if (src_offset >= etr_buf->size)
+			src_offset -= etr_buf->size;
+	}
+}
+
+/*
+ * XXX: What is the expected behavior here in the following cases ?
+ *  1) Full trace mode, without double buffering : What should be the size
+ *     reported back when the buffer is full and has wrapped around. Ideally,
+ *     we should report for the lost trace to make sure the "head" in the ring
+ *     buffer comes back to the position as in the trace buffer, rather than
+ *     returning "total size" of the buffer.
+ * 2) In snapshot mode, should we always return "full buffer size" ?
+ */
+static unsigned long
+tmc_etr_update_perf_buffer(struct coresight_device *csdev,
+			   struct perf_output_handle *handle,
+			   void *config)
+{
+	bool double_buffer, lost = false;
+	unsigned long flags, offset, size = 0;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_perf_buffer *etr_perf = config;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	double_buffer = (etr_perf->flags == 0);
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+	if (WARN_ON(drvdata->perf_data != etr_perf)) {
+		lost = true;
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		goto out;
+	}
+
+	CS_UNLOCK(drvdata->base);
+
+	tmc_flush_and_stop(drvdata);
+
+	tmc_sync_etr_buf(drvdata);
+	CS_UNLOCK(drvdata->base);
+	/* Reset perf specific data */
+	drvdata->perf_data = NULL;
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+
+	offset = etr_buf->offset + etr_buf->len;
+	if (offset > etr_buf->size)
+		offset -= etr_buf->size;
+
+	if (double_buffer) {
+		/*
+		 * If we use software double buffering, update the ring buffer.
+		 * And the size is what we have in the hardware buffer.
+		 */
+		size = etr_buf->len;
+		tmc_etr_sync_perf_buffer(etr_perf);
+	} else {
+		/*
+		 * If the hardware uses perf ring buffer the size of the data
+		 * we have is from the old-head to the current head of the
+		 * buffer. This also means in non-snapshot mode, we have lost
+		 * one-full-buffer-size worth data, if the buffer wraps around.
+		 */
+		unsigned long old_head;
+
+		old_head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+		size = (offset - old_head + etr_buf->size) % etr_buf->size;
+	}
+
+	/*
+	 * Update handle->head in snapshot mode. Also update the size to the
+	 * hardware buffer size if there was an overflow.
+	 */
+	if (etr_perf->snapshot) {
+		if (double_buffer)
+			handle->head += size;
+		else
+			handle->head = offset;
+		if (etr_buf->full)
+			size = etr_buf->size;
+	}
+
+	lost |= etr_buf->full;
+out:
+	if (lost)
+		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	return size;
+}
+
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	/* We don't support perf mode yet ! */
-	return -EINVAL;
+	int rc = 0;
+	unsigned long flags;
+	struct etr_perf_buffer *etr_perf;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+	/*
+	 * There can be only one writer per sink in perf mode. If the sink
+	 * is already open in SYSFS mode, we can't use it.
+	 */
+	if (drvdata->mode != CS_MODE_DISABLED) {
+		rc = -EBUSY;
+		goto unlock_out;
+	}
+
+	etr_perf = drvdata->perf_data;
+	if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
+		rc = -EINVAL;
+		goto unlock_out;
+	}
+
+	drvdata->mode = CS_MODE_PERF;
+	tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
+
+unlock_out:
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	return rc;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
@@ -1459,6 +1836,10 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
 	.enable		= tmc_enable_etr_sink,
 	.disable	= tmc_disable_etr_sink,
+	.alloc_buffer	= tmc_etr_alloc_perf_buffer,
+	.update_buffer	= tmc_etr_update_perf_buffer,
+	.set_buffer	= tmc_etr_set_perf_buffer,
+	.free_buffer	= tmc_etr_free_perf_buffer,
 };
 
 const struct coresight_ops tmc_etr_cs_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 185dc12..aa42f5d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -197,6 +197,7 @@ struct etr_buf {
  * @trigger_cntr: amount of words to store after a trigger.
  * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
  *		device configuration register (DEVID)
+ * @perf_data:	PERF buffer for ETR.
  * @sysfs_data:	SYSFS buffer for ETR.
  */
 struct tmc_drvdata {
@@ -218,6 +219,7 @@ struct tmc_drvdata {
 	u32			trigger_cntr;
 	u32			etr_caps;
 	struct etr_buf		*sysfs_buf;
+	void			*perf_data;
 };
 
 struct etr_buf_operations {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 27/27] coresight: etm-perf: Add support for ETR backend
@ 2018-05-01  9:10   ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-01  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

Add necessary support for using ETR as a sink in ETM perf tracing.
We try make the best use of the available modes of buffers to
try and avoid software double buffering.

We can use the perf ring buffer for ETR directly if all of the
conditions below are met :
 1) ETR is DMA coherent
 2) perf is used in snapshot mode. In full tracing mode, we cannot
    guarantee that the ETR will stop before it overwrites the data
    at the beginning of the trace buffer leading to loss of trace
    data. (The buffer which is being consumed by the perf is still
    hidden from the ETR).
 3) ETR supports save-restore with a scatter-gather mechanism
    which can use a given set of pages we use the perf ring buffer
    directly. If we have an in-built TMC ETR Scatter Gather unit,
    we make use of a circular SG list to restart from a given head.
    However, we need to align the starting offset to 4K in this case.
    With CATU and ETR Save restore feature, we don't have to necessarily
    align the head of the buffer.

If the ETR doesn't support either of this, we fallback to software
double buffering.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---

Note: The conditions above need some rethink.

For (1) : We always sync the buffer for CPU, before we update the
pointers. So, we should be safe here and should be able to remove
this condition.

(2) is a bit more of problem, as the ETR (without SFIFO_2 mode)
doesn't stop writing out the trace buffer, eventhough we exclude
the part of the ring buffer currently consumed by perf, leading
to loss of data. Also, since we don't have an interrupt (without
SFIFO_2), we can't wake up the userspace reliably to consume
the data.

One possible option is to use an hrtimer to wake up the userspace
early enough, using a low wakeup mark. But that doesn't necessarily
guarantee that the ETR will not wrap around overwriting the data,
as we can't modify the ETR pointers, unless we disable it, which
could again potentially cause data loss in Circular Buffer mode.
We may still be able to detect if there was a data loss by checking
how far the userspace has consumed the data.
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 387 +++++++++++++++++++++++-
 drivers/hwtracing/coresight/coresight-tmc.h     |   2 +
 2 files changed, 386 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 8159e84..3e9ba02 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -31,6 +31,32 @@ struct etr_flat_buf {
 };
 
 /*
+ * etr_perf_buffer - Perf buffer used for ETR
+ * @etr_buf		- Actual buffer used by the ETR
+ * @snaphost		- Perf session mode
+ * @head		- handle->head at the beginning of the session.
+ * @nr_pages		- Number of pages in the ring buffer.
+ * @pages		- Pages in the ring buffer.
+ * @flags		- Capabilities of the hardware buffer used in the
+ *			  session. If flags == 0, we use software double
+ *			  buffering.
+ */
+struct etr_perf_buffer {
+	struct etr_buf		*etr_buf;
+	bool			snapshot;
+	unsigned long		head;
+	int			nr_pages;
+	void			**pages;
+	u32			flags;
+};
+
+/* Convert the perf index to an offset within the ETR buffer */
+#define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
+
+/* Lower limit for ETR hardware buffer in double buffering mode */
+#define TMC_ETR_PERF_MIN_BUF_SIZE	SZ_1M
+
+/*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
  * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
@@ -1164,7 +1190,7 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
 }
 
-static int __maybe_unused
+static int
 tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
 		    unsigned long r_offset, unsigned long w_offset,
 		    unsigned long size, u32 status)
@@ -1415,10 +1441,361 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	return ret;
 }
 
+/*
+ * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. We try to
+ * use perf ring buffer pages for the ETR when we can. In the worst case
+ * we fallback to software double buffering. The size of the hardware buffer
+ * in this case is dependent on the size configured via sysfs, if we can't
+ * match the perf ring buffer size. We scale down the size by half until
+ * it reaches a limit of 1M, beyond which we give up.
+ */
+static struct etr_perf_buffer *
+tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
+		       void **pages, bool snapshot)
+{
+	int i;
+	struct etr_buf *etr_buf;
+	struct etr_perf_buffer *etr_perf;
+	unsigned long size;
+	unsigned long buf_flags[] = {
+					ETR_BUF_F_RESTORE_FULL,
+					ETR_BUF_F_RESTORE_MINIMAL,
+					0,
+				    };
+
+	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
+	if (!etr_perf)
+		return ERR_PTR(-ENOMEM);
+
+	size = nr_pages << PAGE_SHIFT;
+	/*
+	 * TODO: We need to refine the following rule.
+	 *
+	 * We can use the perf ring buffer for ETR only if it is coherent
+	 * and used in snapshot mode.
+	 *
+	 * The ETR (without SFIFO_2 mode) cannot stop writing when a
+	 * certain limit is reached, nor can it interrupt driver.
+	 * We can protect the data which is being consumed by the
+	 * userspace, by hiding it from the ETR's tables. So, we could
+	 * potentially loose the trace data only for the current session
+	 * session if the ETR wraps around.
+	 */
+	if (tmc_etr_has_cap(drvdata, TMC_ETR_COHERENT) && snapshot) {
+		for (i = 0; buf_flags[i]; i++) {
+			etr_buf = tmc_alloc_etr_buf(drvdata, size,
+						 buf_flags[i], node, pages);
+			if (!IS_ERR(etr_buf)) {
+				etr_perf->flags = buf_flags[i];
+				goto done;
+			}
+		}
+	}
+
+	/*
+	 * We have to now fallback to software double buffering.
+	 * The tricky decision is choosing a size for the hardware buffer.
+	 * We could start with drvdata->size (configurable via sysfs) and
+	 * scale it down until we can allocate the data.
+	 */
+	etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
+	if (!IS_ERR(etr_buf))
+		goto done;
+	size = drvdata->size;
+	do {
+		etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
+		if (!IS_ERR(etr_buf))
+			goto done;
+		size /= 2;
+	} while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
+
+	kfree(etr_perf);
+	return ERR_PTR(-ENOMEM);
+
+done:
+	etr_perf->etr_buf = etr_buf;
+	return etr_perf;
+}
+
+
+static void *tmc_etr_alloc_perf_buffer(struct coresight_device *csdev,
+					int cpu, void **pages, int nr_pages,
+					bool snapshot)
+{
+	struct etr_perf_buffer *etr_perf;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	if (cpu == -1)
+		cpu = smp_processor_id();
+
+	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
+					     nr_pages, pages, snapshot);
+	if (IS_ERR(etr_perf)) {
+		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
+		return NULL;
+	}
+
+	etr_perf->snapshot = snapshot;
+	etr_perf->nr_pages = nr_pages;
+	etr_perf->pages = pages;
+
+	return etr_perf;
+}
+
+static void tmc_etr_free_perf_buffer(void *config)
+{
+	struct etr_perf_buffer *etr_perf = config;
+
+	if (etr_perf->etr_buf)
+		tmc_free_etr_buf(etr_perf->etr_buf);
+	kfree(etr_perf);
+}
+
+/*
+ * Pad the etr buffer with barrier packets to align the head to 4K aligned
+ * offset. This is required for ETR SG backed buffers, so that we can rotate
+ * the buffer easily and avoid a software double buffering.
+ */
+static long tmc_etr_pad_perf_buffer(struct etr_perf_buffer *etr_perf, long head)
+{
+	long new_head;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	head = PERF_IDX2OFF(head, etr_perf);
+	new_head = ALIGN(head, SZ_4K);
+	if (head == new_head)
+		return head;
+	/*
+	 * If the padding is not aligned to barrier packet size
+	 * we can't do much.
+	 */
+	if ((new_head - head) % CORESIGHT_BARRIER_PKT_SIZE)
+		return -EINVAL;
+	return tmc_etr_buf_insert_barrier_packets(etr_buf, head,
+						  new_head - head);
+}
+
+static int tmc_etr_set_perf_buffer(struct coresight_device *csdev,
+				   struct perf_output_handle *handle,
+				   void *config)
+{
+	int rc;
+	unsigned long flags;
+	long head, new_head;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_perf_buffer *etr_perf = config;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	etr_perf->head = handle->head;
+	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+	switch (etr_perf->flags) {
+	case ETR_BUF_F_RESTORE_MINIMAL:
+		new_head = tmc_etr_pad_perf_buffer(etr_perf, head);
+		if (new_head < 0)
+			return new_head;
+		if (head != new_head) {
+			rc = perf_aux_output_skip(handle, new_head - head);
+			if (rc)
+				return rc;
+			etr_perf->head = handle->head;
+			head = new_head;
+		}
+		/* Fall through */
+	case ETR_BUF_F_RESTORE_FULL:
+		rc = tmc_restore_etr_buf(drvdata, etr_buf,
+					 head, head, handle->size, 0);
+		break;
+	case 0:
+		/* Nothing to do here. */
+		rc = 0;
+		break;
+	default:
+		dev_warn(drvdata->dev, "Unexpected flags in etr_perf buffer\n");
+		WARN_ON(1);
+		rc = -EINVAL;
+	}
+
+	/*
+	 * This sink is going to be used in perf mode. No other session can
+	 * grab it from us. So set the perf mode specific data here. This will
+	 * be released just before we disable the sink from update_buffer call
+	 * back.
+	 */
+	if (!rc) {
+		spin_lock_irqsave(&drvdata->spinlock, flags);
+		if (WARN_ON(drvdata->perf_data))
+			rc = -EBUSY;
+		else
+			drvdata->perf_data = etr_perf;
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	}
+	return rc;
+}
+
+/*
+ * tmc_etr_sync_perf_buffer: Copy the actual trace data from the hardware
+ * buffer to the perf ring buffer.
+ */
+static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf)
+{
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+	long bytes, to_copy;
+	unsigned long head = etr_perf->head;
+	unsigned long pg_idx, pg_offset, src_offset;
+	char **dst_pages, *src_buf;
+
+	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+	pg_idx = head >> PAGE_SHIFT;
+	pg_offset = head & (PAGE_SIZE - 1);
+	dst_pages = (char **)etr_perf->pages;
+	src_offset = etr_buf->offset;
+	to_copy = etr_buf->len;
+
+	while (to_copy > 0) {
+		/*
+		 * We can copy minimum of :
+		 *  1) what is available in the source buffer,
+		 *  2) what is available in the source buffer, before it
+		 *     wraps around.
+		 *  3) what is available in the destination page.
+		 * in one iteration.
+		 */
+		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
+					     &src_buf);
+		if (WARN_ON_ONCE(bytes <= 0))
+			break;
+		if (PAGE_SIZE - pg_offset <  bytes)
+			bytes = PAGE_SIZE - pg_offset;
+
+		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
+		to_copy -= bytes;
+		/* Move destination pointers */
+		pg_offset += bytes;
+		if (pg_offset == PAGE_SIZE) {
+			pg_offset = 0;
+			if (++pg_idx == etr_perf->nr_pages)
+				pg_idx = 0;
+		}
+
+		/* Move source pointers */
+		src_offset += bytes;
+		if (src_offset >= etr_buf->size)
+			src_offset -= etr_buf->size;
+	}
+}
+
+/*
+ * XXX: What is the expected behavior here in the following cases ?
+ *  1) Full trace mode, without double buffering : What should be the size
+ *     reported back when the buffer is full and has wrapped around. Ideally,
+ *     we should report for the lost trace to make sure the "head" in the ring
+ *     buffer comes back to the position as in the trace buffer, rather than
+ *     returning "total size" of the buffer.
+ * 2) In snapshot mode, should we always return "full buffer size" ?
+ */
+static unsigned long
+tmc_etr_update_perf_buffer(struct coresight_device *csdev,
+			   struct perf_output_handle *handle,
+			   void *config)
+{
+	bool double_buffer, lost = false;
+	unsigned long flags, offset, size = 0;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_perf_buffer *etr_perf = config;
+	struct etr_buf *etr_buf = etr_perf->etr_buf;
+
+	double_buffer = (etr_perf->flags == 0);
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+	if (WARN_ON(drvdata->perf_data != etr_perf)) {
+		lost = true;
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		goto out;
+	}
+
+	CS_UNLOCK(drvdata->base);
+
+	tmc_flush_and_stop(drvdata);
+
+	tmc_sync_etr_buf(drvdata);
+	CS_UNLOCK(drvdata->base);
+	/* Reset perf specific data */
+	drvdata->perf_data = NULL;
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+
+	offset = etr_buf->offset + etr_buf->len;
+	if (offset > etr_buf->size)
+		offset -= etr_buf->size;
+
+	if (double_buffer) {
+		/*
+		 * If we use software double buffering, update the ring buffer.
+		 * And the size is what we have in the hardware buffer.
+		 */
+		size = etr_buf->len;
+		tmc_etr_sync_perf_buffer(etr_perf);
+	} else {
+		/*
+		 * If the hardware uses perf ring buffer the size of the data
+		 * we have is from the old-head to the current head of the
+		 * buffer. This also means in non-snapshot mode, we have lost
+		 * one-full-buffer-size worth data, if the buffer wraps around.
+		 */
+		unsigned long old_head;
+
+		old_head = PERF_IDX2OFF(etr_perf->head, etr_perf);
+		size = (offset - old_head + etr_buf->size) % etr_buf->size;
+	}
+
+	/*
+	 * Update handle->head in snapshot mode. Also update the size to the
+	 * hardware buffer size if there was an overflow.
+	 */
+	if (etr_perf->snapshot) {
+		if (double_buffer)
+			handle->head += size;
+		else
+			handle->head = offset;
+		if (etr_buf->full)
+			size = etr_buf->size;
+	}
+
+	lost |= etr_buf->full;
+out:
+	if (lost)
+		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
+	return size;
+}
+
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	/* We don't support perf mode yet ! */
-	return -EINVAL;
+	int rc = 0;
+	unsigned long flags;
+	struct etr_perf_buffer *etr_perf;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+	/*
+	 * There can be only one writer per sink in perf mode. If the sink
+	 * is already open in SYSFS mode, we can't use it.
+	 */
+	if (drvdata->mode != CS_MODE_DISABLED) {
+		rc = -EBUSY;
+		goto unlock_out;
+	}
+
+	etr_perf = drvdata->perf_data;
+	if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
+		rc = -EINVAL;
+		goto unlock_out;
+	}
+
+	drvdata->mode = CS_MODE_PERF;
+	tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
+
+unlock_out:
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	return rc;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
@@ -1459,6 +1836,10 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
 	.enable		= tmc_enable_etr_sink,
 	.disable	= tmc_disable_etr_sink,
+	.alloc_buffer	= tmc_etr_alloc_perf_buffer,
+	.update_buffer	= tmc_etr_update_perf_buffer,
+	.set_buffer	= tmc_etr_set_perf_buffer,
+	.free_buffer	= tmc_etr_free_perf_buffer,
 };
 
 const struct coresight_ops tmc_etr_cs_ops = {
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 185dc12..aa42f5d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -197,6 +197,7 @@ struct etr_buf {
  * @trigger_cntr: amount of words to store after a trigger.
  * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
  *		device configuration register (DEVID)
+ * @perf_data:	PERF buffer for ETR.
  * @sysfs_data:	SYSFS buffer for ETR.
  */
 struct tmc_drvdata {
@@ -218,6 +219,7 @@ struct tmc_drvdata {
 	u32			trigger_cntr;
 	u32			etr_caps;
 	struct etr_buf		*sysfs_buf;
+	void			*perf_data;
 };
 
 struct etr_buf_operations {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-01 13:10     ` Rob Herring
  -1 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-01 13:10 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	robert.walker, mark.rutland, will.deacon, robin.murphy,
	sudeep.holla, frowand.list, john.horley, devicetree,
	Mathieu Poirier

On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
> Document CATU device-tree bindings. CATU augments the TMC-ETR
> by providing an improved Scatter Gather mechanism for streaming
> trace data to non-contiguous system RAM pages.
> 
> Cc: devicetree@vger.kernel.org
> Cc: frowand.list@gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
> index 15ac8e8..cdd84d0 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -39,6 +39,8 @@ its hardware characteristcs.
>  
>  		- System Trace Macrocell:
>  			"arm,coresight-stm", "arm,primecell"; [1]
> +		- Coresight Address Translation Unit (CATU)
> +			"arm, coresight-catu", "arm,primecell";

spurious space               ^

>  
>  	* reg: physical base address and length of the register
>  	  set(s) of the component.
> @@ -86,6 +88,9 @@ its hardware characteristcs.
>  	* arm,buffer-size: size of contiguous buffer space for TMC ETR
>  	 (embedded trace router)
>  
> +* Optional property for CATU :
> +	* interrupts : Exactly one SPI may be listed for reporting the address
> +	  error

Somewhere you need to define the ports for the CATU.

>  
>  Example:
>  
> @@ -118,6 +123,35 @@ Example:
>  		};
>  	};
>  
> +	etr@20070000 {
> +		compatible = "arm,coresight-tmc", "arm,primecell";
> +		reg = <0 0x20070000 0 0x1000>;
> +
> +		clocks = <&oscclk6a>;
> +		clock-names = "apb_pclk";
> +		ports {
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +
> +			/* input port */
> +			port@0 {
> +				reg =  <0>;
> +				etr_in_port: endpoint {
> +					slave-mode;
> +					remote-endpoint = <&replicator2_out_port0>;
> +				};
> +			};
> +
> +			/* CATU link represented by output port */
> +			port@1 {
> +				reg = <0>;

While common in the Coresight bindings, having unit-address and reg not 
match is an error. Mathieu and I discussed this a bit as dtc now warns 
on these.

Either reg should be 1 here, or 'ports' needs to be split into input and 
output ports. My preference would be the former, but Mathieu objected to 
this not reflecting the the h/w numbering.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-01 13:10     ` Rob Herring
  0 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-01 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
> Document CATU device-tree bindings. CATU augments the TMC-ETR
> by providing an improved Scatter Gather mechanism for streaming
> trace data to non-contiguous system RAM pages.
> 
> Cc: devicetree at vger.kernel.org
> Cc: frowand.list at gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
> index 15ac8e8..cdd84d0 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -39,6 +39,8 @@ its hardware characteristcs.
>  
>  		- System Trace Macrocell:
>  			"arm,coresight-stm", "arm,primecell"; [1]
> +		- Coresight Address Translation Unit (CATU)
> +			"arm, coresight-catu", "arm,primecell";

spurious space               ^

>  
>  	* reg: physical base address and length of the register
>  	  set(s) of the component.
> @@ -86,6 +88,9 @@ its hardware characteristcs.
>  	* arm,buffer-size: size of contiguous buffer space for TMC ETR
>  	 (embedded trace router)
>  
> +* Optional property for CATU :
> +	* interrupts : Exactly one SPI may be listed for reporting the address
> +	  error

Somewhere you need to define the ports for the CATU.

>  
>  Example:
>  
> @@ -118,6 +123,35 @@ Example:
>  		};
>  	};
>  
> +	etr at 20070000 {
> +		compatible = "arm,coresight-tmc", "arm,primecell";
> +		reg = <0 0x20070000 0 0x1000>;
> +
> +		clocks = <&oscclk6a>;
> +		clock-names = "apb_pclk";
> +		ports {
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +
> +			/* input port */
> +			port at 0 {
> +				reg =  <0>;
> +				etr_in_port: endpoint {
> +					slave-mode;
> +					remote-endpoint = <&replicator2_out_port0>;
> +				};
> +			};
> +
> +			/* CATU link represented by output port */
> +			port at 1 {
> +				reg = <0>;

While common in the Coresight bindings, having unit-address and reg not 
match is an error. Mathieu and I discussed this a bit as dtc now warns 
on these.

Either reg should be 1 here, or 'ports' needs to be split into input and 
output ports. My preference would be the former, but Mathieu objected to 
this not reflecting the the h/w numbering.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-01 13:13     ` Rob Herring
  -1 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-01 13:13 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	robert.walker, mark.rutland, will.deacon, robin.murphy,
	sudeep.holla, frowand.list, john.horley, Mathieu Poirier,
	devicetree

On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
> We are about to add the support for ETR builtin scatter-gather mode
> for dealing with large amount of trace buffers. However, on some of
> the platforms, using the ETR SG mode can lock up the system due to
> the way the ETR is connected to the memory subsystem.
> 
> In SG mode, the ETR performs READ from the scatter-gather table to
> fetch the next page and regular WRITE of trace data. If the READ
> operation doesn't complete(due to the memory subsystem issues,
> which we have seen on a couple of platforms) the trace WRITE
> cannot proceed leading to issues. So, we by default do not
> use the SG mode, unless it is known to be safe on the platform.
> We define a DT property for the TMC node to specify whether we
> have a proper SG mode.
> 
> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: John Horley <john.horley@arm.com>
> Cc: Robert Walker <robert.walker@arm.com>
> Cc: devicetree@vger.kernel.org
> Cc: frowand.list@gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
> index cdd84d0..7c0c8f0 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -88,6 +88,9 @@ its hardware characteristcs.
>  	* arm,buffer-size: size of contiguous buffer space for TMC ETR
>  	 (embedded trace router)
>  
> +	* scatter-gather: boolean. Indicates that the TMC-ETR can safely
> +	  use the SG mode on this system.
> +

Needs a vendor prefix.

>  * Optional property for CATU :
>  	* interrupts : Exactly one SPI may be listed for reporting the address
>  	  error

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-01 13:13     ` Rob Herring
  0 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-01 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
> We are about to add the support for ETR builtin scatter-gather mode
> for dealing with large amount of trace buffers. However, on some of
> the platforms, using the ETR SG mode can lock up the system due to
> the way the ETR is connected to the memory subsystem.
> 
> In SG mode, the ETR performs READ from the scatter-gather table to
> fetch the next page and regular WRITE of trace data. If the READ
> operation doesn't complete(due to the memory subsystem issues,
> which we have seen on a couple of platforms) the trace WRITE
> cannot proceed leading to issues. So, we by default do not
> use the SG mode, unless it is known to be safe on the platform.
> We define a DT property for the TMC node to specify whether we
> have a proper SG mode.
> 
> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: John Horley <john.horley@arm.com>
> Cc: Robert Walker <robert.walker@arm.com>
> Cc: devicetree at vger.kernel.org
> Cc: frowand.list at gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
> index cdd84d0..7c0c8f0 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -88,6 +88,9 @@ its hardware characteristcs.
>  	* arm,buffer-size: size of contiguous buffer space for TMC ETR
>  	 (embedded trace router)
>  
> +	* scatter-gather: boolean. Indicates that the TMC-ETR can safely
> +	  use the SG mode on this system.
> +

Needs a vendor prefix.

>  * Optional property for CATU :
>  	* interrupts : Exactly one SPI may be listed for reporting the address
>  	  error

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-02  3:55     ` Kim Phillips
  -1 siblings, 0 replies; 134+ messages in thread
From: Kim Phillips @ 2018-05-02  3:55 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	robert.walker, mark.rutland, will.deacon, robin.murphy,
	sudeep.holla, frowand.list, robh, john.horley

On Tue, 1 May 2018 10:10:51 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight

Can we see the splats?  Doesn't lockdep turn itself off if it starts
triggering too many splats?

> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

Won't the splats still occur when the messages are enabled with
dynamic_debug?

So in effect this patch only tries to mitigate the splats, all the
while making things harder for regular users that now have to recompile
their kernels, in exchange for a very small convenience for kernel
developers that happen to see a splat or two with DEBUG_LOCKDEP set?

Not the greatest choice...How about moving the dev_infos outside of the
locks instead?

Thanks,

Kim

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-02  3:55     ` Kim Phillips
  0 siblings, 0 replies; 134+ messages in thread
From: Kim Phillips @ 2018-05-02  3:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 1 May 2018 10:10:51 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight

Can we see the splats?  Doesn't lockdep turn itself off if it starts
triggering too many splats?

> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

Won't the splats still occur when the messages are enabled with
dynamic_debug?

So in effect this patch only tries to mitigate the splats, all the
while making things harder for regular users that now have to recompile
their kernels, in exchange for a very small convenience for kernel
developers that happen to see a splat or two with DEBUG_LOCKDEP set?

Not the greatest choice...How about moving the dev_infos outside of the
locks instead?

Thanks,

Kim

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-02  3:55     ` Kim Phillips
@ 2018-05-02  8:25       ` Robert Walker
  -1 siblings, 0 replies; 134+ messages in thread
From: Robert Walker @ 2018-05-02  8:25 UTC (permalink / raw)
  To: Kim Phillips, Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 02/05/18 04:55, Kim Phillips wrote:
> On Tue, 1 May 2018 10:10:51 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
>> Convert component enable/disable messages from dev_info to dev_dbg.
>> This is required to prevent LOCKDEP splats when operating in perf
>> mode where we could be called with locks held to enable a coresight
> Can we see the splats?  Doesn't lockdep turn itself off if it starts
> triggering too many splats?
>
>> path. If someone wants to really see the messages, they can always
>> enable it at runtime via dynamic_debug.
> Won't the splats still occur when the messages are enabled with
> dynamic_debug?
>
> So in effect this patch only tries to mitigate the splats, all the
> while making things harder for regular users that now have to recompile
> their kernels, in exchange for a very small convenience for kernel
> developers that happen to see a splat or two with DEBUG_LOCKDEP set?
>
> Not the greatest choice...How about moving the dev_infos outside of the
> locks instead?
>
> Thanks,
>
> Kim
The other reason for making these dev_dbg is performance - a message is 
output each time a source / link / sink is enabled or disabled, so we 
can get 20+ messages on each process switch when tracing with perf.  
This has a significant effect on the runtime of the application being 
traced.

Regards

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-02  8:25       ` Robert Walker
  0 siblings, 0 replies; 134+ messages in thread
From: Robert Walker @ 2018-05-02  8:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/05/18 04:55, Kim Phillips wrote:
> On Tue, 1 May 2018 10:10:51 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
>> Convert component enable/disable messages from dev_info to dev_dbg.
>> This is required to prevent LOCKDEP splats when operating in perf
>> mode where we could be called with locks held to enable a coresight
> Can we see the splats?  Doesn't lockdep turn itself off if it starts
> triggering too many splats?
>
>> path. If someone wants to really see the messages, they can always
>> enable it at runtime via dynamic_debug.
> Won't the splats still occur when the messages are enabled with
> dynamic_debug?
>
> So in effect this patch only tries to mitigate the splats, all the
> while making things harder for regular users that now have to recompile
> their kernels, in exchange for a very small convenience for kernel
> developers that happen to see a splat or two with DEBUG_LOCKDEP set?
>
> Not the greatest choice...How about moving the dev_infos outside of the
> locks instead?
>
> Thanks,
>
> Kim
The other reason for making these dev_dbg is performance - a message is 
output each time a source / link / sink is enabled or disabled, so we 
can get 20+ messages on each process switch when tracing with perf.? 
This has a significant effect on the runtime of the application being 
traced.

Regards

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-02  3:55     ` Kim Phillips
@ 2018-05-02 13:52       ` Robin Murphy
  -1 siblings, 0 replies; 134+ messages in thread
From: Robin Murphy @ 2018-05-02 13:52 UTC (permalink / raw)
  To: Kim Phillips, Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	robert.walker, mark.rutland, will.deacon, sudeep.holla,
	frowand.list, robh, john.horley

On 02/05/18 04:55, Kim Phillips wrote:
> On Tue, 1 May 2018 10:10:51 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
>> Convert component enable/disable messages from dev_info to dev_dbg.
>> This is required to prevent LOCKDEP splats when operating in perf
>> mode where we could be called with locks held to enable a coresight
> 
> Can we see the splats?  Doesn't lockdep turn itself off if it starts
> triggering too many splats?

Without some very careful and robust reasoning for why the condition 
being reported by lockdep could not actually occur in practice, 
"avoiding the splats" is far, far less important than "avoiding the 
potential deadlock that they are reporting".

>> path. If someone wants to really see the messages, they can always
>> enable it at runtime via dynamic_debug.
> 
> Won't the splats still occur when the messages are enabled with
> dynamic_debug?
> 
> So in effect this patch only tries to mitigate the splats, all the
> while making things harder for regular users that now have to recompile
> their kernels, in exchange for a very small convenience for kernel
> developers that happen to see a splat or two with DEBUG_LOCKDEP set?

FWIW, if "regular users" means people running distro kernels, then 
chances are that they probably have DYNAMIC_DEBUG set already (100% of 
my local sample of 2 - Ubuntu x86_64 and Arch aarch64 - certainly do). 
Either way, though, this particular log spam really does only look 
vaguely useful to people debugging the coresight stack itself, so anyone 
going out of their way to turn it on has surely already gone beyond 
regular use (even if they're just reproducing an issue with additional 
logging at the request of kernel developers, rather than debugging it 
themselves).

Reducing the scope for possible deadlock from the general case to just 
debugging scenarios is certainly not a bad thing, but as you say I think 
we need a closer look at the underlying issue to know whether even 
dev_dbg() is wise.

Robin.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-02 13:52       ` Robin Murphy
  0 siblings, 0 replies; 134+ messages in thread
From: Robin Murphy @ 2018-05-02 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/05/18 04:55, Kim Phillips wrote:
> On Tue, 1 May 2018 10:10:51 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
>> Convert component enable/disable messages from dev_info to dev_dbg.
>> This is required to prevent LOCKDEP splats when operating in perf
>> mode where we could be called with locks held to enable a coresight
> 
> Can we see the splats?  Doesn't lockdep turn itself off if it starts
> triggering too many splats?

Without some very careful and robust reasoning for why the condition 
being reported by lockdep could not actually occur in practice, 
"avoiding the splats" is far, far less important than "avoiding the 
potential deadlock that they are reporting".

>> path. If someone wants to really see the messages, they can always
>> enable it at runtime via dynamic_debug.
> 
> Won't the splats still occur when the messages are enabled with
> dynamic_debug?
> 
> So in effect this patch only tries to mitigate the splats, all the
> while making things harder for regular users that now have to recompile
> their kernels, in exchange for a very small convenience for kernel
> developers that happen to see a splat or two with DEBUG_LOCKDEP set?

FWIW, if "regular users" means people running distro kernels, then 
chances are that they probably have DYNAMIC_DEBUG set already (100% of 
my local sample of 2 - Ubuntu x86_64 and Arch aarch64 - certainly do). 
Either way, though, this particular log spam really does only look 
vaguely useful to people debugging the coresight stack itself, so anyone 
going out of their way to turn it on has surely already gone beyond 
regular use (even if they're just reproducing an issue with additional 
logging at the request of kernel developers, rather than debugging it 
themselves).

Reducing the scope for possible deadlock from the general case to just 
debugging scenarios is certainly not a bad thing, but as you say I think 
we need a closer look at the underlying issue to know whether even 
dev_dbg() is wise.

Robin.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 03/27] coresight: Add helper device type
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-03 17:00     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:00 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:33AM +0100, Suzuki K Poulose wrote:
> Add a new coresight device type, which do not belong to any
> of the existing types, i.e, source, sink, link etc. A helper
> device could be connected to a coresight device, which could
> augment the functionality of the coresight device.
> 
> This is intended to cover Coresight Address Translation Unit (CATU)
> devices, which provide improved Scatter Gather mechanism for TMC
> ETR. The idea is that the helper device could be controlled by
> the driver of the device it is attached to (in this case ETR),
> transparent to the generic coresight driver (and paths).
> 
> The operations include enable(), disable(), both of which could
> accept a device specific "data" which the driving device and
> the helper device could share. Since they don't appear in the
> coresight "path" tracked by software, we have to ensure that
> they are powered up/down whenever the master device is turned
> on.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight.c | 46 ++++++++++++++++++++++++++++++---
>  include/linux/coresight.h               | 24 +++++++++++++++++
>  2 files changed, 67 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
> index 389c4ba..fd0251e 100644
> --- a/drivers/hwtracing/coresight/coresight.c
> +++ b/drivers/hwtracing/coresight/coresight.c
> @@ -430,6 +430,43 @@ struct coresight_device *coresight_get_enabled_sink(bool deactivate)
>  	return dev ? to_coresight_device(dev) : NULL;
>  }
>  
> +/*
> + * coresight_prepare_device - Prepare this device and any of the helper
> + * devices connected to it for trace operation. Since the helper devices
> + * don't appear on the trace path, they should be handled along with the
> + * the master device.
> + */
> +static void coresight_prepare_device(struct coresight_device *csdev)
> +{
> +	int i;
> +
> +	for (i = 0; i < csdev->nr_outport; i++) {
> +		struct coresight_device *child = csdev->conns[i].child_dev;
> +
> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
> +			pm_runtime_get_sync(child->dev.parent);
> +	}
> +
> +	pm_runtime_get_sync(csdev->dev.parent);
> +}
> +
> +/*
> + * coresight_release_device - Release this device and any of the helper
> + * devices connected to it for trace operation.
> + */
> +static void coresight_release_device(struct coresight_device *csdev)
> +{
> +	int i;
> +
> +	for (i = 0; i < csdev->nr_outport; i++) {
> +		struct coresight_device *child = csdev->conns[i].child_dev;
> +
> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
> +			pm_runtime_put(child->dev.parent);
> +	}

There is a newline here in coresight_prepare_device().  Either add one (or not)
in both function but please be consistent. 

> +	pm_runtime_put(csdev->dev.parent);
> +}
> +
>  /**
>   * _coresight_build_path - recursively build a path from a @csdev to a sink.
>   * @csdev:	The device to start from.
> @@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
>  
>  	node->csdev = csdev;
>  	list_add(&node->link, path);
> -	pm_runtime_get_sync(csdev->dev.parent);
> -
> +	coresight_prepare_device(csdev);

There was a newline between pm_runtime_get_sync() and the return statement in
the original code.

>  	return 0;
>  }
>  
> @@ -524,7 +560,7 @@ void coresight_release_path(struct list_head *path)
>  	list_for_each_entry_safe(nd, next, path, link) {
>  		csdev = nd->csdev;
>  
> -		pm_runtime_put_sync(csdev->dev.parent);
> +		coresight_release_device(csdev);
>  		list_del(&nd->link);
>  		kfree(nd);
>  	}
> @@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
>  		.name = "source",
>  		.groups = coresight_source_groups,
>  	},
> +	{
> +		.name = "helper",
> +	},
> +

Extra newline.

>  };
>  
>  static void coresight_device_release(struct device *dev)
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 556fe59..5e926f7 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -47,6 +47,7 @@ enum coresight_dev_type {
>  	CORESIGHT_DEV_TYPE_LINK,
>  	CORESIGHT_DEV_TYPE_LINKSINK,
>  	CORESIGHT_DEV_TYPE_SOURCE,
> +	CORESIGHT_DEV_TYPE_HELPER,
>  };
>  
>  enum coresight_dev_subtype_sink {
> @@ -69,6 +70,10 @@ enum coresight_dev_subtype_source {
>  	CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE,
>  };
>  
> +enum coresight_dev_subtype_helper {
> +	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
> +};
> +
>  /**
>   * union coresight_dev_subtype - further characterisation of a type
>   * @sink_subtype:	type of sink this component is, as defined
> @@ -77,6 +82,8 @@ enum coresight_dev_subtype_source {
>   *			by @coresight_dev_subtype_link.
>   * @source_subtype:	type of source this component is, as defined
>   *			by @coresight_dev_subtype_source.
> + * @helper_subtype:	type of helper this component is, as defined
> + *			by @coresight_dev_subtype_helper.
>   */
>  union coresight_dev_subtype {
>  	/* We have some devices which acts as LINK and SINK */
> @@ -85,6 +92,7 @@ union coresight_dev_subtype {
>  		enum coresight_dev_subtype_link link_subtype;
>  	};
>  	enum coresight_dev_subtype_source source_subtype;
> +	enum coresight_dev_subtype_helper helper_subtype;
>  };
>  
>  /**
> @@ -181,6 +189,7 @@ struct coresight_device {
>  #define source_ops(csdev)	csdev->ops->source_ops
>  #define sink_ops(csdev)		csdev->ops->sink_ops
>  #define link_ops(csdev)		csdev->ops->link_ops
> +#define helper_ops(csdev)	csdev->ops->helper_ops
>  
>  /**
>   * struct coresight_ops_sink - basic operations for a sink
> @@ -240,10 +249,25 @@ struct coresight_ops_source {
>  			struct perf_event *event);
>  };
>  
> +/**
> + * struct coresight_ops_helper - Operations for a helper device.
> + *
> + * All operations could pass in a device specific data, which could
> + * help the helper device to determine what to do.
> + *
> + * @enable	: Turn the device ON.
> + * @disable	: Turn the device OFF.

There is a discrepancy between the comment and the operations, i.e enabling a
device is not synonymous of turning it on.  Looking at patch 04/27 the ops is
called in tmc_etr_enable/disable_catu() so the comment propably needs to be
changed.

> + */
> +struct coresight_ops_helper {
> +	int (*enable)(struct coresight_device *csdev, void *data);
> +	int (*disable)(struct coresight_device *csdev, void *data);
> +};
> +
>  struct coresight_ops {
>  	const struct coresight_ops_sink *sink_ops;
>  	const struct coresight_ops_link *link_ops;
>  	const struct coresight_ops_source *source_ops;
> +	const struct coresight_ops_helper *helper_ops;
>  };
>  
>  #ifdef CONFIG_CORESIGHT
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 03/27] coresight: Add helper device type
@ 2018-05-03 17:00     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:33AM +0100, Suzuki K Poulose wrote:
> Add a new coresight device type, which do not belong to any
> of the existing types, i.e, source, sink, link etc. A helper
> device could be connected to a coresight device, which could
> augment the functionality of the coresight device.
> 
> This is intended to cover Coresight Address Translation Unit (CATU)
> devices, which provide improved Scatter Gather mechanism for TMC
> ETR. The idea is that the helper device could be controlled by
> the driver of the device it is attached to (in this case ETR),
> transparent to the generic coresight driver (and paths).
> 
> The operations include enable(), disable(), both of which could
> accept a device specific "data" which the driving device and
> the helper device could share. Since they don't appear in the
> coresight "path" tracked by software, we have to ensure that
> they are powered up/down whenever the master device is turned
> on.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight.c | 46 ++++++++++++++++++++++++++++++---
>  include/linux/coresight.h               | 24 +++++++++++++++++
>  2 files changed, 67 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
> index 389c4ba..fd0251e 100644
> --- a/drivers/hwtracing/coresight/coresight.c
> +++ b/drivers/hwtracing/coresight/coresight.c
> @@ -430,6 +430,43 @@ struct coresight_device *coresight_get_enabled_sink(bool deactivate)
>  	return dev ? to_coresight_device(dev) : NULL;
>  }
>  
> +/*
> + * coresight_prepare_device - Prepare this device and any of the helper
> + * devices connected to it for trace operation. Since the helper devices
> + * don't appear on the trace path, they should be handled along with the
> + * the master device.
> + */
> +static void coresight_prepare_device(struct coresight_device *csdev)
> +{
> +	int i;
> +
> +	for (i = 0; i < csdev->nr_outport; i++) {
> +		struct coresight_device *child = csdev->conns[i].child_dev;
> +
> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
> +			pm_runtime_get_sync(child->dev.parent);
> +	}
> +
> +	pm_runtime_get_sync(csdev->dev.parent);
> +}
> +
> +/*
> + * coresight_release_device - Release this device and any of the helper
> + * devices connected to it for trace operation.
> + */
> +static void coresight_release_device(struct coresight_device *csdev)
> +{
> +	int i;
> +
> +	for (i = 0; i < csdev->nr_outport; i++) {
> +		struct coresight_device *child = csdev->conns[i].child_dev;
> +
> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
> +			pm_runtime_put(child->dev.parent);
> +	}

There is a newline here in coresight_prepare_device().  Either add one (or not)
in both function but please be consistent. 

> +	pm_runtime_put(csdev->dev.parent);
> +}
> +
>  /**
>   * _coresight_build_path - recursively build a path from a @csdev to a sink.
>   * @csdev:	The device to start from.
> @@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
>  
>  	node->csdev = csdev;
>  	list_add(&node->link, path);
> -	pm_runtime_get_sync(csdev->dev.parent);
> -
> +	coresight_prepare_device(csdev);

There was a newline between pm_runtime_get_sync() and the return statement in
the original code.

>  	return 0;
>  }
>  
> @@ -524,7 +560,7 @@ void coresight_release_path(struct list_head *path)
>  	list_for_each_entry_safe(nd, next, path, link) {
>  		csdev = nd->csdev;
>  
> -		pm_runtime_put_sync(csdev->dev.parent);
> +		coresight_release_device(csdev);
>  		list_del(&nd->link);
>  		kfree(nd);
>  	}
> @@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
>  		.name = "source",
>  		.groups = coresight_source_groups,
>  	},
> +	{
> +		.name = "helper",
> +	},
> +

Extra newline.

>  };
>  
>  static void coresight_device_release(struct device *dev)
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 556fe59..5e926f7 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -47,6 +47,7 @@ enum coresight_dev_type {
>  	CORESIGHT_DEV_TYPE_LINK,
>  	CORESIGHT_DEV_TYPE_LINKSINK,
>  	CORESIGHT_DEV_TYPE_SOURCE,
> +	CORESIGHT_DEV_TYPE_HELPER,
>  };
>  
>  enum coresight_dev_subtype_sink {
> @@ -69,6 +70,10 @@ enum coresight_dev_subtype_source {
>  	CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE,
>  };
>  
> +enum coresight_dev_subtype_helper {
> +	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
> +};
> +
>  /**
>   * union coresight_dev_subtype - further characterisation of a type
>   * @sink_subtype:	type of sink this component is, as defined
> @@ -77,6 +82,8 @@ enum coresight_dev_subtype_source {
>   *			by @coresight_dev_subtype_link.
>   * @source_subtype:	type of source this component is, as defined
>   *			by @coresight_dev_subtype_source.
> + * @helper_subtype:	type of helper this component is, as defined
> + *			by @coresight_dev_subtype_helper.
>   */
>  union coresight_dev_subtype {
>  	/* We have some devices which acts as LINK and SINK */
> @@ -85,6 +92,7 @@ union coresight_dev_subtype {
>  		enum coresight_dev_subtype_link link_subtype;
>  	};
>  	enum coresight_dev_subtype_source source_subtype;
> +	enum coresight_dev_subtype_helper helper_subtype;
>  };
>  
>  /**
> @@ -181,6 +189,7 @@ struct coresight_device {
>  #define source_ops(csdev)	csdev->ops->source_ops
>  #define sink_ops(csdev)		csdev->ops->sink_ops
>  #define link_ops(csdev)		csdev->ops->link_ops
> +#define helper_ops(csdev)	csdev->ops->helper_ops
>  
>  /**
>   * struct coresight_ops_sink - basic operations for a sink
> @@ -240,10 +249,25 @@ struct coresight_ops_source {
>  			struct perf_event *event);
>  };
>  
> +/**
> + * struct coresight_ops_helper - Operations for a helper device.
> + *
> + * All operations could pass in a device specific data, which could
> + * help the helper device to determine what to do.
> + *
> + * @enable	: Turn the device ON.
> + * @disable	: Turn the device OFF.

There is a discrepancy between the comment and the operations, i.e enabling a
device is not synonymous of turning it on.  Looking at patch 04/27 the ops is
called in tmc_etr_enable/disable_catu() so the comment propably needs to be
changed.

> + */
> +struct coresight_ops_helper {
> +	int (*enable)(struct coresight_device *csdev, void *data);
> +	int (*disable)(struct coresight_device *csdev, void *data);
> +};
> +
>  struct coresight_ops {
>  	const struct coresight_ops_sink *sink_ops;
>  	const struct coresight_ops_link *link_ops;
>  	const struct coresight_ops_source *source_ops;
> +	const struct coresight_ops_helper *helper_ops;
>  };
>  
>  #ifdef CONFIG_CORESIGHT
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-03 17:31     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:31 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
> Add the initial support for Coresight Address Translation Unit, which
> augments the TMC in Coresight SoC-600 by providing an improved Scatter
> Gather mechanism. CATU is always connected to a single TMC-ETR and
> converts the AXI address with a translated address (from a given SG
> table with specific format). The CATU should be programmed in pass
> through mode and enabled if the ETR doesn't translation by CATU.
> 
> This patch provides mechanism to enable/disable the CATU always in the
> pass through mode.
> 
> We reuse the existing ports mechanism to link the TMC-ETR to the
> connected CATU.
> 
> i.e, TMC-ETR:output_port0 -> CATU:input_port0
> 
> Reference manual for  CATU component is avilable in version r2p0 of :
> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
> under Section 4.9.

Please remove the part about the TRM as it is bound to change.  

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/Kconfig             |  10 ++
>  drivers/hwtracing/coresight/Makefile            |   1 +
>  drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
>  include/linux/coresight.h                       |   1 +
>  7 files changed, 349 insertions(+)
>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.h
> 
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index ef9cb3c..21f638f 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
>  	  complies with the generic implementation of the component without
>  	  special enhancement or added features.
>  
> +config CORESIGHT_CATU
> +	bool "Coresight Address Translation Unit (CATU) driver"
> +	depends on CORESIGHT_LINK_AND_SINK_TMC
> +	help
> +	   Enable support for the Coresight Address Translation Unit (CATU).
> +	   CATU supports a scatter gather table of 4K pages, with forward/backward
> +	   lookup. CATU helps TMC ETR to use large physically non-contiguous trace
> +	   buffer by translating the addersses used by ETR to the corresponding
> +	   physical adderss by looking up the table.

There is a couple of typos in the last sentence.

> +
>  config CORESIGHT_SINK_TPIU
>  	bool "Coresight generic TPIU driver"
>  	depends on CORESIGHT_LINKS_AND_SINKS
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index 61db9dd..41870de 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>  obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
> +obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> new file mode 100644
> index 0000000..2cd69a6
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +

Extra line

> +/*
> + * Copyright (C) 2017 ARM Limited. All rights reserved.

You sure you don't want to bump this to 2018?

> + *
> + * Coresight Address Translation Unit support
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/device.h>
> +#include <linux/amba/bus.h>
> +#include <linux/io.h>
> +#include <linux/slab.h>

List in alphabetical order is possible.

> +
> +#include "coresight-catu.h"
> +#include "coresight-priv.h"
> +
> +#define csdev_to_catu_drvdata(csdev)	\
> +	dev_get_drvdata(csdev->dev.parent)
> +
> +coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
> +coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
> +coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> +coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
> +coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
> +coresight_simple_reg64(struct catu_drvdata, sladdr,
> +		       CATU_SLADDRLO, CATU_SLADDRHI);
> +coresight_simple_reg64(struct catu_drvdata, inaddr,
> +		       CATU_INADDRLO, CATU_INADDRHI);
> +
> +static struct attribute *catu_mgmt_attrs[] = {
> +	&dev_attr_control.attr,
> +	&dev_attr_status.attr,
> +	&dev_attr_mode.attr,
> +	&dev_attr_axictrl.attr,
> +	&dev_attr_irqen.attr,
> +	&dev_attr_sladdr.attr,
> +	&dev_attr_inaddr.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group catu_mgmt_group = {
> +	.attrs = catu_mgmt_attrs,
> +	.name = "mgmt",
> +};
> +
> +static const struct attribute_group *catu_groups[] = {
> +	&catu_mgmt_group,
> +	NULL,
> +};
> +
> +
> +static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
> +{
> +	return coresight_timeout(drvdata->base,
> +				 CATU_STATUS, CATU_STATUS_READY, 1);
> +}
> +
> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
> +{
> +	u32 control;
> +
> +	if (catu_wait_for_ready(drvdata))
> +		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
> +
> +	control = catu_read_control(drvdata);
> +	if (control & BIT(CATU_CONTROL_ENABLE)) {
> +		dev_warn(drvdata->dev, "CATU is already enabled\n");
> +		return -EBUSY;
> +	}
> +
> +	control |= BIT(CATU_CONTROL_ENABLE);
> +	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
> +	catu_write_control(drvdata, control);
> +	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
> +	return 0;
> +}
> +
> +static int catu_enable(struct coresight_device *csdev, void *data)
> +{
> +	int rc;
> +	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> +
> +	CS_UNLOCK(catu_drvdata->base);
> +	rc = catu_enable_hw(catu_drvdata, data);
> +	CS_LOCK(catu_drvdata->base);
> +	return rc;
> +}
> +
> +static int catu_disable_hw(struct catu_drvdata *drvdata)
> +{
> +	int rc = 0;
> +
> +	if (catu_wait_for_ready(drvdata)) {
> +		dev_info(drvdata->dev, "Timeout while waiting for READY\n");
> +		rc = -EAGAIN;
> +	}
> +
> +	catu_write_control(drvdata, 0);
> +	dev_dbg(drvdata->dev, "Disabled\n");
> +	return rc;
> +}
> +
> +static int catu_disable(struct coresight_device *csdev, void *__unused)
> +{
> +	int rc;
> +	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> +
> +	CS_UNLOCK(catu_drvdata->base);
> +	rc = catu_disable_hw(catu_drvdata);
> +	CS_LOCK(catu_drvdata->base);
> +

I suppose you can remove the extra line as catu_enable() doesn't have one.

> +	return rc;
> +}
> +
> +const struct coresight_ops_helper catu_helper_ops = {
> +	.enable = catu_enable,
> +	.disable = catu_disable,
> +};
> +
> +const struct coresight_ops catu_ops = {
> +	.helper_ops = &catu_helper_ops,
> +};
> +
> +static int catu_probe(struct amba_device *adev, const struct amba_id *id)
> +{
> +	int ret = 0;
> +	struct catu_drvdata *drvdata;
> +	struct coresight_desc catu_desc;
> +	struct coresight_platform_data *pdata = NULL;
> +	struct device *dev = &adev->dev;
> +	struct device_node *np = dev->of_node;
> +	void __iomem *base;
> +
> +	if (np) {
> +		pdata = of_get_coresight_platform_data(dev, np);
> +		if (IS_ERR(pdata)) {
> +			ret = PTR_ERR(pdata);
> +			goto out;
> +		}
> +		dev->platform_data = pdata;
> +	}
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (!drvdata) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	drvdata->dev = dev;
> +	dev_set_drvdata(dev, drvdata);
> +	base = devm_ioremap_resource(dev, &adev->res);
> +	if (IS_ERR(base)) {
> +		ret = PTR_ERR(base);
> +		goto out;
> +	}
> +
> +	drvdata->base = base;
> +	catu_desc.pdata = pdata;
> +	catu_desc.dev = dev;
> +	catu_desc.groups = catu_groups;
> +	catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
> +	catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
> +	catu_desc.ops = &catu_ops;
> +	drvdata->csdev = coresight_register(&catu_desc);
> +	if (IS_ERR(drvdata->csdev))
> +		ret = PTR_ERR(drvdata->csdev);
> +	if (!ret)
> +		dev_info(drvdata->dev, "initialized\n");

Please remove as it 1) doesn't convey HW related information and 2) the TMC
doesn't out put anything. 

> +out:
> +	pm_runtime_put(&adev->dev);
> +	return ret;
> +}
> +
> +static struct amba_id catu_ids[] = {
> +	{
> +		.id	= 0x000bb9ee,
> +		.mask	= 0x000fffff,
> +	},
> +	{},
> +};
> +
> +static struct amba_driver catu_driver = {
> +	.drv = {
> +		.name			= "coresight-catu",
> +		.owner			= THIS_MODULE,
> +		.suppress_bind_attrs	= true,
> +	},
> +	.probe				= catu_probe,
> +	.id_table			= catu_ids,
> +};
> +
> +builtin_amba_driver(catu_driver);
> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
> new file mode 100644
> index 0000000..cd58d6f
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-catu.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +

Extra line

> +/*
> + * Copyright (C) 2017 ARM Limited. All rights reserved.
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + *

Extra line. In coresight-catu.c there isn't one.

> + */
> +
> +#ifndef _CORESIGHT_CATU_H
> +#define _CORESIGHT_CATU_H
> +
> +#include "coresight-priv.h"
> +
> +/* Register offset from base */
> +#define CATU_CONTROL		0x000
> +#define CATU_MODE		0x004
> +#define CATU_AXICTRL		0x008
> +#define CATU_IRQEN		0x00c
> +#define CATU_SLADDRLO		0x020
> +#define CATU_SLADDRHI		0x024
> +#define CATU_INADDRLO		0x028
> +#define CATU_INADDRHI		0x02c
> +#define CATU_STATUS		0x100
> +#define CATU_DEVARCH		0xfbc
> +
> +#define CATU_CONTROL_ENABLE	0
> +
> +#define CATU_MODE_PASS_THROUGH	0U
> +#define CATU_MODE_TRANSLATE	1U
> +
> +#define CATU_STATUS_READY	8
> +#define CATU_STATUS_ADRERR	0
> +#define CATU_STATUS_AXIERR	4
> +
> +

Extra line.

> +#define CATU_IRQEN_ON		0x1
> +#define CATU_IRQEN_OFF		0x0
> +
> +

Extra line.

> +struct catu_drvdata {
> +	struct device *dev;
> +	void __iomem *base;
> +	struct coresight_device *csdev;
> +	int irq;
> +};
> +
> +#define CATU_REG32(name, offset)					\
> +static inline u32							\
> +catu_read_##name(struct catu_drvdata *drvdata)				\
> +{									\
> +	return coresight_read_reg_pair(drvdata->base, offset, -1);	\
> +}									\
> +static inline void							\
> +catu_write_##name(struct catu_drvdata *drvdata, u32 val)		\
> +{									\
> +	coresight_write_reg_pair(drvdata->base, val, offset, -1);	\
> +}
> +
> +#define CATU_REG_PAIR(name, lo_off, hi_off)				\
> +static inline u64							\
> +catu_read_##name(struct catu_drvdata *drvdata)				\
> +{									\
> +	return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);	\
> +}									\
> +static inline void							\
> +catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
> +{									\
> +	coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);	\
> +}
> +
> +CATU_REG32(control, CATU_CONTROL);
> +CATU_REG32(mode, CATU_MODE);
> +CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
> +CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
> +
> +static inline bool coresight_is_catu_device(struct coresight_device *csdev)
> +{
> +	enum coresight_dev_subtype_helper subtype;
> +
> +	/* Make the checkpatch happy */
> +	subtype = csdev->subtype.helper_subtype;
> +
> +	return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
> +	       csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
> +	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
> +}
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 68fbc8f..9b0c620 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,9 +17,26 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include "coresight-catu.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
> +{
> +	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> +
> +	if (catu && helper_ops(catu)->enable)
> +		helper_ops(catu)->enable(catu, NULL);
> +}
> +
> +static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
> +{
> +	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> +
> +	if (catu && helper_ops(catu)->disable)
> +		helper_ops(catu)->disable(catu, NULL);
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> @@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	/* Zero out the memory to help with debug */
>  	memset(drvdata->vaddr, 0, drvdata->size);
>  
> +	/*
> +	 * If this ETR is connected to a CATU, enable it before we turn
> +	 * this on
> +	 */
> +	tmc_etr_enable_catu(drvdata);
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	/* Wait for TMCSReady bit to be set */
> @@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> +
> +	/* Disable CATU device if this ETR is connected to one */
> +	tmc_etr_disable_catu(drvdata);
>  }
>  
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 8df7a81..cdff853 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -19,6 +19,7 @@
>  #define _CORESIGHT_TMC_H
>  
>  #include <linux/miscdevice.h>
> +#include "coresight-catu.h"
>  
>  #define TMC_RSZ			0x004
>  #define TMC_STS			0x00c
> @@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>  	return !!(drvdata->etr_caps & cap);
>  }
>  
> +/*
> + * TMC ETR could be connected to a CATU device, which can provide address
> + * translation service. This is represented by the Output port of the TMC
> + * (ETR) connected to the input port of the CATU.
> + *
> + * Returns	: coresight_device ptr for the CATU device if a CATU is found.
> + *		: NULL otherwise.
> + */
> +static inline struct coresight_device *
> +tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
> +{
> +	int i;
> +	struct coresight_device *tmp, *etr = drvdata->csdev;
> +
> +	if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
> +		return NULL;
> +
> +	for (i = 0; i < etr->nr_outport; i++) {
> +		tmp = etr->conns[0].child_dev;
> +		if (tmp && coresight_is_catu_device(tmp))
> +			return tmp;
> +	}
> +
> +	return NULL;
> +}
> +
>  #endif
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 5e926f7..c0e1568 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
>  
>  enum coresight_dev_subtype_helper {
>  	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
> +	CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
>  };
>  
>  /**
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
@ 2018-05-03 17:31     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
> Add the initial support for Coresight Address Translation Unit, which
> augments the TMC in Coresight SoC-600 by providing an improved Scatter
> Gather mechanism. CATU is always connected to a single TMC-ETR and
> converts the AXI address with a translated address (from a given SG
> table with specific format). The CATU should be programmed in pass
> through mode and enabled if the ETR doesn't translation by CATU.
> 
> This patch provides mechanism to enable/disable the CATU always in the
> pass through mode.
> 
> We reuse the existing ports mechanism to link the TMC-ETR to the
> connected CATU.
> 
> i.e, TMC-ETR:output_port0 -> CATU:input_port0
> 
> Reference manual for  CATU component is avilable in version r2p0 of :
> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
> under Section 4.9.

Please remove the part about the TRM as it is bound to change.  

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/Kconfig             |  10 ++
>  drivers/hwtracing/coresight/Makefile            |   1 +
>  drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
>  include/linux/coresight.h                       |   1 +
>  7 files changed, 349 insertions(+)
>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.h
> 
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index ef9cb3c..21f638f 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
>  	  complies with the generic implementation of the component without
>  	  special enhancement or added features.
>  
> +config CORESIGHT_CATU
> +	bool "Coresight Address Translation Unit (CATU) driver"
> +	depends on CORESIGHT_LINK_AND_SINK_TMC
> +	help
> +	   Enable support for the Coresight Address Translation Unit (CATU).
> +	   CATU supports a scatter gather table of 4K pages, with forward/backward
> +	   lookup. CATU helps TMC ETR to use large physically non-contiguous trace
> +	   buffer by translating the addersses used by ETR to the corresponding
> +	   physical adderss by looking up the table.

There is a couple of typos in the last sentence.

> +
>  config CORESIGHT_SINK_TPIU
>  	bool "Coresight generic TPIU driver"
>  	depends on CORESIGHT_LINKS_AND_SINKS
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index 61db9dd..41870de 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>  obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
> +obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> new file mode 100644
> index 0000000..2cd69a6
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +

Extra line

> +/*
> + * Copyright (C) 2017 ARM Limited. All rights reserved.

You sure you don't want to bump this to 2018?

> + *
> + * Coresight Address Translation Unit support
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/device.h>
> +#include <linux/amba/bus.h>
> +#include <linux/io.h>
> +#include <linux/slab.h>

List in alphabetical order is possible.

> +
> +#include "coresight-catu.h"
> +#include "coresight-priv.h"
> +
> +#define csdev_to_catu_drvdata(csdev)	\
> +	dev_get_drvdata(csdev->dev.parent)
> +
> +coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
> +coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
> +coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> +coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
> +coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
> +coresight_simple_reg64(struct catu_drvdata, sladdr,
> +		       CATU_SLADDRLO, CATU_SLADDRHI);
> +coresight_simple_reg64(struct catu_drvdata, inaddr,
> +		       CATU_INADDRLO, CATU_INADDRHI);
> +
> +static struct attribute *catu_mgmt_attrs[] = {
> +	&dev_attr_control.attr,
> +	&dev_attr_status.attr,
> +	&dev_attr_mode.attr,
> +	&dev_attr_axictrl.attr,
> +	&dev_attr_irqen.attr,
> +	&dev_attr_sladdr.attr,
> +	&dev_attr_inaddr.attr,
> +	NULL,
> +};
> +
> +static const struct attribute_group catu_mgmt_group = {
> +	.attrs = catu_mgmt_attrs,
> +	.name = "mgmt",
> +};
> +
> +static const struct attribute_group *catu_groups[] = {
> +	&catu_mgmt_group,
> +	NULL,
> +};
> +
> +
> +static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
> +{
> +	return coresight_timeout(drvdata->base,
> +				 CATU_STATUS, CATU_STATUS_READY, 1);
> +}
> +
> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
> +{
> +	u32 control;
> +
> +	if (catu_wait_for_ready(drvdata))
> +		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
> +
> +	control = catu_read_control(drvdata);
> +	if (control & BIT(CATU_CONTROL_ENABLE)) {
> +		dev_warn(drvdata->dev, "CATU is already enabled\n");
> +		return -EBUSY;
> +	}
> +
> +	control |= BIT(CATU_CONTROL_ENABLE);
> +	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
> +	catu_write_control(drvdata, control);
> +	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
> +	return 0;
> +}
> +
> +static int catu_enable(struct coresight_device *csdev, void *data)
> +{
> +	int rc;
> +	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> +
> +	CS_UNLOCK(catu_drvdata->base);
> +	rc = catu_enable_hw(catu_drvdata, data);
> +	CS_LOCK(catu_drvdata->base);
> +	return rc;
> +}
> +
> +static int catu_disable_hw(struct catu_drvdata *drvdata)
> +{
> +	int rc = 0;
> +
> +	if (catu_wait_for_ready(drvdata)) {
> +		dev_info(drvdata->dev, "Timeout while waiting for READY\n");
> +		rc = -EAGAIN;
> +	}
> +
> +	catu_write_control(drvdata, 0);
> +	dev_dbg(drvdata->dev, "Disabled\n");
> +	return rc;
> +}
> +
> +static int catu_disable(struct coresight_device *csdev, void *__unused)
> +{
> +	int rc;
> +	struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> +
> +	CS_UNLOCK(catu_drvdata->base);
> +	rc = catu_disable_hw(catu_drvdata);
> +	CS_LOCK(catu_drvdata->base);
> +

I suppose you can remove the extra line as catu_enable() doesn't have one.

> +	return rc;
> +}
> +
> +const struct coresight_ops_helper catu_helper_ops = {
> +	.enable = catu_enable,
> +	.disable = catu_disable,
> +};
> +
> +const struct coresight_ops catu_ops = {
> +	.helper_ops = &catu_helper_ops,
> +};
> +
> +static int catu_probe(struct amba_device *adev, const struct amba_id *id)
> +{
> +	int ret = 0;
> +	struct catu_drvdata *drvdata;
> +	struct coresight_desc catu_desc;
> +	struct coresight_platform_data *pdata = NULL;
> +	struct device *dev = &adev->dev;
> +	struct device_node *np = dev->of_node;
> +	void __iomem *base;
> +
> +	if (np) {
> +		pdata = of_get_coresight_platform_data(dev, np);
> +		if (IS_ERR(pdata)) {
> +			ret = PTR_ERR(pdata);
> +			goto out;
> +		}
> +		dev->platform_data = pdata;
> +	}
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (!drvdata) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	drvdata->dev = dev;
> +	dev_set_drvdata(dev, drvdata);
> +	base = devm_ioremap_resource(dev, &adev->res);
> +	if (IS_ERR(base)) {
> +		ret = PTR_ERR(base);
> +		goto out;
> +	}
> +
> +	drvdata->base = base;
> +	catu_desc.pdata = pdata;
> +	catu_desc.dev = dev;
> +	catu_desc.groups = catu_groups;
> +	catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
> +	catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
> +	catu_desc.ops = &catu_ops;
> +	drvdata->csdev = coresight_register(&catu_desc);
> +	if (IS_ERR(drvdata->csdev))
> +		ret = PTR_ERR(drvdata->csdev);
> +	if (!ret)
> +		dev_info(drvdata->dev, "initialized\n");

Please remove as it 1) doesn't convey HW related information and 2) the TMC
doesn't out put anything. 

> +out:
> +	pm_runtime_put(&adev->dev);
> +	return ret;
> +}
> +
> +static struct amba_id catu_ids[] = {
> +	{
> +		.id	= 0x000bb9ee,
> +		.mask	= 0x000fffff,
> +	},
> +	{},
> +};
> +
> +static struct amba_driver catu_driver = {
> +	.drv = {
> +		.name			= "coresight-catu",
> +		.owner			= THIS_MODULE,
> +		.suppress_bind_attrs	= true,
> +	},
> +	.probe				= catu_probe,
> +	.id_table			= catu_ids,
> +};
> +
> +builtin_amba_driver(catu_driver);
> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
> new file mode 100644
> index 0000000..cd58d6f
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-catu.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +

Extra line

> +/*
> + * Copyright (C) 2017 ARM Limited. All rights reserved.
> + *
> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
> + *

Extra line. In coresight-catu.c there isn't one.

> + */
> +
> +#ifndef _CORESIGHT_CATU_H
> +#define _CORESIGHT_CATU_H
> +
> +#include "coresight-priv.h"
> +
> +/* Register offset from base */
> +#define CATU_CONTROL		0x000
> +#define CATU_MODE		0x004
> +#define CATU_AXICTRL		0x008
> +#define CATU_IRQEN		0x00c
> +#define CATU_SLADDRLO		0x020
> +#define CATU_SLADDRHI		0x024
> +#define CATU_INADDRLO		0x028
> +#define CATU_INADDRHI		0x02c
> +#define CATU_STATUS		0x100
> +#define CATU_DEVARCH		0xfbc
> +
> +#define CATU_CONTROL_ENABLE	0
> +
> +#define CATU_MODE_PASS_THROUGH	0U
> +#define CATU_MODE_TRANSLATE	1U
> +
> +#define CATU_STATUS_READY	8
> +#define CATU_STATUS_ADRERR	0
> +#define CATU_STATUS_AXIERR	4
> +
> +

Extra line.

> +#define CATU_IRQEN_ON		0x1
> +#define CATU_IRQEN_OFF		0x0
> +
> +

Extra line.

> +struct catu_drvdata {
> +	struct device *dev;
> +	void __iomem *base;
> +	struct coresight_device *csdev;
> +	int irq;
> +};
> +
> +#define CATU_REG32(name, offset)					\
> +static inline u32							\
> +catu_read_##name(struct catu_drvdata *drvdata)				\
> +{									\
> +	return coresight_read_reg_pair(drvdata->base, offset, -1);	\
> +}									\
> +static inline void							\
> +catu_write_##name(struct catu_drvdata *drvdata, u32 val)		\
> +{									\
> +	coresight_write_reg_pair(drvdata->base, val, offset, -1);	\
> +}
> +
> +#define CATU_REG_PAIR(name, lo_off, hi_off)				\
> +static inline u64							\
> +catu_read_##name(struct catu_drvdata *drvdata)				\
> +{									\
> +	return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);	\
> +}									\
> +static inline void							\
> +catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
> +{									\
> +	coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);	\
> +}
> +
> +CATU_REG32(control, CATU_CONTROL);
> +CATU_REG32(mode, CATU_MODE);
> +CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
> +CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
> +
> +static inline bool coresight_is_catu_device(struct coresight_device *csdev)
> +{
> +	enum coresight_dev_subtype_helper subtype;
> +
> +	/* Make the checkpatch happy */
> +	subtype = csdev->subtype.helper_subtype;
> +
> +	return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
> +	       csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
> +	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
> +}
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 68fbc8f..9b0c620 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,9 +17,26 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include "coresight-catu.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
> +{
> +	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> +
> +	if (catu && helper_ops(catu)->enable)
> +		helper_ops(catu)->enable(catu, NULL);
> +}
> +
> +static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
> +{
> +	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> +
> +	if (catu && helper_ops(catu)->disable)
> +		helper_ops(catu)->disable(catu, NULL);
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> @@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	/* Zero out the memory to help with debug */
>  	memset(drvdata->vaddr, 0, drvdata->size);
>  
> +	/*
> +	 * If this ETR is connected to a CATU, enable it before we turn
> +	 * this on
> +	 */
> +	tmc_etr_enable_catu(drvdata);
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	/* Wait for TMCSReady bit to be set */
> @@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> +
> +	/* Disable CATU device if this ETR is connected to one */
> +	tmc_etr_disable_catu(drvdata);
>  }
>  
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 8df7a81..cdff853 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -19,6 +19,7 @@
>  #define _CORESIGHT_TMC_H
>  
>  #include <linux/miscdevice.h>
> +#include "coresight-catu.h"
>  
>  #define TMC_RSZ			0x004
>  #define TMC_STS			0x00c
> @@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>  	return !!(drvdata->etr_caps & cap);
>  }
>  
> +/*
> + * TMC ETR could be connected to a CATU device, which can provide address
> + * translation service. This is represented by the Output port of the TMC
> + * (ETR) connected to the input port of the CATU.
> + *
> + * Returns	: coresight_device ptr for the CATU device if a CATU is found.
> + *		: NULL otherwise.
> + */
> +static inline struct coresight_device *
> +tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
> +{
> +	int i;
> +	struct coresight_device *tmp, *etr = drvdata->csdev;
> +
> +	if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
> +		return NULL;
> +
> +	for (i = 0; i < etr->nr_outport; i++) {
> +		tmp = etr->conns[0].child_dev;
> +		if (tmp && coresight_is_catu_device(tmp))
> +			return tmp;
> +	}
> +
> +	return NULL;
> +}
> +
>  #endif
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 5e926f7..c0e1568 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
>  
>  enum coresight_dev_subtype_helper {
>  	CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
> +	CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
>  };
>  
>  /**
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-01 13:10     ` Rob Herring
@ 2018-05-03 17:42       ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:42 UTC (permalink / raw)
  To: Rob Herring
  Cc: Suzuki K Poulose, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, devicetree,
	Mathieu Poirier

On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>> by providing an improved Scatter Gather mechanism for streaming
>> trace data to non-contiguous system RAM pages.
>>
>> Cc: devicetree@vger.kernel.org
>> Cc: frowand.list@gmail.com
>> Cc: Rob Herring <robh@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>>  1 file changed, 52 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>> index 15ac8e8..cdd84d0 100644
>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>
>>               - System Trace Macrocell:
>>                       "arm,coresight-stm", "arm,primecell"; [1]
>> +             - Coresight Address Translation Unit (CATU)
>> +                     "arm, coresight-catu", "arm,primecell";
>
> spurious space               ^
>
>>
>>       * reg: physical base address and length of the register
>>         set(s) of the component.
>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>        (embedded trace router)
>>
>> +* Optional property for CATU :
>> +     * interrupts : Exactly one SPI may be listed for reporting the address
>> +       error
>
> Somewhere you need to define the ports for the CATU.
>
>>
>>  Example:
>>
>> @@ -118,6 +123,35 @@ Example:
>>               };
>>       };
>>
>> +     etr@20070000 {
>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>> +             reg = <0 0x20070000 0 0x1000>;
>> +
>> +             clocks = <&oscclk6a>;
>> +             clock-names = "apb_pclk";
>> +             ports {
>> +                     #address-cells = <1>;
>> +                     #size-cells = <0>;
>> +
>> +                     /* input port */
>> +                     port@0 {
>> +                             reg =  <0>;
>> +                             etr_in_port: endpoint {
>> +                                     slave-mode;
>> +                                     remote-endpoint = <&replicator2_out_port0>;
>> +                             };
>> +                     };
>> +
>> +                     /* CATU link represented by output port */
>> +                     port@1 {
>> +                             reg = <0>;
>
> While common in the Coresight bindings, having unit-address and reg not
> match is an error. Mathieu and I discussed this a bit as dtc now warns
> on these.
>
> Either reg should be 1 here, or 'ports' needs to be split into input and
> output ports. My preference would be the former, but Mathieu objected to
> this not reflecting the the h/w numbering.

Suzuki, as we discuss this is related to your work on revamping CS
bindings for ACPI.  Until that gets done and to move forward with this
set I suggest you abide to Rob's request.

>
> Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-03 17:42       ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>> by providing an improved Scatter Gather mechanism for streaming
>> trace data to non-contiguous system RAM pages.
>>
>> Cc: devicetree at vger.kernel.org
>> Cc: frowand.list at gmail.com
>> Cc: Rob Herring <robh@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>>  1 file changed, 52 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>> index 15ac8e8..cdd84d0 100644
>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>
>>               - System Trace Macrocell:
>>                       "arm,coresight-stm", "arm,primecell"; [1]
>> +             - Coresight Address Translation Unit (CATU)
>> +                     "arm, coresight-catu", "arm,primecell";
>
> spurious space               ^
>
>>
>>       * reg: physical base address and length of the register
>>         set(s) of the component.
>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>        (embedded trace router)
>>
>> +* Optional property for CATU :
>> +     * interrupts : Exactly one SPI may be listed for reporting the address
>> +       error
>
> Somewhere you need to define the ports for the CATU.
>
>>
>>  Example:
>>
>> @@ -118,6 +123,35 @@ Example:
>>               };
>>       };
>>
>> +     etr at 20070000 {
>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>> +             reg = <0 0x20070000 0 0x1000>;
>> +
>> +             clocks = <&oscclk6a>;
>> +             clock-names = "apb_pclk";
>> +             ports {
>> +                     #address-cells = <1>;
>> +                     #size-cells = <0>;
>> +
>> +                     /* input port */
>> +                     port at 0 {
>> +                             reg =  <0>;
>> +                             etr_in_port: endpoint {
>> +                                     slave-mode;
>> +                                     remote-endpoint = <&replicator2_out_port0>;
>> +                             };
>> +                     };
>> +
>> +                     /* CATU link represented by output port */
>> +                     port at 1 {
>> +                             reg = <0>;
>
> While common in the Coresight bindings, having unit-address and reg not
> match is an error. Mathieu and I discussed this a bit as dtc now warns
> on these.
>
> Either reg should be 1 here, or 'ports' needs to be split into input and
> output ports. My preference would be the former, but Mathieu objected to
> this not reflecting the the h/w numbering.

Suzuki, as we discuss this is related to your work on revamping CS
bindings for ACPI.  Until that gets done and to move forward with this
set I suggest you abide to Rob's request.

>
> Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 07/27] coresight: tmc: Hide trace buffer handling for file read
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-03 19:50     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 19:50 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:37AM +0100, Suzuki K Poulose wrote:
> At the moment we adjust the buffer pointers for reading the trace
> data via misc device in the common code for ETF/ETB and ETR. Since
> we are going to change how we manage the buffer for ETR, let us
> move the buffer manipulation to the respective driver files, hiding
> it from the common code. We do so by adding type specific helpers
> for finding the length of data and the pointer to the buffer,
> for a given length at a file position.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
>  drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
>  4 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index e2513b7..2113e93 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
>  	CS_LOCK(drvdata->base);
>  }
>  
> +/*
> + * Return the available trace data in the buffer from @pos, with
> + * a maximum limit of @len, updating the @bufpp on where to
> + * find it.
> + */
> +ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				  loff_t pos, size_t len, char **bufpp)
> +{
> +	ssize_t actual = len;
> +
> +	/* Adjust the len to available size @pos */
> +	if (pos + actual > drvdata->len)
> +		actual = drvdata->len - pos;
> +	if (actual > 0)
> +		*bufpp = drvdata->buf + pos;
> +	return actual;
> +}
> +
>  static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
>  {
>  	int ret = 0;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index bff46f2..53a17a8 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -92,6 +92,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	CS_LOCK(drvdata->base);
>  }
>  
> +/*
> + * Return the available trace data in the buffer @pos, with a maximum
> + * limit of @len, also updating the @bufpp on where to find it.
> + */
> +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +			    loff_t pos, size_t len, char **bufpp)
> +{
> +	ssize_t actual = len;
> +	char *bufp = drvdata->buf + pos;
> +	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> +
> +	/* Adjust the len to available size @pos */
> +	if (pos + actual > drvdata->len)
> +		actual = drvdata->len - pos;
> +
> +	if (actual <= 0)
> +		return actual;
> +
> +	/*
> +	 * Since we use a circular buffer, with trace data starting
> +	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> +	 * wrap the current @pos to within the buffer.
> +	 */
> +	if (bufp >= bufend)
> +		bufp -= drvdata->size;
> +	/*
> +	 * For simplicity, avoid copying over a wrapped around buffer.
> +	 */
> +	if ((bufp + actual) > bufend)
> +		actual = bufend - bufp;
> +	*bufpp = bufp;
> +	return actual;
> +}
> +
>  static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
>  {
>  	const u32 *barrier;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index 0ea04f5..7a4e84f 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +					loff_t pos, size_t len, char **bufpp)
> +{
> +	switch (drvdata->config_type) {
> +	case TMC_CONFIG_TYPE_ETB:
> +	case TMC_CONFIG_TYPE_ETF:
> +		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
> +	case TMC_CONFIG_TYPE_ETR:
> +		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
> +	}
> +
> +	return  -EINVAL;

Extra space betwen return and -EINVAL.

> +}
> +
>  static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
>  			loff_t *ppos)
>  {
> +	char *bufp;
> +	ssize_t actual;
>  	struct tmc_drvdata *drvdata = container_of(file->private_data,
>  						   struct tmc_drvdata, miscdev);
> -	char *bufp = drvdata->buf + *ppos;
> +	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
> +	if (actual <= 0)
> +		return 0;
>  
> -	if (*ppos + len > drvdata->len)
> -		len = drvdata->len - *ppos;
> -
> -	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
> -		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
> -			bufp = drvdata->vaddr;
> -		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
> -			bufp -= drvdata->size;
> -		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
> -			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
> -	}
> -
> -	if (copy_to_user(data, bufp, len)) {
> +	if (copy_to_user(data, bufp, actual)) {
>  		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
>  		return -EFAULT;
>  	}
>  
> -	*ppos += len;
> +	*ppos += actual;
> +	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
>  
> -	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
> -		__func__, len, (int)(drvdata->len - *ppos));
> -	return len;
> +	return actual;
>  }
>  
>  static int tmc_release(struct inode *inode, struct file *file)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index cdff853..9cbc4d5 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -184,10 +184,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
>  extern const struct coresight_ops tmc_etb_cs_ops;
>  extern const struct coresight_ops tmc_etf_cs_ops;
>  
> +ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				loff_t pos, size_t len, char **bufpp);
>  /* ETR functions */
>  int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
>  extern const struct coresight_ops tmc_etr_cs_ops;
> +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				loff_t pos, size_t len, char **bufpp);
>  
>  
>  #define TMC_REG_PAIR(name, lo_off, hi_off)				\
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 07/27] coresight: tmc: Hide trace buffer handling for file read
@ 2018-05-03 19:50     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:37AM +0100, Suzuki K Poulose wrote:
> At the moment we adjust the buffer pointers for reading the trace
> data via misc device in the common code for ETF/ETB and ETR. Since
> we are going to change how we manage the buffer for ETR, let us
> move the buffer manipulation to the respective driver files, hiding
> it from the common code. We do so by adding type specific helpers
> for finding the length of data and the pointer to the buffer,
> for a given length at a file position.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
>  drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
>  4 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index e2513b7..2113e93 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
>  	CS_LOCK(drvdata->base);
>  }
>  
> +/*
> + * Return the available trace data in the buffer from @pos, with
> + * a maximum limit of @len, updating the @bufpp on where to
> + * find it.
> + */
> +ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				  loff_t pos, size_t len, char **bufpp)
> +{
> +	ssize_t actual = len;
> +
> +	/* Adjust the len to available size @pos */
> +	if (pos + actual > drvdata->len)
> +		actual = drvdata->len - pos;
> +	if (actual > 0)
> +		*bufpp = drvdata->buf + pos;
> +	return actual;
> +}
> +
>  static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
>  {
>  	int ret = 0;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index bff46f2..53a17a8 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -92,6 +92,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	CS_LOCK(drvdata->base);
>  }
>  
> +/*
> + * Return the available trace data in the buffer @pos, with a maximum
> + * limit of @len, also updating the @bufpp on where to find it.
> + */
> +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +			    loff_t pos, size_t len, char **bufpp)
> +{
> +	ssize_t actual = len;
> +	char *bufp = drvdata->buf + pos;
> +	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> +
> +	/* Adjust the len to available size @pos */
> +	if (pos + actual > drvdata->len)
> +		actual = drvdata->len - pos;
> +
> +	if (actual <= 0)
> +		return actual;
> +
> +	/*
> +	 * Since we use a circular buffer, with trace data starting
> +	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> +	 * wrap the current @pos to within the buffer.
> +	 */
> +	if (bufp >= bufend)
> +		bufp -= drvdata->size;
> +	/*
> +	 * For simplicity, avoid copying over a wrapped around buffer.
> +	 */
> +	if ((bufp + actual) > bufend)
> +		actual = bufend - bufp;
> +	*bufpp = bufp;
> +	return actual;
> +}
> +
>  static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
>  {
>  	const u32 *barrier;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index 0ea04f5..7a4e84f 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +					loff_t pos, size_t len, char **bufpp)
> +{
> +	switch (drvdata->config_type) {
> +	case TMC_CONFIG_TYPE_ETB:
> +	case TMC_CONFIG_TYPE_ETF:
> +		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
> +	case TMC_CONFIG_TYPE_ETR:
> +		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
> +	}
> +
> +	return  -EINVAL;

Extra space betwen return and -EINVAL.

> +}
> +
>  static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
>  			loff_t *ppos)
>  {
> +	char *bufp;
> +	ssize_t actual;
>  	struct tmc_drvdata *drvdata = container_of(file->private_data,
>  						   struct tmc_drvdata, miscdev);
> -	char *bufp = drvdata->buf + *ppos;
> +	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
> +	if (actual <= 0)
> +		return 0;
>  
> -	if (*ppos + len > drvdata->len)
> -		len = drvdata->len - *ppos;
> -
> -	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
> -		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
> -			bufp = drvdata->vaddr;
> -		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
> -			bufp -= drvdata->size;
> -		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
> -			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
> -	}
> -
> -	if (copy_to_user(data, bufp, len)) {
> +	if (copy_to_user(data, bufp, actual)) {
>  		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
>  		return -EFAULT;
>  	}
>  
> -	*ppos += len;
> +	*ppos += actual;
> +	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
>  
> -	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
> -		__func__, len, (int)(drvdata->len - *ppos));
> -	return len;
> +	return actual;
>  }
>  
>  static int tmc_release(struct inode *inode, struct file *file)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index cdff853..9cbc4d5 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -184,10 +184,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
>  extern const struct coresight_ops tmc_etb_cs_ops;
>  extern const struct coresight_ops tmc_etf_cs_ops;
>  
> +ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				loff_t pos, size_t len, char **bufpp);
>  /* ETR functions */
>  int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
>  extern const struct coresight_ops tmc_etr_cs_ops;
> +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> +				loff_t pos, size_t len, char **bufpp);
>  
>  
>  #define TMC_REG_PAIR(name, lo_off, hi_off)				\
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
  2018-05-03 17:31     ` Mathieu Poirier
@ 2018-05-03 20:25       ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:25 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, Rob Herring, John Horley

On 3 May 2018 at 11:31, Mathieu Poirier <mathieu.poirier@linaro.org> wrote:
> On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
>> Add the initial support for Coresight Address Translation Unit, which
>> augments the TMC in Coresight SoC-600 by providing an improved Scatter
>> Gather mechanism. CATU is always connected to a single TMC-ETR and
>> converts the AXI address with a translated address (from a given SG
>> table with specific format). The CATU should be programmed in pass
>> through mode and enabled if the ETR doesn't translation by CATU.
>>
>> This patch provides mechanism to enable/disable the CATU always in the
>> pass through mode.
>>
>> We reuse the existing ports mechanism to link the TMC-ETR to the
>> connected CATU.
>>
>> i.e, TMC-ETR:output_port0 -> CATU:input_port0
>>
>> Reference manual for  CATU component is avilable in version r2p0 of :
>> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
>> under Section 4.9.
>
> Please remove the part about the TRM as it is bound to change.
>
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  drivers/hwtracing/coresight/Kconfig             |  10 ++
>>  drivers/hwtracing/coresight/Makefile            |   1 +
>>  drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
>>  drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
>>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
>>  drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
>>  include/linux/coresight.h                       |   1 +
>>  7 files changed, 349 insertions(+)
>>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.h
>>
>> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
>> index ef9cb3c..21f638f 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
>>         complies with the generic implementation of the component without
>>         special enhancement or added features.
>>
>> +config CORESIGHT_CATU
>> +     bool "Coresight Address Translation Unit (CATU) driver"
>> +     depends on CORESIGHT_LINK_AND_SINK_TMC
>> +     help
>> +        Enable support for the Coresight Address Translation Unit (CATU).
>> +        CATU supports a scatter gather table of 4K pages, with forward/backward
>> +        lookup. CATU helps TMC ETR to use large physically non-contiguous trace
>> +        buffer by translating the addersses used by ETR to the corresponding
>> +        physical adderss by looking up the table.
>
> There is a couple of typos in the last sentence.

There's also a typo in the patch title.

>
>> +
>>  config CORESIGHT_SINK_TPIU
>>       bool "Coresight generic TPIU driver"
>>       depends on CORESIGHT_LINKS_AND_SINKS
>> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
>> index 61db9dd..41870de 100644
>> --- a/drivers/hwtracing/coresight/Makefile
>> +++ b/drivers/hwtracing/coresight/Makefile
>> @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>>  obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
>>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>> +obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>> new file mode 100644
>> index 0000000..2cd69a6
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>
> Extra line
>
>> +/*
>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>
> You sure you don't want to bump this to 2018?
>
>> + *
>> + * Coresight Address Translation Unit support
>> + *
>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/device.h>
>> +#include <linux/amba/bus.h>
>> +#include <linux/io.h>
>> +#include <linux/slab.h>
>
> List in alphabetical order is possible.
>
>> +
>> +#include "coresight-catu.h"
>> +#include "coresight-priv.h"
>> +
>> +#define csdev_to_catu_drvdata(csdev) \
>> +     dev_get_drvdata(csdev->dev.parent)
>> +
>> +coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>> +coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>> +coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
>> +coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
>> +coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
>> +coresight_simple_reg64(struct catu_drvdata, sladdr,
>> +                    CATU_SLADDRLO, CATU_SLADDRHI);
>> +coresight_simple_reg64(struct catu_drvdata, inaddr,
>> +                    CATU_INADDRLO, CATU_INADDRHI);
>> +
>> +static struct attribute *catu_mgmt_attrs[] = {
>> +     &dev_attr_control.attr,
>> +     &dev_attr_status.attr,
>> +     &dev_attr_mode.attr,
>> +     &dev_attr_axictrl.attr,
>> +     &dev_attr_irqen.attr,
>> +     &dev_attr_sladdr.attr,
>> +     &dev_attr_inaddr.attr,
>> +     NULL,
>> +};
>> +
>> +static const struct attribute_group catu_mgmt_group = {
>> +     .attrs = catu_mgmt_attrs,
>> +     .name = "mgmt",
>> +};
>> +
>> +static const struct attribute_group *catu_groups[] = {
>> +     &catu_mgmt_group,
>> +     NULL,
>> +};
>> +
>> +
>> +static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
>> +{
>> +     return coresight_timeout(drvdata->base,
>> +                              CATU_STATUS, CATU_STATUS_READY, 1);
>> +}
>> +
>> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
>> +{
>> +     u32 control;
>> +
>> +     if (catu_wait_for_ready(drvdata))
>> +             dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
>> +
>> +     control = catu_read_control(drvdata);
>> +     if (control & BIT(CATU_CONTROL_ENABLE)) {
>> +             dev_warn(drvdata->dev, "CATU is already enabled\n");
>> +             return -EBUSY;
>> +     }
>> +
>> +     control |= BIT(CATU_CONTROL_ENABLE);
>> +     catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
>> +     catu_write_control(drvdata, control);
>> +     dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
>> +     return 0;
>> +}
>> +
>> +static int catu_enable(struct coresight_device *csdev, void *data)
>> +{
>> +     int rc;
>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>> +
>> +     CS_UNLOCK(catu_drvdata->base);
>> +     rc = catu_enable_hw(catu_drvdata, data);
>> +     CS_LOCK(catu_drvdata->base);
>> +     return rc;
>> +}
>> +
>> +static int catu_disable_hw(struct catu_drvdata *drvdata)
>> +{
>> +     int rc = 0;
>> +
>> +     if (catu_wait_for_ready(drvdata)) {
>> +             dev_info(drvdata->dev, "Timeout while waiting for READY\n");
>> +             rc = -EAGAIN;
>> +     }
>> +
>> +     catu_write_control(drvdata, 0);
>> +     dev_dbg(drvdata->dev, "Disabled\n");
>> +     return rc;
>> +}
>> +
>> +static int catu_disable(struct coresight_device *csdev, void *__unused)
>> +{
>> +     int rc;
>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>> +
>> +     CS_UNLOCK(catu_drvdata->base);
>> +     rc = catu_disable_hw(catu_drvdata);
>> +     CS_LOCK(catu_drvdata->base);
>> +
>
> I suppose you can remove the extra line as catu_enable() doesn't have one.
>
>> +     return rc;
>> +}
>> +
>> +const struct coresight_ops_helper catu_helper_ops = {
>> +     .enable = catu_enable,
>> +     .disable = catu_disable,
>> +};
>> +
>> +const struct coresight_ops catu_ops = {
>> +     .helper_ops = &catu_helper_ops,
>> +};
>> +
>> +static int catu_probe(struct amba_device *adev, const struct amba_id *id)
>> +{
>> +     int ret = 0;
>> +     struct catu_drvdata *drvdata;
>> +     struct coresight_desc catu_desc;
>> +     struct coresight_platform_data *pdata = NULL;
>> +     struct device *dev = &adev->dev;
>> +     struct device_node *np = dev->of_node;
>> +     void __iomem *base;
>> +
>> +     if (np) {
>> +             pdata = of_get_coresight_platform_data(dev, np);
>> +             if (IS_ERR(pdata)) {
>> +                     ret = PTR_ERR(pdata);
>> +                     goto out;
>> +             }
>> +             dev->platform_data = pdata;
>> +     }
>> +
>> +     drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>> +     if (!drvdata) {
>> +             ret = -ENOMEM;
>> +             goto out;
>> +     }
>> +
>> +     drvdata->dev = dev;
>> +     dev_set_drvdata(dev, drvdata);
>> +     base = devm_ioremap_resource(dev, &adev->res);
>> +     if (IS_ERR(base)) {
>> +             ret = PTR_ERR(base);
>> +             goto out;
>> +     }
>> +
>> +     drvdata->base = base;
>> +     catu_desc.pdata = pdata;
>> +     catu_desc.dev = dev;
>> +     catu_desc.groups = catu_groups;
>> +     catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
>> +     catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>> +     catu_desc.ops = &catu_ops;
>> +     drvdata->csdev = coresight_register(&catu_desc);
>> +     if (IS_ERR(drvdata->csdev))
>> +             ret = PTR_ERR(drvdata->csdev);
>> +     if (!ret)
>> +             dev_info(drvdata->dev, "initialized\n");
>
> Please remove as it 1) doesn't convey HW related information and 2) the TMC
> doesn't out put anything.
>
>> +out:
>> +     pm_runtime_put(&adev->dev);
>> +     return ret;
>> +}
>> +
>> +static struct amba_id catu_ids[] = {
>> +     {
>> +             .id     = 0x000bb9ee,
>> +             .mask   = 0x000fffff,
>> +     },
>> +     {},
>> +};
>> +
>> +static struct amba_driver catu_driver = {
>> +     .drv = {
>> +             .name                   = "coresight-catu",
>> +             .owner                  = THIS_MODULE,
>> +             .suppress_bind_attrs    = true,
>> +     },
>> +     .probe                          = catu_probe,
>> +     .id_table                       = catu_ids,
>> +};
>> +
>> +builtin_amba_driver(catu_driver);
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>> new file mode 100644
>> index 0000000..cd58d6f
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>> @@ -0,0 +1,89 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>
> Extra line
>
>> +/*
>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>> + *
>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>> + *
>
> Extra line. In coresight-catu.c there isn't one.
>
>> + */
>> +
>> +#ifndef _CORESIGHT_CATU_H
>> +#define _CORESIGHT_CATU_H
>> +
>> +#include "coresight-priv.h"
>> +
>> +/* Register offset from base */
>> +#define CATU_CONTROL         0x000
>> +#define CATU_MODE            0x004
>> +#define CATU_AXICTRL         0x008
>> +#define CATU_IRQEN           0x00c
>> +#define CATU_SLADDRLO                0x020
>> +#define CATU_SLADDRHI                0x024
>> +#define CATU_INADDRLO                0x028
>> +#define CATU_INADDRHI                0x02c
>> +#define CATU_STATUS          0x100
>> +#define CATU_DEVARCH         0xfbc
>> +
>> +#define CATU_CONTROL_ENABLE  0
>> +
>> +#define CATU_MODE_PASS_THROUGH       0U
>> +#define CATU_MODE_TRANSLATE  1U
>> +
>> +#define CATU_STATUS_READY    8
>> +#define CATU_STATUS_ADRERR   0
>> +#define CATU_STATUS_AXIERR   4
>> +
>> +
>
> Extra line.
>
>> +#define CATU_IRQEN_ON                0x1
>> +#define CATU_IRQEN_OFF               0x0
>> +
>> +
>
> Extra line.
>
>> +struct catu_drvdata {
>> +     struct device *dev;
>> +     void __iomem *base;
>> +     struct coresight_device *csdev;
>> +     int irq;
>> +};
>> +
>> +#define CATU_REG32(name, offset)                                     \
>> +static inline u32                                                    \
>> +catu_read_##name(struct catu_drvdata *drvdata)                               \
>> +{                                                                    \
>> +     return coresight_read_reg_pair(drvdata->base, offset, -1);      \
>> +}                                                                    \
>> +static inline void                                                   \
>> +catu_write_##name(struct catu_drvdata *drvdata, u32 val)             \
>> +{                                                                    \
>> +     coresight_write_reg_pair(drvdata->base, val, offset, -1);       \
>> +}
>> +
>> +#define CATU_REG_PAIR(name, lo_off, hi_off)                          \
>> +static inline u64                                                    \
>> +catu_read_##name(struct catu_drvdata *drvdata)                               \
>> +{                                                                    \
>> +     return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);  \
>> +}                                                                    \
>> +static inline void                                                   \
>> +catu_write_##name(struct catu_drvdata *drvdata, u64 val)             \
>> +{                                                                    \
>> +     coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);   \
>> +}
>> +
>> +CATU_REG32(control, CATU_CONTROL);
>> +CATU_REG32(mode, CATU_MODE);
>> +CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
>> +CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
>> +
>> +static inline bool coresight_is_catu_device(struct coresight_device *csdev)
>> +{
>> +     enum coresight_dev_subtype_helper subtype;
>> +
>> +     /* Make the checkpatch happy */
>> +     subtype = csdev->subtype.helper_subtype;
>> +
>> +     return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
>> +            csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
>> +            subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>> +}
>> +
>> +#endif
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 68fbc8f..9b0c620 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -17,9 +17,26 @@
>>
>>  #include <linux/coresight.h>
>>  #include <linux/dma-mapping.h>
>> +#include "coresight-catu.h"
>>  #include "coresight-priv.h"
>>  #include "coresight-tmc.h"
>>
>> +static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>> +{
>> +     struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>> +
>> +     if (catu && helper_ops(catu)->enable)
>> +             helper_ops(catu)->enable(catu, NULL);
>> +}
>> +
>> +static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>> +{
>> +     struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>> +
>> +     if (catu && helper_ops(catu)->disable)
>> +             helper_ops(catu)->disable(catu, NULL);
>> +}
>> +
>>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>>  {
>>       u32 axictl, sts;
>> @@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>>       /* Zero out the memory to help with debug */
>>       memset(drvdata->vaddr, 0, drvdata->size);
>>
>> +     /*
>> +      * If this ETR is connected to a CATU, enable it before we turn
>> +      * this on
>> +      */
>> +     tmc_etr_enable_catu(drvdata);
>> +
>>       CS_UNLOCK(drvdata->base);
>>
>>       /* Wait for TMCSReady bit to be set */
>> @@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>>       tmc_disable_hw(drvdata);
>>
>>       CS_LOCK(drvdata->base);
>> +
>> +     /* Disable CATU device if this ETR is connected to one */
>> +     tmc_etr_disable_catu(drvdata);
>>  }
>>
>>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
>> index 8df7a81..cdff853 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc.h
>> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
>> @@ -19,6 +19,7 @@
>>  #define _CORESIGHT_TMC_H
>>
>>  #include <linux/miscdevice.h>
>> +#include "coresight-catu.h"
>>
>>  #define TMC_RSZ                      0x004
>>  #define TMC_STS                      0x00c
>> @@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>>       return !!(drvdata->etr_caps & cap);
>>  }
>>
>> +/*
>> + * TMC ETR could be connected to a CATU device, which can provide address
>> + * translation service. This is represented by the Output port of the TMC
>> + * (ETR) connected to the input port of the CATU.
>> + *
>> + * Returns   : coresight_device ptr for the CATU device if a CATU is found.
>> + *           : NULL otherwise.
>> + */
>> +static inline struct coresight_device *
>> +tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
>> +{
>> +     int i;
>> +     struct coresight_device *tmp, *etr = drvdata->csdev;
>> +
>> +     if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
>> +             return NULL;
>> +
>> +     for (i = 0; i < etr->nr_outport; i++) {
>> +             tmp = etr->conns[0].child_dev;
>> +             if (tmp && coresight_is_catu_device(tmp))
>> +                     return tmp;
>> +     }
>> +
>> +     return NULL;
>> +}
>> +
>>  #endif
>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
>> index 5e926f7..c0e1568 100644
>> --- a/include/linux/coresight.h
>> +++ b/include/linux/coresight.h
>> @@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
>>
>>  enum coresight_dev_subtype_helper {
>>       CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
>> +     CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
>>  };
>>
>>  /**
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
@ 2018-05-03 20:25       ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 May 2018 at 11:31, Mathieu Poirier <mathieu.poirier@linaro.org> wrote:
> On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
>> Add the initial support for Coresight Address Translation Unit, which
>> augments the TMC in Coresight SoC-600 by providing an improved Scatter
>> Gather mechanism. CATU is always connected to a single TMC-ETR and
>> converts the AXI address with a translated address (from a given SG
>> table with specific format). The CATU should be programmed in pass
>> through mode and enabled if the ETR doesn't translation by CATU.
>>
>> This patch provides mechanism to enable/disable the CATU always in the
>> pass through mode.
>>
>> We reuse the existing ports mechanism to link the TMC-ETR to the
>> connected CATU.
>>
>> i.e, TMC-ETR:output_port0 -> CATU:input_port0
>>
>> Reference manual for  CATU component is avilable in version r2p0 of :
>> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
>> under Section 4.9.
>
> Please remove the part about the TRM as it is bound to change.
>
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  drivers/hwtracing/coresight/Kconfig             |  10 ++
>>  drivers/hwtracing/coresight/Makefile            |   1 +
>>  drivers/hwtracing/coresight/coresight-catu.c    | 195 ++++++++++++++++++++++++
>>  drivers/hwtracing/coresight/coresight-catu.h    |  89 +++++++++++
>>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  26 ++++
>>  drivers/hwtracing/coresight/coresight-tmc.h     |  27 ++++
>>  include/linux/coresight.h                       |   1 +
>>  7 files changed, 349 insertions(+)
>>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-catu.h
>>
>> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
>> index ef9cb3c..21f638f 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -31,6 +31,16 @@ config CORESIGHT_LINK_AND_SINK_TMC
>>         complies with the generic implementation of the component without
>>         special enhancement or added features.
>>
>> +config CORESIGHT_CATU
>> +     bool "Coresight Address Translation Unit (CATU) driver"
>> +     depends on CORESIGHT_LINK_AND_SINK_TMC
>> +     help
>> +        Enable support for the Coresight Address Translation Unit (CATU).
>> +        CATU supports a scatter gather table of 4K pages, with forward/backward
>> +        lookup. CATU helps TMC ETR to use large physically non-contiguous trace
>> +        buffer by translating the addersses used by ETR to the corresponding
>> +        physical adderss by looking up the table.
>
> There is a couple of typos in the last sentence.

There's also a typo in the patch title.

>
>> +
>>  config CORESIGHT_SINK_TPIU
>>       bool "Coresight generic TPIU driver"
>>       depends on CORESIGHT_LINKS_AND_SINKS
>> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
>> index 61db9dd..41870de 100644
>> --- a/drivers/hwtracing/coresight/Makefile
>> +++ b/drivers/hwtracing/coresight/Makefile
>> @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>>  obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
>>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
>> +obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>> new file mode 100644
>> index 0000000..2cd69a6
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>
> Extra line
>
>> +/*
>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>
> You sure you don't want to bump this to 2018?
>
>> + *
>> + * Coresight Address Translation Unit support
>> + *
>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/device.h>
>> +#include <linux/amba/bus.h>
>> +#include <linux/io.h>
>> +#include <linux/slab.h>
>
> List in alphabetical order is possible.
>
>> +
>> +#include "coresight-catu.h"
>> +#include "coresight-priv.h"
>> +
>> +#define csdev_to_catu_drvdata(csdev) \
>> +     dev_get_drvdata(csdev->dev.parent)
>> +
>> +coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>> +coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>> +coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
>> +coresight_simple_reg32(struct catu_drvdata, axictrl, CATU_AXICTRL);
>> +coresight_simple_reg32(struct catu_drvdata, irqen, CATU_IRQEN);
>> +coresight_simple_reg64(struct catu_drvdata, sladdr,
>> +                    CATU_SLADDRLO, CATU_SLADDRHI);
>> +coresight_simple_reg64(struct catu_drvdata, inaddr,
>> +                    CATU_INADDRLO, CATU_INADDRHI);
>> +
>> +static struct attribute *catu_mgmt_attrs[] = {
>> +     &dev_attr_control.attr,
>> +     &dev_attr_status.attr,
>> +     &dev_attr_mode.attr,
>> +     &dev_attr_axictrl.attr,
>> +     &dev_attr_irqen.attr,
>> +     &dev_attr_sladdr.attr,
>> +     &dev_attr_inaddr.attr,
>> +     NULL,
>> +};
>> +
>> +static const struct attribute_group catu_mgmt_group = {
>> +     .attrs = catu_mgmt_attrs,
>> +     .name = "mgmt",
>> +};
>> +
>> +static const struct attribute_group *catu_groups[] = {
>> +     &catu_mgmt_group,
>> +     NULL,
>> +};
>> +
>> +
>> +static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
>> +{
>> +     return coresight_timeout(drvdata->base,
>> +                              CATU_STATUS, CATU_STATUS_READY, 1);
>> +}
>> +
>> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
>> +{
>> +     u32 control;
>> +
>> +     if (catu_wait_for_ready(drvdata))
>> +             dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
>> +
>> +     control = catu_read_control(drvdata);
>> +     if (control & BIT(CATU_CONTROL_ENABLE)) {
>> +             dev_warn(drvdata->dev, "CATU is already enabled\n");
>> +             return -EBUSY;
>> +     }
>> +
>> +     control |= BIT(CATU_CONTROL_ENABLE);
>> +     catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
>> +     catu_write_control(drvdata, control);
>> +     dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
>> +     return 0;
>> +}
>> +
>> +static int catu_enable(struct coresight_device *csdev, void *data)
>> +{
>> +     int rc;
>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>> +
>> +     CS_UNLOCK(catu_drvdata->base);
>> +     rc = catu_enable_hw(catu_drvdata, data);
>> +     CS_LOCK(catu_drvdata->base);
>> +     return rc;
>> +}
>> +
>> +static int catu_disable_hw(struct catu_drvdata *drvdata)
>> +{
>> +     int rc = 0;
>> +
>> +     if (catu_wait_for_ready(drvdata)) {
>> +             dev_info(drvdata->dev, "Timeout while waiting for READY\n");
>> +             rc = -EAGAIN;
>> +     }
>> +
>> +     catu_write_control(drvdata, 0);
>> +     dev_dbg(drvdata->dev, "Disabled\n");
>> +     return rc;
>> +}
>> +
>> +static int catu_disable(struct coresight_device *csdev, void *__unused)
>> +{
>> +     int rc;
>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>> +
>> +     CS_UNLOCK(catu_drvdata->base);
>> +     rc = catu_disable_hw(catu_drvdata);
>> +     CS_LOCK(catu_drvdata->base);
>> +
>
> I suppose you can remove the extra line as catu_enable() doesn't have one.
>
>> +     return rc;
>> +}
>> +
>> +const struct coresight_ops_helper catu_helper_ops = {
>> +     .enable = catu_enable,
>> +     .disable = catu_disable,
>> +};
>> +
>> +const struct coresight_ops catu_ops = {
>> +     .helper_ops = &catu_helper_ops,
>> +};
>> +
>> +static int catu_probe(struct amba_device *adev, const struct amba_id *id)
>> +{
>> +     int ret = 0;
>> +     struct catu_drvdata *drvdata;
>> +     struct coresight_desc catu_desc;
>> +     struct coresight_platform_data *pdata = NULL;
>> +     struct device *dev = &adev->dev;
>> +     struct device_node *np = dev->of_node;
>> +     void __iomem *base;
>> +
>> +     if (np) {
>> +             pdata = of_get_coresight_platform_data(dev, np);
>> +             if (IS_ERR(pdata)) {
>> +                     ret = PTR_ERR(pdata);
>> +                     goto out;
>> +             }
>> +             dev->platform_data = pdata;
>> +     }
>> +
>> +     drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
>> +     if (!drvdata) {
>> +             ret = -ENOMEM;
>> +             goto out;
>> +     }
>> +
>> +     drvdata->dev = dev;
>> +     dev_set_drvdata(dev, drvdata);
>> +     base = devm_ioremap_resource(dev, &adev->res);
>> +     if (IS_ERR(base)) {
>> +             ret = PTR_ERR(base);
>> +             goto out;
>> +     }
>> +
>> +     drvdata->base = base;
>> +     catu_desc.pdata = pdata;
>> +     catu_desc.dev = dev;
>> +     catu_desc.groups = catu_groups;
>> +     catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
>> +     catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>> +     catu_desc.ops = &catu_ops;
>> +     drvdata->csdev = coresight_register(&catu_desc);
>> +     if (IS_ERR(drvdata->csdev))
>> +             ret = PTR_ERR(drvdata->csdev);
>> +     if (!ret)
>> +             dev_info(drvdata->dev, "initialized\n");
>
> Please remove as it 1) doesn't convey HW related information and 2) the TMC
> doesn't out put anything.
>
>> +out:
>> +     pm_runtime_put(&adev->dev);
>> +     return ret;
>> +}
>> +
>> +static struct amba_id catu_ids[] = {
>> +     {
>> +             .id     = 0x000bb9ee,
>> +             .mask   = 0x000fffff,
>> +     },
>> +     {},
>> +};
>> +
>> +static struct amba_driver catu_driver = {
>> +     .drv = {
>> +             .name                   = "coresight-catu",
>> +             .owner                  = THIS_MODULE,
>> +             .suppress_bind_attrs    = true,
>> +     },
>> +     .probe                          = catu_probe,
>> +     .id_table                       = catu_ids,
>> +};
>> +
>> +builtin_amba_driver(catu_driver);
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>> new file mode 100644
>> index 0000000..cd58d6f
>> --- /dev/null
>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>> @@ -0,0 +1,89 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>
> Extra line
>
>> +/*
>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>> + *
>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>> + *
>
> Extra line. In coresight-catu.c there isn't one.
>
>> + */
>> +
>> +#ifndef _CORESIGHT_CATU_H
>> +#define _CORESIGHT_CATU_H
>> +
>> +#include "coresight-priv.h"
>> +
>> +/* Register offset from base */
>> +#define CATU_CONTROL         0x000
>> +#define CATU_MODE            0x004
>> +#define CATU_AXICTRL         0x008
>> +#define CATU_IRQEN           0x00c
>> +#define CATU_SLADDRLO                0x020
>> +#define CATU_SLADDRHI                0x024
>> +#define CATU_INADDRLO                0x028
>> +#define CATU_INADDRHI                0x02c
>> +#define CATU_STATUS          0x100
>> +#define CATU_DEVARCH         0xfbc
>> +
>> +#define CATU_CONTROL_ENABLE  0
>> +
>> +#define CATU_MODE_PASS_THROUGH       0U
>> +#define CATU_MODE_TRANSLATE  1U
>> +
>> +#define CATU_STATUS_READY    8
>> +#define CATU_STATUS_ADRERR   0
>> +#define CATU_STATUS_AXIERR   4
>> +
>> +
>
> Extra line.
>
>> +#define CATU_IRQEN_ON                0x1
>> +#define CATU_IRQEN_OFF               0x0
>> +
>> +
>
> Extra line.
>
>> +struct catu_drvdata {
>> +     struct device *dev;
>> +     void __iomem *base;
>> +     struct coresight_device *csdev;
>> +     int irq;
>> +};
>> +
>> +#define CATU_REG32(name, offset)                                     \
>> +static inline u32                                                    \
>> +catu_read_##name(struct catu_drvdata *drvdata)                               \
>> +{                                                                    \
>> +     return coresight_read_reg_pair(drvdata->base, offset, -1);      \
>> +}                                                                    \
>> +static inline void                                                   \
>> +catu_write_##name(struct catu_drvdata *drvdata, u32 val)             \
>> +{                                                                    \
>> +     coresight_write_reg_pair(drvdata->base, val, offset, -1);       \
>> +}
>> +
>> +#define CATU_REG_PAIR(name, lo_off, hi_off)                          \
>> +static inline u64                                                    \
>> +catu_read_##name(struct catu_drvdata *drvdata)                               \
>> +{                                                                    \
>> +     return coresight_read_reg_pair(drvdata->base, lo_off, hi_off);  \
>> +}                                                                    \
>> +static inline void                                                   \
>> +catu_write_##name(struct catu_drvdata *drvdata, u64 val)             \
>> +{                                                                    \
>> +     coresight_write_reg_pair(drvdata->base, val, lo_off, hi_off);   \
>> +}
>> +
>> +CATU_REG32(control, CATU_CONTROL);
>> +CATU_REG32(mode, CATU_MODE);
>> +CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
>> +CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
>> +
>> +static inline bool coresight_is_catu_device(struct coresight_device *csdev)
>> +{
>> +     enum coresight_dev_subtype_helper subtype;
>> +
>> +     /* Make the checkpatch happy */
>> +     subtype = csdev->subtype.helper_subtype;
>> +
>> +     return IS_ENABLED(CONFIG_CORESIGHT_CATU) &&
>> +            csdev->type == CORESIGHT_DEV_TYPE_HELPER &&
>> +            subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>> +}
>> +
>> +#endif
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 68fbc8f..9b0c620 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -17,9 +17,26 @@
>>
>>  #include <linux/coresight.h>
>>  #include <linux/dma-mapping.h>
>> +#include "coresight-catu.h"
>>  #include "coresight-priv.h"
>>  #include "coresight-tmc.h"
>>
>> +static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>> +{
>> +     struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>> +
>> +     if (catu && helper_ops(catu)->enable)
>> +             helper_ops(catu)->enable(catu, NULL);
>> +}
>> +
>> +static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>> +{
>> +     struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>> +
>> +     if (catu && helper_ops(catu)->disable)
>> +             helper_ops(catu)->disable(catu, NULL);
>> +}
>> +
>>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>>  {
>>       u32 axictl, sts;
>> @@ -27,6 +44,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>>       /* Zero out the memory to help with debug */
>>       memset(drvdata->vaddr, 0, drvdata->size);
>>
>> +     /*
>> +      * If this ETR is connected to a CATU, enable it before we turn
>> +      * this on
>> +      */
>> +     tmc_etr_enable_catu(drvdata);
>> +
>>       CS_UNLOCK(drvdata->base);
>>
>>       /* Wait for TMCSReady bit to be set */
>> @@ -116,6 +139,9 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>>       tmc_disable_hw(drvdata);
>>
>>       CS_LOCK(drvdata->base);
>> +
>> +     /* Disable CATU device if this ETR is connected to one */
>> +     tmc_etr_disable_catu(drvdata);
>>  }
>>
>>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
>> index 8df7a81..cdff853 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc.h
>> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
>> @@ -19,6 +19,7 @@
>>  #define _CORESIGHT_TMC_H
>>
>>  #include <linux/miscdevice.h>
>> +#include "coresight-catu.h"
>>
>>  #define TMC_RSZ                      0x004
>>  #define TMC_STS                      0x00c
>> @@ -222,4 +223,30 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>>       return !!(drvdata->etr_caps & cap);
>>  }
>>
>> +/*
>> + * TMC ETR could be connected to a CATU device, which can provide address
>> + * translation service. This is represented by the Output port of the TMC
>> + * (ETR) connected to the input port of the CATU.
>> + *
>> + * Returns   : coresight_device ptr for the CATU device if a CATU is found.
>> + *           : NULL otherwise.
>> + */
>> +static inline struct coresight_device *
>> +tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
>> +{
>> +     int i;
>> +     struct coresight_device *tmp, *etr = drvdata->csdev;
>> +
>> +     if (!IS_ENABLED(CONFIG_CORESIGHT_CATU))
>> +             return NULL;
>> +
>> +     for (i = 0; i < etr->nr_outport; i++) {
>> +             tmp = etr->conns[0].child_dev;
>> +             if (tmp && coresight_is_catu_device(tmp))
>> +                     return tmp;
>> +     }
>> +
>> +     return NULL;
>> +}
>> +
>>  #endif
>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
>> index 5e926f7..c0e1568 100644
>> --- a/include/linux/coresight.h
>> +++ b/include/linux/coresight.h
>> @@ -72,6 +72,7 @@ enum coresight_dev_subtype_source {
>>
>>  enum coresight_dev_subtype_helper {
>>       CORESIGHT_DEV_SUBTYPE_HELPER_NONE,
>> +     CORESIGHT_DEV_SUBTYPE_HELPER_CATU,
>>  };
>>
>>  /**
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-01 13:13     ` Rob Herring
@ 2018-05-03 20:32       ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:32 UTC (permalink / raw)
  To: Rob Herring
  Cc: Suzuki K Poulose, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, Mathieu Poirier,
	devicetree

On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>> We are about to add the support for ETR builtin scatter-gather mode
>> for dealing with large amount of trace buffers. However, on some of
>> the platforms, using the ETR SG mode can lock up the system due to
>> the way the ETR is connected to the memory subsystem.
>>
>> In SG mode, the ETR performs READ from the scatter-gather table to
>> fetch the next page and regular WRITE of trace data. If the READ
>> operation doesn't complete(due to the memory subsystem issues,
>> which we have seen on a couple of platforms) the trace WRITE
>> cannot proceed leading to issues. So, we by default do not
>> use the SG mode, unless it is known to be safe on the platform.
>> We define a DT property for the TMC node to specify whether we
>> have a proper SG mode.
>>
>> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: John Horley <john.horley@arm.com>
>> Cc: Robert Walker <robert.walker@arm.com>
>> Cc: devicetree@vger.kernel.org
>> Cc: frowand.list@gmail.com
>> Cc: Rob Herring <robh@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>> index cdd84d0..7c0c8f0 100644
>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>        (embedded trace router)
>>
>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>> +       use the SG mode on this system.
>> +
>
> Needs a vendor prefix.
>

Thinking further on this, do we need to make it device specific as
well - something like "arm,etr-scatter-gather"?  That way we don't
have to redefine "scatter-gather" for other ARM devices if they happen
to need the same property but for different reasons.

>>  * Optional property for CATU :
>>       * interrupts : Exactly one SPI may be listed for reporting the address
>>         error

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-03 20:32       ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>> We are about to add the support for ETR builtin scatter-gather mode
>> for dealing with large amount of trace buffers. However, on some of
>> the platforms, using the ETR SG mode can lock up the system due to
>> the way the ETR is connected to the memory subsystem.
>>
>> In SG mode, the ETR performs READ from the scatter-gather table to
>> fetch the next page and regular WRITE of trace data. If the READ
>> operation doesn't complete(due to the memory subsystem issues,
>> which we have seen on a couple of platforms) the trace WRITE
>> cannot proceed leading to issues. So, we by default do not
>> use the SG mode, unless it is known to be safe on the platform.
>> We define a DT property for the TMC node to specify whether we
>> have a proper SG mode.
>>
>> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: John Horley <john.horley@arm.com>
>> Cc: Robert Walker <robert.walker@arm.com>
>> Cc: devicetree at vger.kernel.org
>> Cc: frowand.list at gmail.com
>> Cc: Rob Herring <robh@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>> index cdd84d0..7c0c8f0 100644
>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>        (embedded trace router)
>>
>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>> +       use the SG mode on this system.
>> +
>
> Needs a vendor prefix.
>

Thinking further on this, do we need to make it device specific as
well - something like "arm,etr-scatter-gather"?  That way we don't
have to redefine "scatter-gather" for other ARM devices if they happen
to need the same property but for different reasons.

>>  * Optional property for CATU :
>>       * interrupts : Exactly one SPI may be listed for reporting the address
>>         error

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-03 20:40     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:40 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:42AM +0100, Suzuki K Poulose wrote:
> The Coresight TMC-ETR SG mode could be unsafe on a platform where
> the ETR is not properly connected to account for READ operations.
> We use a DT node property to indicate if the system is safe.
> This patch also provides a command line parameter to "force"
> the use of SG mode to override the firmware information.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Hi
> 
> This is more of a debug patch for people who may want to
> test their platform without too much of hacking. I am not
> too keen on pushing this patch in.

I am not either nor do I personally need it to test this feature.  We can leave
it in for now (and subsequent version) if you need it but we agree that I won't
queue it to my tree when the time comes.

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
>  drivers/hwtracing/coresight/coresight-tmc.c     | 7 ++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 11fc28e..03b51c3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -675,6 +675,14 @@
>  			Enable/disable the CPU sampling based debugging.
>  			0: default value, disable debugging
>  			1: enable debugging at boot time
> +	coresight_tmc.etr_force_sg
> +			[ARM, ARM64]
> +			Format: <bool>
> +			Force using the TMC ETR builtin scatter-gather mode
> +			even when it may be unsafe to use.
> +			Default : 0, do not force using the builtin SG mode.
> +				  1, Allow using the SG, ignoring the firmware
> +				     provided information.
>  
>  	cpuidle.off=1	[CPU_IDLE]
>  			disable the cpuidle sub-system
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index e38379c..c7bc681 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -20,6 +20,7 @@
>  #include <linux/err.h>
>  #include <linux/fs.h>
>  #include <linux/miscdevice.h>
> +#include <linux/module.h>
>  #include <linux/property.h>
>  #include <linux/uaccess.h>
>  #include <linux/slab.h>
> @@ -33,6 +34,8 @@
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +static bool etr_force_sg;
> +
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata)
>  {
>  	/* Ensure formatter, unformatter and hardware fifo are empty */
> @@ -307,7 +310,8 @@ const struct attribute_group *coresight_tmc_groups[] = {
>  
>  static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
>  {
> -	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
> +	return etr_force_sg ||
> +	       fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
>  }
>  
>  /* Detect and initialise the capabilities of a TMC ETR */
> @@ -482,3 +486,4 @@ static struct amba_driver tmc_driver = {
>  	.id_table	= tmc_ids,
>  };
>  builtin_amba_driver(tmc_driver);
> +module_param(etr_force_sg, bool, 0);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
@ 2018-05-03 20:40     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-03 20:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:42AM +0100, Suzuki K Poulose wrote:
> The Coresight TMC-ETR SG mode could be unsafe on a platform where
> the ETR is not properly connected to account for READ operations.
> We use a DT node property to indicate if the system is safe.
> This patch also provides a command line parameter to "force"
> the use of SG mode to override the firmware information.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Hi
> 
> This is more of a debug patch for people who may want to
> test their platform without too much of hacking. I am not
> too keen on pushing this patch in.

I am not either nor do I personally need it to test this feature.  We can leave
it in for now (and subsequent version) if you need it but we agree that I won't
queue it to my tree when the time comes.

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
>  drivers/hwtracing/coresight/coresight-tmc.c     | 7 ++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 11fc28e..03b51c3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -675,6 +675,14 @@
>  			Enable/disable the CPU sampling based debugging.
>  			0: default value, disable debugging
>  			1: enable debugging at boot time
> +	coresight_tmc.etr_force_sg
> +			[ARM, ARM64]
> +			Format: <bool>
> +			Force using the TMC ETR builtin scatter-gather mode
> +			even when it may be unsafe to use.
> +			Default : 0, do not force using the builtin SG mode.
> +				  1, Allow using the SG, ignoring the firmware
> +				     provided information.
>  
>  	cpuidle.off=1	[CPU_IDLE]
>  			disable the cpuidle sub-system
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index e38379c..c7bc681 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -20,6 +20,7 @@
>  #include <linux/err.h>
>  #include <linux/fs.h>
>  #include <linux/miscdevice.h>
> +#include <linux/module.h>
>  #include <linux/property.h>
>  #include <linux/uaccess.h>
>  #include <linux/slab.h>
> @@ -33,6 +34,8 @@
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +static bool etr_force_sg;
> +
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata)
>  {
>  	/* Ensure formatter, unformatter and hardware fifo are empty */
> @@ -307,7 +310,8 @@ const struct attribute_group *coresight_tmc_groups[] = {
>  
>  static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
>  {
> -	return fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
> +	return etr_force_sg ||
> +	       fwnode_property_present(drvdata->dev->fwnode, "scatter-gather");
>  }
>  
>  /* Detect and initialise the capabilities of a TMC ETR */
> @@ -482,3 +486,4 @@ static struct amba_driver tmc_driver = {
>  	.id_table	= tmc_ids,
>  };
>  builtin_amba_driver(tmc_driver);
> +module_param(etr_force_sg, bool, 0);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 13/27] coresight: Add generic TMC sg table framework
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-04 17:35     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-04 17:35 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley, Mathieu Poirier

On Tue, May 01, 2018 at 10:10:43AM +0100, Suzuki K Poulose wrote:
> This patch introduces a generic sg table data structure and
> associated operations. An SG table can be used to map a set
> of Data pages where the trace data could be stored by the TMC
> ETR. The information about the data pages could be stored in
> different formats, depending on the type of the underlying
> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> structure provides book keeping of the pages used for the data
> as well as the table contents. The table should be filled by
> the user of the infrastructure.
> 
> A table can be created by specifying the number of data pages
> as well as the number of table pages required to hold the
> pointers, where the latter could be different for different
> types of tables. The pages are mapped in the appropriate dma
> data direction mode (i.e, DMA_TO_DEVICE for table pages
> and DMA_FROM_DEVICE for data pages).  The framework can optionally
> accept a set of allocated data pages (e.g, perf ring buffer) and
> map them accordingly. The table and data pages are vmap'ed to allow
> easier access by the drivers. The framework also provides helpers to
> sync the data written to the pages with appropriate directions.
> 
> This will be later used by the TMC ETR SG unit and CATU.
> 
> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 284 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 +++++
>  2 files changed, 334 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7af72d7..57a8fe1 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,10 +17,294 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/slab.h>
>  #include "coresight-catu.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +/*
> + * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
> + * and map the device address @addr to an offset within the virtual
> + * contiguous buffer.
> + */
> +static long
> +tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
> +{
> +	int i;
> +	dma_addr_t page_start;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		page_start = tmc_pages->daddrs[i];
> +		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
> +			return i * PAGE_SIZE + (addr - page_start);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +/*
> + * tmc_pages_free : Unmap and free the pages used by tmc_pages.
> + */
> +static void tmc_pages_free(struct tmc_pages *tmc_pages,
> +			   struct device *dev, enum dma_data_direction dir)
> +{
> +	int i;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
> +			dma_unmap_page(dev, tmc_pages->daddrs[i],
> +					 PAGE_SIZE, dir);
> +		if (tmc_pages->pages && tmc_pages->pages[i])
> +			__free_page(tmc_pages->pages[i]);

I think it's worth adding a comment saying that because of the page count, pages
given to the infracstructure (rather than allocated) won't be free'ed by
__free_page().

> +	}
> +
> +	kfree(tmc_pages->pages);
> +	kfree(tmc_pages->daddrs);
> +	tmc_pages->pages = NULL;
> +	tmc_pages->daddrs = NULL;
> +	tmc_pages->nr_pages = 0;
> +}
> +
> +/*
> + * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
> + * If @pages is not NULL, the list of page virtual addresses are
> + * used as the data pages. The pages are then dma_map'ed for @dev
> + * with dma_direction @dir.
> + *
> + * Returns 0 upon success, else the error number.
> + */
> +static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
> +			   struct device *dev, int node,
> +			   enum dma_data_direction dir, void **pages)
> +{
> +	int i, nr_pages;
> +	dma_addr_t paddr;
> +	struct page *page;
> +
> +	nr_pages = tmc_pages->nr_pages;
> +	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->daddrs)
> +		return -ENOMEM;
> +	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->pages) {
> +		kfree(tmc_pages->daddrs);
> +		tmc_pages->daddrs = NULL;
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		if (pages && pages[i]) {
> +			page = virt_to_page(pages[i]);
> +			get_page(page);
> +		} else {
> +			page = alloc_pages_node(node,
> +						GFP_KERNEL | __GFP_ZERO, 0);
> +		}
> +		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> +		if (dma_mapping_error(dev, paddr))
> +			goto err;
> +		tmc_pages->daddrs[i] = paddr;
> +		tmc_pages->pages[i] = page;
> +	}
> +	return 0;
> +err:
> +	tmc_pages_free(tmc_pages, dev, dir);
> +	return -ENOMEM;
> +}
> +
> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return 0;
> +	return sg_table->table_daddr;
> +}
> +
> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return NULL;
> +	return sg_table->table_vaddr;
> +}
> +
> +static inline void *
> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
> +		return 0;
> +	return sg_table->data_vaddr;
> +}
> +
> +static inline long
> +tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
> +{
> +	return tmc_pages_get_offset(&sg_table->data_pages, addr);
> +}
> +
> +static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->table_vaddr)
> +		vunmap(sg_table->table_vaddr);
> +	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
> +}
> +
> +static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->data_vaddr)
> +		vunmap(sg_table->data_vaddr);
> +	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
> +}
> +
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table)
> +{
> +	tmc_free_table_pages(sg_table);
> +	tmc_free_data_pages(sg_table);
> +}
> +
> +/*
> + * Alloc pages for the table. Since this will be used by the device,
> + * allocate the pages closer to the device (i.e, dev_to_node(dev)
> + * rather than the CPU node).
> + */
> +static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	int rc;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	rc = tmc_pages_alloc(table_pages, sg_table->dev,
> +			     dev_to_node(sg_table->dev),
> +			     DMA_TO_DEVICE, NULL);
> +	if (rc)
> +		return rc;
> +	sg_table->table_vaddr = vmap(table_pages->pages,
> +				     table_pages->nr_pages,
> +				     VM_MAP,
> +				     PAGE_KERNEL);
> +	if (!sg_table->table_vaddr)
> +		rc = -ENOMEM;
> +	else
> +		sg_table->table_daddr = table_pages->daddrs[0];
> +	return rc;
> +}
> +
> +static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
> +{
> +	int rc;
> +
> +	/* Allocate data pages on the node requested by the caller */
> +	rc = tmc_pages_alloc(&sg_table->data_pages,
> +			     sg_table->dev, sg_table->node,
> +			     DMA_FROM_DEVICE, pages);
> +	if (!rc) {
> +		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
> +					   sg_table->data_pages.nr_pages,
> +					   VM_MAP,
> +					   PAGE_KERNEL);

Indentation.

> +		if (!sg_table->data_vaddr)
> +			rc = -ENOMEM;
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
> + * and data buffers. TMC writes to the data buffers and reads from the SG
> + * Table pages.
> + *
> + * @dev		- Device to which page should be DMA mapped.
> + * @node	- Numa node for mem allocations
> + * @nr_tpages	- Number of pages for the table entries.
> + * @nr_dpages	- Number of pages for Data buffer.
> + * @pages	- Optional list of virtual address of pages.
> + */
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages)
> +{
> +	long rc;
> +	struct tmc_sg_table *sg_table;
> +
> +	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
> +	if (!sg_table)
> +		return ERR_PTR(-ENOMEM);
> +	sg_table->data_pages.nr_pages = nr_dpages;
> +	sg_table->table_pages.nr_pages = nr_tpages;
> +	sg_table->node = node;
> +	sg_table->dev = dev;
> +
> +	rc  = tmc_alloc_data_pages(sg_table, pages);
> +	if (!rc)
> +		rc = tmc_alloc_table_pages(sg_table);
> +	if (rc) {
> +		tmc_free_sg_table(sg_table);
> +		kfree(sg_table);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return sg_table;
> +}
> +
> +/*
> + * tmc_sg_table_sync_data_range: Sync the data buffer written
> + * by the device from @offset upto a @size bytes.
> + */
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size)
> +{
> +	int i, index, start;
> +	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
> +	struct device *dev = table->dev;
> +	struct tmc_pages *data = &table->data_pages;
> +
> +	start = offset >> PAGE_SHIFT;
> +	for (i = start; i < (start + npages); i++) {
> +		index = i % data->nr_pages;
> +		dma_sync_single_for_cpu(dev, data->daddrs[index],
> +					PAGE_SIZE, DMA_FROM_DEVICE);
> +	}
> +}
> +
> +/* tmc_sg_sync_table: Sync the page table */
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
> +{
> +	int i;
> +	struct device *dev = sg_table->dev;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	for (i = 0; i < table_pages->nr_pages; i++)
> +		dma_sync_single_for_device(dev, table_pages->daddrs[i],
> +					   PAGE_SIZE, DMA_TO_DEVICE);
> +}
> +
> +/*
> + * tmc_sg_table_get_data: Get the buffer pointer for data @offset
> + * in the SG buffer. The @bufpp is updated to point to the buffer.
> + * Returns :
> + *	the length of linear data available at @offset.
> + *	or
> + *	<= 0 if no data is available.
> + */
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +				u64 offset, size_t len, char **bufpp)

Indentation

> +{
> +	size_t size;
> +	int pg_idx = offset >> PAGE_SHIFT;
> +	int pg_offset = offset & (PAGE_SIZE - 1);
> +	struct tmc_pages *data_pages = &sg_table->data_pages;
> +
> +	size = tmc_sg_table_buf_size(sg_table);
> +	if (offset >= size)
> +		return -EINVAL;

        /* Make sure we don't go beyond the page array */

> +	len = (len < (size - offset)) ? len : size - offset;

        /* Respect page boundaries */

> +	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
> +	if (len > 0)
> +		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
> +	return len;
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 9cbc4d5..74d8f24 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -19,6 +19,7 @@
>  #define _CORESIGHT_TMC_H
>  
>  #include <linux/miscdevice.h>
> +#include <linux/dma-mapping.h>

Alphabetial order.

>  #include "coresight-catu.h"
>  
>  #define TMC_RSZ			0x004
> @@ -172,6 +173,38 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +/**
> + * struct tmc_pages - Collection of pages used for SG.
> + * @nr_pages:		Number of pages in the list.
> + * @daddrs:		Array of DMA'able page address.
> + * @pages:		Array pages for the buffer.
> + */
> +struct tmc_pages {
> +	int nr_pages;
> +	dma_addr_t	*daddrs;
> +	struct page	**pages;
> +};
> +
> +/*
> + * struct tmc_sg_table - Generic SG table for TMC
> + * @dev:		Device for DMA allocations
> + * @table_vaddr:	Contiguous Virtual address for PageTable
> + * @data_vaddr:		Contiguous Virtual address for Data Buffer
> + * @table_daddr:	DMA address of the PageTable base
> + * @node:		Node for Page allocations
> + * @table_pages:	List of pages & dma address for Table
> + * @data_pages:		List of pages & dma address for Data
> + */
> +struct tmc_sg_table {
> +	struct device *dev;
> +	void *table_vaddr;
> +	void *data_vaddr;
> +	dma_addr_t table_daddr;
> +	int node;
> +	struct tmc_pages table_pages;
> +	struct tmc_pages data_pages;
> +};
> +
>  /* Generic functions */
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
>  void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
> @@ -253,4 +286,21 @@ tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
>  	return NULL;
>  }
>  
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages);
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size);
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp);
> +static inline unsigned long
> +tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
> +{
> +	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
> +}
> +
>  #endif
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 13/27] coresight: Add generic TMC sg table framework
@ 2018-05-04 17:35     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-04 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:43AM +0100, Suzuki K Poulose wrote:
> This patch introduces a generic sg table data structure and
> associated operations. An SG table can be used to map a set
> of Data pages where the trace data could be stored by the TMC
> ETR. The information about the data pages could be stored in
> different formats, depending on the type of the underlying
> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> structure provides book keeping of the pages used for the data
> as well as the table contents. The table should be filled by
> the user of the infrastructure.
> 
> A table can be created by specifying the number of data pages
> as well as the number of table pages required to hold the
> pointers, where the latter could be different for different
> types of tables. The pages are mapped in the appropriate dma
> data direction mode (i.e, DMA_TO_DEVICE for table pages
> and DMA_FROM_DEVICE for data pages).  The framework can optionally
> accept a set of allocated data pages (e.g, perf ring buffer) and
> map them accordingly. The table and data pages are vmap'ed to allow
> easier access by the drivers. The framework also provides helpers to
> sync the data written to the pages with appropriate directions.
> 
> This will be later used by the TMC ETR SG unit and CATU.
> 
> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 284 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 +++++
>  2 files changed, 334 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7af72d7..57a8fe1 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,10 +17,294 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/slab.h>
>  #include "coresight-catu.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +/*
> + * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
> + * and map the device address @addr to an offset within the virtual
> + * contiguous buffer.
> + */
> +static long
> +tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
> +{
> +	int i;
> +	dma_addr_t page_start;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		page_start = tmc_pages->daddrs[i];
> +		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
> +			return i * PAGE_SIZE + (addr - page_start);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +/*
> + * tmc_pages_free : Unmap and free the pages used by tmc_pages.
> + */
> +static void tmc_pages_free(struct tmc_pages *tmc_pages,
> +			   struct device *dev, enum dma_data_direction dir)
> +{
> +	int i;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
> +			dma_unmap_page(dev, tmc_pages->daddrs[i],
> +					 PAGE_SIZE, dir);
> +		if (tmc_pages->pages && tmc_pages->pages[i])
> +			__free_page(tmc_pages->pages[i]);

I think it's worth adding a comment saying that because of the page count, pages
given to the infracstructure (rather than allocated) won't be free'ed by
__free_page().

> +	}
> +
> +	kfree(tmc_pages->pages);
> +	kfree(tmc_pages->daddrs);
> +	tmc_pages->pages = NULL;
> +	tmc_pages->daddrs = NULL;
> +	tmc_pages->nr_pages = 0;
> +}
> +
> +/*
> + * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
> + * If @pages is not NULL, the list of page virtual addresses are
> + * used as the data pages. The pages are then dma_map'ed for @dev
> + * with dma_direction @dir.
> + *
> + * Returns 0 upon success, else the error number.
> + */
> +static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
> +			   struct device *dev, int node,
> +			   enum dma_data_direction dir, void **pages)
> +{
> +	int i, nr_pages;
> +	dma_addr_t paddr;
> +	struct page *page;
> +
> +	nr_pages = tmc_pages->nr_pages;
> +	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->daddrs)
> +		return -ENOMEM;
> +	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->pages) {
> +		kfree(tmc_pages->daddrs);
> +		tmc_pages->daddrs = NULL;
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		if (pages && pages[i]) {
> +			page = virt_to_page(pages[i]);
> +			get_page(page);
> +		} else {
> +			page = alloc_pages_node(node,
> +						GFP_KERNEL | __GFP_ZERO, 0);
> +		}
> +		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> +		if (dma_mapping_error(dev, paddr))
> +			goto err;
> +		tmc_pages->daddrs[i] = paddr;
> +		tmc_pages->pages[i] = page;
> +	}
> +	return 0;
> +err:
> +	tmc_pages_free(tmc_pages, dev, dir);
> +	return -ENOMEM;
> +}
> +
> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return 0;
> +	return sg_table->table_daddr;
> +}
> +
> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return NULL;
> +	return sg_table->table_vaddr;
> +}
> +
> +static inline void *
> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
> +		return 0;
> +	return sg_table->data_vaddr;
> +}
> +
> +static inline long
> +tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
> +{
> +	return tmc_pages_get_offset(&sg_table->data_pages, addr);
> +}
> +
> +static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->table_vaddr)
> +		vunmap(sg_table->table_vaddr);
> +	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
> +}
> +
> +static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->data_vaddr)
> +		vunmap(sg_table->data_vaddr);
> +	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
> +}
> +
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table)
> +{
> +	tmc_free_table_pages(sg_table);
> +	tmc_free_data_pages(sg_table);
> +}
> +
> +/*
> + * Alloc pages for the table. Since this will be used by the device,
> + * allocate the pages closer to the device (i.e, dev_to_node(dev)
> + * rather than the CPU node).
> + */
> +static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	int rc;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	rc = tmc_pages_alloc(table_pages, sg_table->dev,
> +			     dev_to_node(sg_table->dev),
> +			     DMA_TO_DEVICE, NULL);
> +	if (rc)
> +		return rc;
> +	sg_table->table_vaddr = vmap(table_pages->pages,
> +				     table_pages->nr_pages,
> +				     VM_MAP,
> +				     PAGE_KERNEL);
> +	if (!sg_table->table_vaddr)
> +		rc = -ENOMEM;
> +	else
> +		sg_table->table_daddr = table_pages->daddrs[0];
> +	return rc;
> +}
> +
> +static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
> +{
> +	int rc;
> +
> +	/* Allocate data pages on the node requested by the caller */
> +	rc = tmc_pages_alloc(&sg_table->data_pages,
> +			     sg_table->dev, sg_table->node,
> +			     DMA_FROM_DEVICE, pages);
> +	if (!rc) {
> +		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
> +					   sg_table->data_pages.nr_pages,
> +					   VM_MAP,
> +					   PAGE_KERNEL);

Indentation.

> +		if (!sg_table->data_vaddr)
> +			rc = -ENOMEM;
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
> + * and data buffers. TMC writes to the data buffers and reads from the SG
> + * Table pages.
> + *
> + * @dev		- Device to which page should be DMA mapped.
> + * @node	- Numa node for mem allocations
> + * @nr_tpages	- Number of pages for the table entries.
> + * @nr_dpages	- Number of pages for Data buffer.
> + * @pages	- Optional list of virtual address of pages.
> + */
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages)
> +{
> +	long rc;
> +	struct tmc_sg_table *sg_table;
> +
> +	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
> +	if (!sg_table)
> +		return ERR_PTR(-ENOMEM);
> +	sg_table->data_pages.nr_pages = nr_dpages;
> +	sg_table->table_pages.nr_pages = nr_tpages;
> +	sg_table->node = node;
> +	sg_table->dev = dev;
> +
> +	rc  = tmc_alloc_data_pages(sg_table, pages);
> +	if (!rc)
> +		rc = tmc_alloc_table_pages(sg_table);
> +	if (rc) {
> +		tmc_free_sg_table(sg_table);
> +		kfree(sg_table);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return sg_table;
> +}
> +
> +/*
> + * tmc_sg_table_sync_data_range: Sync the data buffer written
> + * by the device from @offset upto a @size bytes.
> + */
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size)
> +{
> +	int i, index, start;
> +	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
> +	struct device *dev = table->dev;
> +	struct tmc_pages *data = &table->data_pages;
> +
> +	start = offset >> PAGE_SHIFT;
> +	for (i = start; i < (start + npages); i++) {
> +		index = i % data->nr_pages;
> +		dma_sync_single_for_cpu(dev, data->daddrs[index],
> +					PAGE_SIZE, DMA_FROM_DEVICE);
> +	}
> +}
> +
> +/* tmc_sg_sync_table: Sync the page table */
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
> +{
> +	int i;
> +	struct device *dev = sg_table->dev;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	for (i = 0; i < table_pages->nr_pages; i++)
> +		dma_sync_single_for_device(dev, table_pages->daddrs[i],
> +					   PAGE_SIZE, DMA_TO_DEVICE);
> +}
> +
> +/*
> + * tmc_sg_table_get_data: Get the buffer pointer for data @offset
> + * in the SG buffer. The @bufpp is updated to point to the buffer.
> + * Returns :
> + *	the length of linear data available at @offset.
> + *	or
> + *	<= 0 if no data is available.
> + */
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +				u64 offset, size_t len, char **bufpp)

Indentation

> +{
> +	size_t size;
> +	int pg_idx = offset >> PAGE_SHIFT;
> +	int pg_offset = offset & (PAGE_SIZE - 1);
> +	struct tmc_pages *data_pages = &sg_table->data_pages;
> +
> +	size = tmc_sg_table_buf_size(sg_table);
> +	if (offset >= size)
> +		return -EINVAL;

        /* Make sure we don't go beyond the page array */

> +	len = (len < (size - offset)) ? len : size - offset;

        /* Respect page boundaries */

> +	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
> +	if (len > 0)
> +		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
> +	return len;
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 9cbc4d5..74d8f24 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -19,6 +19,7 @@
>  #define _CORESIGHT_TMC_H
>  
>  #include <linux/miscdevice.h>
> +#include <linux/dma-mapping.h>

Alphabetial order.

>  #include "coresight-catu.h"
>  
>  #define TMC_RSZ			0x004
> @@ -172,6 +173,38 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +/**
> + * struct tmc_pages - Collection of pages used for SG.
> + * @nr_pages:		Number of pages in the list.
> + * @daddrs:		Array of DMA'able page address.
> + * @pages:		Array pages for the buffer.
> + */
> +struct tmc_pages {
> +	int nr_pages;
> +	dma_addr_t	*daddrs;
> +	struct page	**pages;
> +};
> +
> +/*
> + * struct tmc_sg_table - Generic SG table for TMC
> + * @dev:		Device for DMA allocations
> + * @table_vaddr:	Contiguous Virtual address for PageTable
> + * @data_vaddr:		Contiguous Virtual address for Data Buffer
> + * @table_daddr:	DMA address of the PageTable base
> + * @node:		Node for Page allocations
> + * @table_pages:	List of pages & dma address for Table
> + * @data_pages:		List of pages & dma address for Data
> + */
> +struct tmc_sg_table {
> +	struct device *dev;
> +	void *table_vaddr;
> +	void *data_vaddr;
> +	dma_addr_t table_daddr;
> +	int node;
> +	struct tmc_pages table_pages;
> +	struct tmc_pages data_pages;
> +};
> +
>  /* Generic functions */
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
>  void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
> @@ -253,4 +286,21 @@ tmc_etr_get_catu_device(struct tmc_drvdata *drvdata)
>  	return NULL;
>  }
>  
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages);
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size);
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp);
> +static inline unsigned long
> +tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
> +{
> +	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
> +}
> +
>  #endif
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-03 20:32       ` Mathieu Poirier
@ 2018-05-04 22:56         ` Rob Herring
  -1 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-04 22:56 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Suzuki K Poulose, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, Mathieu Poirier,
	devicetree

On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
<mathieu.poirier@linaro.org> wrote:
> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>> We are about to add the support for ETR builtin scatter-gather mode
>>> for dealing with large amount of trace buffers. However, on some of
>>> the platforms, using the ETR SG mode can lock up the system due to
>>> the way the ETR is connected to the memory subsystem.
>>>
>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>> fetch the next page and regular WRITE of trace data. If the READ
>>> operation doesn't complete(due to the memory subsystem issues,
>>> which we have seen on a couple of platforms) the trace WRITE
>>> cannot proceed leading to issues. So, we by default do not
>>> use the SG mode, unless it is known to be safe on the platform.
>>> We define a DT property for the TMC node to specify whether we
>>> have a proper SG mode.
>>>
>>> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: John Horley <john.horley@arm.com>
>>> Cc: Robert Walker <robert.walker@arm.com>
>>> Cc: devicetree@vger.kernel.org
>>> Cc: frowand.list@gmail.com
>>> Cc: Rob Herring <robh@kernel.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index cdd84d0..7c0c8f0 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>        (embedded trace router)
>>>
>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>> +       use the SG mode on this system.
>>> +
>>
>> Needs a vendor prefix.
>>
>
> Thinking further on this, do we need to make it device specific as
> well - something like "arm,etr-scatter-gather"?  That way we don't
> have to redefine "scatter-gather" for other ARM devices if they happen
> to need the same property but for different reasons.

No. If we had a bunch of cases, then we'd probably want to have just
'scatter-gather'.

BTW, if SG had already been supported, then I'd say this is a quirk
and we should invert this property. Otherwise, you'd be disabling once
enabled SG and require working platforms to update their dtb. Of
course, I shouldn't really let the state of an OS driver influence the
DT binding.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-04 22:56         ` Rob Herring
  0 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-04 22:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
<mathieu.poirier@linaro.org> wrote:
> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>> We are about to add the support for ETR builtin scatter-gather mode
>>> for dealing with large amount of trace buffers. However, on some of
>>> the platforms, using the ETR SG mode can lock up the system due to
>>> the way the ETR is connected to the memory subsystem.
>>>
>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>> fetch the next page and regular WRITE of trace data. If the READ
>>> operation doesn't complete(due to the memory subsystem issues,
>>> which we have seen on a couple of platforms) the trace WRITE
>>> cannot proceed leading to issues. So, we by default do not
>>> use the SG mode, unless it is known to be safe on the platform.
>>> We define a DT property for the TMC node to specify whether we
>>> have a proper SG mode.
>>>
>>> Cc: Mathieu Poirier <matheiu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: John Horley <john.horley@arm.com>
>>> Cc: Robert Walker <robert.walker@arm.com>
>>> Cc: devicetree at vger.kernel.org
>>> Cc: frowand.list at gmail.com
>>> Cc: Rob Herring <robh@kernel.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>  Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>  drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index cdd84d0..7c0c8f0 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>       * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>        (embedded trace router)
>>>
>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>> +       use the SG mode on this system.
>>> +
>>
>> Needs a vendor prefix.
>>
>
> Thinking further on this, do we need to make it device specific as
> well - something like "arm,etr-scatter-gather"?  That way we don't
> have to redefine "scatter-gather" for other ARM devices if they happen
> to need the same property but for different reasons.

No. If we had a bunch of cases, then we'd probably want to have just
'scatter-gather'.

BTW, if SG had already been supported, then I'd say this is a quirk
and we should invert this property. Otherwise, you'd be disabling once
enabled SG and require working platforms to update their dtb. Of
course, I shouldn't really let the state of an OS driver influence the
DT binding.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 03/27] coresight: Add helper device type
  2018-05-03 17:00     ` Mathieu Poirier
@ 2018-05-05  9:56       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-05  9:56 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 05/03/2018 06:00 PM, Mathieu Poirier wrote:

...

>> +/*
>> + * coresight_release_device - Release this device and any of the helper
>> + * devices connected to it for trace operation.
>> + */
>> +static void coresight_release_device(struct coresight_device *csdev)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < csdev->nr_outport; i++) {
>> +		struct coresight_device *child = csdev->conns[i].child_dev;
>> +
>> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
>> +			pm_runtime_put(child->dev.parent);
>> +	}
> 
> There is a newline here in coresight_prepare_device().  Either add one (or not)
> in both function but please be consistent.
> 

>> @@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
>>   
>>   	node->csdev = csdev;
>>   	list_add(&node->link, path);
>> -	pm_runtime_get_sync(csdev->dev.parent);
>> -
>> +	coresight_prepare_device(csdev);
> 
> There was a newline between pm_runtime_get_sync() and the return statement in
> the original code.
> 


>> @@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
>>   		.name = "source",
>>   		.groups = coresight_source_groups,
>>   	},
>> +	{
>> +		.name = "helper",
>> +	},
>> +
> 
> Extra newline.
> 

>>   };   
>> +/**
>> + * struct coresight_ops_helper - Operations for a helper device.
>> + *
>> + * All operations could pass in a device specific data, which could
>> + * help the helper device to determine what to do.
>> + *
>> + * @enable	: Turn the device ON.
>> + * @disable	: Turn the device OFF.
> 
> There is a discrepancy between the comment and the operations, i.e enabling a
> device is not synonymous of turning it on.  Looking at patch 04/27 the ops is
> called in tmc_etr_enable/disable_catu() so the comment propably needs to be
> changed.

Sure, will fix all of them.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 03/27] coresight: Add helper device type
@ 2018-05-05  9:56       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-05  9:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/2018 06:00 PM, Mathieu Poirier wrote:

...

>> +/*
>> + * coresight_release_device - Release this device and any of the helper
>> + * devices connected to it for trace operation.
>> + */
>> +static void coresight_release_device(struct coresight_device *csdev)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < csdev->nr_outport; i++) {
>> +		struct coresight_device *child = csdev->conns[i].child_dev;
>> +
>> +		if (child && child->type == CORESIGHT_DEV_TYPE_HELPER)
>> +			pm_runtime_put(child->dev.parent);
>> +	}
> 
> There is a newline here in coresight_prepare_device().  Either add one (or not)
> in both function but please be consistent.
> 

>> @@ -480,8 +517,7 @@ static int _coresight_build_path(struct coresight_device *csdev,
>>   
>>   	node->csdev = csdev;
>>   	list_add(&node->link, path);
>> -	pm_runtime_get_sync(csdev->dev.parent);
>> -
>> +	coresight_prepare_device(csdev);
> 
> There was a newline between pm_runtime_get_sync() and the return statement in
> the original code.
> 


>> @@ -775,6 +811,10 @@ static struct device_type coresight_dev_type[] = {
>>   		.name = "source",
>>   		.groups = coresight_source_groups,
>>   	},
>> +	{
>> +		.name = "helper",
>> +	},
>> +
> 
> Extra newline.
> 

>>   };   
>> +/**
>> + * struct coresight_ops_helper - Operations for a helper device.
>> + *
>> + * All operations could pass in a device specific data, which could
>> + * help the helper device to determine what to do.
>> + *
>> + * @enable	: Turn the device ON.
>> + * @disable	: Turn the device OFF.
> 
> There is a discrepancy between the comment and the operations, i.e enabling a
> device is not synonymous of turning it on.  Looking at patch 04/27 the ops is
> called in tmc_etr_enable/disable_catu() so the comment propably needs to be
> changed.

Sure, will fix all of them.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
  2018-05-03 20:25       ` Mathieu Poirier
@ 2018-05-05 10:03         ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-05 10:03 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, Rob Herring, John Horley

On 05/03/2018 09:25 PM, Mathieu Poirier wrote:
> On 3 May 2018 at 11:31, Mathieu Poirier <mathieu.poirier@linaro.org> wrote:
>> On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
>>> Add the initial support for Coresight Address Translation Unit, which
>>> augments the TMC in Coresight SoC-600 by providing an improved Scatter
>>> Gather mechanism. CATU is always connected to a single TMC-ETR and
>>> converts the AXI address with a translated address (from a given SG
>>> table with specific format). The CATU should be programmed in pass
>>> through mode and enabled if the ETR doesn't translation by CATU.
>>>
>>> This patch provides mechanism to enable/disable the CATU always in the
>>> pass through mode.
>>>
>>> We reuse the existing ports mechanism to link the TMC-ETR to the
>>> connected CATU.
>>>
>>> i.e, TMC-ETR:output_port0 -> CATU:input_port0
>>>
>>> Reference manual for  CATU component is avilable in version r2p0 of :
>>> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
>>> under Section 4.9.
>>
>> Please remove the part about the TRM as it is bound to change.

Ok, I will. Generally the TRM for a particular release (rXpY) doesn't
change, unless there is a change in either X or Y or both.

>>>
>>> +config CORESIGHT_CATU
>>> +     bool "Coresight Address Translation Unit (CATU) driver"
>>> +     depends on CORESIGHT_LINK_AND_SINK_TMC
>>> +     help
>>> +        Enable support for the Coresight Address Translation Unit (CATU).
>>> +        CATU supports a scatter gather table of 4K pages, with forward/backward
>>> +        lookup. CATU helps TMC ETR to use large physically non-contiguous trace
>>> +        buffer by translating the addersses used by ETR to the corresponding
>>> +        physical adderss by looking up the table.
>>
>> There is a couple of typos in the last sentence.
> 
> There's also a typo in the patch title.
> 

>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>>> new file mode 100644
>>> index 0000000..2cd69a6
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>>> @@ -0,0 +1,195 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>
>> Extra line
>>
>>> +/*
>>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>>
>> You sure you don't want to bump this to 2018?
>>


>>> + *
>>> + * Coresight Address Translation Unit support
>>> + *
>>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> + */
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/device.h>
>>> +#include <linux/amba/bus.h>
>>> +#include <linux/io.h>
>>> +#include <linux/slab.h>
>>
>> List in alphabetical order is possible.
>>

>>> +static int catu_disable(struct coresight_device *csdev, void *__unused)
>>> +{
>>> +     int rc;
>>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>>> +
>>> +     CS_UNLOCK(catu_drvdata->base);
>>> +     rc = catu_disable_hw(catu_drvdata);
>>> +     CS_LOCK(catu_drvdata->base);
>>> +
>>
>> I suppose you can remove the extra line as catu_enable() doesn't have one.
>>
>>> +     return rc;
>>> +}

>>> +     drvdata->base = base;
>>> +     catu_desc.pdata = pdata;
>>> +     catu_desc.dev = dev;
>>> +     catu_desc.groups = catu_groups;
>>> +     catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
>>> +     catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>>> +     catu_desc.ops = &catu_ops;
>>> +     drvdata->csdev = coresight_register(&catu_desc);
>>> +     if (IS_ERR(drvdata->csdev))
>>> +             ret = PTR_ERR(drvdata->csdev);
>>> +     if (!ret)
>>> +             dev_info(drvdata->dev, "initialized\n");
>>
>> Please remove as it 1) doesn't convey HW related information and 2) the TMC
>> doesn't out put anything.
>>

>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>>> new file mode 100644
>>> index 0000000..cd58d6f
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>>> @@ -0,0 +1,89 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +
>>
>> Extra line
>>
>>> +/*
>>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>>> + *
>>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> + *
>>
>> Extra line. In coresight-catu.c there isn't one.
>>

>>> +#define CATU_STATUS_READY    8
>>> +#define CATU_STATUS_ADRERR   0
>>> +#define CATU_STATUS_AXIERR   4
>>> +
>>> +
>>
>> Extra line.
>>
>>> +#define CATU_IRQEN_ON                0x1
>>> +#define CATU_IRQEN_OFF               0x0
>>> +
>>> +
>>
>> Extra line.
>>

Will address all the above.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit
@ 2018-05-05 10:03         ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-05 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/2018 09:25 PM, Mathieu Poirier wrote:
> On 3 May 2018 at 11:31, Mathieu Poirier <mathieu.poirier@linaro.org> wrote:
>> On Tue, May 01, 2018 at 10:10:34AM +0100, Suzuki K Poulose wrote:
>>> Add the initial support for Coresight Address Translation Unit, which
>>> augments the TMC in Coresight SoC-600 by providing an improved Scatter
>>> Gather mechanism. CATU is always connected to a single TMC-ETR and
>>> converts the AXI address with a translated address (from a given SG
>>> table with specific format). The CATU should be programmed in pass
>>> through mode and enabled if the ETR doesn't translation by CATU.
>>>
>>> This patch provides mechanism to enable/disable the CATU always in the
>>> pass through mode.
>>>
>>> We reuse the existing ports mechanism to link the TMC-ETR to the
>>> connected CATU.
>>>
>>> i.e, TMC-ETR:output_port0 -> CATU:input_port0
>>>
>>> Reference manual for  CATU component is avilable in version r2p0 of :
>>> "Arm Coresight System-on-Chip SoC-600 Technical Reference Manual",
>>> under Section 4.9.
>>
>> Please remove the part about the TRM as it is bound to change.

Ok, I will. Generally the TRM for a particular release (rXpY) doesn't
change, unless there is a change in either X or Y or both.

>>>
>>> +config CORESIGHT_CATU
>>> +     bool "Coresight Address Translation Unit (CATU) driver"
>>> +     depends on CORESIGHT_LINK_AND_SINK_TMC
>>> +     help
>>> +        Enable support for the Coresight Address Translation Unit (CATU).
>>> +        CATU supports a scatter gather table of 4K pages, with forward/backward
>>> +        lookup. CATU helps TMC ETR to use large physically non-contiguous trace
>>> +        buffer by translating the addersses used by ETR to the corresponding
>>> +        physical adderss by looking up the table.
>>
>> There is a couple of typos in the last sentence.
> 
> There's also a typo in the patch title.
> 

>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>>> new file mode 100644
>>> index 0000000..2cd69a6
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>>> @@ -0,0 +1,195 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>
>> Extra line
>>
>>> +/*
>>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>>
>> You sure you don't want to bump this to 2018?
>>


>>> + *
>>> + * Coresight Address Translation Unit support
>>> + *
>>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> + */
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/device.h>
>>> +#include <linux/amba/bus.h>
>>> +#include <linux/io.h>
>>> +#include <linux/slab.h>
>>
>> List in alphabetical order is possible.
>>

>>> +static int catu_disable(struct coresight_device *csdev, void *__unused)
>>> +{
>>> +     int rc;
>>> +     struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
>>> +
>>> +     CS_UNLOCK(catu_drvdata->base);
>>> +     rc = catu_disable_hw(catu_drvdata);
>>> +     CS_LOCK(catu_drvdata->base);
>>> +
>>
>> I suppose you can remove the extra line as catu_enable() doesn't have one.
>>
>>> +     return rc;
>>> +}

>>> +     drvdata->base = base;
>>> +     catu_desc.pdata = pdata;
>>> +     catu_desc.dev = dev;
>>> +     catu_desc.groups = catu_groups;
>>> +     catu_desc.type = CORESIGHT_DEV_TYPE_HELPER;
>>> +     catu_desc.subtype.helper_subtype = CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>>> +     catu_desc.ops = &catu_ops;
>>> +     drvdata->csdev = coresight_register(&catu_desc);
>>> +     if (IS_ERR(drvdata->csdev))
>>> +             ret = PTR_ERR(drvdata->csdev);
>>> +     if (!ret)
>>> +             dev_info(drvdata->dev, "initialized\n");
>>
>> Please remove as it 1) doesn't convey HW related information and 2) the TMC
>> doesn't out put anything.
>>

>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>>> new file mode 100644
>>> index 0000000..cd58d6f
>>> --- /dev/null
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>>> @@ -0,0 +1,89 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +
>>
>> Extra line
>>
>>> +/*
>>> + * Copyright (C) 2017 ARM Limited. All rights reserved.
>>> + *
>>> + * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> + *
>>
>> Extra line. In coresight-catu.c there isn't one.
>>

>>> +#define CATU_STATUS_READY    8
>>> +#define CATU_STATUS_ADRERR   0
>>> +#define CATU_STATUS_AXIERR   4
>>> +
>>> +
>>
>> Extra line.
>>
>>> +#define CATU_IRQEN_ON                0x1
>>> +#define CATU_IRQEN_OFF               0x0
>>> +
>>> +
>>
>> Extra line.
>>

Will address all the above.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 16/27] coresight: tmc-etr: Add transparent buffer management
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 17:20     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 17:20 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:46AM +0100, Suzuki K Poulose wrote:
> At the moment we always use contiguous memory for TMC ETR tracing
> when used from sysfs. The size of the buffer is fixed at boot time
> and can only be changed by modifiying the DT. With the introduction
> of SG support we could support really large buffers in that mode.
> This patch abstracts the buffer used for ETR to switch between a
> contiguous buffer or a SG table depending on the availability of
> the memory.
> 
> This also enables the sysfs mode to use the ETR in SG mode depending
> on configured the trace buffer size. Also, since ETR will use the
> new infrastructure to manage the buffer, we can get rid of some
> of the members in the tmc_drvdata and clean up the fields a bit.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 451 +++++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
>  2 files changed, 418 insertions(+), 90 deletions(-)

Good work on cleanly dealing with the different modes of operation from sysFS.
It is that kind of kitchen work I was too lazy to do when first worked on this
driver.

One a side note this patch doesn't apply on my CS next branch due to some
refactoring in tmc_enable_etr_sink_sysfs().  No need to worry about it for this
iteration though.

> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index d18043d..fde3fa6 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -23,6 +23,13 @@
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +struct etr_flat_buf {
> +	struct device	*dev;
> +	dma_addr_t	daddr;
> +	void		*vaddr;
> +	size_t		size;
> +};
> +
>  /*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
> @@ -666,7 +673,7 @@ tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
>   * @size	- Total size of the data buffer
>   * @pages	- Optional list of page virtual address
>   */
> -static struct etr_sg_table __maybe_unused *
> +static struct etr_sg_table *
>  tmc_init_etr_sg_table(struct device *dev, int node,
>  		  unsigned long size, void **pages)
>  {
> @@ -702,6 +709,296 @@ tmc_init_etr_sg_table(struct device *dev, int node,
>  	return etr_table;
>  }
>  
> +/*
> + * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
> + */
> +static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
> +				  struct etr_buf *etr_buf, int node,
> +				  void **pages)
> +{
> +	struct etr_flat_buf *flat_buf;
> +
> +	/* We cannot reuse existing pages for flat buf */
> +	if (pages)
> +		return -EINVAL;
> +
> +	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
> +	if (!flat_buf)
> +		return -ENOMEM;
> +
> +	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
> +					   &flat_buf->daddr, GFP_KERNEL);
> +	if (!flat_buf->vaddr) {
> +		kfree(flat_buf);
> +		return -ENOMEM;
> +	}
> +
> +	flat_buf->size = etr_buf->size;
> +	flat_buf->dev = drvdata->dev;
> +	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->mode = ETR_MODE_FLAT;
> +	etr_buf->private = flat_buf;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (flat_buf && flat_buf->daddr)
> +		dma_free_coherent(flat_buf->dev, flat_buf->size,
> +				  flat_buf->vaddr, flat_buf->daddr);
> +	kfree(flat_buf);
> +}
> +
> +static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	/*
> +	 * Adjust the buffer to point to the beginning of the trace data
> +	 * and update the available trace data.
> +	 */
> +	etr_buf->offset = rrp - etr_buf->hwaddr;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = rwp - rrp;
> +}
> +
> +static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
> +					 u64 offset, size_t len, char **bufpp)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	*bufpp = (char *)flat_buf->vaddr + offset;
> +	/*
> +	 * tmc_etr_buf_get_data already adjusts the length to handle
> +	 * buffer wrapping around.
> +	 */
> +	return len;
> +}
> +
> +static const struct etr_buf_operations etr_flat_buf_ops = {
> +	.alloc = tmc_etr_alloc_flat_buf,
> +	.free = tmc_etr_free_flat_buf,
> +	.sync = tmc_etr_sync_flat_buf,
> +	.get_data = tmc_etr_get_data_flat_buf,
> +};
> +
> +/*
> + * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
> + * appropriately.
> + */
> +static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
> +				struct etr_buf *etr_buf, int node,
> +				void **pages)
> +{
> +	struct etr_sg_table *etr_table;
> +
> +	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
> +					  etr_buf->size, pages);
> +	if (IS_ERR(etr_table))
> +		return -ENOMEM;
> +	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->mode = ETR_MODE_ETR_SG;
> +	etr_buf->private = etr_table;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	if (etr_table) {
> +		tmc_free_sg_table(etr_table->sg_table);
> +		kfree(etr_table);
> +	}
> +}
> +
> +static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
> +				       size_t len, char **bufpp)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
> +}
> +
> +static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	long r_offset, w_offset;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct tmc_sg_table *table = etr_table->sg_table;
> +
> +	/* Convert hw address to offset in the buffer */
> +	r_offset = tmc_sg_get_data_page_offset(table, rrp);
> +	if (r_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RRP %llx to offset\n", rrp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	w_offset = tmc_sg_get_data_page_offset(table, rwp);
> +	if (w_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RWP %llx to offset\n", rwp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +				w_offset - r_offset;
> +	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
> +}
> +
> +static const struct etr_buf_operations etr_sg_buf_ops = {
> +	.alloc = tmc_etr_alloc_sg_buf,
> +	.free = tmc_etr_free_sg_buf,
> +	.sync = tmc_etr_sync_sg_buf,
> +	.get_data = tmc_etr_get_data_sg_buf,
> +};
> +
> +static const struct etr_buf_operations *etr_buf_ops[] = {
> +	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
> +	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +};
> +
> +static inline int tmc_etr_mode_alloc_buf(int mode,
> +					 struct tmc_drvdata *drvdata,
> +					 struct etr_buf *etr_buf, int node,
> +					 void **pages)
> +{
> +	int rc;
> +
> +	switch (mode) {
> +	case ETR_MODE_FLAT:
> +	case ETR_MODE_ETR_SG:
> +		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
> +		if (!rc)
> +			etr_buf->ops = etr_buf_ops[mode];
> +		return rc;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
> + * @drvdata	: ETR device details.
> + * @size	: size of the requested buffer.
> + * @flags	: Required properties for the buffer.
> + * @node	: Node for memory allocations.
> + * @pages	: An optional list of pages.
> + */
> +static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
> +					 ssize_t size, int flags,
> +					 int node, void **pages)
> +{
> +	int rc = -ENOMEM;
> +	bool has_etr_sg, has_iommu;
> +	struct etr_buf *etr_buf;
> +
> +	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
> +	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +
> +	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
> +	if (!etr_buf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	etr_buf->size = size;
> +
> +	/*
> +	 * If we have to use an existing list of pages, we cannot reliably
> +	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
> +	 * we use the contiguous DMA memory if at least one of the following
> +	 * conditions is true:
> +	 *  a) The ETR cannot use Scatter-Gather.
> +	 *  b) we have a backing IOMMU
> +	 *  c) The requested memory size is smaller (< 1M).
> +	 *
> +	 * Fallback to available mechanisms.
> +	 *
> +	 */
> +	if (!pages &&
> +	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc && has_etr_sg)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc) {
> +		kfree(etr_buf);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return etr_buf;
> +}
> +
> +static void tmc_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
> +	etr_buf->ops->free(etr_buf);
> +	kfree(etr_buf);
> +}
> +
> +/*
> + * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
> + * with a maximum of @len bytes.
> + * Returns: The size of the linear data available @pos, with *bufpp
> + * updated to point to the buffer.
> + */
> +static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
> +				    u64 offset, size_t len, char **bufpp)
> +{
> +	/* Adjust the length to limit this transaction to end of buffer */
> +	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
> +
> +	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	ssize_t len;
> +	char *bufp;
> +
> +	len = tmc_etr_buf_get_data(etr_buf, offset,
> +				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> +	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +		return -EINVAL;
> +	coresight_insert_barrier_packet(bufp);
> +	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +}
> +
> +/*
> + * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
> + * Makes sure the trace data is synced to the memory for consumption.
> + * @etr_buf->offset will hold the offset to the beginning of the trace data
> + * within the buffer, with @etr_buf->len bytes to consume.
> + */
> +static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
> +{
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	u64 rrp, rwp;
> +	u32 status;
> +
> +	rrp = tmc_read_rrp(drvdata);
> +	rwp = tmc_read_rwp(drvdata);
> +	status = readl_relaxed(drvdata->base + TMC_STS);
> +	etr_buf->full = status & TMC_STS_FULL;
> +
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
> +
> +	etr_buf->ops->sync(etr_buf, rrp, rwp);
> +
> +	/* Insert barrier packets at the beginning, if there was an overflow */
> +	if (etr_buf->full)
> +		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> @@ -721,6 +1018,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
>  	/*
>  	 * If this ETR is connected to a CATU, enable it before we turn
> @@ -733,7 +1031,7 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	/* Wait for TMCSReady bit to be set */
>  	tmc_wait_for_tmcready(drvdata);
>  
> -	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
> +	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
>  	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
>  
>  	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
> @@ -746,16 +1044,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  		axictl |= TMC_AXICTL_ARCACHE_OS;
>  	}
>  
> +	if (etr_buf->mode == ETR_MODE_ETR_SG) {
> +		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
> +			return;
> +		axictl |= TMC_AXICTL_SCT_GAT_MODE;
> +	}
> +
>  	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
> -	tmc_write_dba(drvdata, drvdata->paddr);
> +	tmc_write_dba(drvdata, etr_buf->hwaddr);
>  	/*
>  	 * If the TMC pointers must be programmed before the session,
>  	 * we have to set it properly (i.e, RRP/RWP to base address and
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, drvdata->paddr);
> -		tmc_write_rwp(drvdata, drvdata->paddr);
> +		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rwp(drvdata, etr_buf->hwaddr);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> @@ -771,63 +1075,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  }
>  
>  /*
> - * Return the available trace data in the buffer @pos, with a maximum
> - * limit of @len, also updating the @bufpp on where to find it.
> + * Return the available trace data in the buffer (starts at etr_buf->offset,
> + * limited by etr_buf->len) from @pos, with a maximum limit of @len,
> + * also updating the @bufpp on where to find it. Since the trace data
> + * starts at anywhere in the buffer, depending on the RRP, we adjust the
> + * @len returned to handle buffer wrapping around.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> -			    loff_t pos, size_t len, char **bufpp)
> +				loff_t pos, size_t len, char **bufpp)
>  {
> +	s64 offset;
>  	ssize_t actual = len;
> -	char *bufp = drvdata->buf + pos;
> -	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> -
> -	/* Adjust the len to available size @pos */
> -	if (pos + actual > drvdata->len)
> -		actual = drvdata->len - pos;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
> +	if (pos + actual > etr_buf->len)
> +		actual = etr_buf->len - pos;
>  	if (actual <= 0)
>  		return actual;
>  
> -	/*
> -	 * Since we use a circular buffer, with trace data starting
> -	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> -	 * wrap the current @pos to within the buffer.
> -	 */
> -	if (bufp >= bufend)
> -		bufp -= drvdata->size;
> -	/*
> -	 * For simplicity, avoid copying over a wrapped around buffer.
> -	 */
> -	if ((bufp + actual) > bufend)
> -		actual = bufend - bufp;
> -	*bufpp = bufp;
> -	return actual;
> +	/* Compute the offset from which we read the data */
> +	offset = etr_buf->offset + pos;
> +	if (offset >= etr_buf->size)
> +		offset -= etr_buf->size;
> +	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
>  }
>  
> -static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
> +static struct etr_buf *
> +tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	u32 val;
> -	u64 rwp;
> +	return tmc_alloc_etr_buf(drvdata, drvdata->size,
> +				 0, cpu_to_node(0), NULL);
> +}
>  
> -	rwp = tmc_read_rwp(drvdata);
> -	val = readl_relaxed(drvdata->base + TMC_STS);
> +static void
> +tmc_etr_free_sysfs_buf(struct etr_buf *buf)
> +{
> +	if (buf)
> +		tmc_free_etr_buf(buf);
> +}
>  
> -	/*
> -	 * Adjust the buffer to point to the beginning of the trace data
> -	 * and update the available trace data.
> -	 */
> -	if (val & TMC_STS_FULL) {
> -		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
> -		drvdata->len = drvdata->size;
> -		coresight_insert_barrier_packet(drvdata->buf);
> -	} else {
> -		drvdata->buf = drvdata->vaddr;
> -		drvdata->len = rwp - drvdata->paddr;
> -	}
> +static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
> +{
> +	tmc_sync_etr_buf(drvdata);
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  {
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	tmc_flush_and_stop(drvdata);
> @@ -836,7 +1130,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	 * read before the TMC is disabled.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
> -		tmc_etr_dump_hw(drvdata);
> +		tmc_etr_sync_sysfs_buf(drvdata);
> +
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> @@ -850,34 +1145,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	int ret = 0;
>  	bool used = false;
>  	unsigned long flags;
> -	void __iomem *vaddr = NULL;
> -	dma_addr_t paddr;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_buf *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> -	 * If we don't have a buffer release the lock and allocate memory.
> -	 * Otherwise keep the lock and move along.
> +	 * If we are enabling the ETR from disabled state, we need to make
> +	 * sure we have a buffer with the right size. The etr_buf is not reset
> +	 * immediately after we stop the tracing in SYSFS mode as we wait for
> +	 * the user to collect the data. We may be able to reuse the existing
> +	 * buffer, provided the size matches. Any allocation has to be done
> +	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->vaddr) {
> +	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -
> -		/*
> -		 * Contiguous  memory can't be allocated while a spinlock is
> -		 * held.  As such allocate memory here and free it if a buffer
> -		 * has already been allocated (from a previous session).
> -		 */
> -		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
> -					   &paddr, GFP_KERNEL);
> -		if (!vaddr)
> -			return -ENOMEM;
> +		/* Allocate memory with the spinlock released */
> +		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> +		if (IS_ERR(new_buf))
> +			return PTR_ERR(new_buf);
>  
>  		/* Let's try again */
>  		spin_lock_irqsave(&drvdata->spinlock, flags);
>  	}
>  
> -	if (drvdata->reading) {
> +	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -885,21 +1177,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	/*
>  	 * In sysFS mode we can have multiple writers per sink.  Since this
>  	 * sink is already enabled no memory is needed and the HW need not be
> -	 * touched.
> +	 * touched, even if the buffer size has changed.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		goto out;
>  
>  	/*
> -	 * If drvdata::buf == NULL, use the memory allocated above.
> -	 * Otherwise a buffer still exists from a previous session, so
> -	 * simply use that.
> +	 * If we don't have a buffer or it doesn't match the requested size,
> +	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (drvdata->buf == NULL) {
> +	if (!drvdata->etr_buf ||
> +	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
>  		used = true;
> -		drvdata->vaddr = vaddr;
> -		drvdata->paddr = paddr;
> -		drvdata->buf = drvdata->vaddr;
> +		free_buf = drvdata->etr_buf;
> +		drvdata->etr_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> @@ -908,8 +1199,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free memory outside the spinlock if need be */
> -	if (!used && vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (free_buf)
> +		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
>  		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> @@ -988,8 +1279,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::buf is NULL the trace data has been read already */
> -	if (drvdata->buf == NULL) {
> +	/* If drvdata::etr_buf is NULL the trace data has been read already */
> +	if (drvdata->etr_buf == NULL) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -1008,8 +1299,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	dma_addr_t paddr;
> -	void __iomem *vaddr = NULL;
> +	struct etr_buf *etr_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -1030,17 +1320,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		vaddr = drvdata->vaddr;
> -		paddr = drvdata->paddr;
> -		drvdata->buf = drvdata->vaddr = NULL;
> +		etr_buf =  drvdata->etr_buf;
> +		drvdata->etr_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (etr_buf)
> +		tmc_free_etr_buf(etr_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 74d8f24..6f7bec7 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -56,6 +56,7 @@
>  #define TMC_STS_TMCREADY_BIT	2
>  #define TMC_STS_FULL		BIT(0)
>  #define TMC_STS_TRIGGERED	BIT(1)
> +
>  /*
>   * TMC_AXICTL - 0x110
>   *
> @@ -135,6 +136,35 @@ enum tmc_mem_intf_width {
>  #define CORESIGHT_SOC_600_ETR_CAPS	\
>  	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
>  
> +enum etr_mode {
> +	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
> +	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +};
> +
> +struct etr_buf_operations;
> +
> +/**
> + * struct etr_buf - Details of the buffer used by ETR
> + * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
> + * @full	: Trace data overflow
> + * @size	: Size of the buffer.
> + * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
> + * @offset	: Offset of the trace data in the buffer for consumption.
> + * @len		: Available trace data @buf (may round up to the beginning).
> + * @ops		: ETR buffer operations for the mode.
> + * @private	: Backend specific information for the buf
> + */
> +struct etr_buf {
> +	enum etr_mode			mode;
> +	bool				full;
> +	ssize_t				size;
> +	dma_addr_t			hwaddr;
> +	unsigned long			offset;
> +	s64				len;
> +	const struct etr_buf_operations	*ops;
> +	void				*private;
> +};
> +
>  /**
>   * struct tmc_drvdata - specifics associated to an TMC component
>   * @base:	memory mapped base address for this component.
> @@ -142,11 +172,10 @@ enum tmc_mem_intf_width {
>   * @csdev:	component vitals needed by the framework.
>   * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
>   * @spinlock:	only one at a time pls.
> - * @buf:	area of memory where trace data get sent.
> - * @paddr:	DMA start location in RAM.
> - * @vaddr:	virtual representation of @paddr.
> - * @size:	trace buffer size.
> - * @len:	size of the available trace.
> + * @buf:	Snapshot of the trace data for ETF/ETB.
> + * @etr_buf:	details of buffer used in TMC-ETR
> + * @len:	size of the available trace for ETF/ETB.
> + * @size:	trace buffer size for this TMC (common for all modes).
>   * @mode:	how this TMC is being used.
>   * @config_type: TMC variant, must be of type @tmc_config_type.
>   * @memwidth:	width of the memory interface databus, in bytes.
> @@ -161,11 +190,12 @@ struct tmc_drvdata {
>  	struct miscdevice	miscdev;
>  	spinlock_t		spinlock;
>  	bool			reading;
> -	char			*buf;
> -	dma_addr_t		paddr;
> -	void __iomem		*vaddr;
> -	u32			size;
> +	union {
> +		char		*buf;		/* TMC ETB */
> +		struct etr_buf	*etr_buf;	/* TMC ETR */
> +	};
>  	u32			len;
> +	u32			size;
>  	u32			mode;
>  	enum tmc_config_type	config_type;
>  	enum tmc_mem_intf_width	memwidth;
> @@ -173,6 +203,15 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +struct etr_buf_operations {
> +	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +			int node, void **pages);
> +	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
> +				char **bufpp);
> +	void (*free)(struct etr_buf *etr_buf);
> +};
> +
>  /**
>   * struct tmc_pages - Collection of pages used for SG.
>   * @nr_pages:		Number of pages in the list.
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 16/27] coresight: tmc-etr: Add transparent buffer management
@ 2018-05-07 17:20     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:46AM +0100, Suzuki K Poulose wrote:
> At the moment we always use contiguous memory for TMC ETR tracing
> when used from sysfs. The size of the buffer is fixed at boot time
> and can only be changed by modifiying the DT. With the introduction
> of SG support we could support really large buffers in that mode.
> This patch abstracts the buffer used for ETR to switch between a
> contiguous buffer or a SG table depending on the availability of
> the memory.
> 
> This also enables the sysfs mode to use the ETR in SG mode depending
> on configured the trace buffer size. Also, since ETR will use the
> new infrastructure to manage the buffer, we can get rid of some
> of the members in the tmc_drvdata and clean up the fields a bit.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 451 +++++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
>  2 files changed, 418 insertions(+), 90 deletions(-)

Good work on cleanly dealing with the different modes of operation from sysFS.
It is that kind of kitchen work I was too lazy to do when first worked on this
driver.

One a side note this patch doesn't apply on my CS next branch due to some
refactoring in tmc_enable_etr_sink_sysfs().  No need to worry about it for this
iteration though.

> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index d18043d..fde3fa6 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -23,6 +23,13 @@
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +struct etr_flat_buf {
> +	struct device	*dev;
> +	dma_addr_t	daddr;
> +	void		*vaddr;
> +	size_t		size;
> +};
> +
>  /*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
> @@ -666,7 +673,7 @@ tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
>   * @size	- Total size of the data buffer
>   * @pages	- Optional list of page virtual address
>   */
> -static struct etr_sg_table __maybe_unused *
> +static struct etr_sg_table *
>  tmc_init_etr_sg_table(struct device *dev, int node,
>  		  unsigned long size, void **pages)
>  {
> @@ -702,6 +709,296 @@ tmc_init_etr_sg_table(struct device *dev, int node,
>  	return etr_table;
>  }
>  
> +/*
> + * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
> + */
> +static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
> +				  struct etr_buf *etr_buf, int node,
> +				  void **pages)
> +{
> +	struct etr_flat_buf *flat_buf;
> +
> +	/* We cannot reuse existing pages for flat buf */
> +	if (pages)
> +		return -EINVAL;
> +
> +	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
> +	if (!flat_buf)
> +		return -ENOMEM;
> +
> +	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
> +					   &flat_buf->daddr, GFP_KERNEL);
> +	if (!flat_buf->vaddr) {
> +		kfree(flat_buf);
> +		return -ENOMEM;
> +	}
> +
> +	flat_buf->size = etr_buf->size;
> +	flat_buf->dev = drvdata->dev;
> +	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->mode = ETR_MODE_FLAT;
> +	etr_buf->private = flat_buf;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (flat_buf && flat_buf->daddr)
> +		dma_free_coherent(flat_buf->dev, flat_buf->size,
> +				  flat_buf->vaddr, flat_buf->daddr);
> +	kfree(flat_buf);
> +}
> +
> +static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	/*
> +	 * Adjust the buffer to point to the beginning of the trace data
> +	 * and update the available trace data.
> +	 */
> +	etr_buf->offset = rrp - etr_buf->hwaddr;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = rwp - rrp;
> +}
> +
> +static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
> +					 u64 offset, size_t len, char **bufpp)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	*bufpp = (char *)flat_buf->vaddr + offset;
> +	/*
> +	 * tmc_etr_buf_get_data already adjusts the length to handle
> +	 * buffer wrapping around.
> +	 */
> +	return len;
> +}
> +
> +static const struct etr_buf_operations etr_flat_buf_ops = {
> +	.alloc = tmc_etr_alloc_flat_buf,
> +	.free = tmc_etr_free_flat_buf,
> +	.sync = tmc_etr_sync_flat_buf,
> +	.get_data = tmc_etr_get_data_flat_buf,
> +};
> +
> +/*
> + * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
> + * appropriately.
> + */
> +static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
> +				struct etr_buf *etr_buf, int node,
> +				void **pages)
> +{
> +	struct etr_sg_table *etr_table;
> +
> +	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
> +					  etr_buf->size, pages);
> +	if (IS_ERR(etr_table))
> +		return -ENOMEM;
> +	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->mode = ETR_MODE_ETR_SG;
> +	etr_buf->private = etr_table;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	if (etr_table) {
> +		tmc_free_sg_table(etr_table->sg_table);
> +		kfree(etr_table);
> +	}
> +}
> +
> +static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
> +				       size_t len, char **bufpp)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
> +}
> +
> +static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	long r_offset, w_offset;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct tmc_sg_table *table = etr_table->sg_table;
> +
> +	/* Convert hw address to offset in the buffer */
> +	r_offset = tmc_sg_get_data_page_offset(table, rrp);
> +	if (r_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RRP %llx to offset\n", rrp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	w_offset = tmc_sg_get_data_page_offset(table, rwp);
> +	if (w_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RWP %llx to offset\n", rwp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +				w_offset - r_offset;
> +	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
> +}
> +
> +static const struct etr_buf_operations etr_sg_buf_ops = {
> +	.alloc = tmc_etr_alloc_sg_buf,
> +	.free = tmc_etr_free_sg_buf,
> +	.sync = tmc_etr_sync_sg_buf,
> +	.get_data = tmc_etr_get_data_sg_buf,
> +};
> +
> +static const struct etr_buf_operations *etr_buf_ops[] = {
> +	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
> +	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +};
> +
> +static inline int tmc_etr_mode_alloc_buf(int mode,
> +					 struct tmc_drvdata *drvdata,
> +					 struct etr_buf *etr_buf, int node,
> +					 void **pages)
> +{
> +	int rc;
> +
> +	switch (mode) {
> +	case ETR_MODE_FLAT:
> +	case ETR_MODE_ETR_SG:
> +		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
> +		if (!rc)
> +			etr_buf->ops = etr_buf_ops[mode];
> +		return rc;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
> + * @drvdata	: ETR device details.
> + * @size	: size of the requested buffer.
> + * @flags	: Required properties for the buffer.
> + * @node	: Node for memory allocations.
> + * @pages	: An optional list of pages.
> + */
> +static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
> +					 ssize_t size, int flags,
> +					 int node, void **pages)
> +{
> +	int rc = -ENOMEM;
> +	bool has_etr_sg, has_iommu;
> +	struct etr_buf *etr_buf;
> +
> +	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
> +	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +
> +	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
> +	if (!etr_buf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	etr_buf->size = size;
> +
> +	/*
> +	 * If we have to use an existing list of pages, we cannot reliably
> +	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
> +	 * we use the contiguous DMA memory if at least one of the following
> +	 * conditions is true:
> +	 *  a) The ETR cannot use Scatter-Gather.
> +	 *  b) we have a backing IOMMU
> +	 *  c) The requested memory size is smaller (< 1M).
> +	 *
> +	 * Fallback to available mechanisms.
> +	 *
> +	 */
> +	if (!pages &&
> +	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc && has_etr_sg)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc) {
> +		kfree(etr_buf);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return etr_buf;
> +}
> +
> +static void tmc_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
> +	etr_buf->ops->free(etr_buf);
> +	kfree(etr_buf);
> +}
> +
> +/*
> + * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
> + * with a maximum of @len bytes.
> + * Returns: The size of the linear data available @pos, with *bufpp
> + * updated to point to the buffer.
> + */
> +static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
> +				    u64 offset, size_t len, char **bufpp)
> +{
> +	/* Adjust the length to limit this transaction to end of buffer */
> +	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
> +
> +	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	ssize_t len;
> +	char *bufp;
> +
> +	len = tmc_etr_buf_get_data(etr_buf, offset,
> +				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> +	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +		return -EINVAL;
> +	coresight_insert_barrier_packet(bufp);
> +	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +}
> +
> +/*
> + * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
> + * Makes sure the trace data is synced to the memory for consumption.
> + * @etr_buf->offset will hold the offset to the beginning of the trace data
> + * within the buffer, with @etr_buf->len bytes to consume.
> + */
> +static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
> +{
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	u64 rrp, rwp;
> +	u32 status;
> +
> +	rrp = tmc_read_rrp(drvdata);
> +	rwp = tmc_read_rwp(drvdata);
> +	status = readl_relaxed(drvdata->base + TMC_STS);
> +	etr_buf->full = status & TMC_STS_FULL;
> +
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
> +
> +	etr_buf->ops->sync(etr_buf, rrp, rwp);
> +
> +	/* Insert barrier packets at the beginning, if there was an overflow */
> +	if (etr_buf->full)
> +		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> @@ -721,6 +1018,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
>  	/*
>  	 * If this ETR is connected to a CATU, enable it before we turn
> @@ -733,7 +1031,7 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	/* Wait for TMCSReady bit to be set */
>  	tmc_wait_for_tmcready(drvdata);
>  
> -	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
> +	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
>  	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
>  
>  	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
> @@ -746,16 +1044,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  		axictl |= TMC_AXICTL_ARCACHE_OS;
>  	}
>  
> +	if (etr_buf->mode == ETR_MODE_ETR_SG) {
> +		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
> +			return;
> +		axictl |= TMC_AXICTL_SCT_GAT_MODE;
> +	}
> +
>  	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
> -	tmc_write_dba(drvdata, drvdata->paddr);
> +	tmc_write_dba(drvdata, etr_buf->hwaddr);
>  	/*
>  	 * If the TMC pointers must be programmed before the session,
>  	 * we have to set it properly (i.e, RRP/RWP to base address and
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, drvdata->paddr);
> -		tmc_write_rwp(drvdata, drvdata->paddr);
> +		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rwp(drvdata, etr_buf->hwaddr);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> @@ -771,63 +1075,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  }
>  
>  /*
> - * Return the available trace data in the buffer @pos, with a maximum
> - * limit of @len, also updating the @bufpp on where to find it.
> + * Return the available trace data in the buffer (starts at etr_buf->offset,
> + * limited by etr_buf->len) from @pos, with a maximum limit of @len,
> + * also updating the @bufpp on where to find it. Since the trace data
> + * starts at anywhere in the buffer, depending on the RRP, we adjust the
> + * @len returned to handle buffer wrapping around.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
> -			    loff_t pos, size_t len, char **bufpp)
> +				loff_t pos, size_t len, char **bufpp)
>  {
> +	s64 offset;
>  	ssize_t actual = len;
> -	char *bufp = drvdata->buf + pos;
> -	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> -
> -	/* Adjust the len to available size @pos */
> -	if (pos + actual > drvdata->len)
> -		actual = drvdata->len - pos;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
> +	if (pos + actual > etr_buf->len)
> +		actual = etr_buf->len - pos;
>  	if (actual <= 0)
>  		return actual;
>  
> -	/*
> -	 * Since we use a circular buffer, with trace data starting
> -	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> -	 * wrap the current @pos to within the buffer.
> -	 */
> -	if (bufp >= bufend)
> -		bufp -= drvdata->size;
> -	/*
> -	 * For simplicity, avoid copying over a wrapped around buffer.
> -	 */
> -	if ((bufp + actual) > bufend)
> -		actual = bufend - bufp;
> -	*bufpp = bufp;
> -	return actual;
> +	/* Compute the offset from which we read the data */
> +	offset = etr_buf->offset + pos;
> +	if (offset >= etr_buf->size)
> +		offset -= etr_buf->size;
> +	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
>  }
>  
> -static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
> +static struct etr_buf *
> +tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	u32 val;
> -	u64 rwp;
> +	return tmc_alloc_etr_buf(drvdata, drvdata->size,
> +				 0, cpu_to_node(0), NULL);
> +}
>  
> -	rwp = tmc_read_rwp(drvdata);
> -	val = readl_relaxed(drvdata->base + TMC_STS);
> +static void
> +tmc_etr_free_sysfs_buf(struct etr_buf *buf)
> +{
> +	if (buf)
> +		tmc_free_etr_buf(buf);
> +}
>  
> -	/*
> -	 * Adjust the buffer to point to the beginning of the trace data
> -	 * and update the available trace data.
> -	 */
> -	if (val & TMC_STS_FULL) {
> -		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
> -		drvdata->len = drvdata->size;
> -		coresight_insert_barrier_packet(drvdata->buf);
> -	} else {
> -		drvdata->buf = drvdata->vaddr;
> -		drvdata->len = rwp - drvdata->paddr;
> -	}
> +static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
> +{
> +	tmc_sync_etr_buf(drvdata);
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  {
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	tmc_flush_and_stop(drvdata);
> @@ -836,7 +1130,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	 * read before the TMC is disabled.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
> -		tmc_etr_dump_hw(drvdata);
> +		tmc_etr_sync_sysfs_buf(drvdata);
> +
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> @@ -850,34 +1145,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	int ret = 0;
>  	bool used = false;
>  	unsigned long flags;
> -	void __iomem *vaddr = NULL;
> -	dma_addr_t paddr;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_buf *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> -	 * If we don't have a buffer release the lock and allocate memory.
> -	 * Otherwise keep the lock and move along.
> +	 * If we are enabling the ETR from disabled state, we need to make
> +	 * sure we have a buffer with the right size. The etr_buf is not reset
> +	 * immediately after we stop the tracing in SYSFS mode as we wait for
> +	 * the user to collect the data. We may be able to reuse the existing
> +	 * buffer, provided the size matches. Any allocation has to be done
> +	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->vaddr) {
> +	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -
> -		/*
> -		 * Contiguous  memory can't be allocated while a spinlock is
> -		 * held.  As such allocate memory here and free it if a buffer
> -		 * has already been allocated (from a previous session).
> -		 */
> -		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
> -					   &paddr, GFP_KERNEL);
> -		if (!vaddr)
> -			return -ENOMEM;
> +		/* Allocate memory with the spinlock released */
> +		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> +		if (IS_ERR(new_buf))
> +			return PTR_ERR(new_buf);
>  
>  		/* Let's try again */
>  		spin_lock_irqsave(&drvdata->spinlock, flags);
>  	}
>  
> -	if (drvdata->reading) {
> +	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -885,21 +1177,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	/*
>  	 * In sysFS mode we can have multiple writers per sink.  Since this
>  	 * sink is already enabled no memory is needed and the HW need not be
> -	 * touched.
> +	 * touched, even if the buffer size has changed.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		goto out;
>  
>  	/*
> -	 * If drvdata::buf == NULL, use the memory allocated above.
> -	 * Otherwise a buffer still exists from a previous session, so
> -	 * simply use that.
> +	 * If we don't have a buffer or it doesn't match the requested size,
> +	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (drvdata->buf == NULL) {
> +	if (!drvdata->etr_buf ||
> +	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
>  		used = true;
> -		drvdata->vaddr = vaddr;
> -		drvdata->paddr = paddr;
> -		drvdata->buf = drvdata->vaddr;
> +		free_buf = drvdata->etr_buf;
> +		drvdata->etr_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> @@ -908,8 +1199,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free memory outside the spinlock if need be */
> -	if (!used && vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (free_buf)
> +		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
>  		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> @@ -988,8 +1279,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::buf is NULL the trace data has been read already */
> -	if (drvdata->buf == NULL) {
> +	/* If drvdata::etr_buf is NULL the trace data has been read already */
> +	if (drvdata->etr_buf == NULL) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -1008,8 +1299,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	dma_addr_t paddr;
> -	void __iomem *vaddr = NULL;
> +	struct etr_buf *etr_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -1030,17 +1320,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		vaddr = drvdata->vaddr;
> -		paddr = drvdata->paddr;
> -		drvdata->buf = drvdata->vaddr = NULL;
> +		etr_buf =  drvdata->etr_buf;
> +		drvdata->etr_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (etr_buf)
> +		tmc_free_etr_buf(etr_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 74d8f24..6f7bec7 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -56,6 +56,7 @@
>  #define TMC_STS_TMCREADY_BIT	2
>  #define TMC_STS_FULL		BIT(0)
>  #define TMC_STS_TRIGGERED	BIT(1)
> +
>  /*
>   * TMC_AXICTL - 0x110
>   *
> @@ -135,6 +136,35 @@ enum tmc_mem_intf_width {
>  #define CORESIGHT_SOC_600_ETR_CAPS	\
>  	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
>  
> +enum etr_mode {
> +	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
> +	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +};
> +
> +struct etr_buf_operations;
> +
> +/**
> + * struct etr_buf - Details of the buffer used by ETR
> + * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
> + * @full	: Trace data overflow
> + * @size	: Size of the buffer.
> + * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
> + * @offset	: Offset of the trace data in the buffer for consumption.
> + * @len		: Available trace data @buf (may round up to the beginning).
> + * @ops		: ETR buffer operations for the mode.
> + * @private	: Backend specific information for the buf
> + */
> +struct etr_buf {
> +	enum etr_mode			mode;
> +	bool				full;
> +	ssize_t				size;
> +	dma_addr_t			hwaddr;
> +	unsigned long			offset;
> +	s64				len;
> +	const struct etr_buf_operations	*ops;
> +	void				*private;
> +};
> +
>  /**
>   * struct tmc_drvdata - specifics associated to an TMC component
>   * @base:	memory mapped base address for this component.
> @@ -142,11 +172,10 @@ enum tmc_mem_intf_width {
>   * @csdev:	component vitals needed by the framework.
>   * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
>   * @spinlock:	only one at a time pls.
> - * @buf:	area of memory where trace data get sent.
> - * @paddr:	DMA start location in RAM.
> - * @vaddr:	virtual representation of @paddr.
> - * @size:	trace buffer size.
> - * @len:	size of the available trace.
> + * @buf:	Snapshot of the trace data for ETF/ETB.
> + * @etr_buf:	details of buffer used in TMC-ETR
> + * @len:	size of the available trace for ETF/ETB.
> + * @size:	trace buffer size for this TMC (common for all modes).
>   * @mode:	how this TMC is being used.
>   * @config_type: TMC variant, must be of type @tmc_config_type.
>   * @memwidth:	width of the memory interface databus, in bytes.
> @@ -161,11 +190,12 @@ struct tmc_drvdata {
>  	struct miscdevice	miscdev;
>  	spinlock_t		spinlock;
>  	bool			reading;
> -	char			*buf;
> -	dma_addr_t		paddr;
> -	void __iomem		*vaddr;
> -	u32			size;
> +	union {
> +		char		*buf;		/* TMC ETB */
> +		struct etr_buf	*etr_buf;	/* TMC ETR */
> +	};
>  	u32			len;
> +	u32			size;
>  	u32			mode;
>  	enum tmc_config_type	config_type;
>  	enum tmc_mem_intf_width	memwidth;
> @@ -173,6 +203,15 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +struct etr_buf_operations {
> +	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +			int node, void **pages);
> +	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
> +				char **bufpp);
> +	void (*free)(struct etr_buf *etr_buf);
> +};
> +
>  /**
>   * struct tmc_pages - Collection of pages used for SG.
>   * @nr_pages:		Number of pages in the list.
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 17/27] coresight: etr: Add support for save restore buffers
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 17:48     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 17:48 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:47AM +0100, Suzuki K Poulose wrote:
> Add support for creating buffers which can be used in save-restore
> mode (e.g, for use by perf). If the TMC-ETR supports save-restore
> feature, we could support the mode in all buffer backends. However,
> if it doesn't, we should fall back to using in built SG mechanism,
> where we can rotate the SG table by making some adjustments in the
> page table.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 142 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |  16 +++
>  2 files changed, 153 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index fde3fa6..25e7feb 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -604,7 +604,7 @@ tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
>   *    which starts @base_offset.
>   * 2) Mark the page at the base_offset + size as LAST.
>   */
> -static int __maybe_unused
> +static int
>  tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
>  			unsigned long base_offset, unsigned long size)
>  {
> @@ -736,6 +736,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
>  	flat_buf->size = etr_buf->size;
>  	flat_buf->dev = drvdata->dev;
>  	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->rrp = flat_buf->daddr;
> +	etr_buf->rwp = flat_buf->daddr;
> +	etr_buf->status = 0;
>  	etr_buf->mode = ETR_MODE_FLAT;
>  	etr_buf->private = flat_buf;
>  	return 0;
> @@ -777,11 +780,36 @@ static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
>  	return len;
>  }
>  
> +/*
> + * tmc_etr_restore_flat_buf: Restore the flat buffer pointers.
> + * This is only possible with in-built ETR capability to save-restore
> + * the pointers. The DBA will still point to the original start of the
> + * buffer.
> + */
> +static int tmc_etr_restore_flat_buf(struct etr_buf *etr_buf,
> +				    unsigned long r_offset,
> +				    unsigned long w_offset,
> +				    unsigned long size,
> +				    u32 status,
> +				    bool has_save_restore)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (!has_save_restore || !flat_buf || size > flat_buf->size)
> +		return -EINVAL;
> +	etr_buf->rrp = flat_buf->daddr + (r_offset % flat_buf->size);
> +	etr_buf->rwp = flat_buf->daddr + (w_offset % flat_buf->size);
> +	etr_buf->size = size;
> +	etr_buf->status = status;
> +	return 0;
> +}
> +
>  static const struct etr_buf_operations etr_flat_buf_ops = {
>  	.alloc = tmc_etr_alloc_flat_buf,
>  	.free = tmc_etr_free_flat_buf,
>  	.sync = tmc_etr_sync_flat_buf,
>  	.get_data = tmc_etr_get_data_flat_buf,
> +	.restore = tmc_etr_restore_flat_buf,
>  };
>  
>  /*
> @@ -799,6 +827,7 @@ static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
>  	if (IS_ERR(etr_table))
>  		return -ENOMEM;
>  	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->status = 0;
>  	etr_buf->mode = ETR_MODE_ETR_SG;
>  	etr_buf->private = etr_table;
>  	return 0;
> @@ -825,9 +854,11 @@ static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
>  static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
>  {
>  	long r_offset, w_offset;
> +	unsigned long buf_size;
>  	struct etr_sg_table *etr_table = etr_buf->private;
>  	struct tmc_sg_table *table = etr_table->sg_table;
>  
> +	buf_size = tmc_sg_table_buf_size(table);
>  	/* Convert hw address to offset in the buffer */
>  	r_offset = tmc_sg_get_data_page_offset(table, rrp);
>  	if (r_offset < 0) {
> @@ -849,16 +880,62 @@ static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
>  	if (etr_buf->full)
>  		etr_buf->len = etr_buf->size;
>  	else
> -		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +		etr_buf->len = ((w_offset < r_offset) ? buf_size : 0) +
>  				w_offset - r_offset;
>  	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
>  }
>  
> +static int tmc_etr_restore_sg_buf(struct etr_buf *etr_buf,
> +				  unsigned long r_offset,
> +				  unsigned long w_offset,
> +				  unsigned long size,
> +				  u32 __always_unused status,
> +				  bool has_save_restore)
> +{
> +	int rc;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct device *dev = etr_table->sg_table->dev;
> +
> +	/*
> +	 * It is highly unlikely that we have an ETR with in-built SG and
> +	 * Save-Restore capability and we are not sure if the PTRs will
> +	 * be updated.
> +	 */
> +	if (has_save_restore) {
> +		dev_warn_once(dev,
> +		"Unexpected feature combination of SG and save-restore\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Since we cannot program RRP/RWP different from DBAL, the offsets
> +	 * should match.
> +	 */
> +	if (r_offset != w_offset) {
> +		dev_dbg(dev, "Mismatched RRP/RWP offsets\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Make sure the size is aligned */
> +	size &= ~(ETR_SG_PAGE_SIZE - 1);
> +
> +	rc = tmc_etr_sg_table_rotate(etr_table, w_offset, size);
> +	if (!rc) {
> +		etr_buf->hwaddr = etr_table->hwaddr;
> +		etr_buf->rrp = etr_table->hwaddr;
> +		etr_buf->rwp = etr_table->hwaddr;
> +		etr_buf->size = size;
> +	}
> +
> +	return rc;
> +}
> +
>  static const struct etr_buf_operations etr_sg_buf_ops = {
>  	.alloc = tmc_etr_alloc_sg_buf,
>  	.free = tmc_etr_free_sg_buf,
>  	.sync = tmc_etr_sync_sg_buf,
>  	.get_data = tmc_etr_get_data_sg_buf,
> +	.restore = tmc_etr_restore_sg_buf,
>  };
>  
>  static const struct etr_buf_operations *etr_buf_ops[] = {
> @@ -899,10 +976,42 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  {
>  	int rc = -ENOMEM;
>  	bool has_etr_sg, has_iommu;
> +	bool has_flat, has_save_restore;
>  	struct etr_buf *etr_buf;
>  
>  	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
>  	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +
> +	/*
> +	 * We can normally use flat DMA buffer provided that the buffer
> +	 * is not used in save restore fashion without hardware support.
> +	 */
> +	has_flat = !(flags & ETR_BUF_F_RESTORE_PTRS) || has_save_restore;
> +
> +	/*
> +	 * To support save-restore on a given ETR we have the following
> +	 * conditions:
> +	 *  1) If the buffer requires save-restore of a pointers as well
> +	 *     as the Status bit, we require ETR support for it and we coul

/coul/could

> +	 *     support all the backends.
> +	 *  2) If the buffer requires only save-restore of pointers, then
> +	 *     we could exploit a circular ETR SG list. None of the other
> +	 *     backends can support it without the ETR feature.
> +	 *
> +	 * If the buffer will be used in a save-restore mode without
> +	 * the ETR support for SAVE_RESTORE, we can only support TMC
> +	 * ETR in-built SG tables which can be rotated to make it work.
> +	 */
> +	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!has_flat && !has_etr_sg) {
> +		dev_dbg(drvdata->dev,
> +			"No available backends for ETR buffer with flags %x\n",
> +			flags);
> +		return ERR_PTR(-EINVAL);
> +	}
>  
>  	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
>  	if (!etr_buf)
> @@ -922,7 +1031,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	 * Fallback to available mechanisms.
>  	 *
>  	 */
> -	if (!pages &&
> +	if (!pages && has_flat &&
>  	    (!has_etr_sg || has_iommu || size < SZ_1M))
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
>  					    etr_buf, node, pages);
> @@ -999,6 +1108,29 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
>  		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
>  }
>  
> +static int __maybe_unused
> +tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +		    unsigned long r_offset, unsigned long w_offset,
> +		    unsigned long size, u32 status)
> +{
> +	bool has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +
> +	if (WARN_ON_ONCE(!has_save_restore && etr_buf->mode != ETR_MODE_ETR_SG))
> +		return -EINVAL;
> +	/*
> +	 * If we use a circular SG list without ETR support, we can't
> +	 * support restoring "Full" bit.
> +	 */
> +	if (WARN_ON_ONCE(!has_save_restore && status))
> +		return -EINVAL;
> +	if (status & ~TMC_STS_FULL)
> +		return -EINVAL;
> +	if (etr_buf->ops->restore)
> +		return etr_buf->ops->restore(etr_buf, r_offset, w_offset, size,
> +					      status, has_save_restore);
> +	return -EINVAL;
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> @@ -1058,8 +1190,8 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> -		tmc_write_rwp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rrp(drvdata, etr_buf->rrp);
> +		tmc_write_rwp(drvdata, etr_buf->rwp);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 6f7bec7..1bdfb38 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -141,12 +141,22 @@ enum etr_mode {
>  	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
>  };
>  
> +/* ETR buffer should support save-restore */
> +#define ETR_BUF_F_RESTORE_PTRS		0x1
> +#define ETR_BUF_F_RESTORE_STATUS	0x2
> +
> +#define ETR_BUF_F_RESTORE_MINIMAL	ETR_BUF_F_RESTORE_PTRS
> +#define ETR_BUF_F_RESTORE_FULL		(ETR_BUF_F_RESTORE_PTRS |\
> +					 ETR_BUF_F_RESTORE_STATUS)
>  struct etr_buf_operations;
>  
>  /**
>   * struct etr_buf - Details of the buffer used by ETR
>   * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
>   * @full	: Trace data overflow
> + * @status	: Value for STATUS if the ETR supports save-restore.
> + * @rrp		: Value for RRP{LO:HI} if the ETR supports save-restore
> + * @rwp		: Value for RWP{LO:HI} if the ETR supports save-restore
>   * @size	: Size of the buffer.
>   * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
>   * @offset	: Offset of the trace data in the buffer for consumption.
> @@ -157,6 +167,9 @@ struct etr_buf_operations;
>  struct etr_buf {
>  	enum etr_mode			mode;
>  	bool				full;
> +	u32				status;
> +	dma_addr_t			rrp;
> +	dma_addr_t			rwp;
>  	ssize_t				size;
>  	dma_addr_t			hwaddr;
>  	unsigned long			offset;
> @@ -207,6 +220,9 @@ struct etr_buf_operations {
>  	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
>  			int node, void **pages);
>  	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	int (*restore)(struct etr_buf *etr_buf, unsigned long r_offset,
> +		       unsigned long w_offset, unsigned long size,
> +		       u32 status, bool has_save_restore);
>  	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
>  				char **bufpp);
>  	void (*free)(struct etr_buf *etr_buf);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 17/27] coresight: etr: Add support for save restore buffers
@ 2018-05-07 17:48     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:47AM +0100, Suzuki K Poulose wrote:
> Add support for creating buffers which can be used in save-restore
> mode (e.g, for use by perf). If the TMC-ETR supports save-restore
> feature, we could support the mode in all buffer backends. However,
> if it doesn't, we should fall back to using in built SG mechanism,
> where we can rotate the SG table by making some adjustments in the
> page table.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 142 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |  16 +++
>  2 files changed, 153 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index fde3fa6..25e7feb 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -604,7 +604,7 @@ tmc_etr_sg_table_index_to_daddr(struct tmc_sg_table *sg_table, u32 index)
>   *    which starts @base_offset.
>   * 2) Mark the page at the base_offset + size as LAST.
>   */
> -static int __maybe_unused
> +static int
>  tmc_etr_sg_table_rotate(struct etr_sg_table *etr_table,
>  			unsigned long base_offset, unsigned long size)
>  {
> @@ -736,6 +736,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
>  	flat_buf->size = etr_buf->size;
>  	flat_buf->dev = drvdata->dev;
>  	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->rrp = flat_buf->daddr;
> +	etr_buf->rwp = flat_buf->daddr;
> +	etr_buf->status = 0;
>  	etr_buf->mode = ETR_MODE_FLAT;
>  	etr_buf->private = flat_buf;
>  	return 0;
> @@ -777,11 +780,36 @@ static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
>  	return len;
>  }
>  
> +/*
> + * tmc_etr_restore_flat_buf: Restore the flat buffer pointers.
> + * This is only possible with in-built ETR capability to save-restore
> + * the pointers. The DBA will still point to the original start of the
> + * buffer.
> + */
> +static int tmc_etr_restore_flat_buf(struct etr_buf *etr_buf,
> +				    unsigned long r_offset,
> +				    unsigned long w_offset,
> +				    unsigned long size,
> +				    u32 status,
> +				    bool has_save_restore)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (!has_save_restore || !flat_buf || size > flat_buf->size)
> +		return -EINVAL;
> +	etr_buf->rrp = flat_buf->daddr + (r_offset % flat_buf->size);
> +	etr_buf->rwp = flat_buf->daddr + (w_offset % flat_buf->size);
> +	etr_buf->size = size;
> +	etr_buf->status = status;
> +	return 0;
> +}
> +
>  static const struct etr_buf_operations etr_flat_buf_ops = {
>  	.alloc = tmc_etr_alloc_flat_buf,
>  	.free = tmc_etr_free_flat_buf,
>  	.sync = tmc_etr_sync_flat_buf,
>  	.get_data = tmc_etr_get_data_flat_buf,
> +	.restore = tmc_etr_restore_flat_buf,
>  };
>  
>  /*
> @@ -799,6 +827,7 @@ static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
>  	if (IS_ERR(etr_table))
>  		return -ENOMEM;
>  	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->status = 0;
>  	etr_buf->mode = ETR_MODE_ETR_SG;
>  	etr_buf->private = etr_table;
>  	return 0;
> @@ -825,9 +854,11 @@ static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
>  static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
>  {
>  	long r_offset, w_offset;
> +	unsigned long buf_size;
>  	struct etr_sg_table *etr_table = etr_buf->private;
>  	struct tmc_sg_table *table = etr_table->sg_table;
>  
> +	buf_size = tmc_sg_table_buf_size(table);
>  	/* Convert hw address to offset in the buffer */
>  	r_offset = tmc_sg_get_data_page_offset(table, rrp);
>  	if (r_offset < 0) {
> @@ -849,16 +880,62 @@ static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
>  	if (etr_buf->full)
>  		etr_buf->len = etr_buf->size;
>  	else
> -		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +		etr_buf->len = ((w_offset < r_offset) ? buf_size : 0) +
>  				w_offset - r_offset;
>  	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
>  }
>  
> +static int tmc_etr_restore_sg_buf(struct etr_buf *etr_buf,
> +				  unsigned long r_offset,
> +				  unsigned long w_offset,
> +				  unsigned long size,
> +				  u32 __always_unused status,
> +				  bool has_save_restore)
> +{
> +	int rc;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct device *dev = etr_table->sg_table->dev;
> +
> +	/*
> +	 * It is highly unlikely that we have an ETR with in-built SG and
> +	 * Save-Restore capability and we are not sure if the PTRs will
> +	 * be updated.
> +	 */
> +	if (has_save_restore) {
> +		dev_warn_once(dev,
> +		"Unexpected feature combination of SG and save-restore\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Since we cannot program RRP/RWP different from DBAL, the offsets
> +	 * should match.
> +	 */
> +	if (r_offset != w_offset) {
> +		dev_dbg(dev, "Mismatched RRP/RWP offsets\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Make sure the size is aligned */
> +	size &= ~(ETR_SG_PAGE_SIZE - 1);
> +
> +	rc = tmc_etr_sg_table_rotate(etr_table, w_offset, size);
> +	if (!rc) {
> +		etr_buf->hwaddr = etr_table->hwaddr;
> +		etr_buf->rrp = etr_table->hwaddr;
> +		etr_buf->rwp = etr_table->hwaddr;
> +		etr_buf->size = size;
> +	}
> +
> +	return rc;
> +}
> +
>  static const struct etr_buf_operations etr_sg_buf_ops = {
>  	.alloc = tmc_etr_alloc_sg_buf,
>  	.free = tmc_etr_free_sg_buf,
>  	.sync = tmc_etr_sync_sg_buf,
>  	.get_data = tmc_etr_get_data_sg_buf,
> +	.restore = tmc_etr_restore_sg_buf,
>  };
>  
>  static const struct etr_buf_operations *etr_buf_ops[] = {
> @@ -899,10 +976,42 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  {
>  	int rc = -ENOMEM;
>  	bool has_etr_sg, has_iommu;
> +	bool has_flat, has_save_restore;
>  	struct etr_buf *etr_buf;
>  
>  	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
>  	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +
> +	/*
> +	 * We can normally use flat DMA buffer provided that the buffer
> +	 * is not used in save restore fashion without hardware support.
> +	 */
> +	has_flat = !(flags & ETR_BUF_F_RESTORE_PTRS) || has_save_restore;
> +
> +	/*
> +	 * To support save-restore on a given ETR we have the following
> +	 * conditions:
> +	 *  1) If the buffer requires save-restore of a pointers as well
> +	 *     as the Status bit, we require ETR support for it and we coul

/coul/could

> +	 *     support all the backends.
> +	 *  2) If the buffer requires only save-restore of pointers, then
> +	 *     we could exploit a circular ETR SG list. None of the other
> +	 *     backends can support it without the ETR feature.
> +	 *
> +	 * If the buffer will be used in a save-restore mode without
> +	 * the ETR support for SAVE_RESTORE, we can only support TMC
> +	 * ETR in-built SG tables which can be rotated to make it work.
> +	 */
> +	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!has_flat && !has_etr_sg) {
> +		dev_dbg(drvdata->dev,
> +			"No available backends for ETR buffer with flags %x\n",
> +			flags);
> +		return ERR_PTR(-EINVAL);
> +	}
>  
>  	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
>  	if (!etr_buf)
> @@ -922,7 +1031,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	 * Fallback to available mechanisms.
>  	 *
>  	 */
> -	if (!pages &&
> +	if (!pages && has_flat &&
>  	    (!has_etr_sg || has_iommu || size < SZ_1M))
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
>  					    etr_buf, node, pages);
> @@ -999,6 +1108,29 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
>  		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
>  }
>  
> +static int __maybe_unused
> +tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +		    unsigned long r_offset, unsigned long w_offset,
> +		    unsigned long size, u32 status)
> +{
> +	bool has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +
> +	if (WARN_ON_ONCE(!has_save_restore && etr_buf->mode != ETR_MODE_ETR_SG))
> +		return -EINVAL;
> +	/*
> +	 * If we use a circular SG list without ETR support, we can't
> +	 * support restoring "Full" bit.
> +	 */
> +	if (WARN_ON_ONCE(!has_save_restore && status))
> +		return -EINVAL;
> +	if (status & ~TMC_STS_FULL)
> +		return -EINVAL;
> +	if (etr_buf->ops->restore)
> +		return etr_buf->ops->restore(etr_buf, r_offset, w_offset, size,
> +					      status, has_save_restore);
> +	return -EINVAL;
> +}
> +
>  static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  {
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
> @@ -1058,8 +1190,8 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> -		tmc_write_rwp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rrp(drvdata, etr_buf->rrp);
> +		tmc_write_rwp(drvdata, etr_buf->rwp);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 6f7bec7..1bdfb38 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -141,12 +141,22 @@ enum etr_mode {
>  	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
>  };
>  
> +/* ETR buffer should support save-restore */
> +#define ETR_BUF_F_RESTORE_PTRS		0x1
> +#define ETR_BUF_F_RESTORE_STATUS	0x2
> +
> +#define ETR_BUF_F_RESTORE_MINIMAL	ETR_BUF_F_RESTORE_PTRS
> +#define ETR_BUF_F_RESTORE_FULL		(ETR_BUF_F_RESTORE_PTRS |\
> +					 ETR_BUF_F_RESTORE_STATUS)
>  struct etr_buf_operations;
>  
>  /**
>   * struct etr_buf - Details of the buffer used by ETR
>   * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
>   * @full	: Trace data overflow
> + * @status	: Value for STATUS if the ETR supports save-restore.
> + * @rrp		: Value for RRP{LO:HI} if the ETR supports save-restore
> + * @rwp		: Value for RWP{LO:HI} if the ETR supports save-restore
>   * @size	: Size of the buffer.
>   * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
>   * @offset	: Offset of the trace data in the buffer for consumption.
> @@ -157,6 +167,9 @@ struct etr_buf_operations;
>  struct etr_buf {
>  	enum etr_mode			mode;
>  	bool				full;
> +	u32				status;
> +	dma_addr_t			rrp;
> +	dma_addr_t			rwp;
>  	ssize_t				size;
>  	dma_addr_t			hwaddr;
>  	unsigned long			offset;
> @@ -207,6 +220,9 @@ struct etr_buf_operations {
>  	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
>  			int node, void **pages);
>  	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	int (*restore)(struct etr_buf *etr_buf, unsigned long r_offset,
> +		       unsigned long w_offset, unsigned long size,
> +		       u32 status, bool has_save_restore);
>  	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
>  				char **bufpp);
>  	void (*free)(struct etr_buf *etr_buf);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 20:25     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 20:25 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
> This patch adds the support for setting up a SG table for use
> by the CATU. We reuse the tmc_sg_table to represent the table/data
> pages, even though the table format is different.
> 
> Similar to ETR SG table, CATU uses a 4KB page size for data buffers
> as well as page tables. All table entries are 64bit wide and have
> the following format:
> 
>         63                      12      1  0
>         x-----------------------------------x
>         |        Address [63-12] | SBZ  | V |
>         x-----------------------------------x
> 
> 	Where [V] ->	 0 - Pointer is invalid
> 			 1 - Pointer is Valid
> 
> CATU uses only first half of the page for data page pointers.
> i.e, single table page will only have 256 page pointers, addressing
> upto 1MB of data. The second half of a table page contains only two
> pointers at the end of the page (i.e, pointers at index 510 and 511),
> which are used as links to the "Previous" and "Next" page tables
> respectively.
> 
> The first table page has an "Invalid" previous pointer and the
> next pointer entry points to the second page table if there is one.
> Similarly the last table page has an "Invalid" next pointer to
> indicate the end of the table chain.
> 
> We create a circular buffer (i.e, first_table[prev] => last_table
> and last_table[next] => first_table) by default and provide
> helpers to make the buffer linear from a given offset. When we
> set the buffer to linear, we also mark the "pointers" in the
> outside the given "range" as invalid. We have to do this only
> for the starting and ending tables, as we disconnect the other
> table by invalidating the links. This will allow the ETR buf to
> be restored from a given offset with any size.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-catu.c | 409 +++++++++++++++++++++++++++
>  1 file changed, 409 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index 2cd69a6..4cc2928 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -16,10 +16,419 @@
>  
>  #include "coresight-catu.h"
>  #include "coresight-priv.h"
> +#include "coresight-tmc.h"
>  
>  #define csdev_to_catu_drvdata(csdev)	\
>  	dev_get_drvdata(csdev->dev.parent)
>  
> +/*
> + * CATU uses a page size of 4KB for page tables as well as data pages.
> + * Each 64bit entry in the table has the following format.
> + *
> + *	63			12	1  0
> + *	------------------------------------
> + *	|	 Address [63-12] | SBZ	| V|
> + *	------------------------------------
> + *
> + * Where bit[0] V indicates if the address is valid or not.
> + * Each 4K table pages have upto 256 data page pointers, taking upto 2K
> + * size. There are two Link pointers, pointing to the previous and next
> + * table pages respectively at the end of the 4K page. (i.e, entry 510
> + * and 511).
> + *  E.g, a table of two pages could look like :
> + *
> + *                 Table Page 0               Table Page 1
> + * SLADDR ===> x------------------x  x--> x-----------------x
> + * INADDR    ->|  Page 0      | V |  |    | Page 256    | V | <- INADDR+1M
> + *             |------------------|  |    |-----------------|
> + * INADDR+4K ->|  Page 1      | V |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |  Page 2      | V |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |   ...        | V |  |    |    ...          |
> + *             |------------------|  |    |-----------------|
> + * INADDR+1020K|  Page 255    | V |  |    |   Page 511  | V |
> + * SLADDR+2K==>|------------------|  |    |-----------------|
> + *             |  UNUSED      |   |  |    |                 |
> + *             |------------------|  |    |                 |
> + *             |  UNUSED      |   |  |    |                 |
> + *             |------------------|  |    |                 |
> + *             |    ...       |   |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |   IGNORED    | 0 |  |    | Table Page 0| 1 |
> + *             |------------------|  |    |-----------------|
> + *             |  Table Page 1| 1 |--x    | IGNORED     | 0 |
> + *             x------------------x       x-----------------x
> + * SLADDR+4K==>
> + *
> + * The base input address (used by the ETR, programmed in INADDR_{LO,HI})
> + * must be aligned to 1MB (the size addressable by a single page table).
> + * The CATU maps INADDR{LO:HI} to the first page in the table pointed
> + * to by SLADDR{LO:HI} and so on.
> + *
> + */
> +typedef u64 cate_t;
> +
> +#define CATU_PAGE_SHIFT		12
> +#define CATU_PAGE_SIZE		(1UL << CATU_PAGE_SHIFT)
> +#define CATU_PAGES_PER_SYSPAGE	(PAGE_SIZE / CATU_PAGE_SIZE)
> +
> +/* Page pointers are only allocated in the first 2K half */
> +#define CATU_PTRS_PER_PAGE	((CATU_PAGE_SIZE >> 1) / sizeof(cate_t))
> +#define CATU_PTRS_PER_SYSPAGE	(CATU_PAGES_PER_SYSPAGE * CATU_PTRS_PER_PAGE)
> +#define CATU_LINK_PREV		((CATU_PAGE_SIZE / sizeof(cate_t)) - 2)
> +#define CATU_LINK_NEXT		((CATU_PAGE_SIZE / sizeof(cate_t)) - 1)
> +
> +#define CATU_ADDR_SHIFT		12
> +#define CATU_ADDR_MASK		~(((cate_t)1 << CATU_ADDR_SHIFT) - 1)
> +#define CATU_ENTRY_VALID	((cate_t)0x1)
> +#define CATU_ENTRY_INVALID	((cate_t)0)
> +#define CATU_VALID_ENTRY(addr) \
> +	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
> +#define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
> +
> +/*
> + * Index into the CATU entry pointing to the page within
> + * the table. Each table entry can point to a 4KB page, with
> + * a total of 255 entries in the table adding upto 1MB per table.
> + *
> + * So, bits 19:12 gives you the index of the entry in
> + * the table.
> + */
> +static inline unsigned long catu_offset_to_entry_idx(unsigned long offset)
> +{
> +	return (offset & (SZ_1M - 1)) >> 12;
> +}
> +
> +static inline void catu_update_state(cate_t *catep, int valid)
> +{
> +	*catep &= ~CATU_ENTRY_VALID;
> +	*catep |= valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
> +}
> +
> +/*
> + * Update the valid bit for a given range of indices [start, end)
> + * in the given table @table.
> + */
> +static inline void catu_update_state_range(cate_t *table, int start,
> +						 int end, int valid)

Indentation

> +{
> +	int i;
> +	cate_t *pentry = &table[start];
> +	cate_t state = valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
> +
> +	/* Limit the "end" to maximum range */
> +	if (end > CATU_PTRS_PER_PAGE)
> +		end = CATU_PTRS_PER_PAGE;
> +
> +	for (i = start; i < end; i++, pentry++) {
> +		*pentry &= ~(cate_t)CATU_ENTRY_VALID;
> +		*pentry |= state;
> +	}
> +}
> +
> +/*
> + * Update valid bit for all entries in the range [start, end)
> + */
> +static inline void
> +catu_table_update_offset_range(cate_t *table,
> +			       unsigned long start,
> +			       unsigned long end,
> +			       int valid)
> +{
> +	catu_update_state_range(table,
> +				catu_offset_to_entry_idx(start),
> +				catu_offset_to_entry_idx(end),
> +				valid);
> +}
> +
> +static inline void catu_table_update_prev(cate_t *table, int valid)
> +{
> +	catu_update_state(&table[CATU_LINK_PREV], valid);
> +}
> +
> +static inline void catu_table_update_next(cate_t *table, int valid)
> +{
> +	catu_update_state(&table[CATU_LINK_NEXT], valid);
> +}
> +
> +/*
> + * catu_get_table : Retrieve the table pointers for the given @offset
> + * within the buffer. The buffer is wrapped around to a valid offset.
> + *
> + * Returns : The CPU virtual address for the beginning of the table
> + * containing the data page pointer for @offset. If @daddrp is not NULL,
> + * @daddrp points the DMA address of the beginning of the table.
> + */
> +static inline cate_t *catu_get_table(struct tmc_sg_table *catu_table,
> +				     unsigned long offset,
> +				     dma_addr_t *daddrp)
> +{
> +	unsigned long buf_size = tmc_sg_table_buf_size(catu_table);
> +	unsigned int table_nr, pg_idx, pg_offset;
> +	struct tmc_pages *table_pages = &catu_table->table_pages;
> +	void *ptr;
> +
> +	/* Make sure offset is within the range */
> +	offset %= buf_size;
> +
> +	/*
> +	 * Each table can address 1MB and a single kernel page can
> +	 * contain "CATU_PAGES_PER_SYSPAGE" CATU tables.
> +	 */
> +	table_nr = offset >> 20;
> +	/* Find the table page where the table_nr lies in */
> +	pg_idx = table_nr / CATU_PAGES_PER_SYSPAGE;
> +	pg_offset = (table_nr % CATU_PAGES_PER_SYSPAGE) * CATU_PAGE_SIZE;
> +	if (daddrp)
> +		*daddrp = table_pages->daddrs[pg_idx] + pg_offset;
> +	ptr = page_address(table_pages->pages[pg_idx]);
> +	return (cate_t *)((unsigned long)ptr + pg_offset);
> +}
> +
> +#ifdef CATU_DEBUG
> +static void catu_dump_table(struct tmc_sg_table *catu_table)
> +{
> +	int i;
> +	cate_t *table;
> +	unsigned long table_end, buf_size, offset = 0;
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	dev_dbg(catu_table->dev,
> +		"Dump table %p, tdaddr: %llx\n",
> +		catu_table, catu_table->table_daddr);
> +
> +	while (offset < buf_size) {
> +		table_end = offset + SZ_1M < buf_size ?
> +			    offset + SZ_1M : buf_size;
> +		table = catu_get_table(catu_table, offset, NULL);
> +		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
> +			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
> +		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
> +			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
> +		dev_dbg(catu_table->dev, "== End of sub-table ===");
> +	}
> +	dev_dbg(catu_table->dev, "== End of Table ===");
> +}
> +
> +#else
> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
> +{
> +}
> +#endif

I think this approach is better than peppering the code with #ifdefs as it was
done for ETR.  Please fix that to replicate what you've done here.

> +
> +/*
> + * catu_update_table: Update the start and end tables for the
> + * region [base, base + size) to, validate/invalidate the pointers
> + * outside the area.
> + *
> + * CATU expects the table base address (SLADDR) aligned to 4K.
> + * If the @base is not aligned to 1MB, we should mark all the
> + * pointers in the start table before @base "INVALID".
> + * Similarly all pointers in the last table beyond (@base + @size)
> + * should be marked INVALID.
> + * The table page containinig the "base" is marked first (by
> + * marking the previous link INVALID) and the table page
> + * containing "base + size" is marked last (by marking next
> + * link INVALID).
> + * By default we have to update the state of pointers
> + * for offsets in the range :
> + *    Start table: [0, ALIGN_DOWN(base))
> + *    End table  : [ALIGN(end + 1), SZ_1M)
> + * But, if we the buffer wraps around and ends in the same table
> + * as the "base", (i,e this should be :
> + *         [ALIGN(end + 1), base)
> + *
> + * Returns the dma_address for the start_table, which can be set as
> + * SLADDR.
> + */
> +static dma_addr_t catu_update_table(struct tmc_sg_table *catu_table,
> +				    u64 base, u64 size, int valid)
> +{
> +	cate_t *start_table, *end_table;
> +	dma_addr_t taddr;
> +	u64 buf_size, end = base + size - 1;
> +	unsigned int start_off = 0;	/* Offset to begin in start_table */
> +	unsigned int end_off = SZ_1M;	/* Offset to end in the end_table */
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	if (end > buf_size)
> +		end -= buf_size;
> +
> +	/* Get both the virtual and the DMA address of the first table */
> +	start_table = catu_get_table(catu_table, base, &taddr);
> +	end_table = catu_get_table(catu_table, end, NULL);
> +
> +	/* Update the "PREV" link for the starting table */
> +	catu_table_update_prev(start_table, valid);
> +
> +	/* Update the "NEXT" link only if this is not the start_table */
> +	if (end_table != start_table) {
> +		catu_table_update_next(end_table, valid);
> +	} else if (end < base) {
> +		/*
> +		 * If the buffer has wrapped around and we have got the
> +		 * "end" before "base" in the same table, we need to be
> +		 * extra careful. We only need to invalidate the ptrs
> +		 * in between the "end" and "base".
> +		 */
> +		start_off = ALIGN(end, CATU_PAGE_SIZE);
> +		end_off = 0;
> +	}
> +
> +	/* Update the pointers in the starting table before the "base" */
> +	catu_table_update_offset_range(start_table,
> +				       start_off,
> +				       base,
> +				       valid);
> +	if (end_off)
> +		catu_table_update_offset_range(end_table,
> +					       end,
> +					       end_off,
> +					       valid);
> +
> +	catu_dump_table(catu_table);
> +	return taddr;
> +}
> +
> +/*
> + * catu_set_table : Set the buffer to act as linear buffer
> + * from @base of @size.
> + *
> + * Returns : The DMA address for the table containing base.
> + * This can then be programmed into SLADDR.
> + */
> +static dma_addr_t
> +catu_set_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
> +{
> +	/* Make all the entries outside this range invalid */
> +	dma_addr_t sladdr =  catu_update_table(catu_table, base, size, 0);
> +	/* Sync the changes to memory for CATU */
> +	tmc_sg_table_sync_table(catu_table);
> +	return sladdr;
> +}
> +
> +static void __maybe_unused
> +catu_reset_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
> +{
> +	/* Make all the entries outside this range valid */
> +	(void)catu_update_table(catu_table, base, size, 1);
> +}
> +
> +/*
> + * catu_populate_table : Populate the given CATU table.
> + * The table is always populated as a circular table.
> + * i.e, the "prev" link of the "first" table points to the "last"
> + * table and the "next" link of the "last" table points to the
> + * "first" table. The buffer should be made linear by calling
> + * catu_set_table().
> + */
> +static void
> +catu_populate_table(struct tmc_sg_table *catu_table)
> +{
> +	int i, dpidx, s_dpidx;
> +	unsigned long offset, buf_size, last_offset;
> +	dma_addr_t data_daddr;
> +	dma_addr_t prev_taddr, next_taddr, cur_taddr;
> +	cate_t *table_ptr, *next_table;
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	dpidx = s_dpidx = 0;
> +	offset = 0;
> +
> +	table_ptr = catu_get_table(catu_table, 0, &cur_taddr);
> +	/*
> +	 * Use the address of the "last" table as the "prev" link
> +	 * for the first table.
> +	 */
> +	(void)catu_get_table(catu_table, buf_size - 1, &prev_taddr);
> +
> +	while (offset < buf_size) {
> +		/*
> +		 * The @offset is always 1M aligned here and we have an
> +		 * empty table @table_ptr to fill. Each table can address
> +		 * upto 1MB data buffer. The last table may have fewer
> +		 * entries if the buffer size is not aligned.
> +		 */
> +		last_offset = (offset + SZ_1M) < buf_size ?
> +			      (offset + SZ_1M) : buf_size;
> +		for (i = 0; offset < last_offset; i++) {
> +
> +			data_daddr = catu_table->data_pages.daddrs[dpidx] +
> +				     s_dpidx * CATU_PAGE_SIZE;
> +#ifdef CATU_DEBUG
> +			dev_dbg(catu_table->dev,
> +				"[table %5d:%03d] 0x%llx\n",
> +				(offset >> 20), i, data_daddr);
> +#endif

I'm not a fan of adding #ifdefs in the code like this.  I think it is better to
have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
handle the output in there. 

> +			table_ptr[i] = CATU_VALID_ENTRY(data_daddr);
> +			offset += CATU_PAGE_SIZE;
> +			/* Move the pointers for data pages */
> +			s_dpidx = (s_dpidx + 1) % CATU_PAGES_PER_SYSPAGE;
> +			if (s_dpidx == 0)
> +				dpidx++;
> +		}
> +
> +		/*
> +		 * If we have finished all the valid entries, fill the rest of
> +		 * the table (i.e, last table page) with invalid entries,
> +		 * to fail the lookups.
> +		 */
> +		if (offset == buf_size)
> +			catu_table_update_offset_range(table_ptr,
> +						       offset - 1, SZ_1M, 0);
> +
> +		/*
> +		 * Find the next table by looking up the table that contains
> +		 * @offset. For the last table, this will return the very
> +		 * first table (as the offset == buf_size, and thus returns
> +		 * the table for offset = 0.)
> +		 */
> +		next_table = catu_get_table(catu_table, offset, &next_taddr);
> +		table_ptr[CATU_LINK_PREV] = CATU_VALID_ENTRY(prev_taddr);
> +		table_ptr[CATU_LINK_NEXT] = CATU_VALID_ENTRY(next_taddr);
> +
> +#ifdef CATU_DEBUG
> +		dev_dbg(catu_table->dev,
> +			"[table%5d]: Cur: 0x%llx Prev: 0x%llx, Next: 0x%llx\n",
> +			(offset >> 20) - 1,  cur_taddr, prev_taddr, next_taddr);
> +#endif
> +
> +		/* Update the prev/next addresses */
> +		prev_taddr = cur_taddr;
> +		cur_taddr = next_taddr;
> +		table_ptr = next_table;
> +	}
> +}
> +
> +static struct tmc_sg_table __maybe_unused *
> +catu_init_sg_table(struct device *catu_dev, int node,
> +		   ssize_t size, void **pages)
> +{
> +	int nr_tpages;
> +	struct tmc_sg_table *catu_table;
> +
> +	/*
> +	 * Each table can address upto 1MB and we can have
> +	 * CATU_PAGES_PER_SYSPAGE tables in a system page.
> +	 */
> +	nr_tpages = DIV_ROUND_UP(size, SZ_1M) / CATU_PAGES_PER_SYSPAGE;
> +	catu_table = tmc_alloc_sg_table(catu_dev, node, nr_tpages,
> +					size >> PAGE_SHIFT, pages);
> +	if (IS_ERR(catu_table))
> +		return catu_table;
> +
> +	catu_populate_table(catu_table);
> +	/* Make the buf linear from offset 0 */
> +	(void)catu_set_table(catu_table, 0, size);
> +
> +	dev_dbg(catu_dev,
> +		"Setup table %p, size %ldKB, %d table pages\n",
> +		catu_table, (unsigned long)size >> 10,  nr_tpages);

I think this should also be wrapped in a special output debug function.

> +	catu_dump_table(catu_table);
> +	return catu_table;
> +}
> +
>  coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>  coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>  coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
@ 2018-05-07 20:25     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
> This patch adds the support for setting up a SG table for use
> by the CATU. We reuse the tmc_sg_table to represent the table/data
> pages, even though the table format is different.
> 
> Similar to ETR SG table, CATU uses a 4KB page size for data buffers
> as well as page tables. All table entries are 64bit wide and have
> the following format:
> 
>         63                      12      1  0
>         x-----------------------------------x
>         |        Address [63-12] | SBZ  | V |
>         x-----------------------------------x
> 
> 	Where [V] ->	 0 - Pointer is invalid
> 			 1 - Pointer is Valid
> 
> CATU uses only first half of the page for data page pointers.
> i.e, single table page will only have 256 page pointers, addressing
> upto 1MB of data. The second half of a table page contains only two
> pointers at the end of the page (i.e, pointers at index 510 and 511),
> which are used as links to the "Previous" and "Next" page tables
> respectively.
> 
> The first table page has an "Invalid" previous pointer and the
> next pointer entry points to the second page table if there is one.
> Similarly the last table page has an "Invalid" next pointer to
> indicate the end of the table chain.
> 
> We create a circular buffer (i.e, first_table[prev] => last_table
> and last_table[next] => first_table) by default and provide
> helpers to make the buffer linear from a given offset. When we
> set the buffer to linear, we also mark the "pointers" in the
> outside the given "range" as invalid. We have to do this only
> for the starting and ending tables, as we disconnect the other
> table by invalidating the links. This will allow the ETR buf to
> be restored from a given offset with any size.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-catu.c | 409 +++++++++++++++++++++++++++
>  1 file changed, 409 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index 2cd69a6..4cc2928 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -16,10 +16,419 @@
>  
>  #include "coresight-catu.h"
>  #include "coresight-priv.h"
> +#include "coresight-tmc.h"
>  
>  #define csdev_to_catu_drvdata(csdev)	\
>  	dev_get_drvdata(csdev->dev.parent)
>  
> +/*
> + * CATU uses a page size of 4KB for page tables as well as data pages.
> + * Each 64bit entry in the table has the following format.
> + *
> + *	63			12	1  0
> + *	------------------------------------
> + *	|	 Address [63-12] | SBZ	| V|
> + *	------------------------------------
> + *
> + * Where bit[0] V indicates if the address is valid or not.
> + * Each 4K table pages have upto 256 data page pointers, taking upto 2K
> + * size. There are two Link pointers, pointing to the previous and next
> + * table pages respectively at the end of the 4K page. (i.e, entry 510
> + * and 511).
> + *  E.g, a table of two pages could look like :
> + *
> + *                 Table Page 0               Table Page 1
> + * SLADDR ===> x------------------x  x--> x-----------------x
> + * INADDR    ->|  Page 0      | V |  |    | Page 256    | V | <- INADDR+1M
> + *             |------------------|  |    |-----------------|
> + * INADDR+4K ->|  Page 1      | V |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |  Page 2      | V |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |   ...        | V |  |    |    ...          |
> + *             |------------------|  |    |-----------------|
> + * INADDR+1020K|  Page 255    | V |  |    |   Page 511  | V |
> + * SLADDR+2K==>|------------------|  |    |-----------------|
> + *             |  UNUSED      |   |  |    |                 |
> + *             |------------------|  |    |                 |
> + *             |  UNUSED      |   |  |    |                 |
> + *             |------------------|  |    |                 |
> + *             |    ...       |   |  |    |                 |
> + *             |------------------|  |    |-----------------|
> + *             |   IGNORED    | 0 |  |    | Table Page 0| 1 |
> + *             |------------------|  |    |-----------------|
> + *             |  Table Page 1| 1 |--x    | IGNORED     | 0 |
> + *             x------------------x       x-----------------x
> + * SLADDR+4K==>
> + *
> + * The base input address (used by the ETR, programmed in INADDR_{LO,HI})
> + * must be aligned to 1MB (the size addressable by a single page table).
> + * The CATU maps INADDR{LO:HI} to the first page in the table pointed
> + * to by SLADDR{LO:HI} and so on.
> + *
> + */
> +typedef u64 cate_t;
> +
> +#define CATU_PAGE_SHIFT		12
> +#define CATU_PAGE_SIZE		(1UL << CATU_PAGE_SHIFT)
> +#define CATU_PAGES_PER_SYSPAGE	(PAGE_SIZE / CATU_PAGE_SIZE)
> +
> +/* Page pointers are only allocated in the first 2K half */
> +#define CATU_PTRS_PER_PAGE	((CATU_PAGE_SIZE >> 1) / sizeof(cate_t))
> +#define CATU_PTRS_PER_SYSPAGE	(CATU_PAGES_PER_SYSPAGE * CATU_PTRS_PER_PAGE)
> +#define CATU_LINK_PREV		((CATU_PAGE_SIZE / sizeof(cate_t)) - 2)
> +#define CATU_LINK_NEXT		((CATU_PAGE_SIZE / sizeof(cate_t)) - 1)
> +
> +#define CATU_ADDR_SHIFT		12
> +#define CATU_ADDR_MASK		~(((cate_t)1 << CATU_ADDR_SHIFT) - 1)
> +#define CATU_ENTRY_VALID	((cate_t)0x1)
> +#define CATU_ENTRY_INVALID	((cate_t)0)
> +#define CATU_VALID_ENTRY(addr) \
> +	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
> +#define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
> +
> +/*
> + * Index into the CATU entry pointing to the page within
> + * the table. Each table entry can point to a 4KB page, with
> + * a total of 255 entries in the table adding upto 1MB per table.
> + *
> + * So, bits 19:12 gives you the index of the entry in
> + * the table.
> + */
> +static inline unsigned long catu_offset_to_entry_idx(unsigned long offset)
> +{
> +	return (offset & (SZ_1M - 1)) >> 12;
> +}
> +
> +static inline void catu_update_state(cate_t *catep, int valid)
> +{
> +	*catep &= ~CATU_ENTRY_VALID;
> +	*catep |= valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
> +}
> +
> +/*
> + * Update the valid bit for a given range of indices [start, end)
> + * in the given table @table.
> + */
> +static inline void catu_update_state_range(cate_t *table, int start,
> +						 int end, int valid)

Indentation

> +{
> +	int i;
> +	cate_t *pentry = &table[start];
> +	cate_t state = valid ? CATU_ENTRY_VALID : CATU_ENTRY_INVALID;
> +
> +	/* Limit the "end" to maximum range */
> +	if (end > CATU_PTRS_PER_PAGE)
> +		end = CATU_PTRS_PER_PAGE;
> +
> +	for (i = start; i < end; i++, pentry++) {
> +		*pentry &= ~(cate_t)CATU_ENTRY_VALID;
> +		*pentry |= state;
> +	}
> +}
> +
> +/*
> + * Update valid bit for all entries in the range [start, end)
> + */
> +static inline void
> +catu_table_update_offset_range(cate_t *table,
> +			       unsigned long start,
> +			       unsigned long end,
> +			       int valid)
> +{
> +	catu_update_state_range(table,
> +				catu_offset_to_entry_idx(start),
> +				catu_offset_to_entry_idx(end),
> +				valid);
> +}
> +
> +static inline void catu_table_update_prev(cate_t *table, int valid)
> +{
> +	catu_update_state(&table[CATU_LINK_PREV], valid);
> +}
> +
> +static inline void catu_table_update_next(cate_t *table, int valid)
> +{
> +	catu_update_state(&table[CATU_LINK_NEXT], valid);
> +}
> +
> +/*
> + * catu_get_table : Retrieve the table pointers for the given @offset
> + * within the buffer. The buffer is wrapped around to a valid offset.
> + *
> + * Returns : The CPU virtual address for the beginning of the table
> + * containing the data page pointer for @offset. If @daddrp is not NULL,
> + * @daddrp points the DMA address of the beginning of the table.
> + */
> +static inline cate_t *catu_get_table(struct tmc_sg_table *catu_table,
> +				     unsigned long offset,
> +				     dma_addr_t *daddrp)
> +{
> +	unsigned long buf_size = tmc_sg_table_buf_size(catu_table);
> +	unsigned int table_nr, pg_idx, pg_offset;
> +	struct tmc_pages *table_pages = &catu_table->table_pages;
> +	void *ptr;
> +
> +	/* Make sure offset is within the range */
> +	offset %= buf_size;
> +
> +	/*
> +	 * Each table can address 1MB and a single kernel page can
> +	 * contain "CATU_PAGES_PER_SYSPAGE" CATU tables.
> +	 */
> +	table_nr = offset >> 20;
> +	/* Find the table page where the table_nr lies in */
> +	pg_idx = table_nr / CATU_PAGES_PER_SYSPAGE;
> +	pg_offset = (table_nr % CATU_PAGES_PER_SYSPAGE) * CATU_PAGE_SIZE;
> +	if (daddrp)
> +		*daddrp = table_pages->daddrs[pg_idx] + pg_offset;
> +	ptr = page_address(table_pages->pages[pg_idx]);
> +	return (cate_t *)((unsigned long)ptr + pg_offset);
> +}
> +
> +#ifdef CATU_DEBUG
> +static void catu_dump_table(struct tmc_sg_table *catu_table)
> +{
> +	int i;
> +	cate_t *table;
> +	unsigned long table_end, buf_size, offset = 0;
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	dev_dbg(catu_table->dev,
> +		"Dump table %p, tdaddr: %llx\n",
> +		catu_table, catu_table->table_daddr);
> +
> +	while (offset < buf_size) {
> +		table_end = offset + SZ_1M < buf_size ?
> +			    offset + SZ_1M : buf_size;
> +		table = catu_get_table(catu_table, offset, NULL);
> +		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
> +			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
> +		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
> +			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
> +		dev_dbg(catu_table->dev, "== End of sub-table ===");
> +	}
> +	dev_dbg(catu_table->dev, "== End of Table ===");
> +}
> +
> +#else
> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
> +{
> +}
> +#endif

I think this approach is better than peppering the code with #ifdefs as it was
done for ETR.  Please fix that to replicate what you've done here.

> +
> +/*
> + * catu_update_table: Update the start and end tables for the
> + * region [base, base + size) to, validate/invalidate the pointers
> + * outside the area.
> + *
> + * CATU expects the table base address (SLADDR) aligned to 4K.
> + * If the @base is not aligned to 1MB, we should mark all the
> + * pointers in the start table before @base "INVALID".
> + * Similarly all pointers in the last table beyond (@base + @size)
> + * should be marked INVALID.
> + * The table page containinig the "base" is marked first (by
> + * marking the previous link INVALID) and the table page
> + * containing "base + size" is marked last (by marking next
> + * link INVALID).
> + * By default we have to update the state of pointers
> + * for offsets in the range :
> + *    Start table: [0, ALIGN_DOWN(base))
> + *    End table  : [ALIGN(end + 1), SZ_1M)
> + * But, if we the buffer wraps around and ends in the same table
> + * as the "base", (i,e this should be :
> + *         [ALIGN(end + 1), base)
> + *
> + * Returns the dma_address for the start_table, which can be set as
> + * SLADDR.
> + */
> +static dma_addr_t catu_update_table(struct tmc_sg_table *catu_table,
> +				    u64 base, u64 size, int valid)
> +{
> +	cate_t *start_table, *end_table;
> +	dma_addr_t taddr;
> +	u64 buf_size, end = base + size - 1;
> +	unsigned int start_off = 0;	/* Offset to begin in start_table */
> +	unsigned int end_off = SZ_1M;	/* Offset to end in the end_table */
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	if (end > buf_size)
> +		end -= buf_size;
> +
> +	/* Get both the virtual and the DMA address of the first table */
> +	start_table = catu_get_table(catu_table, base, &taddr);
> +	end_table = catu_get_table(catu_table, end, NULL);
> +
> +	/* Update the "PREV" link for the starting table */
> +	catu_table_update_prev(start_table, valid);
> +
> +	/* Update the "NEXT" link only if this is not the start_table */
> +	if (end_table != start_table) {
> +		catu_table_update_next(end_table, valid);
> +	} else if (end < base) {
> +		/*
> +		 * If the buffer has wrapped around and we have got the
> +		 * "end" before "base" in the same table, we need to be
> +		 * extra careful. We only need to invalidate the ptrs
> +		 * in between the "end" and "base".
> +		 */
> +		start_off = ALIGN(end, CATU_PAGE_SIZE);
> +		end_off = 0;
> +	}
> +
> +	/* Update the pointers in the starting table before the "base" */
> +	catu_table_update_offset_range(start_table,
> +				       start_off,
> +				       base,
> +				       valid);
> +	if (end_off)
> +		catu_table_update_offset_range(end_table,
> +					       end,
> +					       end_off,
> +					       valid);
> +
> +	catu_dump_table(catu_table);
> +	return taddr;
> +}
> +
> +/*
> + * catu_set_table : Set the buffer to act as linear buffer
> + * from @base of @size.
> + *
> + * Returns : The DMA address for the table containing base.
> + * This can then be programmed into SLADDR.
> + */
> +static dma_addr_t
> +catu_set_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
> +{
> +	/* Make all the entries outside this range invalid */
> +	dma_addr_t sladdr =  catu_update_table(catu_table, base, size, 0);
> +	/* Sync the changes to memory for CATU */
> +	tmc_sg_table_sync_table(catu_table);
> +	return sladdr;
> +}
> +
> +static void __maybe_unused
> +catu_reset_table(struct tmc_sg_table *catu_table, u64 base, u64 size)
> +{
> +	/* Make all the entries outside this range valid */
> +	(void)catu_update_table(catu_table, base, size, 1);
> +}
> +
> +/*
> + * catu_populate_table : Populate the given CATU table.
> + * The table is always populated as a circular table.
> + * i.e, the "prev" link of the "first" table points to the "last"
> + * table and the "next" link of the "last" table points to the
> + * "first" table. The buffer should be made linear by calling
> + * catu_set_table().
> + */
> +static void
> +catu_populate_table(struct tmc_sg_table *catu_table)
> +{
> +	int i, dpidx, s_dpidx;
> +	unsigned long offset, buf_size, last_offset;
> +	dma_addr_t data_daddr;
> +	dma_addr_t prev_taddr, next_taddr, cur_taddr;
> +	cate_t *table_ptr, *next_table;
> +
> +	buf_size = tmc_sg_table_buf_size(catu_table);
> +	dpidx = s_dpidx = 0;
> +	offset = 0;
> +
> +	table_ptr = catu_get_table(catu_table, 0, &cur_taddr);
> +	/*
> +	 * Use the address of the "last" table as the "prev" link
> +	 * for the first table.
> +	 */
> +	(void)catu_get_table(catu_table, buf_size - 1, &prev_taddr);
> +
> +	while (offset < buf_size) {
> +		/*
> +		 * The @offset is always 1M aligned here and we have an
> +		 * empty table @table_ptr to fill. Each table can address
> +		 * upto 1MB data buffer. The last table may have fewer
> +		 * entries if the buffer size is not aligned.
> +		 */
> +		last_offset = (offset + SZ_1M) < buf_size ?
> +			      (offset + SZ_1M) : buf_size;
> +		for (i = 0; offset < last_offset; i++) {
> +
> +			data_daddr = catu_table->data_pages.daddrs[dpidx] +
> +				     s_dpidx * CATU_PAGE_SIZE;
> +#ifdef CATU_DEBUG
> +			dev_dbg(catu_table->dev,
> +				"[table %5d:%03d] 0x%llx\n",
> +				(offset >> 20), i, data_daddr);
> +#endif

I'm not a fan of adding #ifdefs in the code like this.  I think it is better to
have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
handle the output in there. 

> +			table_ptr[i] = CATU_VALID_ENTRY(data_daddr);
> +			offset += CATU_PAGE_SIZE;
> +			/* Move the pointers for data pages */
> +			s_dpidx = (s_dpidx + 1) % CATU_PAGES_PER_SYSPAGE;
> +			if (s_dpidx == 0)
> +				dpidx++;
> +		}
> +
> +		/*
> +		 * If we have finished all the valid entries, fill the rest of
> +		 * the table (i.e, last table page) with invalid entries,
> +		 * to fail the lookups.
> +		 */
> +		if (offset == buf_size)
> +			catu_table_update_offset_range(table_ptr,
> +						       offset - 1, SZ_1M, 0);
> +
> +		/*
> +		 * Find the next table by looking up the table that contains
> +		 * @offset. For the last table, this will return the very
> +		 * first table (as the offset == buf_size, and thus returns
> +		 * the table for offset = 0.)
> +		 */
> +		next_table = catu_get_table(catu_table, offset, &next_taddr);
> +		table_ptr[CATU_LINK_PREV] = CATU_VALID_ENTRY(prev_taddr);
> +		table_ptr[CATU_LINK_NEXT] = CATU_VALID_ENTRY(next_taddr);
> +
> +#ifdef CATU_DEBUG
> +		dev_dbg(catu_table->dev,
> +			"[table%5d]: Cur: 0x%llx Prev: 0x%llx, Next: 0x%llx\n",
> +			(offset >> 20) - 1,  cur_taddr, prev_taddr, next_taddr);
> +#endif
> +
> +		/* Update the prev/next addresses */
> +		prev_taddr = cur_taddr;
> +		cur_taddr = next_taddr;
> +		table_ptr = next_table;
> +	}
> +}
> +
> +static struct tmc_sg_table __maybe_unused *
> +catu_init_sg_table(struct device *catu_dev, int node,
> +		   ssize_t size, void **pages)
> +{
> +	int nr_tpages;
> +	struct tmc_sg_table *catu_table;
> +
> +	/*
> +	 * Each table can address upto 1MB and we can have
> +	 * CATU_PAGES_PER_SYSPAGE tables in a system page.
> +	 */
> +	nr_tpages = DIV_ROUND_UP(size, SZ_1M) / CATU_PAGES_PER_SYSPAGE;
> +	catu_table = tmc_alloc_sg_table(catu_dev, node, nr_tpages,
> +					size >> PAGE_SHIFT, pages);
> +	if (IS_ERR(catu_table))
> +		return catu_table;
> +
> +	catu_populate_table(catu_table);
> +	/* Make the buf linear from offset 0 */
> +	(void)catu_set_table(catu_table, 0, size);
> +
> +	dev_dbg(catu_dev,
> +		"Setup table %p, size %ldKB, %d table pages\n",
> +		catu_table, (unsigned long)size >> 10,  nr_tpages);

I think this should also be wrapped in a special output debug function.

> +	catu_dump_table(catu_table);
> +	return catu_table;
> +}
> +
>  coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>  coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>  coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 22:02     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:02 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:49AM +0100, Suzuki K Poulose wrote:
> Now that we can use a CATU with a scatter gather table, add support
> for the TMC ETR to make use of the connected CATU in translate mode.
> This is done by adding CATU as new buffer mode. CATU's SLADDR must
> always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
> and we adjust the DBA for the ETR to align to the "offset" within
> the 1MB page.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-catu.c    | 189 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-catu.h    |  30 ++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  23 ++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |   1 +
>  4 files changed, 235 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index 4cc2928..a4792fa 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -22,6 +22,21 @@
>  	dev_get_drvdata(csdev->dev.parent)
>  
>  /*
> + * catu_etr_buf		- CATU buffer descriptor
> + * @catu_table		- SG table for the CATU
> + * @sladdr		- Table base address for CATU
> + * @start_offset	- Current offset where the ETR starts writing
> + *			  within the buffer.
> + * @cur_size		- Current size used by the ETR.
> + */
> +struct catu_etr_buf {
> +	struct tmc_sg_table	*catu_table;
> +	u64			sladdr;
> +	u64			start_offset;
> +	u64			cur_size;
> +};
> +
> +/*
>   * CATU uses a page size of 4KB for page tables as well as data pages.
>   * Each 64bit entry in the table has the following format.
>   *
> @@ -87,6 +102,9 @@ typedef u64 cate_t;
>  	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
>  #define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
>  
> +/* CATU expects the INADDR to be aligned to 1M. */
> +#define CATU_DEFAULT_INADDR	(1ULL << 20)
> +
>  /*
>   * Index into the CATU entry pointing to the page within
>   * the table. Each table entry can point to a 4KB page, with
> @@ -401,7 +419,7 @@ catu_populate_table(struct tmc_sg_table *catu_table)
>  	}
>  }
>  
> -static struct tmc_sg_table __maybe_unused *
> +static struct tmc_sg_table *
>  catu_init_sg_table(struct device *catu_dev, int node,
>  		   ssize_t size, void **pages)
>  {
> @@ -429,6 +447,149 @@ catu_init_sg_table(struct device *catu_dev, int node,
>  	return catu_table;
>  }
>  
> +static void catu_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	struct catu_etr_buf *catu_buf;
> +
> +	if (!etr_buf || etr_buf->mode != ETR_MODE_CATU || !etr_buf->private)
> +		return;
> +	catu_buf = etr_buf->private;
> +	tmc_free_sg_table(catu_buf->catu_table);
> +	kfree(catu_buf);
> +}
> +
> +static ssize_t catu_get_data_etr_buf(struct etr_buf *etr_buf, u64 offset,
> +				     size_t len, char **bufpp)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(catu_buf->catu_table, offset, len, bufpp);
> +}
> +
> +static void catu_sync_etr_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +	s64 r_offset, w_offset;
> +	unsigned long buf_size = tmc_sg_table_buf_size(catu_buf->catu_table);
> +
> +	/*
> +	 * ETR started off at etr_buf->hwaddr which corresponds to
> +	 * start_offset within the trace buffer. Convert the RRP/RWP
> +	 * offsets within the trace buffer.
> +	 */
> +	r_offset = (s64)rrp - etr_buf->hwaddr + catu_buf->start_offset;
> +	r_offset -= (r_offset > buf_size) ? buf_size : 0;
> +
> +	w_offset = (s64)rwp - etr_buf->hwaddr + catu_buf->start_offset;
> +	w_offset -= (w_offset > buf_size) ? buf_size : 0;
> +
> +	if (!etr_buf->full) {
> +		etr_buf->len = w_offset - r_offset;
> +		if (w_offset < r_offset)
> +			etr_buf->len += buf_size;
> +	} else {
> +		etr_buf->len = etr_buf->size;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	tmc_sg_table_sync_data_range(catu_buf->catu_table,
> +				     r_offset, etr_buf->len);
> +}
> +
> +static inline void catu_set_etr_buf(struct etr_buf *etr_buf, u64 base, u64 size)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +	catu_buf->start_offset = base;
> +	catu_buf->cur_size = size;
> +
> +	/*
> +	 * CATU always maps 1MB aligned addresses. ETR should start at
> +	 * the offset within the first table.
> +	 */
> +	etr_buf->hwaddr = CATU_DEFAULT_INADDR + (base & (SZ_1M - 1));
> +	etr_buf->size = size;
> +	etr_buf->rwp = etr_buf->rrp = etr_buf->hwaddr;
> +}
> +
> +static int catu_restore_etr_buf(struct etr_buf *etr_buf,
> +				unsigned long r_offset,
> +				unsigned long w_offset,
> +				unsigned long size,
> +				u32 status,
> +				bool has_save_restore)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +	u64 end = w_offset + size;
> +	u64 cur_end = catu_buf->start_offset + catu_buf->cur_size;
> +
> +	/*
> +	 * We cannot support rotation without a full table
> +	 * at the end. i.e, the buffer size should be aligned
> +	 * to 1MB.
> +	 */
> +	if (tmc_sg_table_buf_size(catu_buf->catu_table) & (SZ_1M - 1))
> +		return -EINVAL;
> +
> +	/*
> +	 * We don't have to make any changes to the table if the
> +	 * current (start, end) and the new (start, end) are in the
> +	 * same pages respectively.
> +	 */
> +	if ((w_offset ^ catu_buf->start_offset) & ~(CATU_PAGE_SIZE - 1) ||
> +	    (end ^ cur_end) & ~(CATU_PAGE_SIZE - 1)) {
> +		catu_reset_table(catu_buf->catu_table, catu_buf->start_offset,
> +				 catu_buf->cur_size);
> +		catu_buf->sladdr = catu_set_table(catu_buf->catu_table,
> +						  w_offset, size);
> +	}
> +
> +	catu_set_etr_buf(etr_buf, w_offset, size);
> +
> +	return 0;
> +}
> +
> +static int catu_alloc_etr_buf(struct tmc_drvdata *tmc_drvdata,
> +			      struct etr_buf *etr_buf, int node, void **pages)
> +{
> +	struct coresight_device *csdev;
> +	struct device *catu_dev;
> +	struct tmc_sg_table *catu_table;
> +	struct catu_etr_buf *catu_buf;
> +
> +	csdev = tmc_etr_get_catu_device(tmc_drvdata);
> +	if (!csdev)
> +		return -ENODEV;
> +	catu_dev = csdev->dev.parent;
> +	catu_buf = kzalloc(sizeof(*catu_buf), GFP_KERNEL);
> +	if (!catu_buf)
> +		return -ENOMEM;
> +
> +	catu_table = catu_init_sg_table(catu_dev, node, etr_buf->size, pages);
> +	if (IS_ERR(catu_table)) {
> +		kfree(catu_buf);
> +		return PTR_ERR(catu_table);
> +	}
> +
> +	etr_buf->mode = ETR_MODE_CATU;
> +	etr_buf->private = catu_buf;
> +	catu_buf->catu_table = catu_table;
> +
> +	/* By default make the buffer linear from 0 with full size */
> +	catu_set_etr_buf(etr_buf, 0, etr_buf->size);
> +	catu_dump_table(catu_table);
> +
> +	return 0;
> +}
> +
> +const struct etr_buf_operations etr_catu_buf_ops = {
> +	.alloc = catu_alloc_etr_buf,
> +	.free = catu_free_etr_buf,
> +	.sync = catu_sync_etr_buf,
> +	.get_data = catu_get_data_etr_buf,
> +	.restore = catu_restore_etr_buf,
> +};
> +
>  coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>  coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>  coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> @@ -467,9 +628,11 @@ static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
>  				 CATU_STATUS, CATU_STATUS_READY, 1);
>  }
>  
> -static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *data)
>  {
>  	u32 control;
> +	u32 mode;
> +	struct etr_buf *etr_buf = data;
>  
>  	if (catu_wait_for_ready(drvdata))
>  		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
> @@ -481,9 +644,27 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
>  	}
>  
>  	control |= BIT(CATU_CONTROL_ENABLE);
> -	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
> +
> +	if (etr_buf && etr_buf->mode == ETR_MODE_CATU) {
> +		struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +		mode = CATU_MODE_TRANSLATE;
> +		catu_write_axictrl(drvdata, CATU_OS_AXICTRL);
> +		catu_write_sladdr(drvdata, catu_buf->sladdr);
> +		catu_write_inaddr(drvdata, CATU_DEFAULT_INADDR);
> +	} else {
> +		mode = CATU_MODE_PASS_THROUGH;
> +		catu_write_sladdr(drvdata, 0);
> +		catu_write_inaddr(drvdata, 0);
> +	}
> +
> +	catu_write_irqen(drvdata, 0);
> +	catu_write_mode(drvdata, mode);
>  	catu_write_control(drvdata, control);
> -	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
> +	dev_dbg(drvdata->dev, "Enabled in %s mode\n",
> +		(mode == CATU_MODE_PASS_THROUGH) ?
> +		"Pass through" :
> +		"Translate");
>  	return 0;
>  }
>  
> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
> index cd58d6f..b673a73 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.h
> +++ b/drivers/hwtracing/coresight/coresight-catu.h
> @@ -29,6 +29,32 @@
>  #define CATU_MODE_PASS_THROUGH	0U
>  #define CATU_MODE_TRANSLATE	1U
>  
> +#define CATU_AXICTRL_ARCACHE_SHIFT	4
> +#define CATU_AXICTRL_ARCACHE_MASK	0xf
> +#define CATU_AXICTRL_ARPROT_MASK	0x3
> +#define CATU_AXICTRL_ARCACHE(arcache)		\
> +	(((arcache) & CATU_AXICTRL_ARCACHE_MASK) << CATU_AXICTRL_ARCACHE_SHIFT)
> +
> +#define CATU_AXICTRL_VAL(arcache, arprot)	\
> +	(CATU_AXICTRL_ARCACHE(arcache) | ((arprot) & CATU_AXICTRL_ARPROT_MASK))
> +
> +#define AXI3_AxCACHE_WB_READ_ALLOC	0x7
> +/*
> + * AXI - ARPROT bits:
> + * See AMBA AXI & ACE Protocol specification (ARM IHI 0022E)
> + * sectionA4.7 Access Permissions.
> + *
> + * Bit 0: 0 - Unprivileged access, 1 - Privileged access
> + * Bit 1: 0 - Secure access, 1 - Non-secure access.
> + * Bit 2: 0 - Data access, 1 - instruction access.
> + *
> + * CATU AXICTRL:ARPROT[2] is res0 as we always access data.
> + */
> +#define CATU_OS_ARPROT			0x2
> +
> +#define CATU_OS_AXICTRL		\
> +	CATU_AXICTRL_VAL(AXI3_AxCACHE_WB_READ_ALLOC, CATU_OS_ARPROT)
> +
>  #define CATU_STATUS_READY	8
>  #define CATU_STATUS_ADRERR	0
>  #define CATU_STATUS_AXIERR	4
> @@ -71,6 +97,8 @@ catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
>  
>  CATU_REG32(control, CATU_CONTROL);
>  CATU_REG32(mode, CATU_MODE);
> +CATU_REG32(irqen, CATU_IRQEN);
> +CATU_REG32(axictrl, CATU_AXICTRL);
>  CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
>  CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
>  
> @@ -86,4 +114,6 @@ static inline bool coresight_is_catu_device(struct coresight_device *csdev)
>  	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>  }
>  
> +extern const struct etr_buf_operations etr_catu_buf_ops;
> +
>  #endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 25e7feb..41dde0a 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
>  static const struct etr_buf_operations *etr_buf_ops[] = {
>  	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
>  	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +#ifdef CONFIG_CORESIGHT_CATU
> +	[ETR_MODE_CATU] = &etr_catu_buf_ops,
> +#endif
>  };
>  
>  static inline int tmc_etr_mode_alloc_buf(int mode,
> @@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
>  	switch (mode) {
>  	case ETR_MODE_FLAT:
>  	case ETR_MODE_ETR_SG:
> +#ifdef CONFIG_CORESIGHT_CATU
> +	case ETR_MODE_CATU:
> +#endif

I really wish we could avoid doing something like this (and the above) but every
alternate solution I come up with is either uglier or on par with it...
Unless someone comes up with a bright idea we'll simply have to let it be.

While looking for a solution I noticed that tmc_etr_get_catu_device()
could be moved to coresight-catu.h.  That way we wouldn't have to include
coresight-catu.h every time coresight-tmc.h is present in a file.  

>  		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
>  		if (!rc)
>  			etr_buf->ops = etr_buf_ops[mode];
> @@ -977,11 +983,15 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	int rc = -ENOMEM;
>  	bool has_etr_sg, has_iommu;
>  	bool has_flat, has_save_restore;
> +	bool has_sg, has_catu;
>  	struct etr_buf *etr_buf;
>  
>  	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
>  	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
>  	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +	has_catu = !!tmc_etr_get_catu_device(drvdata);
> +
> +	has_sg = has_catu || has_etr_sg;
>  
>  	/*
>  	 * We can normally use flat DMA buffer provided that the buffer
> @@ -1006,7 +1016,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
>  		return ERR_PTR(-EINVAL);
>  
> -	if (!has_flat && !has_etr_sg) {
> +	if (!has_flat && !has_sg) {
>  		dev_dbg(drvdata->dev,
>  			"No available backends for ETR buffer with flags %x\n",
>  			flags);
> @@ -1032,17 +1042,22 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	 *
>  	 */
>  	if (!pages && has_flat &&
> -	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +	    (!has_sg || has_iommu || size < SZ_1M))
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
>  					    etr_buf, node, pages);
>  	if (rc && has_etr_sg)
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
>  					    etr_buf, node, pages);
> +	if (rc && has_catu)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_CATU, drvdata,
> +					    etr_buf, node, pages);
>  	if (rc) {
>  		kfree(etr_buf);
>  		return ERR_PTR(rc);
>  	}
>  
> +	dev_dbg(drvdata->dev, "allocated buffer of size %ldKB in mode %d\n",
> +		(unsigned long)size >> 10, etr_buf->mode);
>  	return etr_buf;
>  }
>  
> @@ -1136,7 +1151,7 @@ static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>  
>  	if (catu && helper_ops(catu)->enable)
> -		helper_ops(catu)->enable(catu, NULL);
> +		helper_ops(catu)->enable(catu, drvdata->etr_buf);
>  }
>  
>  static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
> @@ -1144,7 +1159,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>  
>  	if (catu && helper_ops(catu)->disable)
> -		helper_ops(catu)->disable(catu, NULL);
> +		helper_ops(catu)->disable(catu, drvdata->etr_buf);
>  }
>  
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 1bdfb38..1f6aa49 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -139,6 +139,7 @@ enum tmc_mem_intf_width {
>  enum etr_mode {
>  	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
>  	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +	ETR_MODE_CATU,		/* Use SG mechanism in CATU */
>  };
>  
>  /* ETR buffer should support save-restore */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
@ 2018-05-07 22:02     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:49AM +0100, Suzuki K Poulose wrote:
> Now that we can use a CATU with a scatter gather table, add support
> for the TMC ETR to make use of the connected CATU in translate mode.
> This is done by adding CATU as new buffer mode. CATU's SLADDR must
> always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
> and we adjust the DBA for the ETR to align to the "offset" within
> the 1MB page.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-catu.c    | 189 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-catu.h    |  30 ++++
>  drivers/hwtracing/coresight/coresight-tmc-etr.c |  23 ++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |   1 +
>  4 files changed, 235 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index 4cc2928..a4792fa 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -22,6 +22,21 @@
>  	dev_get_drvdata(csdev->dev.parent)
>  
>  /*
> + * catu_etr_buf		- CATU buffer descriptor
> + * @catu_table		- SG table for the CATU
> + * @sladdr		- Table base address for CATU
> + * @start_offset	- Current offset where the ETR starts writing
> + *			  within the buffer.
> + * @cur_size		- Current size used by the ETR.
> + */
> +struct catu_etr_buf {
> +	struct tmc_sg_table	*catu_table;
> +	u64			sladdr;
> +	u64			start_offset;
> +	u64			cur_size;
> +};
> +
> +/*
>   * CATU uses a page size of 4KB for page tables as well as data pages.
>   * Each 64bit entry in the table has the following format.
>   *
> @@ -87,6 +102,9 @@ typedef u64 cate_t;
>  	(((cate_t)(addr) & CATU_ADDR_MASK) | CATU_ENTRY_VALID)
>  #define CATU_ENTRY_ADDR(entry)	((cate_t)(entry) & ~((cate_t)CATU_ENTRY_VALID))
>  
> +/* CATU expects the INADDR to be aligned to 1M. */
> +#define CATU_DEFAULT_INADDR	(1ULL << 20)
> +
>  /*
>   * Index into the CATU entry pointing to the page within
>   * the table. Each table entry can point to a 4KB page, with
> @@ -401,7 +419,7 @@ catu_populate_table(struct tmc_sg_table *catu_table)
>  	}
>  }
>  
> -static struct tmc_sg_table __maybe_unused *
> +static struct tmc_sg_table *
>  catu_init_sg_table(struct device *catu_dev, int node,
>  		   ssize_t size, void **pages)
>  {
> @@ -429,6 +447,149 @@ catu_init_sg_table(struct device *catu_dev, int node,
>  	return catu_table;
>  }
>  
> +static void catu_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	struct catu_etr_buf *catu_buf;
> +
> +	if (!etr_buf || etr_buf->mode != ETR_MODE_CATU || !etr_buf->private)
> +		return;
> +	catu_buf = etr_buf->private;
> +	tmc_free_sg_table(catu_buf->catu_table);
> +	kfree(catu_buf);
> +}
> +
> +static ssize_t catu_get_data_etr_buf(struct etr_buf *etr_buf, u64 offset,
> +				     size_t len, char **bufpp)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(catu_buf->catu_table, offset, len, bufpp);
> +}
> +
> +static void catu_sync_etr_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +	s64 r_offset, w_offset;
> +	unsigned long buf_size = tmc_sg_table_buf_size(catu_buf->catu_table);
> +
> +	/*
> +	 * ETR started off at etr_buf->hwaddr which corresponds to
> +	 * start_offset within the trace buffer. Convert the RRP/RWP
> +	 * offsets within the trace buffer.
> +	 */
> +	r_offset = (s64)rrp - etr_buf->hwaddr + catu_buf->start_offset;
> +	r_offset -= (r_offset > buf_size) ? buf_size : 0;
> +
> +	w_offset = (s64)rwp - etr_buf->hwaddr + catu_buf->start_offset;
> +	w_offset -= (w_offset > buf_size) ? buf_size : 0;
> +
> +	if (!etr_buf->full) {
> +		etr_buf->len = w_offset - r_offset;
> +		if (w_offset < r_offset)
> +			etr_buf->len += buf_size;
> +	} else {
> +		etr_buf->len = etr_buf->size;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	tmc_sg_table_sync_data_range(catu_buf->catu_table,
> +				     r_offset, etr_buf->len);
> +}
> +
> +static inline void catu_set_etr_buf(struct etr_buf *etr_buf, u64 base, u64 size)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +	catu_buf->start_offset = base;
> +	catu_buf->cur_size = size;
> +
> +	/*
> +	 * CATU always maps 1MB aligned addresses. ETR should start at
> +	 * the offset within the first table.
> +	 */
> +	etr_buf->hwaddr = CATU_DEFAULT_INADDR + (base & (SZ_1M - 1));
> +	etr_buf->size = size;
> +	etr_buf->rwp = etr_buf->rrp = etr_buf->hwaddr;
> +}
> +
> +static int catu_restore_etr_buf(struct etr_buf *etr_buf,
> +				unsigned long r_offset,
> +				unsigned long w_offset,
> +				unsigned long size,
> +				u32 status,
> +				bool has_save_restore)
> +{
> +	struct catu_etr_buf *catu_buf = etr_buf->private;
> +	u64 end = w_offset + size;
> +	u64 cur_end = catu_buf->start_offset + catu_buf->cur_size;
> +
> +	/*
> +	 * We cannot support rotation without a full table
> +	 * at the end. i.e, the buffer size should be aligned
> +	 * to 1MB.
> +	 */
> +	if (tmc_sg_table_buf_size(catu_buf->catu_table) & (SZ_1M - 1))
> +		return -EINVAL;
> +
> +	/*
> +	 * We don't have to make any changes to the table if the
> +	 * current (start, end) and the new (start, end) are in the
> +	 * same pages respectively.
> +	 */
> +	if ((w_offset ^ catu_buf->start_offset) & ~(CATU_PAGE_SIZE - 1) ||
> +	    (end ^ cur_end) & ~(CATU_PAGE_SIZE - 1)) {
> +		catu_reset_table(catu_buf->catu_table, catu_buf->start_offset,
> +				 catu_buf->cur_size);
> +		catu_buf->sladdr = catu_set_table(catu_buf->catu_table,
> +						  w_offset, size);
> +	}
> +
> +	catu_set_etr_buf(etr_buf, w_offset, size);
> +
> +	return 0;
> +}
> +
> +static int catu_alloc_etr_buf(struct tmc_drvdata *tmc_drvdata,
> +			      struct etr_buf *etr_buf, int node, void **pages)
> +{
> +	struct coresight_device *csdev;
> +	struct device *catu_dev;
> +	struct tmc_sg_table *catu_table;
> +	struct catu_etr_buf *catu_buf;
> +
> +	csdev = tmc_etr_get_catu_device(tmc_drvdata);
> +	if (!csdev)
> +		return -ENODEV;
> +	catu_dev = csdev->dev.parent;
> +	catu_buf = kzalloc(sizeof(*catu_buf), GFP_KERNEL);
> +	if (!catu_buf)
> +		return -ENOMEM;
> +
> +	catu_table = catu_init_sg_table(catu_dev, node, etr_buf->size, pages);
> +	if (IS_ERR(catu_table)) {
> +		kfree(catu_buf);
> +		return PTR_ERR(catu_table);
> +	}
> +
> +	etr_buf->mode = ETR_MODE_CATU;
> +	etr_buf->private = catu_buf;
> +	catu_buf->catu_table = catu_table;
> +
> +	/* By default make the buffer linear from 0 with full size */
> +	catu_set_etr_buf(etr_buf, 0, etr_buf->size);
> +	catu_dump_table(catu_table);
> +
> +	return 0;
> +}
> +
> +const struct etr_buf_operations etr_catu_buf_ops = {
> +	.alloc = catu_alloc_etr_buf,
> +	.free = catu_free_etr_buf,
> +	.sync = catu_sync_etr_buf,
> +	.get_data = catu_get_data_etr_buf,
> +	.restore = catu_restore_etr_buf,
> +};
> +
>  coresight_simple_reg32(struct catu_drvdata, control, CATU_CONTROL);
>  coresight_simple_reg32(struct catu_drvdata, status, CATU_STATUS);
>  coresight_simple_reg32(struct catu_drvdata, mode, CATU_MODE);
> @@ -467,9 +628,11 @@ static inline int catu_wait_for_ready(struct catu_drvdata *drvdata)
>  				 CATU_STATUS, CATU_STATUS_READY, 1);
>  }
>  
> -static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
> +static int catu_enable_hw(struct catu_drvdata *drvdata, void *data)
>  {
>  	u32 control;
> +	u32 mode;
> +	struct etr_buf *etr_buf = data;
>  
>  	if (catu_wait_for_ready(drvdata))
>  		dev_warn(drvdata->dev, "Timeout while waiting for READY\n");
> @@ -481,9 +644,27 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, void *__unused)
>  	}
>  
>  	control |= BIT(CATU_CONTROL_ENABLE);
> -	catu_write_mode(drvdata, CATU_MODE_PASS_THROUGH);
> +
> +	if (etr_buf && etr_buf->mode == ETR_MODE_CATU) {
> +		struct catu_etr_buf *catu_buf = etr_buf->private;
> +
> +		mode = CATU_MODE_TRANSLATE;
> +		catu_write_axictrl(drvdata, CATU_OS_AXICTRL);
> +		catu_write_sladdr(drvdata, catu_buf->sladdr);
> +		catu_write_inaddr(drvdata, CATU_DEFAULT_INADDR);
> +	} else {
> +		mode = CATU_MODE_PASS_THROUGH;
> +		catu_write_sladdr(drvdata, 0);
> +		catu_write_inaddr(drvdata, 0);
> +	}
> +
> +	catu_write_irqen(drvdata, 0);
> +	catu_write_mode(drvdata, mode);
>  	catu_write_control(drvdata, control);
> -	dev_dbg(drvdata->dev, "Enabled in Pass through mode\n");
> +	dev_dbg(drvdata->dev, "Enabled in %s mode\n",
> +		(mode == CATU_MODE_PASS_THROUGH) ?
> +		"Pass through" :
> +		"Translate");
>  	return 0;
>  }
>  
> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
> index cd58d6f..b673a73 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.h
> +++ b/drivers/hwtracing/coresight/coresight-catu.h
> @@ -29,6 +29,32 @@
>  #define CATU_MODE_PASS_THROUGH	0U
>  #define CATU_MODE_TRANSLATE	1U
>  
> +#define CATU_AXICTRL_ARCACHE_SHIFT	4
> +#define CATU_AXICTRL_ARCACHE_MASK	0xf
> +#define CATU_AXICTRL_ARPROT_MASK	0x3
> +#define CATU_AXICTRL_ARCACHE(arcache)		\
> +	(((arcache) & CATU_AXICTRL_ARCACHE_MASK) << CATU_AXICTRL_ARCACHE_SHIFT)
> +
> +#define CATU_AXICTRL_VAL(arcache, arprot)	\
> +	(CATU_AXICTRL_ARCACHE(arcache) | ((arprot) & CATU_AXICTRL_ARPROT_MASK))
> +
> +#define AXI3_AxCACHE_WB_READ_ALLOC	0x7
> +/*
> + * AXI - ARPROT bits:
> + * See AMBA AXI & ACE Protocol specification (ARM IHI 0022E)
> + * sectionA4.7 Access Permissions.
> + *
> + * Bit 0: 0 - Unprivileged access, 1 - Privileged access
> + * Bit 1: 0 - Secure access, 1 - Non-secure access.
> + * Bit 2: 0 - Data access, 1 - instruction access.
> + *
> + * CATU AXICTRL:ARPROT[2] is res0 as we always access data.
> + */
> +#define CATU_OS_ARPROT			0x2
> +
> +#define CATU_OS_AXICTRL		\
> +	CATU_AXICTRL_VAL(AXI3_AxCACHE_WB_READ_ALLOC, CATU_OS_ARPROT)
> +
>  #define CATU_STATUS_READY	8
>  #define CATU_STATUS_ADRERR	0
>  #define CATU_STATUS_AXIERR	4
> @@ -71,6 +97,8 @@ catu_write_##name(struct catu_drvdata *drvdata, u64 val)		\
>  
>  CATU_REG32(control, CATU_CONTROL);
>  CATU_REG32(mode, CATU_MODE);
> +CATU_REG32(irqen, CATU_IRQEN);
> +CATU_REG32(axictrl, CATU_AXICTRL);
>  CATU_REG_PAIR(sladdr, CATU_SLADDRLO, CATU_SLADDRHI)
>  CATU_REG_PAIR(inaddr, CATU_INADDRLO, CATU_INADDRHI)
>  
> @@ -86,4 +114,6 @@ static inline bool coresight_is_catu_device(struct coresight_device *csdev)
>  	       subtype == CORESIGHT_DEV_SUBTYPE_HELPER_CATU;
>  }
>  
> +extern const struct etr_buf_operations etr_catu_buf_ops;
> +
>  #endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 25e7feb..41dde0a 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
>  static const struct etr_buf_operations *etr_buf_ops[] = {
>  	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
>  	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +#ifdef CONFIG_CORESIGHT_CATU
> +	[ETR_MODE_CATU] = &etr_catu_buf_ops,
> +#endif
>  };
>  
>  static inline int tmc_etr_mode_alloc_buf(int mode,
> @@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
>  	switch (mode) {
>  	case ETR_MODE_FLAT:
>  	case ETR_MODE_ETR_SG:
> +#ifdef CONFIG_CORESIGHT_CATU
> +	case ETR_MODE_CATU:
> +#endif

I really wish we could avoid doing something like this (and the above) but every
alternate solution I come up with is either uglier or on par with it...
Unless someone comes up with a bright idea we'll simply have to let it be.

While looking for a solution I noticed that tmc_etr_get_catu_device()
could be moved to coresight-catu.h.  That way we wouldn't have to include
coresight-catu.h every time coresight-tmc.h is present in a file.  

>  		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
>  		if (!rc)
>  			etr_buf->ops = etr_buf_ops[mode];
> @@ -977,11 +983,15 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	int rc = -ENOMEM;
>  	bool has_etr_sg, has_iommu;
>  	bool has_flat, has_save_restore;
> +	bool has_sg, has_catu;
>  	struct etr_buf *etr_buf;
>  
>  	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
>  	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
>  	has_save_restore = tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE);
> +	has_catu = !!tmc_etr_get_catu_device(drvdata);
> +
> +	has_sg = has_catu || has_etr_sg;
>  
>  	/*
>  	 * We can normally use flat DMA buffer provided that the buffer
> @@ -1006,7 +1016,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	if ((flags & ETR_BUF_F_RESTORE_STATUS) && !has_save_restore)
>  		return ERR_PTR(-EINVAL);
>  
> -	if (!has_flat && !has_etr_sg) {
> +	if (!has_flat && !has_sg) {
>  		dev_dbg(drvdata->dev,
>  			"No available backends for ETR buffer with flags %x\n",
>  			flags);
> @@ -1032,17 +1042,22 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
>  	 *
>  	 */
>  	if (!pages && has_flat &&
> -	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +	    (!has_sg || has_iommu || size < SZ_1M))
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
>  					    etr_buf, node, pages);
>  	if (rc && has_etr_sg)
>  		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
>  					    etr_buf, node, pages);
> +	if (rc && has_catu)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_CATU, drvdata,
> +					    etr_buf, node, pages);
>  	if (rc) {
>  		kfree(etr_buf);
>  		return ERR_PTR(rc);
>  	}
>  
> +	dev_dbg(drvdata->dev, "allocated buffer of size %ldKB in mode %d\n",
> +		(unsigned long)size >> 10, etr_buf->mode);
>  	return etr_buf;
>  }
>  
> @@ -1136,7 +1151,7 @@ static inline void tmc_etr_enable_catu(struct tmc_drvdata *drvdata)
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>  
>  	if (catu && helper_ops(catu)->enable)
> -		helper_ops(catu)->enable(catu, NULL);
> +		helper_ops(catu)->enable(catu, drvdata->etr_buf);
>  }
>  
>  static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
> @@ -1144,7 +1159,7 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  	struct coresight_device *catu = tmc_etr_get_catu_device(drvdata);
>  
>  	if (catu && helper_ops(catu)->disable)
> -		helper_ops(catu)->disable(catu, NULL);
> +		helper_ops(catu)->disable(catu, drvdata->etr_buf);
>  }
>  
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 1bdfb38..1f6aa49 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -139,6 +139,7 @@ enum tmc_mem_intf_width {
>  enum etr_mode {
>  	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
>  	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +	ETR_MODE_CATU,		/* Use SG mechanism in CATU */
>  };
>  
>  /* ETR buffer should support save-restore */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 22:28     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:28 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:51AM +0100, Suzuki K Poulose wrote:
> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight
> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

I'm also in favor of moving to dev_dbg() - the messages they produce are useless
unless serious debugging of the CS framework is happening.  But as Robin Murphy
pointed out it would be great to fix the problem for real rather than masking
it.

I understand this kind of work would be outside the scope of this set.  As such
I'd take this patch but the log message would need to be modified to avoid
talking about LOCKDEP splats, only to make sure nobody thinks the problem has
been fixed.

That being said I work extensively with the CS framework every day (with option
CONFIG_LOCKED_SUPPORT=y) and haven't seen any splats.  Perhaps you have a
recipe to reproduce the problem?

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-dynamic-replicator.c | 4 ++--
>  drivers/hwtracing/coresight/coresight-etb10.c              | 6 +++---
>  drivers/hwtracing/coresight/coresight-etm3x.c              | 4 ++--
>  drivers/hwtracing/coresight/coresight-etm4x.c              | 4 ++--
>  drivers/hwtracing/coresight/coresight-funnel.c             | 4 ++--
>  drivers/hwtracing/coresight/coresight-replicator.c         | 4 ++--
>  drivers/hwtracing/coresight/coresight-stm.c                | 4 ++--
>  drivers/hwtracing/coresight/coresight-tmc-etf.c            | 8 ++++----
>  drivers/hwtracing/coresight/coresight-tmc-etr.c            | 4 ++--
>  drivers/hwtracing/coresight/coresight-tmc.c                | 4 ++--
>  drivers/hwtracing/coresight/coresight-tpiu.c               | 4 ++--
>  11 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> index 043da86..c41d95c 100644
> --- a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> @@ -64,7 +64,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
>  
>  	CS_LOCK(drvdata->base);
>  
> -	dev_info(drvdata->dev, "REPLICATOR enabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
>  	return 0;
>  }
>  
> @@ -83,7 +83,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
>  
>  	CS_LOCK(drvdata->base);
>  
> -	dev_info(drvdata->dev, "REPLICATOR disabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
>  }
>  
>  static const struct coresight_ops_link replicator_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
> index 74232e6..d9c2f87 100644
> --- a/drivers/hwtracing/coresight/coresight-etb10.c
> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
> @@ -163,7 +163,7 @@ static int etb_enable(struct coresight_device *csdev, u32 mode)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  out:
> -	dev_info(drvdata->dev, "ETB enabled\n");
> +	dev_dbg(drvdata->dev, "ETB enabled\n");
>  	return 0;
>  }
>  
> @@ -269,7 +269,7 @@ static void etb_disable(struct coresight_device *csdev)
>  
>  	local_set(&drvdata->mode, CS_MODE_DISABLED);
>  
> -	dev_info(drvdata->dev, "ETB disabled\n");
> +	dev_dbg(drvdata->dev, "ETB disabled\n");
>  }
>  
>  static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
> @@ -512,7 +512,7 @@ static void etb_dump(struct etb_drvdata *drvdata)
>  	}
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "ETB dumped\n");
> +	dev_dbg(drvdata->dev, "ETB dumped\n");
>  }
>  
>  static int etb_open(struct inode *inode, struct file *file)
> diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c
> index 39f42fd..9d4a663 100644
> --- a/drivers/hwtracing/coresight/coresight-etm3x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm3x.c
> @@ -510,7 +510,7 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
>  	drvdata->sticky_enable = true;
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "ETM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
>  	return 0;
>  
>  err:
> @@ -613,7 +613,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
>  	spin_unlock(&drvdata->spinlock);
>  	cpus_read_unlock();
>  
> -	dev_info(drvdata->dev, "ETM tracing disabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
>  }
>  
>  static void etm_disable(struct coresight_device *csdev,
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
> index e84d80b..c9c73c2 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.c
> @@ -274,7 +274,7 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
>  	drvdata->sticky_enable = true;
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "ETM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
>  	return 0;
>  
>  err:
> @@ -387,7 +387,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
>  	spin_unlock(&drvdata->spinlock);
>  	cpus_read_unlock();
>  
> -	dev_info(drvdata->dev, "ETM tracing disabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
>  }
>  
>  static void etm4_disable(struct coresight_device *csdev,
> diff --git a/drivers/hwtracing/coresight/coresight-funnel.c b/drivers/hwtracing/coresight/coresight-funnel.c
> index 9f8ac0be..18b5361 100644
> --- a/drivers/hwtracing/coresight/coresight-funnel.c
> +++ b/drivers/hwtracing/coresight/coresight-funnel.c
> @@ -72,7 +72,7 @@ static int funnel_enable(struct coresight_device *csdev, int inport,
>  
>  	funnel_enable_hw(drvdata, inport);
>  
> -	dev_info(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
> +	dev_dbg(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
>  	return 0;
>  }
>  
> @@ -96,7 +96,7 @@ static void funnel_disable(struct coresight_device *csdev, int inport,
>  
>  	funnel_disable_hw(drvdata, inport);
>  
> -	dev_info(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
> +	dev_dbg(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
>  }
>  
>  static const struct coresight_ops_link funnel_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
> index 3756e71..4f77812 100644
> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> @@ -42,7 +42,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
>  {
>  	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
> -	dev_info(drvdata->dev, "REPLICATOR enabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
>  	return 0;
>  }
>  
> @@ -51,7 +51,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
>  {
>  	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
> -	dev_info(drvdata->dev, "REPLICATOR disabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
>  }
>  
>  static const struct coresight_ops_link replicator_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c
> index 15e7ef38..4c88d99 100644
> --- a/drivers/hwtracing/coresight/coresight-stm.c
> +++ b/drivers/hwtracing/coresight/coresight-stm.c
> @@ -218,7 +218,7 @@ static int stm_enable(struct coresight_device *csdev,
>  	stm_enable_hw(drvdata);
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "STM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "STM tracing enabled\n");
>  	return 0;
>  }
>  
> @@ -281,7 +281,7 @@ static void stm_disable(struct coresight_device *csdev,
>  		pm_runtime_put(drvdata->dev);
>  
>  		local_set(&drvdata->mode, CS_MODE_DISABLED);
> -		dev_info(drvdata->dev, "STM tracing disabled\n");
> +		dev_dbg(drvdata->dev, "STM tracing disabled\n");
>  	}
>  }
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index 1dd44fd..0a32734 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -244,7 +244,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
>  	if (ret)
>  		return ret;
>  
> -	dev_info(drvdata->dev, "TMC-ETB/ETF enabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETB/ETF enabled\n");
>  	return 0;
>  }
>  
> @@ -267,7 +267,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev)
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETB/ETF disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n");
>  }
>  
>  static int tmc_enable_etf_link(struct coresight_device *csdev,
> @@ -286,7 +286,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev,
>  	drvdata->mode = CS_MODE_SYSFS;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETF enabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETF enabled\n");
>  	return 0;
>  }
>  
> @@ -306,7 +306,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
>  	drvdata->mode = CS_MODE_DISABLED;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETF disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETF disabled\n");
>  }
>  
>  static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu,
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 41dde0a..1ef0f62 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1350,7 +1350,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> +		dev_dbg(drvdata->dev, "TMC-ETR enabled\n");
>  
>  	return ret;
>  }
> @@ -1393,7 +1393,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETR disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETR disabled\n");
>  }
>  
>  static const struct coresight_ops_sink tmc_etr_sink_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index 4d41b4b..7adcde3 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -92,7 +92,7 @@ static int tmc_read_prepare(struct tmc_drvdata *drvdata)
>  	}
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC read start\n");
> +		dev_dbg(drvdata->dev, "TMC read start\n");
>  
>  	return ret;
>  }
> @@ -114,7 +114,7 @@ static int tmc_read_unprepare(struct tmc_drvdata *drvdata)
>  	}
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC read end\n");
> +		dev_dbg(drvdata->dev, "TMC read end\n");
>  
>  	return ret;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c
> index 805f7c2..c7f0827 100644
> --- a/drivers/hwtracing/coresight/coresight-tpiu.c
> +++ b/drivers/hwtracing/coresight/coresight-tpiu.c
> @@ -80,7 +80,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode)
>  
>  	tpiu_enable_hw(drvdata);
>  
> -	dev_info(drvdata->dev, "TPIU enabled\n");
> +	dev_dbg(drvdata->dev, "TPIU enabled\n");
>  	return 0;
>  }
>  
> @@ -106,7 +106,7 @@ static void tpiu_disable(struct coresight_device *csdev)
>  
>  	tpiu_disable_hw(drvdata);
>  
> -	dev_info(drvdata->dev, "TPIU disabled\n");
> +	dev_dbg(drvdata->dev, "TPIU disabled\n");
>  }
>  
>  static const struct coresight_ops_sink tpiu_sink_ops = {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-07 22:28     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:51AM +0100, Suzuki K Poulose wrote:
> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight
> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

I'm also in favor of moving to dev_dbg() - the messages they produce are useless
unless serious debugging of the CS framework is happening.  But as Robin Murphy
pointed out it would be great to fix the problem for real rather than masking
it.

I understand this kind of work would be outside the scope of this set.  As such
I'd take this patch but the log message would need to be modified to avoid
talking about LOCKDEP splats, only to make sure nobody thinks the problem has
been fixed.

That being said I work extensively with the CS framework every day (with option
CONFIG_LOCKED_SUPPORT=y) and haven't seen any splats.  Perhaps you have a
recipe to reproduce the problem?

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-dynamic-replicator.c | 4 ++--
>  drivers/hwtracing/coresight/coresight-etb10.c              | 6 +++---
>  drivers/hwtracing/coresight/coresight-etm3x.c              | 4 ++--
>  drivers/hwtracing/coresight/coresight-etm4x.c              | 4 ++--
>  drivers/hwtracing/coresight/coresight-funnel.c             | 4 ++--
>  drivers/hwtracing/coresight/coresight-replicator.c         | 4 ++--
>  drivers/hwtracing/coresight/coresight-stm.c                | 4 ++--
>  drivers/hwtracing/coresight/coresight-tmc-etf.c            | 8 ++++----
>  drivers/hwtracing/coresight/coresight-tmc-etr.c            | 4 ++--
>  drivers/hwtracing/coresight/coresight-tmc.c                | 4 ++--
>  drivers/hwtracing/coresight/coresight-tpiu.c               | 4 ++--
>  11 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> index 043da86..c41d95c 100644
> --- a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
> @@ -64,7 +64,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
>  
>  	CS_LOCK(drvdata->base);
>  
> -	dev_info(drvdata->dev, "REPLICATOR enabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
>  	return 0;
>  }
>  
> @@ -83,7 +83,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
>  
>  	CS_LOCK(drvdata->base);
>  
> -	dev_info(drvdata->dev, "REPLICATOR disabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
>  }
>  
>  static const struct coresight_ops_link replicator_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
> index 74232e6..d9c2f87 100644
> --- a/drivers/hwtracing/coresight/coresight-etb10.c
> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
> @@ -163,7 +163,7 @@ static int etb_enable(struct coresight_device *csdev, u32 mode)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  out:
> -	dev_info(drvdata->dev, "ETB enabled\n");
> +	dev_dbg(drvdata->dev, "ETB enabled\n");
>  	return 0;
>  }
>  
> @@ -269,7 +269,7 @@ static void etb_disable(struct coresight_device *csdev)
>  
>  	local_set(&drvdata->mode, CS_MODE_DISABLED);
>  
> -	dev_info(drvdata->dev, "ETB disabled\n");
> +	dev_dbg(drvdata->dev, "ETB disabled\n");
>  }
>  
>  static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
> @@ -512,7 +512,7 @@ static void etb_dump(struct etb_drvdata *drvdata)
>  	}
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "ETB dumped\n");
> +	dev_dbg(drvdata->dev, "ETB dumped\n");
>  }
>  
>  static int etb_open(struct inode *inode, struct file *file)
> diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c
> index 39f42fd..9d4a663 100644
> --- a/drivers/hwtracing/coresight/coresight-etm3x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm3x.c
> @@ -510,7 +510,7 @@ static int etm_enable_sysfs(struct coresight_device *csdev)
>  	drvdata->sticky_enable = true;
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "ETM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
>  	return 0;
>  
>  err:
> @@ -613,7 +613,7 @@ static void etm_disable_sysfs(struct coresight_device *csdev)
>  	spin_unlock(&drvdata->spinlock);
>  	cpus_read_unlock();
>  
> -	dev_info(drvdata->dev, "ETM tracing disabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
>  }
>  
>  static void etm_disable(struct coresight_device *csdev,
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
> index e84d80b..c9c73c2 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.c
> @@ -274,7 +274,7 @@ static int etm4_enable_sysfs(struct coresight_device *csdev)
>  	drvdata->sticky_enable = true;
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "ETM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing enabled\n");
>  	return 0;
>  
>  err:
> @@ -387,7 +387,7 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
>  	spin_unlock(&drvdata->spinlock);
>  	cpus_read_unlock();
>  
> -	dev_info(drvdata->dev, "ETM tracing disabled\n");
> +	dev_dbg(drvdata->dev, "ETM tracing disabled\n");
>  }
>  
>  static void etm4_disable(struct coresight_device *csdev,
> diff --git a/drivers/hwtracing/coresight/coresight-funnel.c b/drivers/hwtracing/coresight/coresight-funnel.c
> index 9f8ac0be..18b5361 100644
> --- a/drivers/hwtracing/coresight/coresight-funnel.c
> +++ b/drivers/hwtracing/coresight/coresight-funnel.c
> @@ -72,7 +72,7 @@ static int funnel_enable(struct coresight_device *csdev, int inport,
>  
>  	funnel_enable_hw(drvdata, inport);
>  
> -	dev_info(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
> +	dev_dbg(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
>  	return 0;
>  }
>  
> @@ -96,7 +96,7 @@ static void funnel_disable(struct coresight_device *csdev, int inport,
>  
>  	funnel_disable_hw(drvdata, inport);
>  
> -	dev_info(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
> +	dev_dbg(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
>  }
>  
>  static const struct coresight_ops_link funnel_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
> index 3756e71..4f77812 100644
> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> @@ -42,7 +42,7 @@ static int replicator_enable(struct coresight_device *csdev, int inport,
>  {
>  	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
> -	dev_info(drvdata->dev, "REPLICATOR enabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
>  	return 0;
>  }
>  
> @@ -51,7 +51,7 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
>  {
>  	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
> -	dev_info(drvdata->dev, "REPLICATOR disabled\n");
> +	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
>  }
>  
>  static const struct coresight_ops_link replicator_link_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c
> index 15e7ef38..4c88d99 100644
> --- a/drivers/hwtracing/coresight/coresight-stm.c
> +++ b/drivers/hwtracing/coresight/coresight-stm.c
> @@ -218,7 +218,7 @@ static int stm_enable(struct coresight_device *csdev,
>  	stm_enable_hw(drvdata);
>  	spin_unlock(&drvdata->spinlock);
>  
> -	dev_info(drvdata->dev, "STM tracing enabled\n");
> +	dev_dbg(drvdata->dev, "STM tracing enabled\n");
>  	return 0;
>  }
>  
> @@ -281,7 +281,7 @@ static void stm_disable(struct coresight_device *csdev,
>  		pm_runtime_put(drvdata->dev);
>  
>  		local_set(&drvdata->mode, CS_MODE_DISABLED);
> -		dev_info(drvdata->dev, "STM tracing disabled\n");
> +		dev_dbg(drvdata->dev, "STM tracing disabled\n");
>  	}
>  }
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index 1dd44fd..0a32734 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -244,7 +244,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
>  	if (ret)
>  		return ret;
>  
> -	dev_info(drvdata->dev, "TMC-ETB/ETF enabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETB/ETF enabled\n");
>  	return 0;
>  }
>  
> @@ -267,7 +267,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev)
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETB/ETF disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n");
>  }
>  
>  static int tmc_enable_etf_link(struct coresight_device *csdev,
> @@ -286,7 +286,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev,
>  	drvdata->mode = CS_MODE_SYSFS;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETF enabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETF enabled\n");
>  	return 0;
>  }
>  
> @@ -306,7 +306,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
>  	drvdata->mode = CS_MODE_DISABLED;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETF disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETF disabled\n");
>  }
>  
>  static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu,
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 41dde0a..1ef0f62 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1350,7 +1350,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> +		dev_dbg(drvdata->dev, "TMC-ETR enabled\n");
>  
>  	return ret;
>  }
> @@ -1393,7 +1393,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> -	dev_info(drvdata->dev, "TMC-ETR disabled\n");
> +	dev_dbg(drvdata->dev, "TMC-ETR disabled\n");
>  }
>  
>  static const struct coresight_ops_sink tmc_etr_sink_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
> index 4d41b4b..7adcde3 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc.c
> @@ -92,7 +92,7 @@ static int tmc_read_prepare(struct tmc_drvdata *drvdata)
>  	}
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC read start\n");
> +		dev_dbg(drvdata->dev, "TMC read start\n");
>  
>  	return ret;
>  }
> @@ -114,7 +114,7 @@ static int tmc_read_unprepare(struct tmc_drvdata *drvdata)
>  	}
>  
>  	if (!ret)
> -		dev_info(drvdata->dev, "TMC read end\n");
> +		dev_dbg(drvdata->dev, "TMC read end\n");
>  
>  	return ret;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c
> index 805f7c2..c7f0827 100644
> --- a/drivers/hwtracing/coresight/coresight-tpiu.c
> +++ b/drivers/hwtracing/coresight/coresight-tpiu.c
> @@ -80,7 +80,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode)
>  
>  	tpiu_enable_hw(drvdata);
>  
> -	dev_info(drvdata->dev, "TPIU enabled\n");
> +	dev_dbg(drvdata->dev, "TPIU enabled\n");
>  	return 0;
>  }
>  
> @@ -106,7 +106,7 @@ static void tpiu_disable(struct coresight_device *csdev)
>  
>  	tpiu_disable_hw(drvdata);
>  
> -	dev_info(drvdata->dev, "TPIU disabled\n");
> +	dev_dbg(drvdata->dev, "TPIU disabled\n");
>  }
>  
>  static const struct coresight_ops_sink tpiu_sink_ops = {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 24/27] coresight: tmc-etr: Relax collection of trace from sysfs mode
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-07 22:54     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:54 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:54AM +0100, Suzuki K Poulose wrote:
> Since the ETR now uses mode specific buffers, we can reliably
> provide the trace data captured in sysfs mode, even when the ETR
> is operating in PERF mode.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index a35a12f..7551272 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1439,19 +1439,17 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* Don't interfere if operated from Perf */
> -	if (drvdata->mode == CS_MODE_PERF) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
> -	/* If sysfs_buf is NULL the trace data has been read already */
> +	/*
> +	 * We can safely allow reads even if the ETR is operating in PERF mode,
> +	 * since the sysfs session is captured in mode specific data.
> +	 * If drvdata::sysfs_data is NULL the trace data has been read already.
> +	 */
>  	if (!drvdata->sysfs_buf) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
>  
> -	/* Disable the TMC if we are trying to read from a running session */
> +	/* Disable the TMC if we are trying to read from a running session. */

Move that to the previous patch.

>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		tmc_etr_disable_hw(drvdata);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 24/27] coresight: tmc-etr: Relax collection of trace from sysfs mode
@ 2018-05-07 22:54     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-07 22:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:54AM +0100, Suzuki K Poulose wrote:
> Since the ETR now uses mode specific buffers, we can reliably
> provide the trace data captured in sysfs mode, even when the ETR
> is operating in PERF mode.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index a35a12f..7551272 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1439,19 +1439,17 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* Don't interfere if operated from Perf */
> -	if (drvdata->mode == CS_MODE_PERF) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
> -	/* If sysfs_buf is NULL the trace data has been read already */
> +	/*
> +	 * We can safely allow reads even if the ETR is operating in PERF mode,
> +	 * since the sysfs session is captured in mode specific data.
> +	 * If drvdata::sysfs_data is NULL the trace data has been read already.
> +	 */
>  	if (!drvdata->sysfs_buf) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
>  
> -	/* Disable the TMC if we are trying to read from a running session */
> +	/* Disable the TMC if we are trying to read from a running session. */

Move that to the previous patch.

>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		tmc_etr_disable_hw(drvdata);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-03 17:42       ` Mathieu Poirier
@ 2018-05-08 15:40         ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:40 UTC (permalink / raw)
  To: Mathieu Poirier, Rob Herring
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, John Horley, devicetree, Mathieu Poirier



Rob, Mathieu,

On 03/05/18 18:42, Mathieu Poirier wrote:
> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>>> by providing an improved Scatter Gather mechanism for streaming
>>> trace data to non-contiguous system RAM pages.
>>>
>>> Cc: devicetree@vger.kernel.org
>>> Cc: frowand.list@gmail.com
>>> Cc: Rob Herring <robh@kernel.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>>>   1 file changed, 52 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index 15ac8e8..cdd84d0 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>>
>>>                - System Trace Macrocell:
>>>                        "arm,coresight-stm", "arm,primecell"; [1]
>>> +             - Coresight Address Translation Unit (CATU)
>>> +                     "arm, coresight-catu", "arm,primecell";
>>
>> spurious space               ^

Thanks for spotting, will fix it.

>>
>>>
>>>        * reg: physical base address and length of the register
>>>          set(s) of the component.
>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>         (embedded trace router)
>>>
>>> +* Optional property for CATU :
>>> +     * interrupts : Exactly one SPI may be listed for reporting the address
>>> +       error
>>
>> Somewhere you need to define the ports for the CATU.

The ports are defined common to all the coresight components. Would you
like it to be added just for the CATU ?

>>
>>>
>>>   Example:
>>>
>>> @@ -118,6 +123,35 @@ Example:
>>>                };
>>>        };
>>>
>>> +     etr@20070000 {
>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>>> +             reg = <0 0x20070000 0 0x1000>;
>>> +
>>> +                     /* input port */
>>> +                     port@0 {
>>> +                             reg =  <0>;
>>> +                             etr_in_port: endpoint {
>>> +                                     slave-mode;
>>> +                                     remote-endpoint = <&replicator2_out_port0>;
>>> +                             };
>>> +                     };
>>> +
>>> +                     /* CATU link represented by output port */
>>> +                     port@1 {
>>> +                             reg = <0>;
>>
>> While common in the Coresight bindings, having unit-address and reg not
>> match is an error. Mathieu and I discussed this a bit as dtc now warns
>> on these.
>>
>> Either reg should be 1 here, or 'ports' needs to be split into input and
>> output ports. My preference would be the former, but Mathieu objected to
>> this not reflecting the the h/w numbering.
> 
> Suzuki, as we discuss this is related to your work on revamping CS
> bindings for ACPI.  Until that gets done and to move forward with this
> set I suggest you abide to Rob's request.

Ok, I can change it to <1>, as we don't expect any other output port for an
ETR.


Thanks
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-08 15:40         ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:40 UTC (permalink / raw)
  To: linux-arm-kernel



Rob, Mathieu,

On 03/05/18 18:42, Mathieu Poirier wrote:
> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>>> by providing an improved Scatter Gather mechanism for streaming
>>> trace data to non-contiguous system RAM pages.
>>>
>>> Cc: devicetree at vger.kernel.org
>>> Cc: frowand.list at gmail.com
>>> Cc: Rob Herring <robh@kernel.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   .../devicetree/bindings/arm/coresight.txt          | 52 ++++++++++++++++++++++
>>>   1 file changed, 52 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index 15ac8e8..cdd84d0 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>>
>>>                - System Trace Macrocell:
>>>                        "arm,coresight-stm", "arm,primecell"; [1]
>>> +             - Coresight Address Translation Unit (CATU)
>>> +                     "arm, coresight-catu", "arm,primecell";
>>
>> spurious space               ^

Thanks for spotting, will fix it.

>>
>>>
>>>        * reg: physical base address and length of the register
>>>          set(s) of the component.
>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>         (embedded trace router)
>>>
>>> +* Optional property for CATU :
>>> +     * interrupts : Exactly one SPI may be listed for reporting the address
>>> +       error
>>
>> Somewhere you need to define the ports for the CATU.

The ports are defined common to all the coresight components. Would you
like it to be added just for the CATU ?

>>
>>>
>>>   Example:
>>>
>>> @@ -118,6 +123,35 @@ Example:
>>>                };
>>>        };
>>>
>>> +     etr at 20070000 {
>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>>> +             reg = <0 0x20070000 0 0x1000>;
>>> +
>>> +                     /* input port */
>>> +                     port at 0 {
>>> +                             reg =  <0>;
>>> +                             etr_in_port: endpoint {
>>> +                                     slave-mode;
>>> +                                     remote-endpoint = <&replicator2_out_port0>;
>>> +                             };
>>> +                     };
>>> +
>>> +                     /* CATU link represented by output port */
>>> +                     port at 1 {
>>> +                             reg = <0>;
>>
>> While common in the Coresight bindings, having unit-address and reg not
>> match is an error. Mathieu and I discussed this a bit as dtc now warns
>> on these.
>>
>> Either reg should be 1 here, or 'ports' needs to be split into input and
>> output ports. My preference would be the former, but Mathieu objected to
>> this not reflecting the the h/w numbering.
> 
> Suzuki, as we discuss this is related to your work on revamping CS
> bindings for ACPI.  Until that gets done and to move forward with this
> set I suggest you abide to Rob's request.

Ok, I can change it to <1>, as we don't expect any other output port for an
ETR.


Thanks
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-04 22:56         ` Rob Herring
@ 2018-05-08 15:48           ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:48 UTC (permalink / raw)
  To: Rob Herring, Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, John Horley, Mathieu Poirier, devicetree

On 04/05/18 23:56, Rob Herring wrote:
> On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
> <mathieu.poirier@linaro.org> wrote:
>> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>>> We are about to add the support for ETR builtin scatter-gather mode
>>>> for dealing with large amount of trace buffers. However, on some of
>>>> the platforms, using the ETR SG mode can lock up the system due to
>>>> the way the ETR is connected to the memory subsystem.
>>>>
>>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>>> fetch the next page and regular WRITE of trace data. If the READ
>>>> operation doesn't complete(due to the memory subsystem issues,
>>>> which we have seen on a couple of platforms) the trace WRITE
>>>> cannot proceed leading to issues. So, we by default do not
>>>> use the SG mode, unless it is known to be safe on the platform.
>>>> We define a DT property for the TMC node to specify whether we
>>>> have a proper SG mode.


>>>> ---
>>>>   Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>>   drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> index cdd84d0..7c0c8f0 100644
>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>         (embedded trace router)
>>>>
>>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>>> +       use the SG mode on this system.
>>>> +
>>>
>>> Needs a vendor prefix.
>>>
>>
>> Thinking further on this, do we need to make it device specific as
>> well - something like "arm,etr-scatter-gather"?  That way we don't
>> have to redefine "scatter-gather" for other ARM devices if they happen
>> to need the same property but for different reasons.
> 
> No. If we had a bunch of cases, then we'd probably want to have just
> 'scatter-gather'.

Does it mean "arm,scatter-gather" ? If we ever wanted to add the device
specific information, I would prefer to go with "arm,tmc-scatter-gather"
and not "etr-scatter-gather". They both could mean different things.

> 
> BTW, if SG had already been supported, then I'd say this is a quirk
> and we should invert this property. Otherwise, you'd be disabling once
> enabled SG and require working platforms to update their dtb. Of
> course, I shouldn't really let the state of an OS driver influence the
> DT binding.
> 

The SG support is added with this series. So, the OS has never made use
of the feature.

Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-08 15:48           ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/05/18 23:56, Rob Herring wrote:
> On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
> <mathieu.poirier@linaro.org> wrote:
>> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>>> We are about to add the support for ETR builtin scatter-gather mode
>>>> for dealing with large amount of trace buffers. However, on some of
>>>> the platforms, using the ETR SG mode can lock up the system due to
>>>> the way the ETR is connected to the memory subsystem.
>>>>
>>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>>> fetch the next page and regular WRITE of trace data. If the READ
>>>> operation doesn't complete(due to the memory subsystem issues,
>>>> which we have seen on a couple of platforms) the trace WRITE
>>>> cannot proceed leading to issues. So, we by default do not
>>>> use the SG mode, unless it is known to be safe on the platform.
>>>> We define a DT property for the TMC node to specify whether we
>>>> have a proper SG mode.


>>>> ---
>>>>   Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>>   drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> index cdd84d0..7c0c8f0 100644
>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>         (embedded trace router)
>>>>
>>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>>> +       use the SG mode on this system.
>>>> +
>>>
>>> Needs a vendor prefix.
>>>
>>
>> Thinking further on this, do we need to make it device specific as
>> well - something like "arm,etr-scatter-gather"?  That way we don't
>> have to redefine "scatter-gather" for other ARM devices if they happen
>> to need the same property but for different reasons.
> 
> No. If we had a bunch of cases, then we'd probably want to have just
> 'scatter-gather'.

Does it mean "arm,scatter-gather" ? If we ever wanted to add the device
specific information, I would prefer to go with "arm,tmc-scatter-gather"
and not "etr-scatter-gather". They both could mean different things.

> 
> BTW, if SG had already been supported, then I'd say this is a quirk
> and we should invert this property. Otherwise, you'd be disabling once
> enabled SG and require working platforms to update their dtb. Of
> course, I shouldn't really let the state of an OS driver influence the
> DT binding.
> 

The SG support is added with this series. So, the OS has never made use
of the feature.

Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
  2018-05-03 20:40     ` Mathieu Poirier
@ 2018-05-08 15:49       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:49 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 03/05/18 21:40, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:42AM +0100, Suzuki K Poulose wrote:
>> The Coresight TMC-ETR SG mode could be unsafe on a platform where
>> the ETR is not properly connected to account for READ operations.
>> We use a DT node property to indicate if the system is safe.
>> This patch also provides a command line parameter to "force"
>> the use of SG mode to override the firmware information.
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Hi
>>
>> This is more of a debug patch for people who may want to
>> test their platform without too much of hacking. I am not
>> too keen on pushing this patch in.
> 
> I am not either nor do I personally need it to test this feature.  We can leave
> it in for now (and subsequent version) if you need it but we agree that I won't
> queue it to my tree when the time comes.

OK, I was expecting this view from most of us here. I will drop it from my series in
the next version.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use
@ 2018-05-08 15:49       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/05/18 21:40, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:42AM +0100, Suzuki K Poulose wrote:
>> The Coresight TMC-ETR SG mode could be unsafe on a platform where
>> the ETR is not properly connected to account for READ operations.
>> We use a DT node property to indicate if the system is safe.
>> This patch also provides a command line parameter to "force"
>> the use of SG mode to override the firmware information.
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Hi
>>
>> This is more of a debug patch for people who may want to
>> test their platform without too much of hacking. I am not
>> too keen on pushing this patch in.
> 
> I am not either nor do I personally need it to test this feature.  We can leave
> it in for now (and subsequent version) if you need it but we agree that I won't
> queue it to my tree when the time comes.

OK, I was expecting this view from most of us here. I will drop it from my series in
the next version.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
  2018-05-07 20:25     ` Mathieu Poirier
@ 2018-05-08 15:56       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:56 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 07/05/18 21:25, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
>> This patch adds the support for setting up a SG table for use
>> by the CATU. We reuse the tmc_sg_table to represent the table/data
>> pages, even though the table format is different.
>>

...

>>
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>> index 2cd69a6..4cc2928 100644
>> --- a/drivers/hwtracing/coresight/coresight-catu.c
>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>> @@ -16,10 +16,419 @@

...

>> +
>> +/*
>> + * Update the valid bit for a given range of indices [start, end)
>> + * in the given table @table.
>> + */
>> +static inline void catu_update_state_range(cate_t *table, int start,
>> +						 int end, int valid)
> 
> Indentation
> 

...

>> +#ifdef CATU_DEBUG
>> +static void catu_dump_table(struct tmc_sg_table *catu_table)
>> +{
>> +	int i;
>> +	cate_t *table;
>> +	unsigned long table_end, buf_size, offset = 0;
>> +
>> +	buf_size = tmc_sg_table_buf_size(catu_table);
>> +	dev_dbg(catu_table->dev,
>> +		"Dump table %p, tdaddr: %llx\n",
>> +		catu_table, catu_table->table_daddr);
>> +
>> +	while (offset < buf_size) {
>> +		table_end = offset + SZ_1M < buf_size ?
>> +			    offset + SZ_1M : buf_size;
>> +		table = catu_get_table(catu_table, offset, NULL);
>> +		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
>> +			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
>> +		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
>> +			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
>> +		dev_dbg(catu_table->dev, "== End of sub-table ===");
>> +	}
>> +	dev_dbg(catu_table->dev, "== End of Table ===");
>> +}
>> +
>> +#else
>> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
>> +{
>> +}
>> +#endif
> 
> I think this approach is better than peppering the code with #ifdefs as it was
> done for ETR.  Please fix that to replicate what you've done here.
> 

OK

>> +
>> +/*
>> + * catu_populate_table : Populate the given CATU table.
>> + * The table is always populated as a circular table.
>> + * i.e, the "prev" link of the "first" table points to the "last"
>> + * table and the "next" link of the "last" table points to the
>> + * "first" table. The buffer should be made linear by calling
>> + * catu_set_table().
>> + */
>> +static void
>> +catu_populate_table(struct tmc_sg_table *catu_table)
>> +{

...

>> +	while (offset < buf_size) {
>> +		/*
>> +		 * The @offset is always 1M aligned here and we have an
>> +		 * empty table @table_ptr to fill. Each table can address
>> +		 * upto 1MB data buffer. The last table may have fewer
>> +		 * entries if the buffer size is not aligned.
>> +		 */
>> +		last_offset = (offset + SZ_1M) < buf_size ?
>> +			      (offset + SZ_1M) : buf_size;
>> +		for (i = 0; offset < last_offset; i++) {
>> +
>> +			data_daddr = catu_table->data_pages.daddrs[dpidx] +
>> +				     s_dpidx * CATU_PAGE_SIZE;
>> +#ifdef CATU_DEBUG
>> +			dev_dbg(catu_table->dev,
>> +				"[table %5d:%03d] 0x%llx\n",
>> +				(offset >> 20), i, data_daddr);
>> +#endif
> 
> I'm not a fan of adding #ifdefs in the code like this.  I think it is better to
> have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
> handle the output in there.
> 


>> +
>> +	catu_populate_table(catu_table);
>> +	/* Make the buf linear from offset 0 */
>> +	(void)catu_set_table(catu_table, 0, size);
>> +
>> +	dev_dbg(catu_dev,
>> +		"Setup table %p, size %ldKB, %d table pages\n",
>> +		catu_table, (unsigned long)size >> 10,  nr_tpages);
> 
> I think this should also be wrapped in a special output debug function.
> 

I could do something like :

#ifdef CATU_DEBUG
#define catu_dbg(fmt, ...)	dev_dbg(fmt, __VA_ARGS__)
#else
#define catu_dbg(fmt, ...)	do { } while (0)
#endif

And use catu_dbg() for the sprinkled prints.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
@ 2018-05-08 15:56       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/05/18 21:25, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
>> This patch adds the support for setting up a SG table for use
>> by the CATU. We reuse the tmc_sg_table to represent the table/data
>> pages, even though the table format is different.
>>

...

>>
>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
>> index 2cd69a6..4cc2928 100644
>> --- a/drivers/hwtracing/coresight/coresight-catu.c
>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>> @@ -16,10 +16,419 @@

...

>> +
>> +/*
>> + * Update the valid bit for a given range of indices [start, end)
>> + * in the given table @table.
>> + */
>> +static inline void catu_update_state_range(cate_t *table, int start,
>> +						 int end, int valid)
> 
> Indentation
> 

...

>> +#ifdef CATU_DEBUG
>> +static void catu_dump_table(struct tmc_sg_table *catu_table)
>> +{
>> +	int i;
>> +	cate_t *table;
>> +	unsigned long table_end, buf_size, offset = 0;
>> +
>> +	buf_size = tmc_sg_table_buf_size(catu_table);
>> +	dev_dbg(catu_table->dev,
>> +		"Dump table %p, tdaddr: %llx\n",
>> +		catu_table, catu_table->table_daddr);
>> +
>> +	while (offset < buf_size) {
>> +		table_end = offset + SZ_1M < buf_size ?
>> +			    offset + SZ_1M : buf_size;
>> +		table = catu_get_table(catu_table, offset, NULL);
>> +		for (i = 0; offset < table_end; i++, offset += CATU_PAGE_SIZE)
>> +			dev_dbg(catu_table->dev, "%d: %llx\n", i, table[i]);
>> +		dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
>> +			table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
>> +		dev_dbg(catu_table->dev, "== End of sub-table ===");
>> +	}
>> +	dev_dbg(catu_table->dev, "== End of Table ===");
>> +}
>> +
>> +#else
>> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
>> +{
>> +}
>> +#endif
> 
> I think this approach is better than peppering the code with #ifdefs as it was
> done for ETR.  Please fix that to replicate what you've done here.
> 

OK

>> +
>> +/*
>> + * catu_populate_table : Populate the given CATU table.
>> + * The table is always populated as a circular table.
>> + * i.e, the "prev" link of the "first" table points to the "last"
>> + * table and the "next" link of the "last" table points to the
>> + * "first" table. The buffer should be made linear by calling
>> + * catu_set_table().
>> + */
>> +static void
>> +catu_populate_table(struct tmc_sg_table *catu_table)
>> +{

...

>> +	while (offset < buf_size) {
>> +		/*
>> +		 * The @offset is always 1M aligned here and we have an
>> +		 * empty table @table_ptr to fill. Each table can address
>> +		 * upto 1MB data buffer. The last table may have fewer
>> +		 * entries if the buffer size is not aligned.
>> +		 */
>> +		last_offset = (offset + SZ_1M) < buf_size ?
>> +			      (offset + SZ_1M) : buf_size;
>> +		for (i = 0; offset < last_offset; i++) {
>> +
>> +			data_daddr = catu_table->data_pages.daddrs[dpidx] +
>> +				     s_dpidx * CATU_PAGE_SIZE;
>> +#ifdef CATU_DEBUG
>> +			dev_dbg(catu_table->dev,
>> +				"[table %5d:%03d] 0x%llx\n",
>> +				(offset >> 20), i, data_daddr);
>> +#endif
> 
> I'm not a fan of adding #ifdefs in the code like this.  I think it is better to
> have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
> handle the output in there.
> 


>> +
>> +	catu_populate_table(catu_table);
>> +	/* Make the buf linear from offset 0 */
>> +	(void)catu_set_table(catu_table, 0, size);
>> +
>> +	dev_dbg(catu_dev,
>> +		"Setup table %p, size %ldKB, %d table pages\n",
>> +		catu_table, (unsigned long)size >> 10,  nr_tpages);
> 
> I think this should also be wrapped in a special output debug function.
> 

I could do something like :

#ifdef CATU_DEBUG
#define catu_dbg(fmt, ...)	dev_dbg(fmt, __VA_ARGS__)
#else
#define catu_dbg(fmt, ...)	do { } while (0)
#endif

And use catu_dbg() for the sprinkled prints.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
  2018-05-08 15:56       ` Suzuki K Poulose
@ 2018-05-08 16:13         ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 16:13 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, Rob Herring, John Horley

On 8 May 2018 at 09:56, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> On 07/05/18 21:25, Mathieu Poirier wrote:
>>
>> On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
>>>
>>> This patch adds the support for setting up a SG table for use
>>> by the CATU. We reuse the tmc_sg_table to represent the table/data
>>> pages, even though the table format is different.
>>>
>
> ...
>
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c
>>> b/drivers/hwtracing/coresight/coresight-catu.c
>>> index 2cd69a6..4cc2928 100644
>>> --- a/drivers/hwtracing/coresight/coresight-catu.c
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>>> @@ -16,10 +16,419 @@
>
>
> ...
>
>>> +
>>> +/*
>>> + * Update the valid bit for a given range of indices [start, end)
>>> + * in the given table @table.
>>> + */
>>> +static inline void catu_update_state_range(cate_t *table, int start,
>>> +                                                int end, int valid)
>>
>>
>> Indentation
>>
>
> ...
>
>>> +#ifdef CATU_DEBUG
>>> +static void catu_dump_table(struct tmc_sg_table *catu_table)
>>> +{
>>> +       int i;
>>> +       cate_t *table;
>>> +       unsigned long table_end, buf_size, offset = 0;
>>> +
>>> +       buf_size = tmc_sg_table_buf_size(catu_table);
>>> +       dev_dbg(catu_table->dev,
>>> +               "Dump table %p, tdaddr: %llx\n",
>>> +               catu_table, catu_table->table_daddr);
>>> +
>>> +       while (offset < buf_size) {
>>> +               table_end = offset + SZ_1M < buf_size ?
>>> +                           offset + SZ_1M : buf_size;
>>> +               table = catu_get_table(catu_table, offset, NULL);
>>> +               for (i = 0; offset < table_end; i++, offset +=
>>> CATU_PAGE_SIZE)
>>> +                       dev_dbg(catu_table->dev, "%d: %llx\n", i,
>>> table[i]);
>>> +               dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
>>> +                       table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
>>> +               dev_dbg(catu_table->dev, "== End of sub-table ===");
>>> +       }
>>> +       dev_dbg(catu_table->dev, "== End of Table ===");
>>> +}
>>> +
>>> +#else
>>> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
>>> +{
>>> +}
>>> +#endif
>>
>>
>> I think this approach is better than peppering the code with #ifdefs as it
>> was
>> done for ETR.  Please fix that to replicate what you've done here.
>>
>
> OK
>
>>> +
>>> +/*
>>> + * catu_populate_table : Populate the given CATU table.
>>> + * The table is always populated as a circular table.
>>> + * i.e, the "prev" link of the "first" table points to the "last"
>>> + * table and the "next" link of the "last" table points to the
>>> + * "first" table. The buffer should be made linear by calling
>>> + * catu_set_table().
>>> + */
>>> +static void
>>> +catu_populate_table(struct tmc_sg_table *catu_table)
>>> +{
>
>
> ...
>
>>> +       while (offset < buf_size) {
>>> +               /*
>>> +                * The @offset is always 1M aligned here and we have an
>>> +                * empty table @table_ptr to fill. Each table can address
>>> +                * upto 1MB data buffer. The last table may have fewer
>>> +                * entries if the buffer size is not aligned.
>>> +                */
>>> +               last_offset = (offset + SZ_1M) < buf_size ?
>>> +                             (offset + SZ_1M) : buf_size;
>>> +               for (i = 0; offset < last_offset; i++) {
>>> +
>>> +                       data_daddr = catu_table->data_pages.daddrs[dpidx]
>>> +
>>> +                                    s_dpidx * CATU_PAGE_SIZE;
>>> +#ifdef CATU_DEBUG
>>> +                       dev_dbg(catu_table->dev,
>>> +                               "[table %5d:%03d] 0x%llx\n",
>>> +                               (offset >> 20), i, data_daddr);
>>> +#endif
>>
>>
>> I'm not a fan of adding #ifdefs in the code like this.  I think it is
>> better to
>> have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
>> handle the output in there.
>>
>
>
>>> +
>>> +       catu_populate_table(catu_table);
>>> +       /* Make the buf linear from offset 0 */
>>> +       (void)catu_set_table(catu_table, 0, size);
>>> +
>>> +       dev_dbg(catu_dev,
>>> +               "Setup table %p, size %ldKB, %d table pages\n",
>>> +               catu_table, (unsigned long)size >> 10,  nr_tpages);
>>
>>
>> I think this should also be wrapped in a special output debug function.
>>
>
> I could do something like :
>
> #ifdef CATU_DEBUG
> #define catu_dbg(fmt, ...)      dev_dbg(fmt, __VA_ARGS__)
> #else
> #define catu_dbg(fmt, ...)      do { } while (0)
> #endif
>
> And use catu_dbg() for the sprinkled prints.

Yes, that is exactly what I had in mind.

>
> Cheers
> Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables
@ 2018-05-08 16:13         ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 May 2018 at 09:56, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> On 07/05/18 21:25, Mathieu Poirier wrote:
>>
>> On Tue, May 01, 2018 at 10:10:48AM +0100, Suzuki K Poulose wrote:
>>>
>>> This patch adds the support for setting up a SG table for use
>>> by the CATU. We reuse the tmc_sg_table to represent the table/data
>>> pages, even though the table format is different.
>>>
>
> ...
>
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-catu.c
>>> b/drivers/hwtracing/coresight/coresight-catu.c
>>> index 2cd69a6..4cc2928 100644
>>> --- a/drivers/hwtracing/coresight/coresight-catu.c
>>> +++ b/drivers/hwtracing/coresight/coresight-catu.c
>>> @@ -16,10 +16,419 @@
>
>
> ...
>
>>> +
>>> +/*
>>> + * Update the valid bit for a given range of indices [start, end)
>>> + * in the given table @table.
>>> + */
>>> +static inline void catu_update_state_range(cate_t *table, int start,
>>> +                                                int end, int valid)
>>
>>
>> Indentation
>>
>
> ...
>
>>> +#ifdef CATU_DEBUG
>>> +static void catu_dump_table(struct tmc_sg_table *catu_table)
>>> +{
>>> +       int i;
>>> +       cate_t *table;
>>> +       unsigned long table_end, buf_size, offset = 0;
>>> +
>>> +       buf_size = tmc_sg_table_buf_size(catu_table);
>>> +       dev_dbg(catu_table->dev,
>>> +               "Dump table %p, tdaddr: %llx\n",
>>> +               catu_table, catu_table->table_daddr);
>>> +
>>> +       while (offset < buf_size) {
>>> +               table_end = offset + SZ_1M < buf_size ?
>>> +                           offset + SZ_1M : buf_size;
>>> +               table = catu_get_table(catu_table, offset, NULL);
>>> +               for (i = 0; offset < table_end; i++, offset +=
>>> CATU_PAGE_SIZE)
>>> +                       dev_dbg(catu_table->dev, "%d: %llx\n", i,
>>> table[i]);
>>> +               dev_dbg(catu_table->dev, "Prev : %llx, Next: %llx\n",
>>> +                       table[CATU_LINK_PREV], table[CATU_LINK_NEXT]);
>>> +               dev_dbg(catu_table->dev, "== End of sub-table ===");
>>> +       }
>>> +       dev_dbg(catu_table->dev, "== End of Table ===");
>>> +}
>>> +
>>> +#else
>>> +static inline void catu_dump_table(struct tmc_sg_table *catu_table)
>>> +{
>>> +}
>>> +#endif
>>
>>
>> I think this approach is better than peppering the code with #ifdefs as it
>> was
>> done for ETR.  Please fix that to replicate what you've done here.
>>
>
> OK
>
>>> +
>>> +/*
>>> + * catu_populate_table : Populate the given CATU table.
>>> + * The table is always populated as a circular table.
>>> + * i.e, the "prev" link of the "first" table points to the "last"
>>> + * table and the "next" link of the "last" table points to the
>>> + * "first" table. The buffer should be made linear by calling
>>> + * catu_set_table().
>>> + */
>>> +static void
>>> +catu_populate_table(struct tmc_sg_table *catu_table)
>>> +{
>
>
> ...
>
>>> +       while (offset < buf_size) {
>>> +               /*
>>> +                * The @offset is always 1M aligned here and we have an
>>> +                * empty table @table_ptr to fill. Each table can address
>>> +                * upto 1MB data buffer. The last table may have fewer
>>> +                * entries if the buffer size is not aligned.
>>> +                */
>>> +               last_offset = (offset + SZ_1M) < buf_size ?
>>> +                             (offset + SZ_1M) : buf_size;
>>> +               for (i = 0; offset < last_offset; i++) {
>>> +
>>> +                       data_daddr = catu_table->data_pages.daddrs[dpidx]
>>> +
>>> +                                    s_dpidx * CATU_PAGE_SIZE;
>>> +#ifdef CATU_DEBUG
>>> +                       dev_dbg(catu_table->dev,
>>> +                               "[table %5d:%03d] 0x%llx\n",
>>> +                               (offset >> 20), i, data_daddr);
>>> +#endif
>>
>>
>> I'm not a fan of adding #ifdefs in the code like this.  I think it is
>> better to
>> have a wrapper (that resolves to nothing if CATU_DEBUG is not defined) and
>> handle the output in there.
>>
>
>
>>> +
>>> +       catu_populate_table(catu_table);
>>> +       /* Make the buf linear from offset 0 */
>>> +       (void)catu_set_table(catu_table, 0, size);
>>> +
>>> +       dev_dbg(catu_dev,
>>> +               "Setup table %p, size %ldKB, %d table pages\n",
>>> +               catu_table, (unsigned long)size >> 10,  nr_tpages);
>>
>>
>> I think this should also be wrapped in a special output debug function.
>>
>
> I could do something like :
>
> #ifdef CATU_DEBUG
> #define catu_dbg(fmt, ...)      dev_dbg(fmt, __VA_ARGS__)
> #else
> #define catu_dbg(fmt, ...)      do { } while (0)
> #endif
>
> And use catu_dbg() for the sprinkled prints.

Yes, that is exactly what I had in mind.

>
> Cheers
> Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
  2018-05-07 22:02     ` Mathieu Poirier
@ 2018-05-08 16:21       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 16:21 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 07/05/18 23:02, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:49AM +0100, Suzuki K Poulose wrote:
>> Now that we can use a CATU with a scatter gather table, add support
>> for the TMC ETR to make use of the connected CATU in translate mode.
>> This is done by adding CATU as new buffer mode. CATU's SLADDR must
>> always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
>> and we adjust the DBA for the ETR to align to the "offset" within
>> the 1MB page.


>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>> index cd58d6f..b673a73 100644
>> --- a/drivers/hwtracing/coresight/coresight-catu.h
>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>> @@ -29,6 +29,32 @@

>>   
>> +extern const struct etr_buf_operations etr_catu_buf_ops;
>> +

>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 25e7feb..41dde0a 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
>>   static const struct etr_buf_operations *etr_buf_ops[] = {
>>   	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
>>   	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
>> +#ifdef CONFIG_CORESIGHT_CATU
>> +	[ETR_MODE_CATU] = &etr_catu_buf_ops,
>> +#endif

...

>>   static inline int tmc_etr_mode_alloc_buf(int mode,
>> @@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
>>   	switch (mode) {
>>   	case ETR_MODE_FLAT:
>>   	case ETR_MODE_ETR_SG:
>> +#ifdef CONFIG_CORESIGHT_CATU
>> +	case ETR_MODE_CATU:
>> +#endif
> 
> I really wish we could avoid doing something like this (and the above) but every
> alternate solution I come up with is either uglier or on par with it...
> Unless someone comes up with a bright idea we'll simply have to let it be.

We could do a little trick in the coresight-catu.h :

#ifdef CONFIG_CORESIGHT_CATU
extern struct etr_buf_operations etr_catu_buf_ops;
#else
static struct etr_buf_opertaions etr_catu_buf_ops;
#endif

And then add the following check to get rid of the #ifdef above in tmc_etr_mode_alloc_buf().

		if (etr_buf_ops[mode]->alloc)
			rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);

>>   		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);

> 
> While looking for a solution I noticed that tmc_etr_get_catu_device()
> could be moved to coresight-catu.h.  That way we wouldn't have to include
> coresight-catu.h every time coresight-tmc.h is present in a file.

Yes, we could do that. I don't remember if there was a specific reason behind it.
May be it is a left over from the rebases and how the CATU link was evolved. I
will clean it up.


Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer
@ 2018-05-08 16:21       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/05/18 23:02, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:49AM +0100, Suzuki K Poulose wrote:
>> Now that we can use a CATU with a scatter gather table, add support
>> for the TMC ETR to make use of the connected CATU in translate mode.
>> This is done by adding CATU as new buffer mode. CATU's SLADDR must
>> always be 4K aligned. Thus the INADDR (base VA) is always 1M aligned
>> and we adjust the DBA for the ETR to align to the "offset" within
>> the 1MB page.


>> diff --git a/drivers/hwtracing/coresight/coresight-catu.h b/drivers/hwtracing/coresight/coresight-catu.h
>> index cd58d6f..b673a73 100644
>> --- a/drivers/hwtracing/coresight/coresight-catu.h
>> +++ b/drivers/hwtracing/coresight/coresight-catu.h
>> @@ -29,6 +29,32 @@

>>   
>> +extern const struct etr_buf_operations etr_catu_buf_ops;
>> +

>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 25e7feb..41dde0a 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -941,6 +941,9 @@ static const struct etr_buf_operations etr_sg_buf_ops = {
>>   static const struct etr_buf_operations *etr_buf_ops[] = {
>>   	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
>>   	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
>> +#ifdef CONFIG_CORESIGHT_CATU
>> +	[ETR_MODE_CATU] = &etr_catu_buf_ops,
>> +#endif

...

>>   static inline int tmc_etr_mode_alloc_buf(int mode,
>> @@ -953,6 +956,9 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
>>   	switch (mode) {
>>   	case ETR_MODE_FLAT:
>>   	case ETR_MODE_ETR_SG:
>> +#ifdef CONFIG_CORESIGHT_CATU
>> +	case ETR_MODE_CATU:
>> +#endif
> 
> I really wish we could avoid doing something like this (and the above) but every
> alternate solution I come up with is either uglier or on par with it...
> Unless someone comes up with a bright idea we'll simply have to let it be.

We could do a little trick in the coresight-catu.h :

#ifdef CONFIG_CORESIGHT_CATU
extern struct etr_buf_operations etr_catu_buf_ops;
#else
static struct etr_buf_opertaions etr_catu_buf_ops;
#endif

And then add the following check to get rid of the #ifdef above in tmc_etr_mode_alloc_buf().

		if (etr_buf_ops[mode]->alloc)
			rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);

>>   		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);

> 
> While looking for a solution I noticed that tmc_etr_get_catu_device()
> could be moved to coresight-catu.h.  That way we wouldn't have to include
> coresight-catu.h every time coresight-tmc.h is present in a file.

Yes, we could do that. I don't remember if there was a specific reason behind it.
May be it is a left over from the rebases and how the CATU link was evolved. I
will clean it up.


Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-08 17:18     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 17:18 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
> Since the ETR could be driven either by SYSFS or by perf, it
> becomes complicated how we deal with the buffers used for each
> of these modes. The ETR driver cannot simply free the current
> attached buffer without knowing the provider (i.e, sysfs vs perf).
> 
> To solve this issue, we provide:
> 1) the driver-mode specific etr buffer to be retained in the drvdata
> 2) the etr_buf for a session should be passed on when enabling the
>    hardware, which will be stored in drvdata->etr_buf. This will be
>    replaced (not free'd) as soon as the hardware is disabled, after
>    necessary sync operation.
> 
> The advantages of this are :
> 
> 1) The common code path doesn't need to worry about how to dispose
>    an existing buffer, if it is about to start a new session with a
>    different buffer, possibly in a different mode.
> 2) The driver mode can control its buffers and can get access to the
>    saved session even when the hardware is operating in a different
>    mode. (e.g, we can still access a trace buffer from a sysfs mode
>    even if the etr is now used in perf mode, without disrupting the
>    current session.)
> 
> Towards this, we introduce a sysfs specific data which will hold the
> etr_buf used for sysfs mode of operation, controlled solely by the
> sysfs mode handling code.

Thinking further on this... I toyed with the idea of doing the same thing when
working on the original driver and decided against it.  Do we really have a case
where users would want to use sysFS and perf alternatively?  To me this looks
overdesigned.  

If we are going to go that way we need to enact the same behavior for ETB10 and
ETF...  And take it out of this set as it is already substantial enough.

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 59 ++++++++++++++++---------
>  drivers/hwtracing/coresight/coresight-tmc.h     |  2 +
>  2 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 1ef0f62..a35a12f 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1162,10 +1162,15 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  		helper_ops(catu)->disable(catu, drvdata->etr_buf);
>  }
>  
> -static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
> +static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata,
> +			      struct etr_buf *etr_buf)
>  {
>  	u32 axictl, sts;
> -	struct etr_buf *etr_buf = drvdata->etr_buf;
> +
> +	/* Callers should provide an appropriate buffer for use */
> +	if (WARN_ON(!etr_buf || drvdata->etr_buf))
> +		return;
> +	drvdata->etr_buf = etr_buf;
>  
>  	/*
>  	 * If this ETR is connected to a CATU, enable it before we turn
> @@ -1227,13 +1232,16 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>   * also updating the @bufpp on where to find it. Since the trace data
>   * starts at anywhere in the buffer, depending on the RRP, we adjust the
>   * @len returned to handle buffer wrapping around.
> + *
> + * We are protected here by drvdata->reading != 0, which ensures the
> + * sysfs_buf stays alive.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
>  				loff_t pos, size_t len, char **bufpp)
>  {
>  	s64 offset;
>  	ssize_t actual = len;
> -	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	struct etr_buf *etr_buf = drvdata->sysfs_buf;
>  
>  	if (pos + actual > etr_buf->len)
>  		actual = etr_buf->len - pos;
> @@ -1263,7 +1271,14 @@ tmc_etr_free_sysfs_buf(struct etr_buf *buf)
>  
>  static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	tmc_sync_etr_buf(drvdata);
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +
> +	if (WARN_ON(drvdata->sysfs_buf != etr_buf)) {
> +		tmc_etr_free_sysfs_buf(drvdata->sysfs_buf);
> +		drvdata->sysfs_buf = NULL;
> +	} else {
> +		tmc_sync_etr_buf(drvdata);
> +	}
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> @@ -1285,6 +1300,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  
>  	/* Disable CATU device if this ETR is connected to one */
>  	tmc_etr_disable_catu(drvdata);
> +	/* Reset the ETR buf used by hardware */
> +	drvdata->etr_buf = NULL;
>  }
>  
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
> @@ -1293,7 +1310,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	bool used = false;
>  	unsigned long flags;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> -	struct etr_buf *new_buf = NULL, *free_buf = NULL;
> +	struct etr_buf *sysfs_buf = NULL, *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> @@ -1305,7 +1322,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
> +	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
> +	if (!sysfs_buf || (sysfs_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  		/* Allocate memory with the spinlock released */
>  		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> @@ -1333,15 +1351,16 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	 * If we don't have a buffer or it doesn't match the requested size,
>  	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (!drvdata->etr_buf ||
> -	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
> +	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
> +	if (!sysfs_buf ||
> +	    (new_buf && sysfs_buf->size != new_buf->size)) {
>  		used = true;
> -		free_buf = drvdata->etr_buf;
> -		drvdata->etr_buf = new_buf;
> +		free_buf = sysfs_buf;
> +		drvdata->sysfs_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> -	tmc_etr_enable_hw(drvdata);
> +	tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
>  out:
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> @@ -1426,13 +1445,13 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::etr_buf is NULL the trace data has been read already */
> -	if (drvdata->etr_buf == NULL) {
> +	/* If sysfs_buf is NULL the trace data has been read already */
> +	if (!drvdata->sysfs_buf) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
>  
> -	/* Disable the TMC if need be */
> +	/* Disable the TMC if we are trying to read from a running session */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		tmc_etr_disable_hw(drvdata);
>  
> @@ -1446,7 +1465,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	struct etr_buf *etr_buf = NULL;
> +	struct etr_buf *sysfs_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -1461,22 +1480,22 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * buffer. Since the tracer is still enabled drvdata::buf can't
>  		 * be NULL.
>  		 */
> -		tmc_etr_enable_hw(drvdata);
> +		tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
>  	} else {
>  		/*
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		etr_buf =  drvdata->etr_buf;
> -		drvdata->etr_buf = NULL;
> +		sysfs_buf = drvdata->sysfs_buf;
> +		drvdata->sysfs_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (etr_buf)
> -		tmc_free_etr_buf(etr_buf);
> +	if (sysfs_buf)
> +		tmc_etr_free_sysfs_buf(sysfs_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 76a89a6..185dc12 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -197,6 +197,7 @@ struct etr_buf {
>   * @trigger_cntr: amount of words to store after a trigger.
>   * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
>   *		device configuration register (DEVID)
> + * @sysfs_data:	SYSFS buffer for ETR.
>   */
>  struct tmc_drvdata {
>  	void __iomem		*base;
> @@ -216,6 +217,7 @@ struct tmc_drvdata {
>  	enum tmc_mem_intf_width	memwidth;
>  	u32			trigger_cntr;
>  	u32			etr_caps;
> +	struct etr_buf		*sysfs_buf;
>  };
>  
>  struct etr_buf_operations {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
@ 2018-05-08 17:18     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
> Since the ETR could be driven either by SYSFS or by perf, it
> becomes complicated how we deal with the buffers used for each
> of these modes. The ETR driver cannot simply free the current
> attached buffer without knowing the provider (i.e, sysfs vs perf).
> 
> To solve this issue, we provide:
> 1) the driver-mode specific etr buffer to be retained in the drvdata
> 2) the etr_buf for a session should be passed on when enabling the
>    hardware, which will be stored in drvdata->etr_buf. This will be
>    replaced (not free'd) as soon as the hardware is disabled, after
>    necessary sync operation.
> 
> The advantages of this are :
> 
> 1) The common code path doesn't need to worry about how to dispose
>    an existing buffer, if it is about to start a new session with a
>    different buffer, possibly in a different mode.
> 2) The driver mode can control its buffers and can get access to the
>    saved session even when the hardware is operating in a different
>    mode. (e.g, we can still access a trace buffer from a sysfs mode
>    even if the etr is now used in perf mode, without disrupting the
>    current session.)
> 
> Towards this, we introduce a sysfs specific data which will hold the
> etr_buf used for sysfs mode of operation, controlled solely by the
> sysfs mode handling code.

Thinking further on this... I toyed with the idea of doing the same thing when
working on the original driver and decided against it.  Do we really have a case
where users would want to use sysFS and perf alternatively?  To me this looks
overdesigned.  

If we are going to go that way we need to enact the same behavior for ETB10 and
ETF...  And take it out of this set as it is already substantial enough.

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 59 ++++++++++++++++---------
>  drivers/hwtracing/coresight/coresight-tmc.h     |  2 +
>  2 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 1ef0f62..a35a12f 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1162,10 +1162,15 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
>  		helper_ops(catu)->disable(catu, drvdata->etr_buf);
>  }
>  
> -static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
> +static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata,
> +			      struct etr_buf *etr_buf)
>  {
>  	u32 axictl, sts;
> -	struct etr_buf *etr_buf = drvdata->etr_buf;
> +
> +	/* Callers should provide an appropriate buffer for use */
> +	if (WARN_ON(!etr_buf || drvdata->etr_buf))
> +		return;
> +	drvdata->etr_buf = etr_buf;
>  
>  	/*
>  	 * If this ETR is connected to a CATU, enable it before we turn
> @@ -1227,13 +1232,16 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>   * also updating the @bufpp on where to find it. Since the trace data
>   * starts at anywhere in the buffer, depending on the RRP, we adjust the
>   * @len returned to handle buffer wrapping around.
> + *
> + * We are protected here by drvdata->reading != 0, which ensures the
> + * sysfs_buf stays alive.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
>  				loff_t pos, size_t len, char **bufpp)
>  {
>  	s64 offset;
>  	ssize_t actual = len;
> -	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	struct etr_buf *etr_buf = drvdata->sysfs_buf;
>  
>  	if (pos + actual > etr_buf->len)
>  		actual = etr_buf->len - pos;
> @@ -1263,7 +1271,14 @@ tmc_etr_free_sysfs_buf(struct etr_buf *buf)
>  
>  static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	tmc_sync_etr_buf(drvdata);
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +
> +	if (WARN_ON(drvdata->sysfs_buf != etr_buf)) {
> +		tmc_etr_free_sysfs_buf(drvdata->sysfs_buf);
> +		drvdata->sysfs_buf = NULL;
> +	} else {
> +		tmc_sync_etr_buf(drvdata);
> +	}
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
> @@ -1285,6 +1300,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  
>  	/* Disable CATU device if this ETR is connected to one */
>  	tmc_etr_disable_catu(drvdata);
> +	/* Reset the ETR buf used by hardware */
> +	drvdata->etr_buf = NULL;
>  }
>  
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
> @@ -1293,7 +1310,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	bool used = false;
>  	unsigned long flags;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> -	struct etr_buf *new_buf = NULL, *free_buf = NULL;
> +	struct etr_buf *sysfs_buf = NULL, *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> @@ -1305,7 +1322,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
> +	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
> +	if (!sysfs_buf || (sysfs_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  		/* Allocate memory with the spinlock released */
>  		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> @@ -1333,15 +1351,16 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	 * If we don't have a buffer or it doesn't match the requested size,
>  	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (!drvdata->etr_buf ||
> -	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
> +	sysfs_buf = READ_ONCE(drvdata->sysfs_buf);
> +	if (!sysfs_buf ||
> +	    (new_buf && sysfs_buf->size != new_buf->size)) {
>  		used = true;
> -		free_buf = drvdata->etr_buf;
> -		drvdata->etr_buf = new_buf;
> +		free_buf = sysfs_buf;
> +		drvdata->sysfs_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> -	tmc_etr_enable_hw(drvdata);
> +	tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
>  out:
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
> @@ -1426,13 +1445,13 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::etr_buf is NULL the trace data has been read already */
> -	if (drvdata->etr_buf == NULL) {
> +	/* If sysfs_buf is NULL the trace data has been read already */
> +	if (!drvdata->sysfs_buf) {
>  		ret = -EINVAL;
>  		goto out;
>  	}
>  
> -	/* Disable the TMC if need be */
> +	/* Disable the TMC if we are trying to read from a running session */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		tmc_etr_disable_hw(drvdata);
>  
> @@ -1446,7 +1465,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	struct etr_buf *etr_buf = NULL;
> +	struct etr_buf *sysfs_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -1461,22 +1480,22 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * buffer. Since the tracer is still enabled drvdata::buf can't
>  		 * be NULL.
>  		 */
> -		tmc_etr_enable_hw(drvdata);
> +		tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
>  	} else {
>  		/*
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		etr_buf =  drvdata->etr_buf;
> -		drvdata->etr_buf = NULL;
> +		sysfs_buf = drvdata->sysfs_buf;
> +		drvdata->sysfs_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (etr_buf)
> -		tmc_free_etr_buf(etr_buf);
> +	if (sysfs_buf)
> +		tmc_etr_free_sysfs_buf(sysfs_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 76a89a6..185dc12 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -197,6 +197,7 @@ struct etr_buf {
>   * @trigger_cntr: amount of words to store after a trigger.
>   * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
>   *		device configuration register (DEVID)
> + * @sysfs_data:	SYSFS buffer for ETR.
>   */
>  struct tmc_drvdata {
>  	void __iomem		*base;
> @@ -216,6 +217,7 @@ struct tmc_drvdata {
>  	enum tmc_mem_intf_width	memwidth;
>  	u32			trigger_cntr;
>  	u32			etr_caps;
> +	struct etr_buf		*sysfs_buf;
>  };
>  
>  struct etr_buf_operations {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-08 15:48           ` Suzuki K Poulose
@ 2018-05-08 17:34             ` Rob Herring
  -1 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-08 17:34 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Mathieu Poirier, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, Mathieu Poirier,
	devicetree

On Tue, May 8, 2018 at 10:48 AM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
> On 04/05/18 23:56, Rob Herring wrote:
>>
>> On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
>> <mathieu.poirier@linaro.org> wrote:
>>>
>>> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>>>>
>>>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>>>>
>>>>> We are about to add the support for ETR builtin scatter-gather mode
>>>>> for dealing with large amount of trace buffers. However, on some of
>>>>> the platforms, using the ETR SG mode can lock up the system due to
>>>>> the way the ETR is connected to the memory subsystem.
>>>>>
>>>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>>>> fetch the next page and regular WRITE of trace data. If the READ
>>>>> operation doesn't complete(due to the memory subsystem issues,
>>>>> which we have seen on a couple of platforms) the trace WRITE
>>>>> cannot proceed leading to issues. So, we by default do not
>>>>> use the SG mode, unless it is known to be safe on the platform.
>>>>> We define a DT property for the TMC node to specify whether we
>>>>> have a proper SG mode.
>
>
>
>>>>> ---
>>>>>   Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>>>   drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> index cdd84d0..7c0c8f0 100644
>>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>>         (embedded trace router)
>>>>>
>>>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>>>> +       use the SG mode on this system.
>>>>> +
>>>>
>>>>
>>>> Needs a vendor prefix.
>>>>
>>>
>>> Thinking further on this, do we need to make it device specific as
>>> well - something like "arm,etr-scatter-gather"?  That way we don't
>>> have to redefine "scatter-gather" for other ARM devices if they happen
>>> to need the same property but for different reasons.
>>
>>
>> No. If we had a bunch of cases, then we'd probably want to have just
>> 'scatter-gather'.
>
>
> Does it mean "arm,scatter-gather" ?

Yes. Use that.

> If we ever wanted to add the device
> specific information, I would prefer to go with "arm,tmc-scatter-gather"
> and not "etr-scatter-gather". They both could mean different things.
>
>>
>> BTW, if SG had already been supported, then I'd say this is a quirk
>> and we should invert this property. Otherwise, you'd be disabling once
>> enabled SG and require working platforms to update their dtb. Of
>> course, I shouldn't really let the state of an OS driver influence the
>> DT binding.
>>
>
> The SG support is added with this series. So, the OS has never made use
> of the feature.

Linux never did, but other OSs use DT, hence why I said "an OS
driver", not "the OS driver". But in reality, I'd guess only Linux has
Coresight support at all.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-08 17:34             ` Rob Herring
  0 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-08 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 8, 2018 at 10:48 AM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
> On 04/05/18 23:56, Rob Herring wrote:
>>
>> On Thu, May 3, 2018 at 3:32 PM, Mathieu Poirier
>> <mathieu.poirier@linaro.org> wrote:
>>>
>>> On 1 May 2018 at 07:13, Rob Herring <robh@kernel.org> wrote:
>>>>
>>>> On Tue, May 01, 2018 at 10:10:40AM +0100, Suzuki K Poulose wrote:
>>>>>
>>>>> We are about to add the support for ETR builtin scatter-gather mode
>>>>> for dealing with large amount of trace buffers. However, on some of
>>>>> the platforms, using the ETR SG mode can lock up the system due to
>>>>> the way the ETR is connected to the memory subsystem.
>>>>>
>>>>> In SG mode, the ETR performs READ from the scatter-gather table to
>>>>> fetch the next page and regular WRITE of trace data. If the READ
>>>>> operation doesn't complete(due to the memory subsystem issues,
>>>>> which we have seen on a couple of platforms) the trace WRITE
>>>>> cannot proceed leading to issues. So, we by default do not
>>>>> use the SG mode, unless it is known to be safe on the platform.
>>>>> We define a DT property for the TMC node to specify whether we
>>>>> have a proper SG mode.
>
>
>
>>>>> ---
>>>>>   Documentation/devicetree/bindings/arm/coresight.txt | 3 +++
>>>>>   drivers/hwtracing/coresight/coresight-tmc.c         | 8 +++++++-
>>>>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> index cdd84d0..7c0c8f0 100644
>>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>>> @@ -88,6 +88,9 @@ its hardware characteristcs.
>>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>>         (embedded trace router)
>>>>>
>>>>> +     * scatter-gather: boolean. Indicates that the TMC-ETR can safely
>>>>> +       use the SG mode on this system.
>>>>> +
>>>>
>>>>
>>>> Needs a vendor prefix.
>>>>
>>>
>>> Thinking further on this, do we need to make it device specific as
>>> well - something like "arm,etr-scatter-gather"?  That way we don't
>>> have to redefine "scatter-gather" for other ARM devices if they happen
>>> to need the same property but for different reasons.
>>
>>
>> No. If we had a bunch of cases, then we'd probably want to have just
>> 'scatter-gather'.
>
>
> Does it mean "arm,scatter-gather" ?

Yes. Use that.

> If we ever wanted to add the device
> specific information, I would prefer to go with "arm,tmc-scatter-gather"
> and not "etr-scatter-gather". They both could mean different things.
>
>>
>> BTW, if SG had already been supported, then I'd say this is a quirk
>> and we should invert this property. Otherwise, you'd be disabling once
>> enabled SG and require working platforms to update their dtb. Of
>> course, I shouldn't really let the state of an OS driver influence the
>> DT binding.
>>
>
> The SG support is added with this series. So, the OS has never made use
> of the feature.

Linux never did, but other OSs use DT, hence why I said "an OS
driver", not "the OS driver". But in reality, I'd guess only Linux has
Coresight support at all.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 25/27] coresight: etr_buf: Add helper for padding an area of trace data
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-08 17:34     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 17:34 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:55AM +0100, Suzuki K Poulose wrote:
> This patch adds a helper to insert barrier packets for a given
> size (aligned to packet size) at given offset in an etr_buf. This
> will be used later for perf mode when we try to start in the
> middle of an SG buffer.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 53 ++++++++++++++++++++++---
>  1 file changed, 47 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7551272..8159e84 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1083,18 +1083,59 @@ static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
>  	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
>  }
>  
> +/*
> + * tmc_etr_buf_insert_barrier_packets : Insert barrier packets at @offset upto
> + * @size of bytes in the given buffer. @size should be aligned to the barrier
> + * packet size.
> + *
> + * Returns the new @offset after filling the barriers on success. Otherwise
> + * returns error.
> + */
>  static inline s64
> -tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +tmc_etr_buf_insert_barrier_packets(struct etr_buf *etr_buf,
> +				   u64 offset, u64 size)
>  {
>  	ssize_t len;
>  	char *bufp;
>  
> -	len = tmc_etr_buf_get_data(etr_buf, offset,
> -				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> -	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +	if (size < CORESIGHT_BARRIER_PKT_SIZE)
>  		return -EINVAL;
> -	coresight_insert_barrier_packet(bufp);
> -	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +	/*
> +	 * Normally the size should be aligned to the frame size
> +	 * of the ETR. Even if it isn't, the decoder looks for a
> +	 * barrier packet at a frame size aligned offset. So align
> +	 * the buffer to frame size first and then fill barrier
> +	 * packets.
> +	 */
> +	do {
> +		len = tmc_etr_buf_get_data(etr_buf, offset, size, &bufp);
> +		if (WARN_ON(len <= 0))
> +			return -EINVAL;
> +		/*
> +		 * We are guaranteed that @bufp will point to a linear range
> +		 * of @len bytes, where @len <= @size.
> +		 */
> +		size -= len;
> +		offset += len;
> +		while (len >= CORESIGHT_BARRIER_PKT_SIZE) {
> +			coresight_insert_barrier_packet(bufp);
> +			bufp += CORESIGHT_BARRIER_PKT_SIZE;
> +			len -= CORESIGHT_BARRIER_PKT_SIZE;
> +		}
> +
> +		/* If we reached the end of the buffer, wrap around */
> +		if (offset == etr_buf->size)
> +			offset -= etr_buf->size;
> +	} while (size);
> +
> +	return offset;
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	return tmc_etr_buf_insert_barrier_packets(etr_buf, offset,
> +					  CORESIGHT_BARRIER_PKT_SIZE);

Indentation

>  }
>  
>  /*
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 25/27] coresight: etr_buf: Add helper for padding an area of trace data
@ 2018-05-08 17:34     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:55AM +0100, Suzuki K Poulose wrote:
> This patch adds a helper to insert barrier packets for a given
> size (aligned to packet size) at given offset in an etr_buf. This
> will be used later for perf mode when we try to start in the
> middle of an SG buffer.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 53 ++++++++++++++++++++++---
>  1 file changed, 47 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7551272..8159e84 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1083,18 +1083,59 @@ static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
>  	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
>  }
>  
> +/*
> + * tmc_etr_buf_insert_barrier_packets : Insert barrier packets at @offset upto
> + * @size of bytes in the given buffer. @size should be aligned to the barrier
> + * packet size.
> + *
> + * Returns the new @offset after filling the barriers on success. Otherwise
> + * returns error.
> + */
>  static inline s64
> -tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +tmc_etr_buf_insert_barrier_packets(struct etr_buf *etr_buf,
> +				   u64 offset, u64 size)
>  {
>  	ssize_t len;
>  	char *bufp;
>  
> -	len = tmc_etr_buf_get_data(etr_buf, offset,
> -				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> -	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +	if (size < CORESIGHT_BARRIER_PKT_SIZE)
>  		return -EINVAL;
> -	coresight_insert_barrier_packet(bufp);
> -	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +	/*
> +	 * Normally the size should be aligned to the frame size
> +	 * of the ETR. Even if it isn't, the decoder looks for a
> +	 * barrier packet at a frame size aligned offset. So align
> +	 * the buffer to frame size first and then fill barrier
> +	 * packets.
> +	 */
> +	do {
> +		len = tmc_etr_buf_get_data(etr_buf, offset, size, &bufp);
> +		if (WARN_ON(len <= 0))
> +			return -EINVAL;
> +		/*
> +		 * We are guaranteed that @bufp will point to a linear range
> +		 * of @len bytes, where @len <= @size.
> +		 */
> +		size -= len;
> +		offset += len;
> +		while (len >= CORESIGHT_BARRIER_PKT_SIZE) {
> +			coresight_insert_barrier_packet(bufp);
> +			bufp += CORESIGHT_BARRIER_PKT_SIZE;
> +			len -= CORESIGHT_BARRIER_PKT_SIZE;
> +		}
> +
> +		/* If we reached the end of the buffer, wrap around */
> +		if (offset == etr_buf->size)
> +			offset -= etr_buf->size;
> +	} while (size);
> +
> +	return offset;
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	return tmc_etr_buf_insert_barrier_packets(etr_buf, offset,
> +					  CORESIGHT_BARRIER_PKT_SIZE);

Indentation

>  }
>  
>  /*
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-08 19:42     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 19:42 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:56AM +0100, Suzuki K Poulose wrote:
> Right now we issue an update_buffer() and reset_buffer() call backs
> in succession when we stop tracing an event. The update_buffer is
> supposed to check the status of the buffer and make sure the ring buffer
> is updated with the trace data. And we store information about the
> size of the data collected only to be consumed by the reset_buffer
> callback which always follows the update_buffer. This was originally
> designed for handling future IPs which could trigger a buffer overflow
> interrupt. This patch gets rid of the reset_buffer callback altogether
> and performs the actions in update_buffer, making it return the size
> collected. We can always add the support for handling the overflow
> interrupt case later.
> 
> This removes some not-so pretty hack (storing the new head in the
> size field for snapshot mode) and cleans it up a little bit.

IPs with an overflow interrupts will be arriving shortly, so it is not like the
future is uncertain - they are coming.  Right now the logic is there - I don't
see a real need to consolidate things only to split it again in the near future.

I agree the part about overloading buf->data_size with the head of the ring
buffer when operating in snapshot mode isn't pretty (though well documented).
If anything that can be improve, i.e add a buf->head and things will be clear.
Once again this could be part of another patchset.

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
>  drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
>  drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
>  include/linux/coresight.h                        |  5 +-
>  4 files changed, 26 insertions(+), 102 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
> index d9c2f87..b13712a 100644
> --- a/drivers/hwtracing/coresight/coresight-etb10.c
> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
> @@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
>  	return ret;
>  }
>  
> -static unsigned long etb_reset_buffer(struct coresight_device *csdev,
> -				      struct perf_output_handle *handle,
> -				      void *sink_config)
> -{
> -	unsigned long size = 0;
> -	struct cs_buffers *buf = sink_config;
> -
> -	if (buf) {
> -		/*
> -		 * In snapshot mode ->data_size holds the new address of the
> -		 * ring buffer's head.  The size itself is the whole address
> -		 * range since we want the latest information.
> -		 */
> -		if (buf->snapshot)
> -			handle->head = local_xchg(&buf->data_size,
> -						  buf->nr_pages << PAGE_SHIFT);
> -
> -		/*
> -		 * Tell the tracer PMU how much we got in this run and if
> -		 * something went wrong along the way.  Nobody else can use
> -		 * this cs_buffers instance until we are done.  As such
> -		 * resetting parameters here and squaring off with the ring
> -		 * buffer API in the tracer PMU is fine.
> -		 */
> -		size = local_xchg(&buf->data_size, 0);
> -	}
> -
> -	return size;
> -}
> -
> -static void etb_update_buffer(struct coresight_device *csdev,
> +static unsigned long etb_update_buffer(struct coresight_device *csdev,
>  			      struct perf_output_handle *handle,
>  			      void *sink_config)
>  {
> @@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
>  	u8 *buf_ptr;
>  	const u32 *barrier;
>  	u32 read_ptr, write_ptr, capacity;
> -	u32 status, read_data, to_read;
> -	unsigned long offset;
> +	u32 status, read_data;
> +	unsigned long offset, to_read;
>  	struct cs_buffers *buf = sink_config;
>  	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
>  	if (!buf)
> -		return;
> +		return 0;
>  
>  	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
>  
> @@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
>  	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
>  
>  	/*
> -	 * In snapshot mode all we have to do is communicate to
> -	 * perf_aux_output_end() the address of the current head.  In full
> -	 * trace mode the same function expects a size to move rb->aux_head
> -	 * forward.
> +	 * In snapshot mode we have to update the handle->head to point
> +	 * to the new location.
>  	 */
> -	if (buf->snapshot)
> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
> -	else
> -		local_add(to_read, &buf->data_size);
> -
> +	if (buf->snapshot) {
> +		handle->head = (cur * PAGE_SIZE) + offset;
> +		to_read = buf->nr_pages << PAGE_SHIFT;
> +	}
>  	etb_enable_hw(drvdata);
>  	CS_LOCK(drvdata->base);
> +
> +	return to_read;
>  }
>  
>  static const struct coresight_ops_sink etb_sink_ops = {
> @@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
>  	.alloc_buffer	= etb_alloc_buffer,
>  	.free_buffer	= etb_free_buffer,
>  	.set_buffer	= etb_set_buffer,
> -	.reset_buffer	= etb_reset_buffer,
>  	.update_buffer	= etb_update_buffer,
>  };
>  
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 4e5ed65..5096def 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>  		if (!sink_ops(sink)->update_buffer)
>  			return;
>  
> -		sink_ops(sink)->update_buffer(sink, handle,
> +		size = sink_ops(sink)->update_buffer(sink, handle,
>  					      event_data->snk_config);
> -
> -		if (!sink_ops(sink)->reset_buffer)
> -			return;
> -
> -		size = sink_ops(sink)->reset_buffer(sink, handle,
> -						    event_data->snk_config);
> -
>  		perf_aux_output_end(handle, size);
>  	}
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index 0a32734..75ef5c4 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
>  	return ret;
>  }
>  
> -static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
> -					  struct perf_output_handle *handle,
> -					  void *sink_config)
> -{
> -	long size = 0;
> -	struct cs_buffers *buf = sink_config;
> -
> -	if (buf) {
> -		/*
> -		 * In snapshot mode ->data_size holds the new address of the
> -		 * ring buffer's head.  The size itself is the whole address
> -		 * range since we want the latest information.
> -		 */
> -		if (buf->snapshot)
> -			handle->head = local_xchg(&buf->data_size,
> -						  buf->nr_pages << PAGE_SHIFT);
> -		/*
> -		 * Tell the tracer PMU how much we got in this run and if
> -		 * something went wrong along the way.  Nobody else can use
> -		 * this cs_buffers instance until we are done.  As such
> -		 * resetting parameters here and squaring off with the ring
> -		 * buffer API in the tracer PMU is fine.
> -		 */
> -		size = local_xchg(&buf->data_size, 0);
> -	}
> -
> -	return size;
> -}
> -
> -static void tmc_update_etf_buffer(struct coresight_device *csdev,
> +static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
>  				  struct perf_output_handle *handle,
>  				  void *sink_config)
>  {
> @@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>  	const u32 *barrier;
>  	u32 *buf_ptr;
>  	u64 read_ptr, write_ptr;
> -	u32 status, to_read;
> -	unsigned long offset;
> +	u32 status;
> +	unsigned long offset, to_read;
>  	struct cs_buffers *buf = sink_config;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
>  	if (!buf)
> -		return;
> +		return 0;
>  
>  	/* This shouldn't happen */
>  	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
> -		return;
> +		return 0;
>  
>  	CS_UNLOCK(drvdata->base);
>  
> @@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>  		}
>  	}
>  
> -	/*
> -	 * In snapshot mode all we have to do is communicate to
> -	 * perf_aux_output_end() the address of the current head.  In full
> -	 * trace mode the same function expects a size to move rb->aux_head
> -	 * forward.
> -	 */
> -	if (buf->snapshot)
> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
> -	else
> -		local_add(to_read, &buf->data_size);
> -
> +	/* In snapshot mode we have to update the head */
> +	if (buf->snapshot) {
> +		handle->head = (cur * PAGE_SIZE) + offset;
> +		to_read = buf->nr_pages << PAGE_SHIFT;
> +	}
>  	CS_LOCK(drvdata->base);
> +
> +	return to_read;
>  }
>  
>  static const struct coresight_ops_sink tmc_etf_sink_ops = {
> @@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
>  	.alloc_buffer	= tmc_alloc_etf_buffer,
>  	.free_buffer	= tmc_free_etf_buffer,
>  	.set_buffer	= tmc_set_etf_buffer,
> -	.reset_buffer	= tmc_reset_etf_buffer,
>  	.update_buffer	= tmc_update_etf_buffer,
>  };
>  
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index c0e1568..41b3729 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -212,10 +212,7 @@ struct coresight_ops_sink {
>  	int (*set_buffer)(struct coresight_device *csdev,
>  			  struct perf_output_handle *handle,
>  			  void *sink_config);
> -	unsigned long (*reset_buffer)(struct coresight_device *csdev,
> -				      struct perf_output_handle *handle,
> -				      void *sink_config);
> -	void (*update_buffer)(struct coresight_device *csdev,
> +	unsigned long (*update_buffer)(struct coresight_device *csdev,
>  			      struct perf_output_handle *handle,
>  			      void *sink_config);
>  };
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
@ 2018-05-08 19:42     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 19:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:56AM +0100, Suzuki K Poulose wrote:
> Right now we issue an update_buffer() and reset_buffer() call backs
> in succession when we stop tracing an event. The update_buffer is
> supposed to check the status of the buffer and make sure the ring buffer
> is updated with the trace data. And we store information about the
> size of the data collected only to be consumed by the reset_buffer
> callback which always follows the update_buffer. This was originally
> designed for handling future IPs which could trigger a buffer overflow
> interrupt. This patch gets rid of the reset_buffer callback altogether
> and performs the actions in update_buffer, making it return the size
> collected. We can always add the support for handling the overflow
> interrupt case later.
> 
> This removes some not-so pretty hack (storing the new head in the
> size field for snapshot mode) and cleans it up a little bit.

IPs with an overflow interrupts will be arriving shortly, so it is not like the
future is uncertain - they are coming.  Right now the logic is there - I don't
see a real need to consolidate things only to split it again in the near future.

I agree the part about overloading buf->data_size with the head of the ring
buffer when operating in snapshot mode isn't pretty (though well documented).
If anything that can be improve, i.e add a buf->head and things will be clear.
Once again this could be part of another patchset.

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
>  drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
>  drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
>  include/linux/coresight.h                        |  5 +-
>  4 files changed, 26 insertions(+), 102 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
> index d9c2f87..b13712a 100644
> --- a/drivers/hwtracing/coresight/coresight-etb10.c
> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
> @@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
>  	return ret;
>  }
>  
> -static unsigned long etb_reset_buffer(struct coresight_device *csdev,
> -				      struct perf_output_handle *handle,
> -				      void *sink_config)
> -{
> -	unsigned long size = 0;
> -	struct cs_buffers *buf = sink_config;
> -
> -	if (buf) {
> -		/*
> -		 * In snapshot mode ->data_size holds the new address of the
> -		 * ring buffer's head.  The size itself is the whole address
> -		 * range since we want the latest information.
> -		 */
> -		if (buf->snapshot)
> -			handle->head = local_xchg(&buf->data_size,
> -						  buf->nr_pages << PAGE_SHIFT);
> -
> -		/*
> -		 * Tell the tracer PMU how much we got in this run and if
> -		 * something went wrong along the way.  Nobody else can use
> -		 * this cs_buffers instance until we are done.  As such
> -		 * resetting parameters here and squaring off with the ring
> -		 * buffer API in the tracer PMU is fine.
> -		 */
> -		size = local_xchg(&buf->data_size, 0);
> -	}
> -
> -	return size;
> -}
> -
> -static void etb_update_buffer(struct coresight_device *csdev,
> +static unsigned long etb_update_buffer(struct coresight_device *csdev,
>  			      struct perf_output_handle *handle,
>  			      void *sink_config)
>  {
> @@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
>  	u8 *buf_ptr;
>  	const u32 *barrier;
>  	u32 read_ptr, write_ptr, capacity;
> -	u32 status, read_data, to_read;
> -	unsigned long offset;
> +	u32 status, read_data;
> +	unsigned long offset, to_read;
>  	struct cs_buffers *buf = sink_config;
>  	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
>  	if (!buf)
> -		return;
> +		return 0;
>  
>  	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
>  
> @@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
>  	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
>  
>  	/*
> -	 * In snapshot mode all we have to do is communicate to
> -	 * perf_aux_output_end() the address of the current head.  In full
> -	 * trace mode the same function expects a size to move rb->aux_head
> -	 * forward.
> +	 * In snapshot mode we have to update the handle->head to point
> +	 * to the new location.
>  	 */
> -	if (buf->snapshot)
> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
> -	else
> -		local_add(to_read, &buf->data_size);
> -
> +	if (buf->snapshot) {
> +		handle->head = (cur * PAGE_SIZE) + offset;
> +		to_read = buf->nr_pages << PAGE_SHIFT;
> +	}
>  	etb_enable_hw(drvdata);
>  	CS_LOCK(drvdata->base);
> +
> +	return to_read;
>  }
>  
>  static const struct coresight_ops_sink etb_sink_ops = {
> @@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
>  	.alloc_buffer	= etb_alloc_buffer,
>  	.free_buffer	= etb_free_buffer,
>  	.set_buffer	= etb_set_buffer,
> -	.reset_buffer	= etb_reset_buffer,
>  	.update_buffer	= etb_update_buffer,
>  };
>  
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 4e5ed65..5096def 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>  		if (!sink_ops(sink)->update_buffer)
>  			return;
>  
> -		sink_ops(sink)->update_buffer(sink, handle,
> +		size = sink_ops(sink)->update_buffer(sink, handle,
>  					      event_data->snk_config);
> -
> -		if (!sink_ops(sink)->reset_buffer)
> -			return;
> -
> -		size = sink_ops(sink)->reset_buffer(sink, handle,
> -						    event_data->snk_config);
> -
>  		perf_aux_output_end(handle, size);
>  	}
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index 0a32734..75ef5c4 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
>  	return ret;
>  }
>  
> -static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
> -					  struct perf_output_handle *handle,
> -					  void *sink_config)
> -{
> -	long size = 0;
> -	struct cs_buffers *buf = sink_config;
> -
> -	if (buf) {
> -		/*
> -		 * In snapshot mode ->data_size holds the new address of the
> -		 * ring buffer's head.  The size itself is the whole address
> -		 * range since we want the latest information.
> -		 */
> -		if (buf->snapshot)
> -			handle->head = local_xchg(&buf->data_size,
> -						  buf->nr_pages << PAGE_SHIFT);
> -		/*
> -		 * Tell the tracer PMU how much we got in this run and if
> -		 * something went wrong along the way.  Nobody else can use
> -		 * this cs_buffers instance until we are done.  As such
> -		 * resetting parameters here and squaring off with the ring
> -		 * buffer API in the tracer PMU is fine.
> -		 */
> -		size = local_xchg(&buf->data_size, 0);
> -	}
> -
> -	return size;
> -}
> -
> -static void tmc_update_etf_buffer(struct coresight_device *csdev,
> +static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
>  				  struct perf_output_handle *handle,
>  				  void *sink_config)
>  {
> @@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>  	const u32 *barrier;
>  	u32 *buf_ptr;
>  	u64 read_ptr, write_ptr;
> -	u32 status, to_read;
> -	unsigned long offset;
> +	u32 status;
> +	unsigned long offset, to_read;
>  	struct cs_buffers *buf = sink_config;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>  
>  	if (!buf)
> -		return;
> +		return 0;
>  
>  	/* This shouldn't happen */
>  	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
> -		return;
> +		return 0;
>  
>  	CS_UNLOCK(drvdata->base);
>  
> @@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>  		}
>  	}
>  
> -	/*
> -	 * In snapshot mode all we have to do is communicate to
> -	 * perf_aux_output_end() the address of the current head.  In full
> -	 * trace mode the same function expects a size to move rb->aux_head
> -	 * forward.
> -	 */
> -	if (buf->snapshot)
> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
> -	else
> -		local_add(to_read, &buf->data_size);
> -
> +	/* In snapshot mode we have to update the head */
> +	if (buf->snapshot) {
> +		handle->head = (cur * PAGE_SIZE) + offset;
> +		to_read = buf->nr_pages << PAGE_SHIFT;
> +	}
>  	CS_LOCK(drvdata->base);
> +
> +	return to_read;
>  }
>  
>  static const struct coresight_ops_sink tmc_etf_sink_ops = {
> @@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
>  	.alloc_buffer	= tmc_alloc_etf_buffer,
>  	.free_buffer	= tmc_free_etf_buffer,
>  	.set_buffer	= tmc_set_etf_buffer,
> -	.reset_buffer	= tmc_reset_etf_buffer,
>  	.update_buffer	= tmc_update_etf_buffer,
>  };
>  
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index c0e1568..41b3729 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -212,10 +212,7 @@ struct coresight_ops_sink {
>  	int (*set_buffer)(struct coresight_device *csdev,
>  			  struct perf_output_handle *handle,
>  			  void *sink_config);
> -	unsigned long (*reset_buffer)(struct coresight_device *csdev,
> -				      struct perf_output_handle *handle,
> -				      void *sink_config);
> -	void (*update_buffer)(struct coresight_device *csdev,
> +	unsigned long (*update_buffer)(struct coresight_device *csdev,
>  			      struct perf_output_handle *handle,
>  			      void *sink_config);
>  };
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
  2018-05-08 17:18     ` Mathieu Poirier
@ 2018-05-08 21:51       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 21:51 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 05/08/2018 06:18 PM, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
>> Since the ETR could be driven either by SYSFS or by perf, it
>> becomes complicated how we deal with the buffers used for each
>> of these modes. The ETR driver cannot simply free the current
>> attached buffer without knowing the provider (i.e, sysfs vs perf).
>>
>> To solve this issue, we provide:
>> 1) the driver-mode specific etr buffer to be retained in the drvdata
>> 2) the etr_buf for a session should be passed on when enabling the
>>     hardware, which will be stored in drvdata->etr_buf. This will be
>>     replaced (not free'd) as soon as the hardware is disabled, after
>>     necessary sync operation.
>>
>> The advantages of this are :
>>
>> 1) The common code path doesn't need to worry about how to dispose
>>     an existing buffer, if it is about to start a new session with a
>>     different buffer, possibly in a different mode.
>> 2) The driver mode can control its buffers and can get access to the
>>     saved session even when the hardware is operating in a different
>>     mode. (e.g, we can still access a trace buffer from a sysfs mode
>>     even if the etr is now used in perf mode, without disrupting the
>>     current session.)
>>
>> Towards this, we introduce a sysfs specific data which will hold the
>> etr_buf used for sysfs mode of operation, controlled solely by the
>> sysfs mode handling code.
> 
> Thinking further on this... I toyed with the idea of doing the same thing when
> working on the original driver and decided against it.  Do we really have a case
> where users would want to use sysFS and perf alternatively?  To me this looks
> overdesigned.
> 
> If we are going to go that way we need to enact the same behavior for ETB10 and
> ETF...  And take it out of this set as it is already substantial enough.

The difference between ETB10/ETF and ETR is the usage of the buffer. The 
former uses an internal buffer and we always have to copy it out to an
external buffer for consumption. Now this external buffer is actually 
separate for each mode, i.e sysfs and perf. Also the data is copied
out right after we disable the HW. This ensures that the interleaved
mode doesn't corrupt each others data.

However, the ETR doesn't have an internal buffer and uses the System 
RAM. That brings in the problem of one mode using the "buffer" as
described by the drvdata. So, eventually either mode could write to
the buffer allocated by the other mode before it is consumed by the
end user (via syfs read or perf). That brings in the challenge of
managing the buffer safely, switching back and forth the buffer
(with the right size and pages) for each mode without any interferences.
That also implies, one mode must be able to free the left-over buffer
from the previous mode safely (which could be potentially linked to
other data structures maintained by the mode). And that makes it more 
complex. e.g, we must leave the sysfs trace data for collection and
meanwhile the perf could grab the ETR for its usage. The perf mode
might not know the mode of the existing buffer and thus wouldn't know
how to free it properly.

This is why we need buffers per mode which can be managed by
each mode. i.e, both allocated, used and more importantly free'd
appropriately.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
@ 2018-05-08 21:51       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-08 21:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/08/2018 06:18 PM, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
>> Since the ETR could be driven either by SYSFS or by perf, it
>> becomes complicated how we deal with the buffers used for each
>> of these modes. The ETR driver cannot simply free the current
>> attached buffer without knowing the provider (i.e, sysfs vs perf).
>>
>> To solve this issue, we provide:
>> 1) the driver-mode specific etr buffer to be retained in the drvdata
>> 2) the etr_buf for a session should be passed on when enabling the
>>     hardware, which will be stored in drvdata->etr_buf. This will be
>>     replaced (not free'd) as soon as the hardware is disabled, after
>>     necessary sync operation.
>>
>> The advantages of this are :
>>
>> 1) The common code path doesn't need to worry about how to dispose
>>     an existing buffer, if it is about to start a new session with a
>>     different buffer, possibly in a different mode.
>> 2) The driver mode can control its buffers and can get access to the
>>     saved session even when the hardware is operating in a different
>>     mode. (e.g, we can still access a trace buffer from a sysfs mode
>>     even if the etr is now used in perf mode, without disrupting the
>>     current session.)
>>
>> Towards this, we introduce a sysfs specific data which will hold the
>> etr_buf used for sysfs mode of operation, controlled solely by the
>> sysfs mode handling code.
> 
> Thinking further on this... I toyed with the idea of doing the same thing when
> working on the original driver and decided against it.  Do we really have a case
> where users would want to use sysFS and perf alternatively?  To me this looks
> overdesigned.
> 
> If we are going to go that way we need to enact the same behavior for ETB10 and
> ETF...  And take it out of this set as it is already substantial enough.

The difference between ETB10/ETF and ETR is the usage of the buffer. The 
former uses an internal buffer and we always have to copy it out to an
external buffer for consumption. Now this external buffer is actually 
separate for each mode, i.e sysfs and perf. Also the data is copied
out right after we disable the HW. This ensures that the interleaved
mode doesn't corrupt each others data.

However, the ETR doesn't have an internal buffer and uses the System 
RAM. That brings in the problem of one mode using the "buffer" as
described by the drvdata. So, eventually either mode could write to
the buffer allocated by the other mode before it is consumed by the
end user (via syfs read or perf). That brings in the challenge of
managing the buffer safely, switching back and forth the buffer
(with the right size and pages) for each mode without any interferences.
That also implies, one mode must be able to free the left-over buffer
from the previous mode safely (which could be potentially linked to
other data structures maintained by the mode). And that makes it more 
complex. e.g, we must leave the sysfs trace data for collection and
meanwhile the perf could grab the ETR for its usage. The perf mode
might not know the mode of the existing buffer and thus wouldn't know
how to free it properly.

This is why we need buffers per mode which can be managed by
each mode. i.e, both allocated, used and more importantly free'd
appropriately.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 27/27] coresight: etm-perf: Add support for ETR backend
  2018-05-01  9:10   ` Suzuki K Poulose
@ 2018-05-08 22:04     ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 22:04 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On Tue, May 01, 2018 at 10:10:57AM +0100, Suzuki K Poulose wrote:
> Add necessary support for using ETR as a sink in ETM perf tracing.
> We try make the best use of the available modes of buffers to
> try and avoid software double buffering.
> 
> We can use the perf ring buffer for ETR directly if all of the
> conditions below are met :
>  1) ETR is DMA coherent
>  2) perf is used in snapshot mode. In full tracing mode, we cannot
>     guarantee that the ETR will stop before it overwrites the data
>     at the beginning of the trace buffer leading to loss of trace
>     data. (The buffer which is being consumed by the perf is still
>     hidden from the ETR).
>  3) ETR supports save-restore with a scatter-gather mechanism
>     which can use a given set of pages we use the perf ring buffer
>     directly. If we have an in-built TMC ETR Scatter Gather unit,
>     we make use of a circular SG list to restart from a given head.
>     However, we need to align the starting offset to 4K in this case.
>     With CATU and ETR Save restore feature, we don't have to necessarily
>     align the head of the buffer.
> 
> If the ETR doesn't support either of this, we fallback to software
> double buffering.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> 
> Note: The conditions above need some rethink.
> 
> For (1) : We always sync the buffer for CPU, before we update the
> pointers. So, we should be safe here and should be able to remove
> this condition.
> 
> (2) is a bit more of problem, as the ETR (without SFIFO_2 mode)
> doesn't stop writing out the trace buffer, eventhough we exclude
> the part of the ring buffer currently consumed by perf, leading
> to loss of data. Also, since we don't have an interrupt (without
> SFIFO_2), we can't wake up the userspace reliably to consume
> the data.
> 
> One possible option is to use an hrtimer to wake up the userspace
> early enough, using a low wakeup mark. But that doesn't necessarily
> guarantee that the ETR will not wrap around overwriting the data,
> as we can't modify the ETR pointers, unless we disable it, which
> could again potentially cause data loss in Circular Buffer mode.
> We may still be able to detect if there was a data loss by checking
> how far the userspace has consumed the data.

I thought about timers before but as you point out it comes with a wealth of
problems.  Not having a buffer overflow interrupt is just a HW limitation we
need to live with until something better comes along.

> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 387 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |   2 +
>  2 files changed, 386 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 8159e84..3e9ba02 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -31,6 +31,32 @@ struct etr_flat_buf {
>  };
>  
>  /*
> + * etr_perf_buffer - Perf buffer used for ETR
> + * @etr_buf		- Actual buffer used by the ETR
> + * @snaphost		- Perf session mode
> + * @head		- handle->head at the beginning of the session.
> + * @nr_pages		- Number of pages in the ring buffer.
> + * @pages		- Pages in the ring buffer.
> + * @flags		- Capabilities of the hardware buffer used in the
> + *			  session. If flags == 0, we use software double
> + *			  buffering.
> + */
> +struct etr_perf_buffer {
> +	struct etr_buf		*etr_buf;
> +	bool			snapshot;
> +	unsigned long		head;
> +	int			nr_pages;
> +	void			**pages;
> +	u32			flags;
> +};
> +
> +/* Convert the perf index to an offset within the ETR buffer */
> +#define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +/* Lower limit for ETR hardware buffer in double buffering mode */
> +#define TMC_ETR_PERF_MIN_BUF_SIZE	SZ_1M
> +
> +/*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
>   * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
> @@ -1164,7 +1190,7 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
>  		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
>  }
>  
> -static int __maybe_unused
> +static int
>  tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
>  		    unsigned long r_offset, unsigned long w_offset,
>  		    unsigned long size, u32 status)
> @@ -1415,10 +1441,361 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	return ret;
>  }
>  
> +/*
> + * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. We try to
> + * use perf ring buffer pages for the ETR when we can. In the worst case
> + * we fallback to software double buffering. The size of the hardware buffer
> + * in this case is dependent on the size configured via sysfs, if we can't
> + * match the perf ring buffer size. We scale down the size by half until
> + * it reaches a limit of 1M, beyond which we give up.
> + */
> +static struct etr_perf_buffer *
> +tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
> +		       void **pages, bool snapshot)
> +{
> +	int i;
> +	struct etr_buf *etr_buf;
> +	struct etr_perf_buffer *etr_perf;
> +	unsigned long size;
> +	unsigned long buf_flags[] = {
> +					ETR_BUF_F_RESTORE_FULL,
> +					ETR_BUF_F_RESTORE_MINIMAL,
> +					0,
> +				    };
> +
> +	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
> +	if (!etr_perf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	size = nr_pages << PAGE_SHIFT;
> +	/*
> +	 * TODO: We need to refine the following rule.
> +	 *
> +	 * We can use the perf ring buffer for ETR only if it is coherent
> +	 * and used in snapshot mode.
> +	 *
> +	 * The ETR (without SFIFO_2 mode) cannot stop writing when a
> +	 * certain limit is reached, nor can it interrupt driver.
> +	 * We can protect the data which is being consumed by the
> +	 * userspace, by hiding it from the ETR's tables. So, we could
> +	 * potentially loose the trace data only for the current session
> +	 * session if the ETR wraps around.
> +	 */
> +	if (tmc_etr_has_cap(drvdata, TMC_ETR_COHERENT) && snapshot) {
> +		for (i = 0; buf_flags[i]; i++) {
> +			etr_buf = tmc_alloc_etr_buf(drvdata, size,
> +						 buf_flags[i], node, pages)

Indentation

> +			if (!IS_ERR(etr_buf)) {
> +				etr_perf->flags = buf_flags[i];
> +				goto done;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * We have to now fallback to software double buffering.
> +	 * The tricky decision is choosing a size for the hardware buffer.
> +	 * We could start with drvdata->size (configurable via sysfs) and
> +	 * scale it down until we can allocate the data.
> +	 */
> +	etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);

The above comment doesn't match the code.  We start with a buffer size equal to
what was requested and then fall back to drvdata->size if something goes wrong.

I don't see why drvdata->size gets involved at all.  I would simply try to
reduce @size until we get a successful allocation.

> +	if (!IS_ERR(etr_buf))
> +		goto done;
> +	size = drvdata->size;
> +	do {
> +		etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
> +		if (!IS_ERR(etr_buf))
> +			goto done;
> +		size /= 2;
> +	} while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
> +
> +	kfree(etr_perf);
> +	return ERR_PTR(-ENOMEM);
> +
> +done:
> +	etr_perf->etr_buf = etr_buf;
> +	return etr_perf;
> +}
> +
> +
> +static void *tmc_etr_alloc_perf_buffer(struct coresight_device *csdev,
> +					int cpu, void **pages, int nr_pages,
> +					bool snapshot)
> +{
> +	struct etr_perf_buffer *etr_perf;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +
> +	if (cpu == -1)
> +		cpu = smp_processor_id();
> +
> +	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
> +					     nr_pages, pages, snapshot);
> +	if (IS_ERR(etr_perf)) {
> +		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
> +		return NULL;
> +	}
> +
> +	etr_perf->snapshot = snapshot;
> +	etr_perf->nr_pages = nr_pages;
> +	etr_perf->pages = pages;
> +
> +	return etr_perf;
> +}
> +
> +static void tmc_etr_free_perf_buffer(void *config)
> +{
> +	struct etr_perf_buffer *etr_perf = config;
> +
> +	if (etr_perf->etr_buf)
> +		tmc_free_etr_buf(etr_perf->etr_buf);
> +	kfree(etr_perf);
> +}
> +
> +/*
> + * Pad the etr buffer with barrier packets to align the head to 4K aligned
> + * offset. This is required for ETR SG backed buffers, so that we can rotate
> + * the buffer easily and avoid a software double buffering.
> + */
> +static long tmc_etr_pad_perf_buffer(struct etr_perf_buffer *etr_perf, long head)
> +{
> +	long new_head;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	head = PERF_IDX2OFF(head, etr_perf);
> +	new_head = ALIGN(head, SZ_4K);
> +	if (head == new_head)
> +		return head;
> +	/*
> +	 * If the padding is not aligned to barrier packet size
> +	 * we can't do much.
> +	 */
> +	if ((new_head - head) % CORESIGHT_BARRIER_PKT_SIZE)
> +		return -EINVAL;
> +	return tmc_etr_buf_insert_barrier_packets(etr_buf, head,
> +						  new_head - head);
> +}
> +
> +static int tmc_etr_set_perf_buffer(struct coresight_device *csdev,
> +				   struct perf_output_handle *handle,
> +				   void *config)
> +{
> +	int rc;
> +	unsigned long flags;
> +	long head, new_head;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_perf_buffer *etr_perf = config;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	etr_perf->head = handle->head;
> +	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +	switch (etr_perf->flags) {
> +	case ETR_BUF_F_RESTORE_MINIMAL:
> +		new_head = tmc_etr_pad_perf_buffer(etr_perf, head);
> +		if (new_head < 0)
> +			return new_head;
> +		if (head != new_head) {
> +			rc = perf_aux_output_skip(handle, new_head - head);
> +			if (rc)
> +				return rc;
> +			etr_perf->head = handle->head;
> +			head = new_head;
> +		}
> +		/* Fall through */
> +	case ETR_BUF_F_RESTORE_FULL:
> +		rc = tmc_restore_etr_buf(drvdata, etr_buf,
> +					 head, head, handle->size, 0);
> +		break;
> +	case 0:
> +		/* Nothing to do here. */
> +		rc = 0;
> +		break;
> +	default:
> +		dev_warn(drvdata->dev, "Unexpected flags in etr_perf buffer\n");
> +		WARN_ON(1);
> +		rc = -EINVAL;
> +	}
> +
> +	/*
> +	 * This sink is going to be used in perf mode. No other session can
> +	 * grab it from us. So set the perf mode specific data here. This will
> +	 * be released just before we disable the sink from update_buffer call
> +	 * back.
> +	 */
> +	if (!rc) {
> +		spin_lock_irqsave(&drvdata->spinlock, flags);
> +		if (WARN_ON(drvdata->perf_data))
> +			rc = -EBUSY;
> +		else
> +			drvdata->perf_data = etr_perf;
> +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_etr_sync_perf_buffer: Copy the actual trace data from the hardware
> + * buffer to the perf ring buffer.
> + */
> +static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf)
> +{
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +	long bytes, to_copy;
> +	unsigned long head = etr_perf->head;
> +	unsigned long pg_idx, pg_offset, src_offset;
> +	char **dst_pages, *src_buf;
> +
> +	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +	pg_idx = head >> PAGE_SHIFT;
> +	pg_offset = head & (PAGE_SIZE - 1);
> +	dst_pages = (char **)etr_perf->pages;
> +	src_offset = etr_buf->offset;
> +	to_copy = etr_buf->len;
> +
> +	while (to_copy > 0) {
> +		/*
> +		 * We can copy minimum of :
> +		 *  1) what is available in the source buffer,
> +		 *  2) what is available in the source buffer, before it
> +		 *     wraps around.
> +		 *  3) what is available in the destination page.
> +		 * in one iteration.
> +		 */
> +		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
> +					     &src_buf);
> +		if (WARN_ON_ONCE(bytes <= 0))
> +			break;
> +		if (PAGE_SIZE - pg_offset <  bytes)
> +			bytes = PAGE_SIZE - pg_offset;
> +
> +		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
> +		to_copy -= bytes;
> +		/* Move destination pointers */
> +		pg_offset += bytes;
> +		if (pg_offset == PAGE_SIZE) {
> +			pg_offset = 0;
> +			if (++pg_idx == etr_perf->nr_pages)
> +				pg_idx = 0;
> +		}
> +
> +		/* Move source pointers */
> +		src_offset += bytes;
> +		if (src_offset >= etr_buf->size)
> +			src_offset -= etr_buf->size;
> +	}
> +}
> +
> +/*
> + * XXX: What is the expected behavior here in the following cases ?
> + *  1) Full trace mode, without double buffering : What should be the size
> + *     reported back when the buffer is full and has wrapped around. Ideally,
> + *     we should report for the lost trace to make sure the "head" in the ring
> + *     buffer comes back to the position as in the trace buffer, rather than
> + *     returning "total size" of the buffer.

I agree with the above strategy as there isn't much else to do.  But do we
actually have a DMA coherent ETR SG or CATU?  From the documentation available
to me I don't see it for ERT SG and I don't have the one for CATU.  My hope
would be to get an IP with an overflow interrupt _before_ one that is DMA
coherent.


> + * 2) In snapshot mode, should we always return "full buffer size" ?

Snapshot mode is currently broken, something I intend to fix shortly.  Until
then and to follow what is done for other IPs I think it is best to return the
full size.

> + */
> +static unsigned long
> +tmc_etr_update_perf_buffer(struct coresight_device *csdev,
> +			   struct perf_output_handle *handle,
> +			   void *config)
> +{
> +	bool double_buffer, lost = false;
> +	unsigned long flags, offset, size = 0;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_perf_buffer *etr_perf = config;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	double_buffer = (etr_perf->flags == 0);
> +
> +	spin_lock_irqsave(&drvdata->spinlock, flags);
> +	if (WARN_ON(drvdata->perf_data != etr_perf)) {
> +		lost = true;
> +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +		goto out;
> +	}
> +
> +	CS_UNLOCK(drvdata->base);
> +
> +	tmc_flush_and_stop(drvdata);
> +
> +	tmc_sync_etr_buf(drvdata);
> +	CS_UNLOCK(drvdata->base);
> +	/* Reset perf specific data */
> +	drvdata->perf_data = NULL;
> +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +
> +	offset = etr_buf->offset + etr_buf->len;
> +	if (offset > etr_buf->size)
> +		offset -= etr_buf->size;
> +
> +	if (double_buffer) {
> +		/*
> +		 * If we use software double buffering, update the ring buffer.
> +		 * And the size is what we have in the hardware buffer.
> +		 */
> +		size = etr_buf->len;
> +		tmc_etr_sync_perf_buffer(etr_perf);
> +	} else {
> +		/*
> +		 * If the hardware uses perf ring buffer the size of the data
> +		 * we have is from the old-head to the current head of the
> +		 * buffer. This also means in non-snapshot mode, we have lost
> +		 * one-full-buffer-size worth data, if the buffer wraps around.
> +		 */
> +		unsigned long old_head;
> +
> +		old_head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +		size = (offset - old_head + etr_buf->size) % etr_buf->size;
> +	}
> +
> +	/*
> +	 * Update handle->head in snapshot mode. Also update the size to the
> +	 * hardware buffer size if there was an overflow.
> +	 */
> +	if (etr_perf->snapshot) {
> +		if (double_buffer)
> +			handle->head += size;
> +		else
> +			handle->head = offset;
> +		if (etr_buf->full)
> +			size = etr_buf->size;
> +	}
> +
> +	lost |= etr_buf->full;
> +out:
> +	if (lost)
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return size;
> +}
> +
>  static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
>  {
> -	/* We don't support perf mode yet ! */
> -	return -EINVAL;
> +	int rc = 0;
> +	unsigned long flags;
> +	struct etr_perf_buffer *etr_perf;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +
> +	spin_lock_irqsave(&drvdata->spinlock, flags);
> +	/*
> +	 * There can be only one writer per sink in perf mode. If the sink
> +	 * is already open in SYSFS mode, we can't use it.
> +	 */
> +	if (drvdata->mode != CS_MODE_DISABLED) {
> +		rc = -EBUSY;
> +		goto unlock_out;
> +	}
> +
> +	etr_perf = drvdata->perf_data;
> +	if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
> +		rc = -EINVAL;
> +		goto unlock_out;
> +	}
> +
> +	drvdata->mode = CS_MODE_PERF;
> +	tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
> +
> +unlock_out:
> +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +	return rc;
>  }
>  
>  static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
> @@ -1459,6 +1836,10 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
>  static const struct coresight_ops_sink tmc_etr_sink_ops = {
>  	.enable		= tmc_enable_etr_sink,
>  	.disable	= tmc_disable_etr_sink,
> +	.alloc_buffer	= tmc_etr_alloc_perf_buffer,
> +	.update_buffer	= tmc_etr_update_perf_buffer,
> +	.set_buffer	= tmc_etr_set_perf_buffer,
> +	.free_buffer	= tmc_etr_free_perf_buffer,
>  };
>  
>  const struct coresight_ops tmc_etr_cs_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 185dc12..aa42f5d 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -197,6 +197,7 @@ struct etr_buf {
>   * @trigger_cntr: amount of words to store after a trigger.
>   * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
>   *		device configuration register (DEVID)
> + * @perf_data:	PERF buffer for ETR.
>   * @sysfs_data:	SYSFS buffer for ETR.
>   */
>  struct tmc_drvdata {
> @@ -218,6 +219,7 @@ struct tmc_drvdata {
>  	u32			trigger_cntr;
>  	u32			etr_caps;
>  	struct etr_buf		*sysfs_buf;
> +	void			*perf_data;
>  };
>  
>  struct etr_buf_operations {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 27/27] coresight: etm-perf: Add support for ETR backend
@ 2018-05-08 22:04     ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-08 22:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 10:10:57AM +0100, Suzuki K Poulose wrote:
> Add necessary support for using ETR as a sink in ETM perf tracing.
> We try make the best use of the available modes of buffers to
> try and avoid software double buffering.
> 
> We can use the perf ring buffer for ETR directly if all of the
> conditions below are met :
>  1) ETR is DMA coherent
>  2) perf is used in snapshot mode. In full tracing mode, we cannot
>     guarantee that the ETR will stop before it overwrites the data
>     at the beginning of the trace buffer leading to loss of trace
>     data. (The buffer which is being consumed by the perf is still
>     hidden from the ETR).
>  3) ETR supports save-restore with a scatter-gather mechanism
>     which can use a given set of pages we use the perf ring buffer
>     directly. If we have an in-built TMC ETR Scatter Gather unit,
>     we make use of a circular SG list to restart from a given head.
>     However, we need to align the starting offset to 4K in this case.
>     With CATU and ETR Save restore feature, we don't have to necessarily
>     align the head of the buffer.
> 
> If the ETR doesn't support either of this, we fallback to software
> double buffering.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> 
> Note: The conditions above need some rethink.
> 
> For (1) : We always sync the buffer for CPU, before we update the
> pointers. So, we should be safe here and should be able to remove
> this condition.
> 
> (2) is a bit more of problem, as the ETR (without SFIFO_2 mode)
> doesn't stop writing out the trace buffer, eventhough we exclude
> the part of the ring buffer currently consumed by perf, leading
> to loss of data. Also, since we don't have an interrupt (without
> SFIFO_2), we can't wake up the userspace reliably to consume
> the data.
> 
> One possible option is to use an hrtimer to wake up the userspace
> early enough, using a low wakeup mark. But that doesn't necessarily
> guarantee that the ETR will not wrap around overwriting the data,
> as we can't modify the ETR pointers, unless we disable it, which
> could again potentially cause data loss in Circular Buffer mode.
> We may still be able to detect if there was a data loss by checking
> how far the userspace has consumed the data.

I thought about timers before but as you point out it comes with a wealth of
problems.  Not having a buffer overflow interrupt is just a HW limitation we
need to live with until something better comes along.

> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 387 +++++++++++++++++++++++-
>  drivers/hwtracing/coresight/coresight-tmc.h     |   2 +
>  2 files changed, 386 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 8159e84..3e9ba02 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -31,6 +31,32 @@ struct etr_flat_buf {
>  };
>  
>  /*
> + * etr_perf_buffer - Perf buffer used for ETR
> + * @etr_buf		- Actual buffer used by the ETR
> + * @snaphost		- Perf session mode
> + * @head		- handle->head at the beginning of the session.
> + * @nr_pages		- Number of pages in the ring buffer.
> + * @pages		- Pages in the ring buffer.
> + * @flags		- Capabilities of the hardware buffer used in the
> + *			  session. If flags == 0, we use software double
> + *			  buffering.
> + */
> +struct etr_perf_buffer {
> +	struct etr_buf		*etr_buf;
> +	bool			snapshot;
> +	unsigned long		head;
> +	int			nr_pages;
> +	void			**pages;
> +	u32			flags;
> +};
> +
> +/* Convert the perf index to an offset within the ETR buffer */
> +#define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
> +
> +/* Lower limit for ETR hardware buffer in double buffering mode */
> +#define TMC_ETR_PERF_MIN_BUF_SIZE	SZ_1M
> +
> +/*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
>   * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
> @@ -1164,7 +1190,7 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
>  		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
>  }
>  
> -static int __maybe_unused
> +static int
>  tmc_restore_etr_buf(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
>  		    unsigned long r_offset, unsigned long w_offset,
>  		    unsigned long size, u32 status)
> @@ -1415,10 +1441,361 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	return ret;
>  }
>  
> +/*
> + * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. We try to
> + * use perf ring buffer pages for the ETR when we can. In the worst case
> + * we fallback to software double buffering. The size of the hardware buffer
> + * in this case is dependent on the size configured via sysfs, if we can't
> + * match the perf ring buffer size. We scale down the size by half until
> + * it reaches a limit of 1M, beyond which we give up.
> + */
> +static struct etr_perf_buffer *
> +tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
> +		       void **pages, bool snapshot)
> +{
> +	int i;
> +	struct etr_buf *etr_buf;
> +	struct etr_perf_buffer *etr_perf;
> +	unsigned long size;
> +	unsigned long buf_flags[] = {
> +					ETR_BUF_F_RESTORE_FULL,
> +					ETR_BUF_F_RESTORE_MINIMAL,
> +					0,
> +				    };
> +
> +	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
> +	if (!etr_perf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	size = nr_pages << PAGE_SHIFT;
> +	/*
> +	 * TODO: We need to refine the following rule.
> +	 *
> +	 * We can use the perf ring buffer for ETR only if it is coherent
> +	 * and used in snapshot mode.
> +	 *
> +	 * The ETR (without SFIFO_2 mode) cannot stop writing when a
> +	 * certain limit is reached, nor can it interrupt driver.
> +	 * We can protect the data which is being consumed by the
> +	 * userspace, by hiding it from the ETR's tables. So, we could
> +	 * potentially loose the trace data only for the current session
> +	 * session if the ETR wraps around.
> +	 */
> +	if (tmc_etr_has_cap(drvdata, TMC_ETR_COHERENT) && snapshot) {
> +		for (i = 0; buf_flags[i]; i++) {
> +			etr_buf = tmc_alloc_etr_buf(drvdata, size,
> +						 buf_flags[i], node, pages)

Indentation

> +			if (!IS_ERR(etr_buf)) {
> +				etr_perf->flags = buf_flags[i];
> +				goto done;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * We have to now fallback to software double buffering.
> +	 * The tricky decision is choosing a size for the hardware buffer.
> +	 * We could start with drvdata->size (configurable via sysfs) and
> +	 * scale it down until we can allocate the data.
> +	 */
> +	etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);

The above comment doesn't match the code.  We start with a buffer size equal to
what was requested and then fall back to drvdata->size if something goes wrong.

I don't see why drvdata->size gets involved at all.  I would simply try to
reduce @size until we get a successful allocation.

> +	if (!IS_ERR(etr_buf))
> +		goto done;
> +	size = drvdata->size;
> +	do {
> +		etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL);
> +		if (!IS_ERR(etr_buf))
> +			goto done;
> +		size /= 2;
> +	} while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
> +
> +	kfree(etr_perf);
> +	return ERR_PTR(-ENOMEM);
> +
> +done:
> +	etr_perf->etr_buf = etr_buf;
> +	return etr_perf;
> +}
> +
> +
> +static void *tmc_etr_alloc_perf_buffer(struct coresight_device *csdev,
> +					int cpu, void **pages, int nr_pages,
> +					bool snapshot)
> +{
> +	struct etr_perf_buffer *etr_perf;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +
> +	if (cpu == -1)
> +		cpu = smp_processor_id();
> +
> +	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
> +					     nr_pages, pages, snapshot);
> +	if (IS_ERR(etr_perf)) {
> +		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
> +		return NULL;
> +	}
> +
> +	etr_perf->snapshot = snapshot;
> +	etr_perf->nr_pages = nr_pages;
> +	etr_perf->pages = pages;
> +
> +	return etr_perf;
> +}
> +
> +static void tmc_etr_free_perf_buffer(void *config)
> +{
> +	struct etr_perf_buffer *etr_perf = config;
> +
> +	if (etr_perf->etr_buf)
> +		tmc_free_etr_buf(etr_perf->etr_buf);
> +	kfree(etr_perf);
> +}
> +
> +/*
> + * Pad the etr buffer with barrier packets to align the head to 4K aligned
> + * offset. This is required for ETR SG backed buffers, so that we can rotate
> + * the buffer easily and avoid a software double buffering.
> + */
> +static long tmc_etr_pad_perf_buffer(struct etr_perf_buffer *etr_perf, long head)
> +{
> +	long new_head;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	head = PERF_IDX2OFF(head, etr_perf);
> +	new_head = ALIGN(head, SZ_4K);
> +	if (head == new_head)
> +		return head;
> +	/*
> +	 * If the padding is not aligned to barrier packet size
> +	 * we can't do much.
> +	 */
> +	if ((new_head - head) % CORESIGHT_BARRIER_PKT_SIZE)
> +		return -EINVAL;
> +	return tmc_etr_buf_insert_barrier_packets(etr_buf, head,
> +						  new_head - head);
> +}
> +
> +static int tmc_etr_set_perf_buffer(struct coresight_device *csdev,
> +				   struct perf_output_handle *handle,
> +				   void *config)
> +{
> +	int rc;
> +	unsigned long flags;
> +	long head, new_head;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_perf_buffer *etr_perf = config;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	etr_perf->head = handle->head;
> +	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +	switch (etr_perf->flags) {
> +	case ETR_BUF_F_RESTORE_MINIMAL:
> +		new_head = tmc_etr_pad_perf_buffer(etr_perf, head);
> +		if (new_head < 0)
> +			return new_head;
> +		if (head != new_head) {
> +			rc = perf_aux_output_skip(handle, new_head - head);
> +			if (rc)
> +				return rc;
> +			etr_perf->head = handle->head;
> +			head = new_head;
> +		}
> +		/* Fall through */
> +	case ETR_BUF_F_RESTORE_FULL:
> +		rc = tmc_restore_etr_buf(drvdata, etr_buf,
> +					 head, head, handle->size, 0);
> +		break;
> +	case 0:
> +		/* Nothing to do here. */
> +		rc = 0;
> +		break;
> +	default:
> +		dev_warn(drvdata->dev, "Unexpected flags in etr_perf buffer\n");
> +		WARN_ON(1);
> +		rc = -EINVAL;
> +	}
> +
> +	/*
> +	 * This sink is going to be used in perf mode. No other session can
> +	 * grab it from us. So set the perf mode specific data here. This will
> +	 * be released just before we disable the sink from update_buffer call
> +	 * back.
> +	 */
> +	if (!rc) {
> +		spin_lock_irqsave(&drvdata->spinlock, flags);
> +		if (WARN_ON(drvdata->perf_data))
> +			rc = -EBUSY;
> +		else
> +			drvdata->perf_data = etr_perf;
> +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_etr_sync_perf_buffer: Copy the actual trace data from the hardware
> + * buffer to the perf ring buffer.
> + */
> +static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf)
> +{
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +	long bytes, to_copy;
> +	unsigned long head = etr_perf->head;
> +	unsigned long pg_idx, pg_offset, src_offset;
> +	char **dst_pages, *src_buf;
> +
> +	head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +	pg_idx = head >> PAGE_SHIFT;
> +	pg_offset = head & (PAGE_SIZE - 1);
> +	dst_pages = (char **)etr_perf->pages;
> +	src_offset = etr_buf->offset;
> +	to_copy = etr_buf->len;
> +
> +	while (to_copy > 0) {
> +		/*
> +		 * We can copy minimum of :
> +		 *  1) what is available in the source buffer,
> +		 *  2) what is available in the source buffer, before it
> +		 *     wraps around.
> +		 *  3) what is available in the destination page.
> +		 * in one iteration.
> +		 */
> +		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
> +					     &src_buf);
> +		if (WARN_ON_ONCE(bytes <= 0))
> +			break;
> +		if (PAGE_SIZE - pg_offset <  bytes)
> +			bytes = PAGE_SIZE - pg_offset;
> +
> +		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
> +		to_copy -= bytes;
> +		/* Move destination pointers */
> +		pg_offset += bytes;
> +		if (pg_offset == PAGE_SIZE) {
> +			pg_offset = 0;
> +			if (++pg_idx == etr_perf->nr_pages)
> +				pg_idx = 0;
> +		}
> +
> +		/* Move source pointers */
> +		src_offset += bytes;
> +		if (src_offset >= etr_buf->size)
> +			src_offset -= etr_buf->size;
> +	}
> +}
> +
> +/*
> + * XXX: What is the expected behavior here in the following cases ?
> + *  1) Full trace mode, without double buffering : What should be the size
> + *     reported back when the buffer is full and has wrapped around. Ideally,
> + *     we should report for the lost trace to make sure the "head" in the ring
> + *     buffer comes back to the position as in the trace buffer, rather than
> + *     returning "total size" of the buffer.

I agree with the above strategy as there isn't much else to do.  But do we
actually have a DMA coherent ETR SG or CATU?  From the documentation available
to me I don't see it for ERT SG and I don't have the one for CATU.  My hope
would be to get an IP with an overflow interrupt _before_ one that is DMA
coherent.


> + * 2) In snapshot mode, should we always return "full buffer size" ?

Snapshot mode is currently broken, something I intend to fix shortly.  Until
then and to follow what is done for other IPs I think it is best to return the
full size.

> + */
> +static unsigned long
> +tmc_etr_update_perf_buffer(struct coresight_device *csdev,
> +			   struct perf_output_handle *handle,
> +			   void *config)
> +{
> +	bool double_buffer, lost = false;
> +	unsigned long flags, offset, size = 0;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_perf_buffer *etr_perf = config;
> +	struct etr_buf *etr_buf = etr_perf->etr_buf;
> +
> +	double_buffer = (etr_perf->flags == 0);
> +
> +	spin_lock_irqsave(&drvdata->spinlock, flags);
> +	if (WARN_ON(drvdata->perf_data != etr_perf)) {
> +		lost = true;
> +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +		goto out;
> +	}
> +
> +	CS_UNLOCK(drvdata->base);
> +
> +	tmc_flush_and_stop(drvdata);
> +
> +	tmc_sync_etr_buf(drvdata);
> +	CS_UNLOCK(drvdata->base);
> +	/* Reset perf specific data */
> +	drvdata->perf_data = NULL;
> +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +
> +	offset = etr_buf->offset + etr_buf->len;
> +	if (offset > etr_buf->size)
> +		offset -= etr_buf->size;
> +
> +	if (double_buffer) {
> +		/*
> +		 * If we use software double buffering, update the ring buffer.
> +		 * And the size is what we have in the hardware buffer.
> +		 */
> +		size = etr_buf->len;
> +		tmc_etr_sync_perf_buffer(etr_perf);
> +	} else {
> +		/*
> +		 * If the hardware uses perf ring buffer the size of the data
> +		 * we have is from the old-head to the current head of the
> +		 * buffer. This also means in non-snapshot mode, we have lost
> +		 * one-full-buffer-size worth data, if the buffer wraps around.
> +		 */
> +		unsigned long old_head;
> +
> +		old_head = PERF_IDX2OFF(etr_perf->head, etr_perf);
> +		size = (offset - old_head + etr_buf->size) % etr_buf->size;
> +	}
> +
> +	/*
> +	 * Update handle->head in snapshot mode. Also update the size to the
> +	 * hardware buffer size if there was an overflow.
> +	 */
> +	if (etr_perf->snapshot) {
> +		if (double_buffer)
> +			handle->head += size;
> +		else
> +			handle->head = offset;
> +		if (etr_buf->full)
> +			size = etr_buf->size;
> +	}
> +
> +	lost |= etr_buf->full;
> +out:
> +	if (lost)
> +		perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
> +	return size;
> +}
> +
>  static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
>  {
> -	/* We don't support perf mode yet ! */
> -	return -EINVAL;
> +	int rc = 0;
> +	unsigned long flags;
> +	struct etr_perf_buffer *etr_perf;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +
> +	spin_lock_irqsave(&drvdata->spinlock, flags);
> +	/*
> +	 * There can be only one writer per sink in perf mode. If the sink
> +	 * is already open in SYSFS mode, we can't use it.
> +	 */
> +	if (drvdata->mode != CS_MODE_DISABLED) {
> +		rc = -EBUSY;
> +		goto unlock_out;
> +	}
> +
> +	etr_perf = drvdata->perf_data;
> +	if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
> +		rc = -EINVAL;
> +		goto unlock_out;
> +	}
> +
> +	drvdata->mode = CS_MODE_PERF;
> +	tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
> +
> +unlock_out:
> +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +	return rc;
>  }
>  
>  static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
> @@ -1459,6 +1836,10 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
>  static const struct coresight_ops_sink tmc_etr_sink_ops = {
>  	.enable		= tmc_enable_etr_sink,
>  	.disable	= tmc_disable_etr_sink,
> +	.alloc_buffer	= tmc_etr_alloc_perf_buffer,
> +	.update_buffer	= tmc_etr_update_perf_buffer,
> +	.set_buffer	= tmc_etr_set_perf_buffer,
> +	.free_buffer	= tmc_etr_free_perf_buffer,
>  };
>  
>  const struct coresight_ops tmc_etr_cs_ops = {
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 185dc12..aa42f5d 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -197,6 +197,7 @@ struct etr_buf {
>   * @trigger_cntr: amount of words to store after a trigger.
>   * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
>   *		device configuration register (DEVID)
> + * @perf_data:	PERF buffer for ETR.
>   * @sysfs_data:	SYSFS buffer for ETR.
>   */
>  struct tmc_drvdata {
> @@ -218,6 +219,7 @@ struct tmc_drvdata {
>  	u32			trigger_cntr;
>  	u32			etr_caps;
>  	struct etr_buf		*sysfs_buf;
> +	void			*perf_data;
>  };
>  
>  struct etr_buf_operations {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
  2018-05-08 21:51       ` Suzuki K Poulose
@ 2018-05-09 17:12         ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-09 17:12 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, Mike Leach, Robert Walker,
	Mark Rutland, Will Deacon, Robin Murphy, Sudeep Holla,
	Frank Rowand, Rob Herring, John Horley

On 8 May 2018 at 15:51, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> On 05/08/2018 06:18 PM, Mathieu Poirier wrote:
>>
>> On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
>>>
>>> Since the ETR could be driven either by SYSFS or by perf, it
>>> becomes complicated how we deal with the buffers used for each
>>> of these modes. The ETR driver cannot simply free the current
>>> attached buffer without knowing the provider (i.e, sysfs vs perf).
>>>
>>> To solve this issue, we provide:
>>> 1) the driver-mode specific etr buffer to be retained in the drvdata
>>> 2) the etr_buf for a session should be passed on when enabling the
>>>     hardware, which will be stored in drvdata->etr_buf. This will be
>>>     replaced (not free'd) as soon as the hardware is disabled, after
>>>     necessary sync operation.
>>>
>>> The advantages of this are :
>>>
>>> 1) The common code path doesn't need to worry about how to dispose
>>>     an existing buffer, if it is about to start a new session with a
>>>     different buffer, possibly in a different mode.
>>> 2) The driver mode can control its buffers and can get access to the
>>>     saved session even when the hardware is operating in a different
>>>     mode. (e.g, we can still access a trace buffer from a sysfs mode
>>>     even if the etr is now used in perf mode, without disrupting the
>>>     current session.)
>>>
>>> Towards this, we introduce a sysfs specific data which will hold the
>>> etr_buf used for sysfs mode of operation, controlled solely by the
>>> sysfs mode handling code.
>>
>>
>> Thinking further on this... I toyed with the idea of doing the same thing
>> when
>> working on the original driver and decided against it.  Do we really have
>> a case
>> where users would want to use sysFS and perf alternatively?  To me this
>> looks
>> overdesigned.
>>
>> If we are going to go that way we need to enact the same behavior for
>> ETB10 and
>> ETF...  And take it out of this set as it is already substantial enough.
>
>
> The difference between ETB10/ETF and ETR is the usage of the buffer. The
> former uses an internal buffer and we always have to copy it out to an
> external buffer for consumption. Now this external buffer is actually
> separate for each mode, i.e sysfs and perf. Also the data is copied
> out right after we disable the HW. This ensures that the interleaved
> mode doesn't corrupt each others data.

Hi Suzuki,

When I wrote my original comment I was under the impression that
ETB10/ETF's drvdata->buf was used for both sysFS and perf, but after
going back to the code find it isn't the case.  As such a user can
call sysFS and perf session alternately without destroying the results
acquired from the previous trace scenario.  This is also what your
patch is providing, enacting the same (desired) behaviour we currently
have.

I'm good with this one.

Mathieu

>
> However, the ETR doesn't have an internal buffer and uses the System RAM.
> That brings in the problem of one mode using the "buffer" as
> described by the drvdata. So, eventually either mode could write to
> the buffer allocated by the other mode before it is consumed by the
> end user (via syfs read or perf). That brings in the challenge of
> managing the buffer safely, switching back and forth the buffer
> (with the right size and pages) for each mode without any interferences.
> That also implies, one mode must be able to free the left-over buffer
> from the previous mode safely (which could be potentially linked to
> other data structures maintained by the mode). And that makes it more
> complex. e.g, we must leave the sysfs trace data for collection and
> meanwhile the perf could grab the ETR for its usage. The perf mode
> might not know the mode of the existing buffer and thus wouldn't know
> how to free it properly.
>
> This is why we need buffers per mode which can be managed by
> each mode. i.e, both allocated, used and more importantly free'd
> appropriately.
>
> Cheers
> Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers
@ 2018-05-09 17:12         ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-09 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 May 2018 at 15:51, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> On 05/08/2018 06:18 PM, Mathieu Poirier wrote:
>>
>> On Tue, May 01, 2018 at 10:10:53AM +0100, Suzuki K Poulose wrote:
>>>
>>> Since the ETR could be driven either by SYSFS or by perf, it
>>> becomes complicated how we deal with the buffers used for each
>>> of these modes. The ETR driver cannot simply free the current
>>> attached buffer without knowing the provider (i.e, sysfs vs perf).
>>>
>>> To solve this issue, we provide:
>>> 1) the driver-mode specific etr buffer to be retained in the drvdata
>>> 2) the etr_buf for a session should be passed on when enabling the
>>>     hardware, which will be stored in drvdata->etr_buf. This will be
>>>     replaced (not free'd) as soon as the hardware is disabled, after
>>>     necessary sync operation.
>>>
>>> The advantages of this are :
>>>
>>> 1) The common code path doesn't need to worry about how to dispose
>>>     an existing buffer, if it is about to start a new session with a
>>>     different buffer, possibly in a different mode.
>>> 2) The driver mode can control its buffers and can get access to the
>>>     saved session even when the hardware is operating in a different
>>>     mode. (e.g, we can still access a trace buffer from a sysfs mode
>>>     even if the etr is now used in perf mode, without disrupting the
>>>     current session.)
>>>
>>> Towards this, we introduce a sysfs specific data which will hold the
>>> etr_buf used for sysfs mode of operation, controlled solely by the
>>> sysfs mode handling code.
>>
>>
>> Thinking further on this... I toyed with the idea of doing the same thing
>> when
>> working on the original driver and decided against it.  Do we really have
>> a case
>> where users would want to use sysFS and perf alternatively?  To me this
>> looks
>> overdesigned.
>>
>> If we are going to go that way we need to enact the same behavior for
>> ETB10 and
>> ETF...  And take it out of this set as it is already substantial enough.
>
>
> The difference between ETB10/ETF and ETR is the usage of the buffer. The
> former uses an internal buffer and we always have to copy it out to an
> external buffer for consumption. Now this external buffer is actually
> separate for each mode, i.e sysfs and perf. Also the data is copied
> out right after we disable the HW. This ensures that the interleaved
> mode doesn't corrupt each others data.

Hi Suzuki,

When I wrote my original comment I was under the impression that
ETB10/ETF's drvdata->buf was used for both sysFS and perf, but after
going back to the code find it isn't the case.  As such a user can
call sysFS and perf session alternately without destroying the results
acquired from the previous trace scenario.  This is also what your
patch is providing, enacting the same (desired) behaviour we currently
have.

I'm good with this one.

Mathieu

>
> However, the ETR doesn't have an internal buffer and uses the System RAM.
> That brings in the problem of one mode using the "buffer" as
> described by the drvdata. So, eventually either mode could write to
> the buffer allocated by the other mode before it is consumed by the
> end user (via syfs read or perf). That brings in the challenge of
> managing the buffer safely, switching back and forth the buffer
> (with the right size and pages) for each mode without any interferences.
> That also implies, one mode must be able to free the left-over buffer
> from the previous mode safely (which could be potentially linked to
> other data structures maintained by the mode). And that makes it more
> complex. e.g, we must leave the sysfs trace data for collection and
> meanwhile the perf could grab the ETR for its usage. The perf mode
> might not know the mode of the existing buffer and thus wouldn't know
> how to free it properly.
>
> This is why we need buffers per mode which can be managed by
> each mode. i.e, both allocated, used and more importantly free'd
> appropriately.
>
> Cheers
> Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
  2018-05-02 13:52       ` Robin Murphy
@ 2018-05-10 13:36         ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-10 13:36 UTC (permalink / raw)
  To: Robin Murphy, Kim Phillips
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, mike.leach,
	robert.walker, mark.rutland, will.deacon, sudeep.holla,
	frowand.list, robh, john.horley

On 02/05/18 14:52, Robin Murphy wrote:
> On 02/05/18 04:55, Kim Phillips wrote:
>> On Tue, 1 May 2018 10:10:51 +0100
>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>>> Convert component enable/disable messages from dev_info to dev_dbg.
>>> This is required to prevent LOCKDEP splats when operating in perf
>>> mode where we could be called with locks held to enable a coresight
>>
>> Can we see the splats?  Doesn't lockdep turn itself off if it starts
>> triggering too many splats?
> 
> Without some very careful and robust reasoning for why the condition being reported by lockdep could not actually occur in practice, "avoiding the splats" is far, far less important than "avoiding the potential deadlock that they are reporting".
> 
>>> path. If someone wants to really see the messages, they can always
>>> enable it at runtime via dynamic_debug.
>>
>> Won't the splats still occur when the messages are enabled with
>> dynamic_debug?
>>
>> So in effect this patch only tries to mitigate the splats, all the
>> while making things harder for regular users that now have to recompile
>> their kernels, in exchange for a very small convenience for kernel
>> developers that happen to see a splat or two with DEBUG_LOCKDEP set?
> 
> FWIW, if "regular users" means people running distro kernels, then chances are that they probably have DYNAMIC_DEBUG set already (100% of my local sample of 2 - Ubuntu x86_64 and Arch aarch64 - certainly do). Either way, though, this particular log spam really does only look vaguely useful to people debugging the coresight stack itself, so anyone going out of their way to turn it on has surely already gone beyond regular use (even if they're just reproducing an issue with additional logging at the request of kernel developers, rather than debugging it themselves).
> 
> Reducing the scope for possible deadlock from the general case to just debugging scenarios is certainly not a bad thing, but as you say I think we need a closer look at the underlying issue to know whether even dev_dbg() is wise.



Sorry for the delay, here is how it looks like on 4.17. In the original
version where I added this change was, slightly different, which had to
do with triggering prints from a perf-context, which could be holding
other locks/semaphores (CPU hotplug). I should have captured the log for
the commit description. I will see if I can get a better splat with the
older version.

Anyways, the following splat is only triggered when you enable printing
stuff from a perf call path. As people have already observed here, the
prints are too invasive and only helpful for the debug. We cannot move
the prints out of the path as there is no safer place outside, either.
Both sysfs mode and perf mode uses the same code path. But with perf,
you might be holding some additional semaphores/locks which is triggering
the splat. Below, stack #1 is from the perf context.



[ 1207.472310] ======================================================
[ 1207.478434] WARNING: possible circular locking dependency detected
[ 1207.484563] 4.17.0-rc3-00027-g9b9372f #73 Not tainted
[ 1207.489568] ------------------------------------------------------
[ 1207.495694] bash/2334 is trying to acquire lock:
[ 1207.500272] 000000004a592304 (&mm->mmap_sem){++++}, at: __might_fault+0x3c/0x88
[ 1207.507555]
[ 1207.507555] but task is already holding lock:
[ 1207.513339] 0000000008ac668a (&sb->s_type->i_mutex_key#3){++++}, at: iterate_dir+0x68/0x1a8
[ 1207.521652]
[ 1207.521652] which lock already depends on the new lock.
[ 1207.521652]
[ 1207.529761]
[ 1207.529761] the existing dependency chain (in reverse order) is:
[ 1207.537177]
[ 1207.537177] -> #5 (&sb->s_type->i_mutex_key#3){++++}:
[ 1207.543686]        down_write+0x48/0xa0
[ 1207.547496]        start_creating+0x54/0x118
[ 1207.551734]        debugfs_create_dir+0x14/0x110
[ 1207.556319]        opp_debug_register+0x78/0x110
[ 1207.560903]        _add_opp_dev+0x54/0x98
[ 1207.564884]        dev_pm_opp_get_opp_table+0x94/0x178
[ 1207.569982]        dev_pm_opp_add+0x20/0x68
[ 1207.574138]        scpi_dvfs_add_opps_to_device+0x80/0x108
[ 1207.579581]        scpi_cpufreq_init+0x50/0x2c0
[ 1207.584076]        cpufreq_online+0xc4/0x6e0
[ 1207.588313]        cpufreq_add_dev+0xa8/0xb8
[ 1207.592551]        subsys_interface_register+0xa4/0xf8
[ 1207.597649]        cpufreq_register_driver+0x17c/0x258
[ 1207.602747]        scpi_cpufreq_probe+0x30/0x70
[ 1207.607244]        platform_drv_probe+0x58/0xc0
[ 1207.611740]        driver_probe_device+0x2d4/0x478
[ 1207.616493]        __device_attach_driver+0xac/0x158
[ 1207.621418]        bus_for_each_drv+0x70/0xc8
[ 1207.625740]        __device_attach+0xdc/0x160
[ 1207.630063]        device_initial_probe+0x10/0x18
[ 1207.634729]        bus_probe_device+0x94/0xa0
[ 1207.639055]        device_add+0x308/0x5e8
[ 1207.643035]        platform_device_add+0x110/0x298
[ 1207.647789]        platform_device_register_full+0x10c/0x130
[ 1207.653404]        scpi_clocks_probe+0xe4/0x160
[ 1207.657901]        platform_drv_probe+0x58/0xc0
[ 1207.662396]        driver_probe_device+0x2d4/0x478
[ 1207.667149]        __device_attach_driver+0xac/0x158
[ 1207.672073]        bus_for_each_drv+0x70/0xc8
[ 1207.676397]        __device_attach+0xdc/0x160
[ 1207.680714]        device_initial_probe+0x10/0x18
[ 1207.685371]        bus_probe_device+0x94/0xa0
[ 1207.689685]        device_add+0x308/0x5e8
[ 1207.693654]        of_device_add+0x44/0x60
[ 1207.697709]        of_platform_device_create_pdata+0x80/0xe0
[ 1207.703311]        of_platform_bus_create+0x170/0x458
[ 1207.708313]        of_platform_populate+0x7c/0x130
[ 1207.713055]        devm_of_platform_populate+0x50/0xb0
[ 1207.718144]        scpi_probe+0x3c0/0x480
[ 1207.722113]        platform_drv_probe+0x58/0xc0
[ 1207.726598]        driver_probe_device+0x2d4/0x478
[ 1207.731341]        __device_attach_driver+0xac/0x158
[ 1207.736255]        bus_for_each_drv+0x70/0xc8
[ 1207.740568]        __device_attach+0xdc/0x160
[ 1207.744881]        device_initial_probe+0x10/0x18
[ 1207.749537]        bus_probe_device+0x94/0xa0
[ 1207.753851]        deferred_probe_work_func+0x58/0x180
[ 1207.758938]        process_one_work+0x228/0x410
[ 1207.763422]        worker_thread+0x25c/0x460
[ 1207.767651]        kthread+0x100/0x130
[ 1207.771363]        ret_from_fork+0x10/0x18
[ 1207.775415]
[ 1207.775415] -> #4 (opp_table_lock){+.+.}:
[ 1207.780864]        __mutex_lock+0x8c/0x8e8
[ 1207.784921]        mutex_lock_nested+0x1c/0x28
[ 1207.789321]        dev_pm_opp_get_opp_table+0x28/0x178
[ 1207.794409]        dev_pm_opp_add+0x20/0x68
[ 1207.798552]        scpi_dvfs_add_opps_to_device+0x80/0x108
[ 1207.803983]        scpi_cpufreq_init+0x50/0x2c0
[ 1207.808468]        cpufreq_online+0xc4/0x6e0
[ 1207.812695]        cpufreq_add_dev+0xa8/0xb8
[ 1207.816922]        subsys_interface_register+0xa4/0xf8
[ 1207.822009]        cpufreq_register_driver+0x17c/0x258
[ 1207.827097]        scpi_cpufreq_probe+0x30/0x70
[ 1207.831583]        platform_drv_probe+0x58/0xc0
[ 1207.836069]        driver_probe_device+0x2d4/0x478
[ 1207.840812]        __device_attach_driver+0xac/0x158
[ 1207.845726]        bus_for_each_drv+0x70/0xc8
[ 1207.850038]        __device_attach+0xdc/0x160
[ 1207.854351]        device_initial_probe+0x10/0x18
[ 1207.859009]        bus_probe_device+0x94/0xa0
[ 1207.863323]        device_add+0x308/0x5e8
[ 1207.867293]        platform_device_add+0x110/0x298
[ 1207.872036]        platform_device_register_full+0x10c/0x130
[ 1207.877639]        scpi_clocks_probe+0xe4/0x160
[ 1207.882125]        platform_drv_probe+0x58/0xc0
[ 1207.886610]        driver_probe_device+0x2d4/0x478
[ 1207.891353]        __device_attach_driver+0xac/0x158
[ 1207.896268]        bus_for_each_drv+0x70/0xc8
[ 1207.900581]        __device_attach+0xdc/0x160
[ 1207.904893]        device_initial_probe+0x10/0x18
[ 1207.909550]        bus_probe_device+0x94/0xa0
[ 1207.913865]        device_add+0x308/0x5e8
[ 1207.917833]        of_device_add+0x44/0x60
[ 1207.921888]        of_platform_device_create_pdata+0x80/0xe0
[ 1207.927490]        of_platform_bus_create+0x170/0x458
[ 1207.932491]        of_platform_populate+0x7c/0x130
[ 1207.937234]        devm_of_platform_populate+0x50/0xb0
[ 1207.942322]        scpi_probe+0x3c0/0x480
[ 1207.946292]        platform_drv_probe+0x58/0xc0
[ 1207.950777]        driver_probe_device+0x2d4/0x478
[ 1207.955520]        __device_attach_driver+0xac/0x158
[ 1207.960434]        bus_for_each_drv+0x70/0xc8
[ 1207.964747]        __device_attach+0xdc/0x160
[ 1207.969059]        device_initial_probe+0x10/0x18
[ 1207.973716]        bus_probe_device+0x94/0xa0
[ 1207.978029]        deferred_probe_work_func+0x58/0x180
[ 1207.983115]        process_one_work+0x228/0x410
[ 1207.987600]        worker_thread+0x25c/0x460
[ 1207.991827]        kthread+0x100/0x130
[ 1207.995538]        ret_from_fork+0x10/0x18
[ 1207.999590]
[ 1207.999590] -> #3 (subsys mutex#9){+.+.}:
[ 1208.005041]        __mutex_lock+0x8c/0x8e8
[ 1208.009098]        mutex_lock_nested+0x1c/0x28
[ 1208.013497]        subsys_interface_register+0x54/0xf8
[ 1208.018584]        cpufreq_register_driver+0x17c/0x258
[ 1208.023672]        scpi_cpufreq_probe+0x30/0x70
[ 1208.028157]        platform_drv_probe+0x58/0xc0
[ 1208.032643]        driver_probe_device+0x2d4/0x478
[ 1208.037386]        __device_attach_driver+0xac/0x158
[ 1208.042300]        bus_for_each_drv+0x70/0xc8
[ 1208.046613]        __device_attach+0xdc/0x160
[ 1208.050926]        device_initial_probe+0x10/0x18
[ 1208.055583]        bus_probe_device+0x94/0xa0
[ 1208.059897]        device_add+0x308/0x5e8
[ 1208.063867]        platform_device_add+0x110/0x298
[ 1208.068610]        platform_device_register_full+0x10c/0x130
[ 1208.074214]        scpi_clocks_probe+0xe4/0x160
[ 1208.078699]        platform_drv_probe+0x58/0xc0
[ 1208.083184]        driver_probe_device+0x2d4/0x478
[ 1208.087928]        __device_attach_driver+0xac/0x158
[ 1208.092842]        bus_for_each_drv+0x70/0xc8
[ 1208.097155]        __device_attach+0xdc/0x160
[ 1208.101468]        device_initial_probe+0x10/0x18
[ 1208.106124]        bus_probe_device+0x94/0xa0
[ 1208.110439]        device_add+0x308/0x5e8
[ 1208.114407]        of_device_add+0x44/0x60
[ 1208.118462]        of_platform_device_create_pdata+0x80/0xe0
[ 1208.124064]        of_platform_bus_create+0x170/0x458
[ 1208.129065]        of_platform_populate+0x7c/0x130
[ 1208.133808]        devm_of_platform_populate+0x50/0xb0
[ 1208.138896]        scpi_probe+0x3c0/0x480
[ 1208.142866]        platform_drv_probe+0x58/0xc0
[ 1208.147351]        driver_probe_device+0x2d4/0x478
[ 1208.152095]        __device_attach_driver+0xac/0x158
[ 1208.157009]        bus_for_each_drv+0x70/0xc8
[ 1208.161322]        __device_attach+0xdc/0x160
[ 1208.165635]        device_initial_probe+0x10/0x18
[ 1208.170292]        bus_probe_device+0x94/0xa0
[ 1208.174605]        deferred_probe_work_func+0x58/0x180
[ 1208.179691]        process_one_work+0x228/0x410
[ 1208.184176]        worker_thread+0x25c/0x460
[ 1208.188403]        kthread+0x100/0x130
[ 1208.192114]        ret_from_fork+0x10/0x18
[ 1208.196166]
[ 1208.196166] -> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 1208.202389]        cpus_read_lock+0x4c/0xc0
[ 1208.206531]        etm_setup_aux+0x50/0x230
[ 1208.210675]        rb_alloc_aux+0x20c/0x2e0
[ 1208.214816]        perf_mmap+0x3fc/0x670
[ 1208.218699]        mmap_region+0x38c/0x5a0
[ 1208.222754]        do_mmap+0x320/0x410
[ 1208.226466]        vm_mmap_pgoff+0xe4/0x110
[ 1208.230608]        ksys_mmap_pgoff+0xc0/0x230
[ 1208.234923]        sys_mmap+0x18/0x28
[ 1208.238548]        el0_svc_naked+0x30/0x34
[ 1208.242600]
[ 1208.242600] -> #1 (&event->mmap_mutex){+.+.}:
[ 1208.248392]        __mutex_lock+0x8c/0x8e8
[ 1208.252448]        mutex_lock_nested+0x1c/0x28
[ 1208.256848]        perf_mmap+0x150/0x670
[ 1208.260731]        mmap_region+0x38c/0x5a0
[ 1208.264786]        do_mmap+0x320/0x410
[ 1208.268497]        vm_mmap_pgoff+0xe4/0x110
[ 1208.272638]        ksys_mmap_pgoff+0xc0/0x230
[ 1208.276952]        sys_mmap+0x18/0x28
[ 1208.280577]        el0_svc_naked+0x30/0x34
[ 1208.284629]
[ 1208.284629] -> #0 (&mm->mmap_sem){++++}:
[ 1208.289991]        lock_acquire+0x44/0x60
[ 1208.293961]        __might_fault+0x60/0x88
[ 1208.298017]        filldir64+0xd0/0x340
[ 1208.301815]        dcache_readdir+0x110/0x178
[ 1208.306128]        iterate_dir+0x9c/0x1a8
[ 1208.310097]        ksys_getdents64+0x8c/0x178
[ 1208.314411]        sys_getdents64+0xc/0x18
[ 1208.318465]        el0_svc_naked+0x30/0x34
[ 1208.322518]
[ 1208.322518] other info that might help us debug this:
[ 1208.322518]
[ 1208.330443] Chain exists of:
[ 1208.330443]   &mm->mmap_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
[ 1208.330443]
[ 1208.341831]  Possible unsafe locking scenario:
[ 1208.341831]
[ 1208.347691]        CPU0                    CPU1
[ 1208.352172]        ----                    ----
[ 1208.356653]   lock(&sb->s_type->i_mutex_key#3);
[ 1208.361145]                                lock(opp_table_lock);
[ 1208.367094]                                lock(&sb->s_type->i_mutex_key#3);
[ 1208.374079]   lock(&mm->mmap_sem);
[ 1208.377449]
[ 1208.377449]  *** DEADLOCK ***
[ 1208.377449]
[ 1208.383312] 1 lock held by bash/2334:
[ 1208.386934]  #0: 0000000008ac668a (&sb->s_type->i_mutex_key#3){++++}, at: iterate_dir+0x68/0x1a8
[ 1208.395653]
[ 1208.395653] stack backtrace:
[ 1208.399970] CPU: 4 PID: 2334 Comm: bash Not tainted 4.17.0-rc3-00027-g9b9372f #73
[ 1208.407378] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jul 28 2017
[ 1208.418053] Call trace:
[ 1208.420475]  dump_backtrace+0x0/0x1d0
[ 1208.424100]  show_stack+0x14/0x20
[ 1208.427382]  dump_stack+0xb8/0xf4
[ 1208.430665]  print_circular_bug.isra.20+0x1d4/0x2e0
[ 1208.435494]  __lock_acquire+0x14c8/0x19c0
[ 1208.439463]  lock_acquire+0x44/0x60
[ 1208.442917]  __might_fault+0x60/0x88
[ 1208.446456]  filldir64+0xd0/0x340
[ 1208.449736]  dcache_readdir+0x110/0x178
[ 1208.453533]  iterate_dir+0x9c/0x1a8
[ 1208.456986]  ksys_getdents64+0x8c/0x178
[ 1208.460783]  sys_getdents64+0xc/0x18
[ 1208.464321]  el0_svc_naked+0x30/0x34
[ 1397.521749] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.531166] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.539439] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.548850] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.557650] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.567060] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.575416] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.584820] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.593708] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.603104] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled



Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg
@ 2018-05-10 13:36         ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-10 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/05/18 14:52, Robin Murphy wrote:
> On 02/05/18 04:55, Kim Phillips wrote:
>> On Tue, 1 May 2018 10:10:51 +0100
>> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>
>>> Convert component enable/disable messages from dev_info to dev_dbg.
>>> This is required to prevent LOCKDEP splats when operating in perf
>>> mode where we could be called with locks held to enable a coresight
>>
>> Can we see the splats?? Doesn't lockdep turn itself off if it starts
>> triggering too many splats?
> 
> Without some very careful and robust reasoning for why the condition being reported by lockdep could not actually occur in practice, "avoiding the splats" is far, far less important than "avoiding the potential deadlock that they are reporting".
> 
>>> path. If someone wants to really see the messages, they can always
>>> enable it at runtime via dynamic_debug.
>>
>> Won't the splats still occur when the messages are enabled with
>> dynamic_debug?
>>
>> So in effect this patch only tries to mitigate the splats, all the
>> while making things harder for regular users that now have to recompile
>> their kernels, in exchange for a very small convenience for kernel
>> developers that happen to see a splat or two with DEBUG_LOCKDEP set?
> 
> FWIW, if "regular users" means people running distro kernels, then chances are that they probably have DYNAMIC_DEBUG set already (100% of my local sample of 2 - Ubuntu x86_64 and Arch aarch64 - certainly do). Either way, though, this particular log spam really does only look vaguely useful to people debugging the coresight stack itself, so anyone going out of their way to turn it on has surely already gone beyond regular use (even if they're just reproducing an issue with additional logging at the request of kernel developers, rather than debugging it themselves).
> 
> Reducing the scope for possible deadlock from the general case to just debugging scenarios is certainly not a bad thing, but as you say I think we need a closer look at the underlying issue to know whether even dev_dbg() is wise.



Sorry for the delay, here is how it looks like on 4.17. In the original
version where I added this change was, slightly different, which had to
do with triggering prints from a perf-context, which could be holding
other locks/semaphores (CPU hotplug). I should have captured the log for
the commit description. I will see if I can get a better splat with the
older version.

Anyways, the following splat is only triggered when you enable printing
stuff from a perf call path. As people have already observed here, the
prints are too invasive and only helpful for the debug. We cannot move
the prints out of the path as there is no safer place outside, either.
Both sysfs mode and perf mode uses the same code path. But with perf,
you might be holding some additional semaphores/locks which is triggering
the splat. Below, stack #1 is from the perf context.



[ 1207.472310] ======================================================
[ 1207.478434] WARNING: possible circular locking dependency detected
[ 1207.484563] 4.17.0-rc3-00027-g9b9372f #73 Not tainted
[ 1207.489568] ------------------------------------------------------
[ 1207.495694] bash/2334 is trying to acquire lock:
[ 1207.500272] 000000004a592304 (&mm->mmap_sem){++++}, at: __might_fault+0x3c/0x88
[ 1207.507555]
[ 1207.507555] but task is already holding lock:
[ 1207.513339] 0000000008ac668a (&sb->s_type->i_mutex_key#3){++++}, at: iterate_dir+0x68/0x1a8
[ 1207.521652]
[ 1207.521652] which lock already depends on the new lock.
[ 1207.521652]
[ 1207.529761]
[ 1207.529761] the existing dependency chain (in reverse order) is:
[ 1207.537177]
[ 1207.537177] -> #5 (&sb->s_type->i_mutex_key#3){++++}:
[ 1207.543686]        down_write+0x48/0xa0
[ 1207.547496]        start_creating+0x54/0x118
[ 1207.551734]        debugfs_create_dir+0x14/0x110
[ 1207.556319]        opp_debug_register+0x78/0x110
[ 1207.560903]        _add_opp_dev+0x54/0x98
[ 1207.564884]        dev_pm_opp_get_opp_table+0x94/0x178
[ 1207.569982]        dev_pm_opp_add+0x20/0x68
[ 1207.574138]        scpi_dvfs_add_opps_to_device+0x80/0x108
[ 1207.579581]        scpi_cpufreq_init+0x50/0x2c0
[ 1207.584076]        cpufreq_online+0xc4/0x6e0
[ 1207.588313]        cpufreq_add_dev+0xa8/0xb8
[ 1207.592551]        subsys_interface_register+0xa4/0xf8
[ 1207.597649]        cpufreq_register_driver+0x17c/0x258
[ 1207.602747]        scpi_cpufreq_probe+0x30/0x70
[ 1207.607244]        platform_drv_probe+0x58/0xc0
[ 1207.611740]        driver_probe_device+0x2d4/0x478
[ 1207.616493]        __device_attach_driver+0xac/0x158
[ 1207.621418]        bus_for_each_drv+0x70/0xc8
[ 1207.625740]        __device_attach+0xdc/0x160
[ 1207.630063]        device_initial_probe+0x10/0x18
[ 1207.634729]        bus_probe_device+0x94/0xa0
[ 1207.639055]        device_add+0x308/0x5e8
[ 1207.643035]        platform_device_add+0x110/0x298
[ 1207.647789]        platform_device_register_full+0x10c/0x130
[ 1207.653404]        scpi_clocks_probe+0xe4/0x160
[ 1207.657901]        platform_drv_probe+0x58/0xc0
[ 1207.662396]        driver_probe_device+0x2d4/0x478
[ 1207.667149]        __device_attach_driver+0xac/0x158
[ 1207.672073]        bus_for_each_drv+0x70/0xc8
[ 1207.676397]        __device_attach+0xdc/0x160
[ 1207.680714]        device_initial_probe+0x10/0x18
[ 1207.685371]        bus_probe_device+0x94/0xa0
[ 1207.689685]        device_add+0x308/0x5e8
[ 1207.693654]        of_device_add+0x44/0x60
[ 1207.697709]        of_platform_device_create_pdata+0x80/0xe0
[ 1207.703311]        of_platform_bus_create+0x170/0x458
[ 1207.708313]        of_platform_populate+0x7c/0x130
[ 1207.713055]        devm_of_platform_populate+0x50/0xb0
[ 1207.718144]        scpi_probe+0x3c0/0x480
[ 1207.722113]        platform_drv_probe+0x58/0xc0
[ 1207.726598]        driver_probe_device+0x2d4/0x478
[ 1207.731341]        __device_attach_driver+0xac/0x158
[ 1207.736255]        bus_for_each_drv+0x70/0xc8
[ 1207.740568]        __device_attach+0xdc/0x160
[ 1207.744881]        device_initial_probe+0x10/0x18
[ 1207.749537]        bus_probe_device+0x94/0xa0
[ 1207.753851]        deferred_probe_work_func+0x58/0x180
[ 1207.758938]        process_one_work+0x228/0x410
[ 1207.763422]        worker_thread+0x25c/0x460
[ 1207.767651]        kthread+0x100/0x130
[ 1207.771363]        ret_from_fork+0x10/0x18
[ 1207.775415]
[ 1207.775415] -> #4 (opp_table_lock){+.+.}:
[ 1207.780864]        __mutex_lock+0x8c/0x8e8
[ 1207.784921]        mutex_lock_nested+0x1c/0x28
[ 1207.789321]        dev_pm_opp_get_opp_table+0x28/0x178
[ 1207.794409]        dev_pm_opp_add+0x20/0x68
[ 1207.798552]        scpi_dvfs_add_opps_to_device+0x80/0x108
[ 1207.803983]        scpi_cpufreq_init+0x50/0x2c0
[ 1207.808468]        cpufreq_online+0xc4/0x6e0
[ 1207.812695]        cpufreq_add_dev+0xa8/0xb8
[ 1207.816922]        subsys_interface_register+0xa4/0xf8
[ 1207.822009]        cpufreq_register_driver+0x17c/0x258
[ 1207.827097]        scpi_cpufreq_probe+0x30/0x70
[ 1207.831583]        platform_drv_probe+0x58/0xc0
[ 1207.836069]        driver_probe_device+0x2d4/0x478
[ 1207.840812]        __device_attach_driver+0xac/0x158
[ 1207.845726]        bus_for_each_drv+0x70/0xc8
[ 1207.850038]        __device_attach+0xdc/0x160
[ 1207.854351]        device_initial_probe+0x10/0x18
[ 1207.859009]        bus_probe_device+0x94/0xa0
[ 1207.863323]        device_add+0x308/0x5e8
[ 1207.867293]        platform_device_add+0x110/0x298
[ 1207.872036]        platform_device_register_full+0x10c/0x130
[ 1207.877639]        scpi_clocks_probe+0xe4/0x160
[ 1207.882125]        platform_drv_probe+0x58/0xc0
[ 1207.886610]        driver_probe_device+0x2d4/0x478
[ 1207.891353]        __device_attach_driver+0xac/0x158
[ 1207.896268]        bus_for_each_drv+0x70/0xc8
[ 1207.900581]        __device_attach+0xdc/0x160
[ 1207.904893]        device_initial_probe+0x10/0x18
[ 1207.909550]        bus_probe_device+0x94/0xa0
[ 1207.913865]        device_add+0x308/0x5e8
[ 1207.917833]        of_device_add+0x44/0x60
[ 1207.921888]        of_platform_device_create_pdata+0x80/0xe0
[ 1207.927490]        of_platform_bus_create+0x170/0x458
[ 1207.932491]        of_platform_populate+0x7c/0x130
[ 1207.937234]        devm_of_platform_populate+0x50/0xb0
[ 1207.942322]        scpi_probe+0x3c0/0x480
[ 1207.946292]        platform_drv_probe+0x58/0xc0
[ 1207.950777]        driver_probe_device+0x2d4/0x478
[ 1207.955520]        __device_attach_driver+0xac/0x158
[ 1207.960434]        bus_for_each_drv+0x70/0xc8
[ 1207.964747]        __device_attach+0xdc/0x160
[ 1207.969059]        device_initial_probe+0x10/0x18
[ 1207.973716]        bus_probe_device+0x94/0xa0
[ 1207.978029]        deferred_probe_work_func+0x58/0x180
[ 1207.983115]        process_one_work+0x228/0x410
[ 1207.987600]        worker_thread+0x25c/0x460
[ 1207.991827]        kthread+0x100/0x130
[ 1207.995538]        ret_from_fork+0x10/0x18
[ 1207.999590]
[ 1207.999590] -> #3 (subsys mutex#9){+.+.}:
[ 1208.005041]        __mutex_lock+0x8c/0x8e8
[ 1208.009098]        mutex_lock_nested+0x1c/0x28
[ 1208.013497]        subsys_interface_register+0x54/0xf8
[ 1208.018584]        cpufreq_register_driver+0x17c/0x258
[ 1208.023672]        scpi_cpufreq_probe+0x30/0x70
[ 1208.028157]        platform_drv_probe+0x58/0xc0
[ 1208.032643]        driver_probe_device+0x2d4/0x478
[ 1208.037386]        __device_attach_driver+0xac/0x158
[ 1208.042300]        bus_for_each_drv+0x70/0xc8
[ 1208.046613]        __device_attach+0xdc/0x160
[ 1208.050926]        device_initial_probe+0x10/0x18
[ 1208.055583]        bus_probe_device+0x94/0xa0
[ 1208.059897]        device_add+0x308/0x5e8
[ 1208.063867]        platform_device_add+0x110/0x298
[ 1208.068610]        platform_device_register_full+0x10c/0x130
[ 1208.074214]        scpi_clocks_probe+0xe4/0x160
[ 1208.078699]        platform_drv_probe+0x58/0xc0
[ 1208.083184]        driver_probe_device+0x2d4/0x478
[ 1208.087928]        __device_attach_driver+0xac/0x158
[ 1208.092842]        bus_for_each_drv+0x70/0xc8
[ 1208.097155]        __device_attach+0xdc/0x160
[ 1208.101468]        device_initial_probe+0x10/0x18
[ 1208.106124]        bus_probe_device+0x94/0xa0
[ 1208.110439]        device_add+0x308/0x5e8
[ 1208.114407]        of_device_add+0x44/0x60
[ 1208.118462]        of_platform_device_create_pdata+0x80/0xe0
[ 1208.124064]        of_platform_bus_create+0x170/0x458
[ 1208.129065]        of_platform_populate+0x7c/0x130
[ 1208.133808]        devm_of_platform_populate+0x50/0xb0
[ 1208.138896]        scpi_probe+0x3c0/0x480
[ 1208.142866]        platform_drv_probe+0x58/0xc0
[ 1208.147351]        driver_probe_device+0x2d4/0x478
[ 1208.152095]        __device_attach_driver+0xac/0x158
[ 1208.157009]        bus_for_each_drv+0x70/0xc8
[ 1208.161322]        __device_attach+0xdc/0x160
[ 1208.165635]        device_initial_probe+0x10/0x18
[ 1208.170292]        bus_probe_device+0x94/0xa0
[ 1208.174605]        deferred_probe_work_func+0x58/0x180
[ 1208.179691]        process_one_work+0x228/0x410
[ 1208.184176]        worker_thread+0x25c/0x460
[ 1208.188403]        kthread+0x100/0x130
[ 1208.192114]        ret_from_fork+0x10/0x18
[ 1208.196166]
[ 1208.196166] -> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 1208.202389]        cpus_read_lock+0x4c/0xc0
[ 1208.206531]        etm_setup_aux+0x50/0x230
[ 1208.210675]        rb_alloc_aux+0x20c/0x2e0
[ 1208.214816]        perf_mmap+0x3fc/0x670
[ 1208.218699]        mmap_region+0x38c/0x5a0
[ 1208.222754]        do_mmap+0x320/0x410
[ 1208.226466]        vm_mmap_pgoff+0xe4/0x110
[ 1208.230608]        ksys_mmap_pgoff+0xc0/0x230
[ 1208.234923]        sys_mmap+0x18/0x28
[ 1208.238548]        el0_svc_naked+0x30/0x34
[ 1208.242600]
[ 1208.242600] -> #1 (&event->mmap_mutex){+.+.}:
[ 1208.248392]        __mutex_lock+0x8c/0x8e8
[ 1208.252448]        mutex_lock_nested+0x1c/0x28
[ 1208.256848]        perf_mmap+0x150/0x670
[ 1208.260731]        mmap_region+0x38c/0x5a0
[ 1208.264786]        do_mmap+0x320/0x410
[ 1208.268497]        vm_mmap_pgoff+0xe4/0x110
[ 1208.272638]        ksys_mmap_pgoff+0xc0/0x230
[ 1208.276952]        sys_mmap+0x18/0x28
[ 1208.280577]        el0_svc_naked+0x30/0x34
[ 1208.284629]
[ 1208.284629] -> #0 (&mm->mmap_sem){++++}:
[ 1208.289991]        lock_acquire+0x44/0x60
[ 1208.293961]        __might_fault+0x60/0x88
[ 1208.298017]        filldir64+0xd0/0x340
[ 1208.301815]        dcache_readdir+0x110/0x178
[ 1208.306128]        iterate_dir+0x9c/0x1a8
[ 1208.310097]        ksys_getdents64+0x8c/0x178
[ 1208.314411]        sys_getdents64+0xc/0x18
[ 1208.318465]        el0_svc_naked+0x30/0x34
[ 1208.322518]
[ 1208.322518] other info that might help us debug this:
[ 1208.322518]
[ 1208.330443] Chain exists of:
[ 1208.330443]   &mm->mmap_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
[ 1208.330443]
[ 1208.341831]  Possible unsafe locking scenario:
[ 1208.341831]
[ 1208.347691]        CPU0                    CPU1
[ 1208.352172]        ----                    ----
[ 1208.356653]   lock(&sb->s_type->i_mutex_key#3);
[ 1208.361145]                                lock(opp_table_lock);
[ 1208.367094]                                lock(&sb->s_type->i_mutex_key#3);
[ 1208.374079]   lock(&mm->mmap_sem);
[ 1208.377449]
[ 1208.377449]  *** DEADLOCK ***
[ 1208.377449]
[ 1208.383312] 1 lock held by bash/2334:
[ 1208.386934]  #0: 0000000008ac668a (&sb->s_type->i_mutex_key#3){++++}, at: iterate_dir+0x68/0x1a8
[ 1208.395653]
[ 1208.395653] stack backtrace:
[ 1208.399970] CPU: 4 PID: 2334 Comm: bash Not tainted 4.17.0-rc3-00027-g9b9372f #73
[ 1208.407378] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jul 28 2017
[ 1208.418053] Call trace:
[ 1208.420475]  dump_backtrace+0x0/0x1d0
[ 1208.424100]  show_stack+0x14/0x20
[ 1208.427382]  dump_stack+0xb8/0xf4
[ 1208.430665]  print_circular_bug.isra.20+0x1d4/0x2e0
[ 1208.435494]  __lock_acquire+0x14c8/0x19c0
[ 1208.439463]  lock_acquire+0x44/0x60
[ 1208.442917]  __might_fault+0x60/0x88
[ 1208.446456]  filldir64+0xd0/0x340
[ 1208.449736]  dcache_readdir+0x110/0x178
[ 1208.453533]  iterate_dir+0x9c/0x1a8
[ 1208.456986]  ksys_getdents64+0x8c/0x178
[ 1208.460783]  sys_getdents64+0xc/0x18
[ 1208.464321]  el0_svc_naked+0x30/0x34
[ 1397.521749] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.531166] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.539439] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.548850] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.557650] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.567060] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.575416] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.584820] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled
[ 1397.593708] replicator_disable:86: coresight-dynamic-replicator 20120000.replicator: REPLICATOR disabled
[ 1397.603104] tmc_disable_etr_sink:1833: coresight-tmc 20070000.etr: TMC-ETR disabled



Cheers
Suzuki

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-08 15:40         ` Suzuki K Poulose
@ 2018-05-11 16:05           ` Rob Herring
  -1 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-11 16:05 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Mathieu Poirier, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, devicetree,
	Mathieu Poirier

On Tue, May 8, 2018 at 10:40 AM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
>
>
> Rob, Mathieu,
>
>
> On 03/05/18 18:42, Mathieu Poirier wrote:
>>
>> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
>>>
>>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>>>>
>>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>>>> by providing an improved Scatter Gather mechanism for streaming
>>>> trace data to non-contiguous system RAM pages.
>>>>
>>>> Cc: devicetree@vger.kernel.org
>>>> Cc: frowand.list@gmail.com
>>>> Cc: Rob Herring <robh@kernel.org>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>   .../devicetree/bindings/arm/coresight.txt          | 52
>>>> ++++++++++++++++++++++
>>>>   1 file changed, 52 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> index 15ac8e8..cdd84d0 100644
>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>>>
>>>>                - System Trace Macrocell:
>>>>                        "arm,coresight-stm", "arm,primecell"; [1]
>>>> +             - Coresight Address Translation Unit (CATU)
>>>> +                     "arm, coresight-catu", "arm,primecell";
>>>
>>>
>>> spurious space               ^
>
>
> Thanks for spotting, will fix it.
>
>>>
>>>>
>>>>        * reg: physical base address and length of the register
>>>>          set(s) of the component.
>>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>         (embedded trace router)
>>>>
>>>> +* Optional property for CATU :
>>>> +     * interrupts : Exactly one SPI may be listed for reporting the
>>>> address
>>>> +       error
>>>
>>>
>>> Somewhere you need to define the ports for the CATU.
>
>
> The ports are defined common to all the coresight components. Would you
> like it to be added just for the CATU ?

Yeah, that's probably how we got into this problem with the port
numbering in the first place.


>>>>   Example:
>>>>
>>>> @@ -118,6 +123,35 @@ Example:
>>>>                };
>>>>        };
>>>>
>>>> +     etr@20070000 {
>>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>>>> +             reg = <0 0x20070000 0 0x1000>;
>>>> +
>>>> +                     /* input port */
>>>> +                     port@0 {
>>>> +                             reg =  <0>;
>>>> +                             etr_in_port: endpoint {
>>>> +                                     slave-mode;
>>>> +                                     remote-endpoint =
>>>> <&replicator2_out_port0>;
>>>> +                             };
>>>> +                     };
>>>> +
>>>> +                     /* CATU link represented by output port */
>>>> +                     port@1 {
>>>> +                             reg = <0>;
>>>
>>>
>>> While common in the Coresight bindings, having unit-address and reg not
>>> match is an error. Mathieu and I discussed this a bit as dtc now warns
>>> on these.
>>>
>>> Either reg should be 1 here, or 'ports' needs to be split into input and
>>> output ports. My preference would be the former, but Mathieu objected to
>>> this not reflecting the the h/w numbering.
>>
>>
>> Suzuki, as we discuss this is related to your work on revamping CS
>> bindings for ACPI.  Until that gets done and to move forward with this
>> set I suggest you abide to Rob's request.
>
>
> Ok, I can change it to <1>, as we don't expect any other output port for an
> ETR.

Better let Mathieu confirm he's okay with the first option because he
wasn't okay with changing the port reg when we discussed. But maybe
that was just on existing things like TPIU.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-11 16:05           ` Rob Herring
  0 siblings, 0 replies; 134+ messages in thread
From: Rob Herring @ 2018-05-11 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 8, 2018 at 10:40 AM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
>
>
> Rob, Mathieu,
>
>
> On 03/05/18 18:42, Mathieu Poirier wrote:
>>
>> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
>>>
>>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
>>>>
>>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
>>>> by providing an improved Scatter Gather mechanism for streaming
>>>> trace data to non-contiguous system RAM pages.
>>>>
>>>> Cc: devicetree at vger.kernel.org
>>>> Cc: frowand.list at gmail.com
>>>> Cc: Rob Herring <robh@kernel.org>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>   .../devicetree/bindings/arm/coresight.txt          | 52
>>>> ++++++++++++++++++++++
>>>>   1 file changed, 52 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> index 15ac8e8..cdd84d0 100644
>>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
>>>>
>>>>                - System Trace Macrocell:
>>>>                        "arm,coresight-stm", "arm,primecell"; [1]
>>>> +             - Coresight Address Translation Unit (CATU)
>>>> +                     "arm, coresight-catu", "arm,primecell";
>>>
>>>
>>> spurious space               ^
>
>
> Thanks for spotting, will fix it.
>
>>>
>>>>
>>>>        * reg: physical base address and length of the register
>>>>          set(s) of the component.
>>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
>>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
>>>>         (embedded trace router)
>>>>
>>>> +* Optional property for CATU :
>>>> +     * interrupts : Exactly one SPI may be listed for reporting the
>>>> address
>>>> +       error
>>>
>>>
>>> Somewhere you need to define the ports for the CATU.
>
>
> The ports are defined common to all the coresight components. Would you
> like it to be added just for the CATU ?

Yeah, that's probably how we got into this problem with the port
numbering in the first place.


>>>>   Example:
>>>>
>>>> @@ -118,6 +123,35 @@ Example:
>>>>                };
>>>>        };
>>>>
>>>> +     etr at 20070000 {
>>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
>>>> +             reg = <0 0x20070000 0 0x1000>;
>>>> +
>>>> +                     /* input port */
>>>> +                     port at 0 {
>>>> +                             reg =  <0>;
>>>> +                             etr_in_port: endpoint {
>>>> +                                     slave-mode;
>>>> +                                     remote-endpoint =
>>>> <&replicator2_out_port0>;
>>>> +                             };
>>>> +                     };
>>>> +
>>>> +                     /* CATU link represented by output port */
>>>> +                     port at 1 {
>>>> +                             reg = <0>;
>>>
>>>
>>> While common in the Coresight bindings, having unit-address and reg not
>>> match is an error. Mathieu and I discussed this a bit as dtc now warns
>>> on these.
>>>
>>> Either reg should be 1 here, or 'ports' needs to be split into input and
>>> output ports. My preference would be the former, but Mathieu objected to
>>> this not reflecting the the h/w numbering.
>>
>>
>> Suzuki, as we discuss this is related to your work on revamping CS
>> bindings for ACPI.  Until that gets done and to move forward with this
>> set I suggest you abide to Rob's request.
>
>
> Ok, I can change it to <1>, as we don't expect any other output port for an
> ETR.

Better let Mathieu confirm he's okay with the first option because he
wasn't okay with changing the port reg when we discussed. But maybe
that was just on existing things like TPIU.

Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
  2018-05-08 19:42     ` Mathieu Poirier
@ 2018-05-11 16:35       ` Suzuki K Poulose
  -1 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-11 16:35 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, mike.leach, robert.walker,
	mark.rutland, will.deacon, robin.murphy, sudeep.holla,
	frowand.list, robh, john.horley

On 08/05/18 20:42, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:56AM +0100, Suzuki K Poulose wrote:
>> Right now we issue an update_buffer() and reset_buffer() call backs
>> in succession when we stop tracing an event. The update_buffer is
>> supposed to check the status of the buffer and make sure the ring buffer
>> is updated with the trace data. And we store information about the
>> size of the data collected only to be consumed by the reset_buffer
>> callback which always follows the update_buffer. This was originally
>> designed for handling future IPs which could trigger a buffer overflow
>> interrupt. This patch gets rid of the reset_buffer callback altogether
>> and performs the actions in update_buffer, making it return the size
>> collected. We can always add the support for handling the overflow
>> interrupt case later.
>>
>> This removes some not-so pretty hack (storing the new head in the
>> size field for snapshot mode) and cleans it up a little bit.
> 
> IPs with an overflow interrupts will be arriving shortly, so it is not like the
> future is uncertain - they are coming.  Right now the logic is there - I don't
> see a real need to consolidate things only to split it again in the near future.
> 
> I agree the part about overloading buf->data_size with the head of the ring
> buffer when operating in snapshot mode isn't pretty (though well documented).
> If anything that can be improve, i.e add a buf->head and things will be clear.
> Once again this could be part of another patchset.
> 

Mathieu,

I am not sure how was this supposed to be used in conjunction with the overflow
handling. Please could you help me here ? I might be able to retain
the callback and possibly improve it with the changes for etr_buf infrastructure.

Cheers
Suzuki


>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
>>   drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
>>   drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
>>   include/linux/coresight.h                        |  5 +-
>>   4 files changed, 26 insertions(+), 102 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
>> index d9c2f87..b13712a 100644
>> --- a/drivers/hwtracing/coresight/coresight-etb10.c
>> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
>> @@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
>>   	return ret;
>>   }
>>   
>> -static unsigned long etb_reset_buffer(struct coresight_device *csdev,
>> -				      struct perf_output_handle *handle,
>> -				      void *sink_config)
>> -{
>> -	unsigned long size = 0;
>> -	struct cs_buffers *buf = sink_config;
>> -
>> -	if (buf) {
>> -		/*
>> -		 * In snapshot mode ->data_size holds the new address of the
>> -		 * ring buffer's head.  The size itself is the whole address
>> -		 * range since we want the latest information.
>> -		 */
>> -		if (buf->snapshot)
>> -			handle->head = local_xchg(&buf->data_size,
>> -						  buf->nr_pages << PAGE_SHIFT);
>> -
>> -		/*
>> -		 * Tell the tracer PMU how much we got in this run and if
>> -		 * something went wrong along the way.  Nobody else can use
>> -		 * this cs_buffers instance until we are done.  As such
>> -		 * resetting parameters here and squaring off with the ring
>> -		 * buffer API in the tracer PMU is fine.
>> -		 */
>> -		size = local_xchg(&buf->data_size, 0);
>> -	}
>> -
>> -	return size;
>> -}
>> -
>> -static void etb_update_buffer(struct coresight_device *csdev,
>> +static unsigned long etb_update_buffer(struct coresight_device *csdev,
>>   			      struct perf_output_handle *handle,
>>   			      void *sink_config)
>>   {
>> @@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
>>   	u8 *buf_ptr;
>>   	const u32 *barrier;
>>   	u32 read_ptr, write_ptr, capacity;
>> -	u32 status, read_data, to_read;
>> -	unsigned long offset;
>> +	u32 status, read_data;
>> +	unsigned long offset, to_read;
>>   	struct cs_buffers *buf = sink_config;
>>   	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>   
>>   	if (!buf)
>> -		return;
>> +		return 0;
>>   
>>   	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
>>   
>> @@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
>>   	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
>>   
>>   	/*
>> -	 * In snapshot mode all we have to do is communicate to
>> -	 * perf_aux_output_end() the address of the current head.  In full
>> -	 * trace mode the same function expects a size to move rb->aux_head
>> -	 * forward.
>> +	 * In snapshot mode we have to update the handle->head to point
>> +	 * to the new location.
>>   	 */
>> -	if (buf->snapshot)
>> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
>> -	else
>> -		local_add(to_read, &buf->data_size);
>> -
>> +	if (buf->snapshot) {
>> +		handle->head = (cur * PAGE_SIZE) + offset;
>> +		to_read = buf->nr_pages << PAGE_SHIFT;
>> +	}
>>   	etb_enable_hw(drvdata);
>>   	CS_LOCK(drvdata->base);
>> +
>> +	return to_read;
>>   }
>>   
>>   static const struct coresight_ops_sink etb_sink_ops = {
>> @@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
>>   	.alloc_buffer	= etb_alloc_buffer,
>>   	.free_buffer	= etb_free_buffer,
>>   	.set_buffer	= etb_set_buffer,
>> -	.reset_buffer	= etb_reset_buffer,
>>   	.update_buffer	= etb_update_buffer,
>>   };
>>   
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 4e5ed65..5096def 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>   		if (!sink_ops(sink)->update_buffer)
>>   			return;
>>   
>> -		sink_ops(sink)->update_buffer(sink, handle,
>> +		size = sink_ops(sink)->update_buffer(sink, handle,
>>   					      event_data->snk_config);
>> -
>> -		if (!sink_ops(sink)->reset_buffer)
>> -			return;
>> -
>> -		size = sink_ops(sink)->reset_buffer(sink, handle,
>> -						    event_data->snk_config);
>> -
>>   		perf_aux_output_end(handle, size);
>>   	}
>>   
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> index 0a32734..75ef5c4 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> @@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
>>   	return ret;
>>   }
>>   
>> -static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
>> -					  struct perf_output_handle *handle,
>> -					  void *sink_config)
>> -{
>> -	long size = 0;
>> -	struct cs_buffers *buf = sink_config;
>> -
>> -	if (buf) {
>> -		/*
>> -		 * In snapshot mode ->data_size holds the new address of the
>> -		 * ring buffer's head.  The size itself is the whole address
>> -		 * range since we want the latest information.
>> -		 */
>> -		if (buf->snapshot)
>> -			handle->head = local_xchg(&buf->data_size,
>> -						  buf->nr_pages << PAGE_SHIFT);
>> -		/*
>> -		 * Tell the tracer PMU how much we got in this run and if
>> -		 * something went wrong along the way.  Nobody else can use
>> -		 * this cs_buffers instance until we are done.  As such
>> -		 * resetting parameters here and squaring off with the ring
>> -		 * buffer API in the tracer PMU is fine.
>> -		 */
>> -		size = local_xchg(&buf->data_size, 0);
>> -	}
>> -
>> -	return size;
>> -}
>> -
>> -static void tmc_update_etf_buffer(struct coresight_device *csdev,
>> +static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
>>   				  struct perf_output_handle *handle,
>>   				  void *sink_config)
>>   {
>> @@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>>   	const u32 *barrier;
>>   	u32 *buf_ptr;
>>   	u64 read_ptr, write_ptr;
>> -	u32 status, to_read;
>> -	unsigned long offset;
>> +	u32 status;
>> +	unsigned long offset, to_read;
>>   	struct cs_buffers *buf = sink_config;
>>   	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>   
>>   	if (!buf)
>> -		return;
>> +		return 0;
>>   
>>   	/* This shouldn't happen */
>>   	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
>> -		return;
>> +		return 0;
>>   
>>   	CS_UNLOCK(drvdata->base);
>>   
>> @@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>>   		}
>>   	}
>>   
>> -	/*
>> -	 * In snapshot mode all we have to do is communicate to
>> -	 * perf_aux_output_end() the address of the current head.  In full
>> -	 * trace mode the same function expects a size to move rb->aux_head
>> -	 * forward.
>> -	 */
>> -	if (buf->snapshot)
>> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
>> -	else
>> -		local_add(to_read, &buf->data_size);
>> -
>> +	/* In snapshot mode we have to update the head */
>> +	if (buf->snapshot) {
>> +		handle->head = (cur * PAGE_SIZE) + offset;
>> +		to_read = buf->nr_pages << PAGE_SHIFT;
>> +	}
>>   	CS_LOCK(drvdata->base);
>> +
>> +	return to_read;
>>   }
>>   
>>   static const struct coresight_ops_sink tmc_etf_sink_ops = {
>> @@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
>>   	.alloc_buffer	= tmc_alloc_etf_buffer,
>>   	.free_buffer	= tmc_free_etf_buffer,
>>   	.set_buffer	= tmc_set_etf_buffer,
>> -	.reset_buffer	= tmc_reset_etf_buffer,
>>   	.update_buffer	= tmc_update_etf_buffer,
>>   };
>>   
>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
>> index c0e1568..41b3729 100644
>> --- a/include/linux/coresight.h
>> +++ b/include/linux/coresight.h
>> @@ -212,10 +212,7 @@ struct coresight_ops_sink {
>>   	int (*set_buffer)(struct coresight_device *csdev,
>>   			  struct perf_output_handle *handle,
>>   			  void *sink_config);
>> -	unsigned long (*reset_buffer)(struct coresight_device *csdev,
>> -				      struct perf_output_handle *handle,
>> -				      void *sink_config);
>> -	void (*update_buffer)(struct coresight_device *csdev,
>> +	unsigned long (*update_buffer)(struct coresight_device *csdev,
>>   			      struct perf_output_handle *handle,
>>   			      void *sink_config);
>>   };
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks
@ 2018-05-11 16:35       ` Suzuki K Poulose
  0 siblings, 0 replies; 134+ messages in thread
From: Suzuki K Poulose @ 2018-05-11 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/05/18 20:42, Mathieu Poirier wrote:
> On Tue, May 01, 2018 at 10:10:56AM +0100, Suzuki K Poulose wrote:
>> Right now we issue an update_buffer() and reset_buffer() call backs
>> in succession when we stop tracing an event. The update_buffer is
>> supposed to check the status of the buffer and make sure the ring buffer
>> is updated with the trace data. And we store information about the
>> size of the data collected only to be consumed by the reset_buffer
>> callback which always follows the update_buffer. This was originally
>> designed for handling future IPs which could trigger a buffer overflow
>> interrupt. This patch gets rid of the reset_buffer callback altogether
>> and performs the actions in update_buffer, making it return the size
>> collected. We can always add the support for handling the overflow
>> interrupt case later.
>>
>> This removes some not-so pretty hack (storing the new head in the
>> size field for snapshot mode) and cleans it up a little bit.
> 
> IPs with an overflow interrupts will be arriving shortly, so it is not like the
> future is uncertain - they are coming.  Right now the logic is there - I don't
> see a real need to consolidate things only to split it again in the near future.
> 
> I agree the part about overloading buf->data_size with the head of the ring
> buffer when operating in snapshot mode isn't pretty (though well documented).
> If anything that can be improve, i.e add a buf->head and things will be clear.
> Once again this could be part of another patchset.
> 

Mathieu,

I am not sure how was this supposed to be used in conjunction with the overflow
handling. Please could you help me here ? I might be able to retain
the callback and possibly improve it with the changes for etr_buf infrastructure.

Cheers
Suzuki


>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-etb10.c    | 56 +++++------------------
>>   drivers/hwtracing/coresight/coresight-etm-perf.c |  9 +---
>>   drivers/hwtracing/coresight/coresight-tmc-etf.c  | 58 +++++-------------------
>>   include/linux/coresight.h                        |  5 +-
>>   4 files changed, 26 insertions(+), 102 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
>> index d9c2f87..b13712a 100644
>> --- a/drivers/hwtracing/coresight/coresight-etb10.c
>> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
>> @@ -322,37 +322,7 @@ static int etb_set_buffer(struct coresight_device *csdev,
>>   	return ret;
>>   }
>>   
>> -static unsigned long etb_reset_buffer(struct coresight_device *csdev,
>> -				      struct perf_output_handle *handle,
>> -				      void *sink_config)
>> -{
>> -	unsigned long size = 0;
>> -	struct cs_buffers *buf = sink_config;
>> -
>> -	if (buf) {
>> -		/*
>> -		 * In snapshot mode ->data_size holds the new address of the
>> -		 * ring buffer's head.  The size itself is the whole address
>> -		 * range since we want the latest information.
>> -		 */
>> -		if (buf->snapshot)
>> -			handle->head = local_xchg(&buf->data_size,
>> -						  buf->nr_pages << PAGE_SHIFT);
>> -
>> -		/*
>> -		 * Tell the tracer PMU how much we got in this run and if
>> -		 * something went wrong along the way.  Nobody else can use
>> -		 * this cs_buffers instance until we are done.  As such
>> -		 * resetting parameters here and squaring off with the ring
>> -		 * buffer API in the tracer PMU is fine.
>> -		 */
>> -		size = local_xchg(&buf->data_size, 0);
>> -	}
>> -
>> -	return size;
>> -}
>> -
>> -static void etb_update_buffer(struct coresight_device *csdev,
>> +static unsigned long etb_update_buffer(struct coresight_device *csdev,
>>   			      struct perf_output_handle *handle,
>>   			      void *sink_config)
>>   {
>> @@ -361,13 +331,13 @@ static void etb_update_buffer(struct coresight_device *csdev,
>>   	u8 *buf_ptr;
>>   	const u32 *barrier;
>>   	u32 read_ptr, write_ptr, capacity;
>> -	u32 status, read_data, to_read;
>> -	unsigned long offset;
>> +	u32 status, read_data;
>> +	unsigned long offset, to_read;
>>   	struct cs_buffers *buf = sink_config;
>>   	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>   
>>   	if (!buf)
>> -		return;
>> +		return 0;
>>   
>>   	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
>>   
>> @@ -472,18 +442,17 @@ static void etb_update_buffer(struct coresight_device *csdev,
>>   	writel_relaxed(0x0, drvdata->base + ETB_RAM_WRITE_POINTER);
>>   
>>   	/*
>> -	 * In snapshot mode all we have to do is communicate to
>> -	 * perf_aux_output_end() the address of the current head.  In full
>> -	 * trace mode the same function expects a size to move rb->aux_head
>> -	 * forward.
>> +	 * In snapshot mode we have to update the handle->head to point
>> +	 * to the new location.
>>   	 */
>> -	if (buf->snapshot)
>> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
>> -	else
>> -		local_add(to_read, &buf->data_size);
>> -
>> +	if (buf->snapshot) {
>> +		handle->head = (cur * PAGE_SIZE) + offset;
>> +		to_read = buf->nr_pages << PAGE_SHIFT;
>> +	}
>>   	etb_enable_hw(drvdata);
>>   	CS_LOCK(drvdata->base);
>> +
>> +	return to_read;
>>   }
>>   
>>   static const struct coresight_ops_sink etb_sink_ops = {
>> @@ -492,7 +461,6 @@ static const struct coresight_ops_sink etb_sink_ops = {
>>   	.alloc_buffer	= etb_alloc_buffer,
>>   	.free_buffer	= etb_free_buffer,
>>   	.set_buffer	= etb_set_buffer,
>> -	.reset_buffer	= etb_reset_buffer,
>>   	.update_buffer	= etb_update_buffer,
>>   };
>>   
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 4e5ed65..5096def 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -342,15 +342,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>>   		if (!sink_ops(sink)->update_buffer)
>>   			return;
>>   
>> -		sink_ops(sink)->update_buffer(sink, handle,
>> +		size = sink_ops(sink)->update_buffer(sink, handle,
>>   					      event_data->snk_config);
>> -
>> -		if (!sink_ops(sink)->reset_buffer)
>> -			return;
>> -
>> -		size = sink_ops(sink)->reset_buffer(sink, handle,
>> -						    event_data->snk_config);
>> -
>>   		perf_aux_output_end(handle, size);
>>   	}
>>   
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> index 0a32734..75ef5c4 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
>> @@ -360,36 +360,7 @@ static int tmc_set_etf_buffer(struct coresight_device *csdev,
>>   	return ret;
>>   }
>>   
>> -static unsigned long tmc_reset_etf_buffer(struct coresight_device *csdev,
>> -					  struct perf_output_handle *handle,
>> -					  void *sink_config)
>> -{
>> -	long size = 0;
>> -	struct cs_buffers *buf = sink_config;
>> -
>> -	if (buf) {
>> -		/*
>> -		 * In snapshot mode ->data_size holds the new address of the
>> -		 * ring buffer's head.  The size itself is the whole address
>> -		 * range since we want the latest information.
>> -		 */
>> -		if (buf->snapshot)
>> -			handle->head = local_xchg(&buf->data_size,
>> -						  buf->nr_pages << PAGE_SHIFT);
>> -		/*
>> -		 * Tell the tracer PMU how much we got in this run and if
>> -		 * something went wrong along the way.  Nobody else can use
>> -		 * this cs_buffers instance until we are done.  As such
>> -		 * resetting parameters here and squaring off with the ring
>> -		 * buffer API in the tracer PMU is fine.
>> -		 */
>> -		size = local_xchg(&buf->data_size, 0);
>> -	}
>> -
>> -	return size;
>> -}
>> -
>> -static void tmc_update_etf_buffer(struct coresight_device *csdev,
>> +static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
>>   				  struct perf_output_handle *handle,
>>   				  void *sink_config)
>>   {
>> @@ -398,17 +369,17 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>>   	const u32 *barrier;
>>   	u32 *buf_ptr;
>>   	u64 read_ptr, write_ptr;
>> -	u32 status, to_read;
>> -	unsigned long offset;
>> +	u32 status;
>> +	unsigned long offset, to_read;
>>   	struct cs_buffers *buf = sink_config;
>>   	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>   
>>   	if (!buf)
>> -		return;
>> +		return 0;
>>   
>>   	/* This shouldn't happen */
>>   	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
>> -		return;
>> +		return 0;
>>   
>>   	CS_UNLOCK(drvdata->base);
>>   
>> @@ -497,18 +468,14 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>>   		}
>>   	}
>>   
>> -	/*
>> -	 * In snapshot mode all we have to do is communicate to
>> -	 * perf_aux_output_end() the address of the current head.  In full
>> -	 * trace mode the same function expects a size to move rb->aux_head
>> -	 * forward.
>> -	 */
>> -	if (buf->snapshot)
>> -		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
>> -	else
>> -		local_add(to_read, &buf->data_size);
>> -
>> +	/* In snapshot mode we have to update the head */
>> +	if (buf->snapshot) {
>> +		handle->head = (cur * PAGE_SIZE) + offset;
>> +		to_read = buf->nr_pages << PAGE_SHIFT;
>> +	}
>>   	CS_LOCK(drvdata->base);
>> +
>> +	return to_read;
>>   }
>>   
>>   static const struct coresight_ops_sink tmc_etf_sink_ops = {
>> @@ -517,7 +484,6 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
>>   	.alloc_buffer	= tmc_alloc_etf_buffer,
>>   	.free_buffer	= tmc_free_etf_buffer,
>>   	.set_buffer	= tmc_set_etf_buffer,
>> -	.reset_buffer	= tmc_reset_etf_buffer,
>>   	.update_buffer	= tmc_update_etf_buffer,
>>   };
>>   
>> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
>> index c0e1568..41b3729 100644
>> --- a/include/linux/coresight.h
>> +++ b/include/linux/coresight.h
>> @@ -212,10 +212,7 @@ struct coresight_ops_sink {
>>   	int (*set_buffer)(struct coresight_device *csdev,
>>   			  struct perf_output_handle *handle,
>>   			  void *sink_config);
>> -	unsigned long (*reset_buffer)(struct coresight_device *csdev,
>> -				      struct perf_output_handle *handle,
>> -				      void *sink_config);
>> -	void (*update_buffer)(struct coresight_device *csdev,
>> +	unsigned long (*update_buffer)(struct coresight_device *csdev,
>>   			      struct perf_output_handle *handle,
>>   			      void *sink_config);
>>   };
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
  2018-05-11 16:05           ` Rob Herring
@ 2018-05-14 14:42             ` Mathieu Poirier
  -1 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-14 14:42 UTC (permalink / raw)
  To: Rob Herring
  Cc: Suzuki K Poulose, linux-arm-kernel, linux-kernel, Mike Leach,
	Robert Walker, Mark Rutland, Will Deacon, Robin Murphy,
	Sudeep Holla, Frank Rowand, John Horley, devicetree,
	Mathieu Poirier

On Fri, May 11, 2018 at 11:05:56AM -0500, Rob Herring wrote:
> On Tue, May 8, 2018 at 10:40 AM, Suzuki K Poulose
> <Suzuki.Poulose@arm.com> wrote:
> >
> >
> > Rob, Mathieu,
> >
> >
> > On 03/05/18 18:42, Mathieu Poirier wrote:
> >>
> >> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
> >>>
> >>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
> >>>>
> >>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
> >>>> by providing an improved Scatter Gather mechanism for streaming
> >>>> trace data to non-contiguous system RAM pages.
> >>>>
> >>>> Cc: devicetree@vger.kernel.org
> >>>> Cc: frowand.list@gmail.com
> >>>> Cc: Rob Herring <robh@kernel.org>
> >>>> Cc: Mark Rutland <mark.rutland@arm.com>
> >>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
> >>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>> ---
> >>>>   .../devicetree/bindings/arm/coresight.txt          | 52
> >>>> ++++++++++++++++++++++
> >>>>   1 file changed, 52 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> b/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> index 15ac8e8..cdd84d0 100644
> >>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
> >>>>
> >>>>                - System Trace Macrocell:
> >>>>                        "arm,coresight-stm", "arm,primecell"; [1]
> >>>> +             - Coresight Address Translation Unit (CATU)
> >>>> +                     "arm, coresight-catu", "arm,primecell";
> >>>
> >>>
> >>> spurious space               ^
> >
> >
> > Thanks for spotting, will fix it.
> >
> >>>
> >>>>
> >>>>        * reg: physical base address and length of the register
> >>>>          set(s) of the component.
> >>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
> >>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
> >>>>         (embedded trace router)
> >>>>
> >>>> +* Optional property for CATU :
> >>>> +     * interrupts : Exactly one SPI may be listed for reporting the
> >>>> address
> >>>> +       error
> >>>
> >>>
> >>> Somewhere you need to define the ports for the CATU.
> >
> >
> > The ports are defined common to all the coresight components. Would you
> > like it to be added just for the CATU ?
> 
> Yeah, that's probably how we got into this problem with the port
> numbering in the first place.
> 
> 
> >>>>   Example:
> >>>>
> >>>> @@ -118,6 +123,35 @@ Example:
> >>>>                };
> >>>>        };
> >>>>
> >>>> +     etr@20070000 {
> >>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
> >>>> +             reg = <0 0x20070000 0 0x1000>;
> >>>> +
> >>>> +                     /* input port */
> >>>> +                     port@0 {
> >>>> +                             reg =  <0>;
> >>>> +                             etr_in_port: endpoint {
> >>>> +                                     slave-mode;
> >>>> +                                     remote-endpoint =
> >>>> <&replicator2_out_port0>;
> >>>> +                             };
> >>>> +                     };
> >>>> +
> >>>> +                     /* CATU link represented by output port */
> >>>> +                     port@1 {
> >>>> +                             reg = <0>;
> >>>
> >>>
> >>> While common in the Coresight bindings, having unit-address and reg not
> >>> match is an error. Mathieu and I discussed this a bit as dtc now warns
> >>> on these.
> >>>
> >>> Either reg should be 1 here, or 'ports' needs to be split into input and
> >>> output ports. My preference would be the former, but Mathieu objected to
> >>> this not reflecting the the h/w numbering.
> >>
> >>
> >> Suzuki, as we discuss this is related to your work on revamping CS
> >> bindings for ACPI.  Until that gets done and to move forward with this
> >> set I suggest you abide to Rob's request.
> >
> >
> > Ok, I can change it to <1>, as we don't expect any other output port for an
> > ETR.
> 
> Better let Mathieu confirm he's okay with the first option because he
> wasn't okay with changing the port reg when we discussed. But maybe
> that was just on existing things like TPIU.

I'm good with this one as it is a new component and doesn't change anything
that people could be relying on.

> 
> Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU
@ 2018-05-14 14:42             ` Mathieu Poirier
  0 siblings, 0 replies; 134+ messages in thread
From: Mathieu Poirier @ 2018-05-14 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 11, 2018 at 11:05:56AM -0500, Rob Herring wrote:
> On Tue, May 8, 2018 at 10:40 AM, Suzuki K Poulose
> <Suzuki.Poulose@arm.com> wrote:
> >
> >
> > Rob, Mathieu,
> >
> >
> > On 03/05/18 18:42, Mathieu Poirier wrote:
> >>
> >> On 1 May 2018 at 07:10, Rob Herring <robh@kernel.org> wrote:
> >>>
> >>> On Tue, May 01, 2018 at 10:10:35AM +0100, Suzuki K Poulose wrote:
> >>>>
> >>>> Document CATU device-tree bindings. CATU augments the TMC-ETR
> >>>> by providing an improved Scatter Gather mechanism for streaming
> >>>> trace data to non-contiguous system RAM pages.
> >>>>
> >>>> Cc: devicetree at vger.kernel.org
> >>>> Cc: frowand.list at gmail.com
> >>>> Cc: Rob Herring <robh@kernel.org>
> >>>> Cc: Mark Rutland <mark.rutland@arm.com>
> >>>> Cc: Mathieu Poirier <mathieu.poirier@arm.com>
> >>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>> ---
> >>>>   .../devicetree/bindings/arm/coresight.txt          | 52
> >>>> ++++++++++++++++++++++
> >>>>   1 file changed, 52 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> b/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> index 15ac8e8..cdd84d0 100644
> >>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> >>>> @@ -39,6 +39,8 @@ its hardware characteristcs.
> >>>>
> >>>>                - System Trace Macrocell:
> >>>>                        "arm,coresight-stm", "arm,primecell"; [1]
> >>>> +             - Coresight Address Translation Unit (CATU)
> >>>> +                     "arm, coresight-catu", "arm,primecell";
> >>>
> >>>
> >>> spurious space               ^
> >
> >
> > Thanks for spotting, will fix it.
> >
> >>>
> >>>>
> >>>>        * reg: physical base address and length of the register
> >>>>          set(s) of the component.
> >>>> @@ -86,6 +88,9 @@ its hardware characteristcs.
> >>>>        * arm,buffer-size: size of contiguous buffer space for TMC ETR
> >>>>         (embedded trace router)
> >>>>
> >>>> +* Optional property for CATU :
> >>>> +     * interrupts : Exactly one SPI may be listed for reporting the
> >>>> address
> >>>> +       error
> >>>
> >>>
> >>> Somewhere you need to define the ports for the CATU.
> >
> >
> > The ports are defined common to all the coresight components. Would you
> > like it to be added just for the CATU ?
> 
> Yeah, that's probably how we got into this problem with the port
> numbering in the first place.
> 
> 
> >>>>   Example:
> >>>>
> >>>> @@ -118,6 +123,35 @@ Example:
> >>>>                };
> >>>>        };
> >>>>
> >>>> +     etr at 20070000 {
> >>>> +             compatible = "arm,coresight-tmc", "arm,primecell";
> >>>> +             reg = <0 0x20070000 0 0x1000>;
> >>>> +
> >>>> +                     /* input port */
> >>>> +                     port at 0 {
> >>>> +                             reg =  <0>;
> >>>> +                             etr_in_port: endpoint {
> >>>> +                                     slave-mode;
> >>>> +                                     remote-endpoint =
> >>>> <&replicator2_out_port0>;
> >>>> +                             };
> >>>> +                     };
> >>>> +
> >>>> +                     /* CATU link represented by output port */
> >>>> +                     port at 1 {
> >>>> +                             reg = <0>;
> >>>
> >>>
> >>> While common in the Coresight bindings, having unit-address and reg not
> >>> match is an error. Mathieu and I discussed this a bit as dtc now warns
> >>> on these.
> >>>
> >>> Either reg should be 1 here, or 'ports' needs to be split into input and
> >>> output ports. My preference would be the former, but Mathieu objected to
> >>> this not reflecting the the h/w numbering.
> >>
> >>
> >> Suzuki, as we discuss this is related to your work on revamping CS
> >> bindings for ACPI.  Until that gets done and to move forward with this
> >> set I suggest you abide to Rob's request.
> >
> >
> > Ok, I can change it to <1>, as we don't expect any other output port for an
> > ETR.
> 
> Better let Mathieu confirm he's okay with the first option because he
> wasn't okay with changing the port reg when we discussed. But maybe
> that was just on existing things like TPIU.

I'm good with this one as it is a new component and doesn't change anything
that people could be relying on.

> 
> Rob

^ permalink raw reply	[flat|nested] 134+ messages in thread

end of thread, other threads:[~2018-05-14 14:42 UTC | newest]

Thread overview: 134+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01  9:10 [PATCH v2 00/27] coresight: TMC ETR backend support for perf Suzuki K Poulose
2018-05-01  9:10 ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 01/27] coresight: ETM: Add support for ARM Cortex-A73 Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 02/27] coresight: Cleanup device subtype struct Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 03/27] coresight: Add helper device type Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-03 17:00   ` Mathieu Poirier
2018-05-03 17:00     ` Mathieu Poirier
2018-05-05  9:56     ` Suzuki K Poulose
2018-05-05  9:56       ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 04/27] coresight: Introduce support for Coresight Addrss Translation Unit Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-03 17:31   ` Mathieu Poirier
2018-05-03 17:31     ` Mathieu Poirier
2018-05-03 20:25     ` Mathieu Poirier
2018-05-03 20:25       ` Mathieu Poirier
2018-05-05 10:03       ` Suzuki K Poulose
2018-05-05 10:03         ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 05/27] dts: bindings: Document device tree binding for CATU Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01 13:10   ` Rob Herring
2018-05-01 13:10     ` Rob Herring
2018-05-03 17:42     ` Mathieu Poirier
2018-05-03 17:42       ` Mathieu Poirier
2018-05-08 15:40       ` Suzuki K Poulose
2018-05-08 15:40         ` Suzuki K Poulose
2018-05-11 16:05         ` Rob Herring
2018-05-11 16:05           ` Rob Herring
2018-05-14 14:42           ` Mathieu Poirier
2018-05-14 14:42             ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 06/27] coresight: tmc etr: Disallow perf mode temporarily Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 07/27] coresight: tmc: Hide trace buffer handling for file read Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-03 19:50   ` Mathieu Poirier
2018-05-03 19:50     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 08/27] coresight: tmc-etr: Do not clean trace buffer Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 09/27] coresight: Add helper for inserting synchronization packets Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 10/27] dts: bindings: Restrict coresight tmc-etr scatter-gather mode Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01 13:13   ` Rob Herring
2018-05-01 13:13     ` Rob Herring
2018-05-03 20:32     ` Mathieu Poirier
2018-05-03 20:32       ` Mathieu Poirier
2018-05-04 22:56       ` Rob Herring
2018-05-04 22:56         ` Rob Herring
2018-05-08 15:48         ` Suzuki K Poulose
2018-05-08 15:48           ` Suzuki K Poulose
2018-05-08 17:34           ` Rob Herring
2018-05-08 17:34             ` Rob Herring
2018-05-01  9:10 ` [PATCH v2 11/27] dts: juno: Add scatter-gather support for all revisions Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 12/27] coresight: tmc-etr: Allow commandline option to override SG use Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-03 20:40   ` Mathieu Poirier
2018-05-03 20:40     ` Mathieu Poirier
2018-05-08 15:49     ` Suzuki K Poulose
2018-05-08 15:49       ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 13/27] coresight: Add generic TMC sg table framework Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-04 17:35   ` Mathieu Poirier
2018-05-04 17:35     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 14/27] coresight: Add support for TMC ETR SG unit Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 15/27] coresight: tmc-etr: Make SG table circular Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 16/27] coresight: tmc-etr: Add transparent buffer management Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-07 17:20   ` Mathieu Poirier
2018-05-07 17:20     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 17/27] coresight: etr: Add support for save restore buffers Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-07 17:48   ` Mathieu Poirier
2018-05-07 17:48     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 18/27] coresight: catu: Add support for scatter gather tables Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-07 20:25   ` Mathieu Poirier
2018-05-07 20:25     ` Mathieu Poirier
2018-05-08 15:56     ` Suzuki K Poulose
2018-05-08 15:56       ` Suzuki K Poulose
2018-05-08 16:13       ` Mathieu Poirier
2018-05-08 16:13         ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 19/27] coresight: catu: Plug in CATU as a backend for ETR buffer Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-07 22:02   ` Mathieu Poirier
2018-05-07 22:02     ` Mathieu Poirier
2018-05-08 16:21     ` Suzuki K Poulose
2018-05-08 16:21       ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 20/27] coresight: tmc: Add configuration support for trace buffer size Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-02  3:55   ` Kim Phillips
2018-05-02  3:55     ` Kim Phillips
2018-05-02  8:25     ` Robert Walker
2018-05-02  8:25       ` Robert Walker
2018-05-02 13:52     ` Robin Murphy
2018-05-02 13:52       ` Robin Murphy
2018-05-10 13:36       ` Suzuki K Poulose
2018-05-10 13:36         ` Suzuki K Poulose
2018-05-07 22:28   ` Mathieu Poirier
2018-05-07 22:28     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 22/27] coresight: tmc-etr: Track if the device is coherent Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 23/27] coresight: tmc-etr: Handle driver mode specific ETR buffers Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-08 17:18   ` Mathieu Poirier
2018-05-08 17:18     ` Mathieu Poirier
2018-05-08 21:51     ` Suzuki K Poulose
2018-05-08 21:51       ` Suzuki K Poulose
2018-05-09 17:12       ` Mathieu Poirier
2018-05-09 17:12         ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 24/27] coresight: tmc-etr: Relax collection of trace from sysfs mode Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-07 22:54   ` Mathieu Poirier
2018-05-07 22:54     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 25/27] coresight: etr_buf: Add helper for padding an area of trace data Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-08 17:34   ` Mathieu Poirier
2018-05-08 17:34     ` Mathieu Poirier
2018-05-01  9:10 ` [PATCH v2 26/27] coresight: perf: Remove reset_buffer call back for sinks Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-08 19:42   ` Mathieu Poirier
2018-05-08 19:42     ` Mathieu Poirier
2018-05-11 16:35     ` Suzuki K Poulose
2018-05-11 16:35       ` Suzuki K Poulose
2018-05-01  9:10 ` [PATCH v2 27/27] coresight: etm-perf: Add support for ETR backend Suzuki K Poulose
2018-05-01  9:10   ` Suzuki K Poulose
2018-05-08 22:04   ` Mathieu Poirier
2018-05-08 22:04     ` Mathieu Poirier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.