All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] coresight: tmc-etr Transparent buffer management
@ 2018-05-18 16:39 ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

This series is split of the Coresight ETR perf support patches posted
here [0]. The CATU support and perf backend support will be posted as
separate series for better management and review of the patches.

This series adds the support for TMC ETR Scatter-Gather mode to allow
using physical non-contiguous buffer for holding the trace data. It
also adds a layer to handle the buffer management in a transparent
manner, independent of the underlying mode used by the TMC ETR.
The layer chooses the ETR mode based on different parameters (size,
re-using a set of pages, presence of an SMMU etc.).

Finally we add a sysfs parameter to tune the buffer size for ETR in
sysfs-mode.

During the testing, we found out that if the TMC ETR is not properly
connected to the memory subsystem, the ETR could lock-up the system
while waiting for the "read" transactions to complete in scatter-gather
mode. So, we do not use the mode on a system unless it is safe to do
so. This is specified by a DT property "arm,scatter-gather".

Applies on v4.17-rc4

Changes since v2 in [0] :
 - Split the series in [0]
 - Address comments on v2
 - Rename DT property "scatter-gather" to "arm,scatter-gather"
 - Add ETM PID for Cortex-A35, use macros to make the listing easier

[0] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/574875.html

Suzuki K Poulose (11):
  coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35
  coresight: tmc: Hide trace buffer handling for file read
  coresight: tmc-etr: Do not clean trace buffer
  coresight: tmc-etr: Disallow perf mode
  coresight: Add helper for inserting synchronization packets
  dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  dts: juno: Add scatter-gather support for all revisions
  coresight: Add generic TMC sg table framework
  coresight: Add support for TMC ETR SG unit
  coresight: tmc-etr: Add transparent buffer management
  coresight: tmc: Add configuration support for trace buffer size

 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |    8 +
 .../devicetree/bindings/arm/coresight.txt          |    5 +-
 arch/arm64/boot/dts/arm/juno-base.dtsi             |    1 +
 drivers/hwtracing/coresight/coresight-etb10.c      |   12 +-
 drivers/hwtracing/coresight/coresight-etm4x.c      |   32 +-
 drivers/hwtracing/coresight/coresight-priv.h       |   10 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |   45 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    | 1032 ++++++++++++++++++--
 drivers/hwtracing/coresight/coresight-tmc.c        |   83 +-
 drivers/hwtracing/coresight/coresight-tmc.h        |  111 ++-
 drivers/hwtracing/coresight/coresight.c            |    3 +-
 11 files changed, 1169 insertions(+), 173 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 00/11] coresight: tmc-etr Transparent buffer management
@ 2018-05-18 16:39 ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

This series is split of the Coresight ETR perf support patches posted
here [0]. The CATU support and perf backend support will be posted as
separate series for better management and review of the patches.

This series adds the support for TMC ETR Scatter-Gather mode to allow
using physical non-contiguous buffer for holding the trace data. It
also adds a layer to handle the buffer management in a transparent
manner, independent of the underlying mode used by the TMC ETR.
The layer chooses the ETR mode based on different parameters (size,
re-using a set of pages, presence of an SMMU etc.).

Finally we add a sysfs parameter to tune the buffer size for ETR in
sysfs-mode.

During the testing, we found out that if the TMC ETR is not properly
connected to the memory subsystem, the ETR could lock-up the system
while waiting for the "read" transactions to complete in scatter-gather
mode. So, we do not use the mode on a system unless it is safe to do
so. This is specified by a DT property "arm,scatter-gather".

Applies on v4.17-rc4

Changes since v2 in [0] :
 - Split the series in [0]
 - Address comments on v2
 - Rename DT property "scatter-gather" to "arm,scatter-gather"
 - Add ETM PID for Cortex-A35, use macros to make the listing easier

[0] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/574875.html

Suzuki K Poulose (11):
  coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35
  coresight: tmc: Hide trace buffer handling for file read
  coresight: tmc-etr: Do not clean trace buffer
  coresight: tmc-etr: Disallow perf mode
  coresight: Add helper for inserting synchronization packets
  dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  dts: juno: Add scatter-gather support for all revisions
  coresight: Add generic TMC sg table framework
  coresight: Add support for TMC ETR SG unit
  coresight: tmc-etr: Add transparent buffer management
  coresight: tmc: Add configuration support for trace buffer size

 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |    8 +
 .../devicetree/bindings/arm/coresight.txt          |    5 +-
 arch/arm64/boot/dts/arm/juno-base.dtsi             |    1 +
 drivers/hwtracing/coresight/coresight-etb10.c      |   12 +-
 drivers/hwtracing/coresight/coresight-etm4x.c      |   32 +-
 drivers/hwtracing/coresight/coresight-priv.h       |   10 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |   45 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    | 1032 ++++++++++++++++++--
 drivers/hwtracing/coresight/coresight-tmc.c        |   83 +-
 drivers/hwtracing/coresight/coresight-tmc.h        |  111 ++-
 drivers/hwtracing/coresight/coresight.c            |    3 +-
 11 files changed, 1169 insertions(+), 173 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/11] coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

Add ETM PIDs of the Arm cortex-A CPUs to the white list of ETMs.
While at it, also add description of the CPU to which the ETM belongs,
to make it easier to identify the ETM devices.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 32 +++++++++++++--------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index cf364a5..fe5b41c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1034,7 +1034,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	}
 
 	pm_runtime_put(&adev->dev);
-	dev_info(dev, "%s initialized\n", (char *)id->data);
+	dev_info(dev, "CPU%d: %s initialized\n",
+			drvdata->cpu, (char *)id->data);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1052,23 +1053,20 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	return ret;
 }
 
+#define ETM4_AMBA_ID(cpu, pid)			\
+	{					\
+		.id	= pid,			\
+		.mask	= 0x000fffff,		\
+		.data	= #cpu " ETM v4.x",	\
+	}
+
 static const struct amba_id etm4_ids[] = {
-	{       /* ETM 4.0 - Cortex-A53  */
-		.id	= 0x000bb95d,
-		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
-	},
-	{       /* ETM 4.0 - Cortex-A57 */
-		.id	= 0x000bb95e,
-		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
-	},
-	{       /* ETM 4.0 - A72, Maia, HiSilicon */
-		.id = 0x000bb95a,
-		.mask = 0x000fffff,
-		.data = "ETM 4.0",
-	},
-	{ 0, 0},
+	ETM4_AMBA_ID(Cortex-A53, 0x000bb95d),
+	ETM4_AMBA_ID(Cortex-A57, 0x000bb95e),
+	ETM4_AMBA_ID(Cortex-A72, 0x000bb95a),
+	ETM4_AMBA_ID(Cortex-A73, 0x000bb959),
+	ETM4_AMBA_ID(Cortex-A35, 0x000bb9da),
+	{},
 };
 
 static struct amba_driver etm4x_driver = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 01/11] coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Add ETM PIDs of the Arm cortex-A CPUs to the white list of ETMs.
While at it, also add description of the CPU to which the ETM belongs,
to make it easier to identify the ETM devices.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 32 +++++++++++++--------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index cf364a5..fe5b41c 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1034,7 +1034,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	}
 
 	pm_runtime_put(&adev->dev);
-	dev_info(dev, "%s initialized\n", (char *)id->data);
+	dev_info(dev, "CPU%d: %s initialized\n",
+			drvdata->cpu, (char *)id->data);
 
 	if (boot_enable) {
 		coresight_enable(drvdata->csdev);
@@ -1052,23 +1053,20 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
 	return ret;
 }
 
+#define ETM4_AMBA_ID(cpu, pid)			\
+	{					\
+		.id	= pid,			\
+		.mask	= 0x000fffff,		\
+		.data	= #cpu " ETM v4.x",	\
+	}
+
 static const struct amba_id etm4_ids[] = {
-	{       /* ETM 4.0 - Cortex-A53  */
-		.id	= 0x000bb95d,
-		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
-	},
-	{       /* ETM 4.0 - Cortex-A57 */
-		.id	= 0x000bb95e,
-		.mask	= 0x000fffff,
-		.data	= "ETM 4.0",
-	},
-	{       /* ETM 4.0 - A72, Maia, HiSilicon */
-		.id = 0x000bb95a,
-		.mask = 0x000fffff,
-		.data = "ETM 4.0",
-	},
-	{ 0, 0},
+	ETM4_AMBA_ID(Cortex-A53, 0x000bb95d),
+	ETM4_AMBA_ID(Cortex-A57, 0x000bb95e),
+	ETM4_AMBA_ID(Cortex-A72, 0x000bb95a),
+	ETM4_AMBA_ID(Cortex-A73, 0x000bb959),
+	ETM4_AMBA_ID(Cortex-A35, 0x000bb9da),
+	{},
 };
 
 static struct amba_driver etm4x_driver = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/11] coresight: tmc: Hide trace buffer handling for file read
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

At the moment we adjust the buffer pointers for reading the trace
data via misc device in the common code for ETF/ETB and ETR. Since
we are going to change how we manage the buffer for ETR, let us
move the buffer manipulation to the respective driver files, hiding
it from the common code. We do so by adding type specific helpers
for finding the length of data and the pointer to the buffer,
for a given length at a file position.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
 drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e2513b7..e5edf46 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer from @pos, with
+ * a maximum limit of @len, updating the @bufpp on where to
+ * find it.
+ */
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+	if (actual > 0)
+		*bufpp = drvdata->buf + pos;
+	return actual;
+}
+
 static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 68fbc8f..d3c2b04 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -69,6 +69,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer @pos, with a maximum
+ * limit of @len, also updating the @bufpp on where to find it.
+ */
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+	char *bufp = drvdata->buf + pos;
+	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+
+	if (actual <= 0)
+		return actual;
+
+	/*
+	 * Since we use a circular buffer, with trace data starting
+	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
+	 * wrap the current @pos to within the buffer.
+	 */
+	if (bufp >= bufend)
+		bufp -= drvdata->size;
+	/*
+	 * For simplicity, avoid copying over a wrapped around buffer.
+	 */
+	if ((bufp + actual) > bufend)
+		actual = bufend - bufp;
+	*bufpp = bufp;
+	return actual;
+}
+
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
 	const u32 *barrier;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 0ea04f5..93c5bfc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
+					  loff_t pos, size_t len, char **bufpp)
+{
+	switch (drvdata->config_type) {
+	case TMC_CONFIG_TYPE_ETB:
+	case TMC_CONFIG_TYPE_ETF:
+		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
+	case TMC_CONFIG_TYPE_ETR:
+		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
+	}
+
+	return -EINVAL;
+}
+
 static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
 			loff_t *ppos)
 {
+	char *bufp;
+	ssize_t actual;
 	struct tmc_drvdata *drvdata = container_of(file->private_data,
 						   struct tmc_drvdata, miscdev);
-	char *bufp = drvdata->buf + *ppos;
+	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
+	if (actual <= 0)
+		return 0;
 
-	if (*ppos + len > drvdata->len)
-		len = drvdata->len - *ppos;
-
-	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
-		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
-			bufp = drvdata->vaddr;
-		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
-			bufp -= drvdata->size;
-		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
-			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
-	}
-
-	if (copy_to_user(data, bufp, len)) {
+	if (copy_to_user(data, bufp, actual)) {
 		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
 		return -EFAULT;
 	}
 
-	*ppos += len;
+	*ppos += actual;
+	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
 
-	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
-		__func__, len, (int)(drvdata->len - *ppos));
-	return len;
+	return actual;
 }
 
 static int tmc_release(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 8df7a81..73f944d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -183,10 +183,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etb_cs_ops;
 extern const struct coresight_ops tmc_etf_cs_ops;
 
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 /* ETR functions */
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etr_cs_ops;
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 
 
 #define TMC_REG_PAIR(name, lo_off, hi_off)				\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/11] coresight: tmc: Hide trace buffer handling for file read
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment we adjust the buffer pointers for reading the trace
data via misc device in the common code for ETF/ETB and ETR. Since
we are going to change how we manage the buffer for ETR, let us
move the buffer manipulation to the respective driver files, hiding
it from the common code. We do so by adding type specific helpers
for finding the length of data and the pointer to the buffer,
for a given length at a file position.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.c     | 41 ++++++++++++++-----------
 drivers/hwtracing/coresight/coresight-tmc.h     |  4 +++
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e2513b7..e5edf46 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -120,6 +120,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer from @pos, with
+ * a maximum limit of @len, updating the @bufpp on where to
+ * find it.
+ */
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+	if (actual > 0)
+		*bufpp = drvdata->buf + pos;
+	return actual;
+}
+
 static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 68fbc8f..d3c2b04 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -69,6 +69,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
+/*
+ * Return the available trace data in the buffer @pos, with a maximum
+ * limit of @len, also updating the @bufpp on where to find it.
+ */
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp)
+{
+	ssize_t actual = len;
+	char *bufp = drvdata->buf + pos;
+	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
+
+	/* Adjust the len to available size @pos */
+	if (pos + actual > drvdata->len)
+		actual = drvdata->len - pos;
+
+	if (actual <= 0)
+		return actual;
+
+	/*
+	 * Since we use a circular buffer, with trace data starting
+	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
+	 * wrap the current @pos to within the buffer.
+	 */
+	if (bufp >= bufend)
+		bufp -= drvdata->size;
+	/*
+	 * For simplicity, avoid copying over a wrapped around buffer.
+	 */
+	if ((bufp + actual) > bufend)
+		actual = bufend - bufp;
+	*bufpp = bufp;
+	return actual;
+}
+
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
 	const u32 *barrier;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 0ea04f5..93c5bfc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -131,35 +131,40 @@ static int tmc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata,
+					  loff_t pos, size_t len, char **bufpp)
+{
+	switch (drvdata->config_type) {
+	case TMC_CONFIG_TYPE_ETB:
+	case TMC_CONFIG_TYPE_ETF:
+		return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp);
+	case TMC_CONFIG_TYPE_ETR:
+		return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp);
+	}
+
+	return -EINVAL;
+}
+
 static ssize_t tmc_read(struct file *file, char __user *data, size_t len,
 			loff_t *ppos)
 {
+	char *bufp;
+	ssize_t actual;
 	struct tmc_drvdata *drvdata = container_of(file->private_data,
 						   struct tmc_drvdata, miscdev);
-	char *bufp = drvdata->buf + *ppos;
+	actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp);
+	if (actual <= 0)
+		return 0;
 
-	if (*ppos + len > drvdata->len)
-		len = drvdata->len - *ppos;
-
-	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
-		if (bufp == (char *)(drvdata->vaddr + drvdata->size))
-			bufp = drvdata->vaddr;
-		else if (bufp > (char *)(drvdata->vaddr + drvdata->size))
-			bufp -= drvdata->size;
-		if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size))
-			len = (char *)(drvdata->vaddr + drvdata->size) - bufp;
-	}
-
-	if (copy_to_user(data, bufp, len)) {
+	if (copy_to_user(data, bufp, actual)) {
 		dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__);
 		return -EFAULT;
 	}
 
-	*ppos += len;
+	*ppos += actual;
+	dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
 
-	dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n",
-		__func__, len, (int)(drvdata->len - *ppos));
-	return len;
+	return actual;
 }
 
 static int tmc_release(struct inode *inode, struct file *file)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 8df7a81..73f944d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -183,10 +183,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etb_cs_ops;
 extern const struct coresight_ops tmc_etf_cs_ops;
 
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 /* ETR functions */
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata);
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata);
 extern const struct coresight_ops tmc_etr_cs_ops;
+ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
+				loff_t pos, size_t len, char **bufpp);
 
 
 #define TMC_REG_PAIR(name, lo_off, hi_off)				\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/11] coresight: tmc-etr: Do not clean trace buffer
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

We zero out the entire trace buffer used for ETR before it is enabled,
for helping with debugging. With the addition of scatter-gather mode,
the buffer could be bigger and non-contiguous.

Get rid of this step; if someone wants to debug, they can always add it
as and when needed.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index d3c2b04..c73bcb3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -24,9 +24,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
 
-	/* Zero out the memory to help with debug */
-	memset(drvdata->vaddr, 0, drvdata->size);
-
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
@@ -352,9 +349,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
-		 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
-		 * so we don't have to explicitly clear it. Also, since the
-		 * tracer is still enabled drvdata::buf can't be NULL.
+		 * buffer. Since the tracer is still enabled drvdata::buf can't
+		 * be NULL.
 		 */
 		tmc_etr_enable_hw(drvdata);
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/11] coresight: tmc-etr: Do not clean trace buffer
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

We zero out the entire trace buffer used for ETR before it is enabled,
for helping with debugging. With the addition of scatter-gather mode,
the buffer could be bigger and non-contiguous.

Get rid of this step; if someone wants to debug, they can always add it
as and when needed.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index d3c2b04..c73bcb3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -24,9 +24,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
 
-	/* Zero out the memory to help with debug */
-	memset(drvdata->vaddr, 0, drvdata->size);
-
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
@@ -352,9 +349,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
-		 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
-		 * so we don't have to explicitly clear it. Also, since the
-		 * tracer is still enabled drvdata::buf can't be NULL.
+		 * buffer. Since the tracer is still enabled drvdata::buf can't
+		 * be NULL.
 		 */
 		tmc_etr_enable_hw(drvdata);
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/11] coresight: tmc-etr: Disallow perf mode
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

We don't support ETR in perf mode yet. So, don't
even try to enable the hardware, even by mistake.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 28 ++-----------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index c73bcb3..6c5e8d1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -223,32 +223,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	int ret = 0;
-	unsigned long flags;
-	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (drvdata->reading) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/*
-	 * In Perf mode there can be only one writer per sink.  There
-	 * is also no need to continue if the ETR is already operated
-	 * from sysFS.
-	 */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	drvdata->mode = CS_MODE_PERF;
-	tmc_etr_enable_hw(drvdata);
-out:
-	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-	return ret;
+	/* We don't support perf mode yet ! */
+	return -EINVAL;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/11] coresight: tmc-etr: Disallow perf mode
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

We don't support ETR in perf mode yet. So, don't
even try to enable the hardware, even by mistake.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 28 ++-----------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index c73bcb3..6c5e8d1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -223,32 +223,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
-	int ret = 0;
-	unsigned long flags;
-	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (drvdata->reading) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/*
-	 * In Perf mode there can be only one writer per sink.  There
-	 * is also no need to continue if the ETR is already operated
-	 * from sysFS.
-	 */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	drvdata->mode = CS_MODE_PERF;
-	tmc_etr_enable_hw(drvdata);
-out:
-	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-	return ret;
+	/* We don't support perf mode yet ! */
+	return -EINVAL;
 }
 
 static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/11] coresight: Add helper for inserting synchronization packets
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose, Mike Leach

Right now we open code filling the trace buffer with synchronization
packets when the circular buffer wraps around in different drivers.
Move this to a common place. While at it, clean up the barrier_pkt
array to strip off the trailing '\0'.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c   | 12 ++++-------
 drivers/hwtracing/coresight/coresight-priv.h    | 10 ++++++++-
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 ++++++++-----------------
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +-----------
 drivers/hwtracing/coresight/coresight.c         |  3 +--
 5 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 580cd38..74232e6 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -202,7 +202,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 	bool lost = false;
 	int i;
 	u8 *buf_ptr;
-	const u32 *barrier;
 	u32 read_data, depth;
 	u32 read_ptr, write_ptr;
 	u32 frame_off, frame_endoff;
@@ -233,19 +232,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 
 	depth = drvdata->buffer_depth;
 	buf_ptr = drvdata->buf;
-	barrier = barrier_pkt;
 	for (i = 0; i < depth; i++) {
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
-			read_data = *barrier;
-			barrier++;
-		}
-
 		*(u32 *)buf_ptr = read_data;
 		buf_ptr += 4;
 	}
 
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+
 	if (frame_off) {
 		buf_ptr -= (frame_endoff * 4);
 		for (i = 0; i < frame_endoff; i++) {
@@ -454,7 +450,7 @@ static void etb_update_buffer(struct coresight_device *csdev,
 		buf_ptr = buf->data_pages[cur] + offset;
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
+		if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
 			read_data = *barrier;
 			barrier++;
 		}
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index f1d0e21d..2bb0a15 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -64,7 +64,8 @@ static DEVICE_ATTR_RO(name)
 #define coresight_simple_reg64(type, name, lo_off, hi_off)		\
 	__coresight_simple_func(type, NULL, name, lo_off, hi_off)
 
-extern const u32 barrier_pkt[5];
+extern const u32 barrier_pkt[4];
+#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt))
 
 enum etm_addr_type {
 	ETM_ADDR_TYPE_NONE,
@@ -98,6 +99,13 @@ struct cs_buffers {
 	void			**data_pages;
 };
 
+static inline void coresight_insert_barrier_packet(void *buf)
+{
+	if (buf)
+		memcpy(buf, barrier_pkt, CORESIGHT_BARRIER_PKT_SIZE);
+}
+
+
 static inline void CS_LOCK(void __iomem *addr)
 {
 	do {
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e5edf46..f30e5d8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -43,39 +43,28 @@ static void tmc_etb_enable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
 {
-	bool lost = false;
 	char *bufp;
-	const u32 *barrier;
-	u32 read_data, status;
+	u32 read_data, lost;
 	int i;
 
-	/*
-	 * Get a hold of the status register and see if a wrap around
-	 * has occurred.
-	 */
-	status = readl_relaxed(drvdata->base + TMC_STS);
-	if (status & TMC_STS_FULL)
-		lost = true;
-
+	/* Check if the buffer wrapped around. */
+	lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL;
 	bufp = drvdata->buf;
 	drvdata->len = 0;
-	barrier = barrier_pkt;
 	while (1) {
 		for (i = 0; i < drvdata->memwidth; i++) {
 			read_data = readl_relaxed(drvdata->base + TMC_RRD);
 			if (read_data == 0xFFFFFFFF)
-				return;
-
-			if (lost && *barrier) {
-				read_data = *barrier;
-				barrier++;
-			}
-
+				goto done;
 			memcpy(bufp, &read_data, 4);
 			bufp += 4;
 			drvdata->len += 4;
 		}
 	}
+done:
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+	return;
 }
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 6c5e8d1..9780798 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -102,9 +102,7 @@ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
-	const u32 *barrier;
 	u32 val;
-	u32 *temp;
 	u64 rwp;
 
 	rwp = tmc_read_rwp(drvdata);
@@ -117,16 +115,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 	if (val & TMC_STS_FULL) {
 		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
 		drvdata->len = drvdata->size;
-
-		barrier = barrier_pkt;
-		temp = (u32 *)drvdata->buf;
-
-		while (*barrier) {
-			*temp = *barrier;
-			temp++;
-			barrier++;
-		}
-
+		coresight_insert_barrier_packet(drvdata->buf);
 	} else {
 		drvdata->buf = drvdata->vaddr;
 		drvdata->len = rwp - drvdata->paddr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index 389c4ba..0dcfe25 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -58,8 +58,7 @@ static struct list_head *stm_path;
  * beginning of the data collected in a buffer.  That way the decoder knows that
  * it needs to look for another sync sequence.
  */
-const u32 barrier_pkt[5] = {0x7fffffff, 0x7fffffff,
-			    0x7fffffff, 0x7fffffff, 0x0};
+const u32 barrier_pkt[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff};
 
 static int coresight_id_match(struct device *dev, void *data)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/11] coresight: Add helper for inserting synchronization packets
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Right now we open code filling the trace buffer with synchronization
packets when the circular buffer wraps around in different drivers.
Move this to a common place. While at it, clean up the barrier_pkt
array to strip off the trailing '\0'.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c   | 12 ++++-------
 drivers/hwtracing/coresight/coresight-priv.h    | 10 ++++++++-
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 ++++++++-----------------
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +-----------
 drivers/hwtracing/coresight/coresight.c         |  3 +--
 5 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 580cd38..74232e6 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -202,7 +202,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 	bool lost = false;
 	int i;
 	u8 *buf_ptr;
-	const u32 *barrier;
 	u32 read_data, depth;
 	u32 read_ptr, write_ptr;
 	u32 frame_off, frame_endoff;
@@ -233,19 +232,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
 
 	depth = drvdata->buffer_depth;
 	buf_ptr = drvdata->buf;
-	barrier = barrier_pkt;
 	for (i = 0; i < depth; i++) {
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
-			read_data = *barrier;
-			barrier++;
-		}
-
 		*(u32 *)buf_ptr = read_data;
 		buf_ptr += 4;
 	}
 
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+
 	if (frame_off) {
 		buf_ptr -= (frame_endoff * 4);
 		for (i = 0; i < frame_endoff; i++) {
@@ -454,7 +450,7 @@ static void etb_update_buffer(struct coresight_device *csdev,
 		buf_ptr = buf->data_pages[cur] + offset;
 		read_data = readl_relaxed(drvdata->base +
 					  ETB_RAM_READ_DATA_REG);
-		if (lost && *barrier) {
+		if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) {
 			read_data = *barrier;
 			barrier++;
 		}
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index f1d0e21d..2bb0a15 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -64,7 +64,8 @@ static DEVICE_ATTR_RO(name)
 #define coresight_simple_reg64(type, name, lo_off, hi_off)		\
 	__coresight_simple_func(type, NULL, name, lo_off, hi_off)
 
-extern const u32 barrier_pkt[5];
+extern const u32 barrier_pkt[4];
+#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt))
 
 enum etm_addr_type {
 	ETM_ADDR_TYPE_NONE,
@@ -98,6 +99,13 @@ struct cs_buffers {
 	void			**data_pages;
 };
 
+static inline void coresight_insert_barrier_packet(void *buf)
+{
+	if (buf)
+		memcpy(buf, barrier_pkt, CORESIGHT_BARRIER_PKT_SIZE);
+}
+
+
 static inline void CS_LOCK(void __iomem *addr)
 {
 	do {
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e5edf46..f30e5d8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -43,39 +43,28 @@ static void tmc_etb_enable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
 {
-	bool lost = false;
 	char *bufp;
-	const u32 *barrier;
-	u32 read_data, status;
+	u32 read_data, lost;
 	int i;
 
-	/*
-	 * Get a hold of the status register and see if a wrap around
-	 * has occurred.
-	 */
-	status = readl_relaxed(drvdata->base + TMC_STS);
-	if (status & TMC_STS_FULL)
-		lost = true;
-
+	/* Check if the buffer wrapped around. */
+	lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL;
 	bufp = drvdata->buf;
 	drvdata->len = 0;
-	barrier = barrier_pkt;
 	while (1) {
 		for (i = 0; i < drvdata->memwidth; i++) {
 			read_data = readl_relaxed(drvdata->base + TMC_RRD);
 			if (read_data == 0xFFFFFFFF)
-				return;
-
-			if (lost && *barrier) {
-				read_data = *barrier;
-				barrier++;
-			}
-
+				goto done;
 			memcpy(bufp, &read_data, 4);
 			bufp += 4;
 			drvdata->len += 4;
 		}
 	}
+done:
+	if (lost)
+		coresight_insert_barrier_packet(drvdata->buf);
+	return;
 }
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 6c5e8d1..9780798 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -102,9 +102,7 @@ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 
 static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 {
-	const u32 *barrier;
 	u32 val;
-	u32 *temp;
 	u64 rwp;
 
 	rwp = tmc_read_rwp(drvdata);
@@ -117,16 +115,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
 	if (val & TMC_STS_FULL) {
 		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
 		drvdata->len = drvdata->size;
-
-		barrier = barrier_pkt;
-		temp = (u32 *)drvdata->buf;
-
-		while (*barrier) {
-			*temp = *barrier;
-			temp++;
-			barrier++;
-		}
-
+		coresight_insert_barrier_packet(drvdata->buf);
 	} else {
 		drvdata->buf = drvdata->vaddr;
 		drvdata->len = rwp - drvdata->paddr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index 389c4ba..0dcfe25 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -58,8 +58,7 @@ static struct list_head *stm_path;
  * beginning of the data collected in a buffer.  That way the decoder knows that
  * it needs to look for another sync sequence.
  */
-const u32 barrier_pkt[5] = {0x7fffffff, 0x7fffffff,
-			    0x7fffffff, 0x7fffffff, 0x0};
+const u32 barrier_pkt[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff};
 
 static int coresight_id_match(struct device *dev, void *data)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/11] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose, Mike Leach,
	John Horley, Robert Walker, devicetree

We are about to add the support for ETR builtin scatter-gather mode
for dealing with large amount of trace buffers. However, on some of
the platforms, using the ETR SG mode can lock up the system due to
the way the ETR is connected to the memory subsystem.

In SG mode, the ETR performs READ from the scatter-gather table to
fetch the next page and regular WRITE of trace data. If the READ
operation doesn't complete(due to the memory subsystem issues,
which we have seen on a couple of platforms) the trace WRITE
cannot proceed leading to issues. So, we by default do not
use the SG mode, unless it is known to be safe on the platform.
We define a DT property for the TMC node to specify whether we
have a proper SG mode.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: John Horley <john.horley@arm.com>
Cc: Robert Walker <robert.walker@arm.com>
Cc: devicetree@vger.kernel.org
Cc: frowand.list@gmail.com
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/devicetree/bindings/arm/coresight.txt | 2 ++
 drivers/hwtracing/coresight/coresight-tmc.c         | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 15ac8e8..603d3c6 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -86,6 +86,8 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+	* arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely
+	  use the SG mode on this system.
 
 Example:
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 93c5bfc..7d8331d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
@@ -304,6 +305,12 @@ const struct attribute_group *coresight_tmc_groups[] = {
 	NULL,
 };
 
+static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
+{
+	return fwnode_property_present(drvdata->dev->fwnode,
+				       "arm,scatter-gather");
+}
+
 /* Detect and initialise the capabilities of a TMC ETR */
 static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 			     u32 devid, void *dev_caps)
@@ -313,7 +320,7 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	/* Set the unadvertised capabilities */
 	tmc_etr_init_caps(drvdata, (u32)(unsigned long)dev_caps);
 
-	if (!(devid & TMC_DEVID_NOSCAT))
+	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
 	/* Check if the AXI address width is available */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/11] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to add the support for ETR builtin scatter-gather mode
for dealing with large amount of trace buffers. However, on some of
the platforms, using the ETR SG mode can lock up the system due to
the way the ETR is connected to the memory subsystem.

In SG mode, the ETR performs READ from the scatter-gather table to
fetch the next page and regular WRITE of trace data. If the READ
operation doesn't complete(due to the memory subsystem issues,
which we have seen on a couple of platforms) the trace WRITE
cannot proceed leading to issues. So, we by default do not
use the SG mode, unless it is known to be safe on the platform.
We define a DT property for the TMC node to specify whether we
have a proper SG mode.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: John Horley <john.horley@arm.com>
Cc: Robert Walker <robert.walker@arm.com>
Cc: devicetree at vger.kernel.org
Cc: frowand.list at gmail.com
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/devicetree/bindings/arm/coresight.txt | 2 ++
 drivers/hwtracing/coresight/coresight-tmc.c         | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 15ac8e8..603d3c6 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -86,6 +86,8 @@ its hardware characteristcs.
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
 	 (embedded trace router)
 
+	* arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely
+	  use the SG mode on this system.
 
 Example:
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 93c5bfc..7d8331d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
@@ -304,6 +305,12 @@ const struct attribute_group *coresight_tmc_groups[] = {
 	NULL,
 };
 
+static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
+{
+	return fwnode_property_present(drvdata->dev->fwnode,
+				       "arm,scatter-gather");
+}
+
 /* Detect and initialise the capabilities of a TMC ETR */
 static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 			     u32 devid, void *dev_caps)
@@ -313,7 +320,7 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 	/* Set the unadvertised capabilities */
 	tmc_etr_init_caps(drvdata, (u32)(unsigned long)dev_caps);
 
-	if (!(devid & TMC_DEVID_NOSCAT))
+	if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata))
 		tmc_etr_set_cap(drvdata, TMC_ETR_SG);
 
 	/* Check if the AXI address width is available */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/11] dts: juno: Add scatter-gather support for all revisions
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose, Liviu Dudau,
	Lorenzo Pieralisi

Advertise that the scatter-gather is properly integrated on
all revisions of Juno board.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index eb749c5..6ce9090 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -198,6 +198,7 @@
 		clocks = <&soc_smc50mhz>;
 		clock-names = "apb_pclk";
 		power-domains = <&scpi_devpd 0>;
+		arm,scatter-gather;
 		port {
 			etr_in_port: endpoint {
 				slave-mode;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/11] dts: juno: Add scatter-gather support for all revisions
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Advertise that the scatter-gather is properly integrated on
all revisions of Juno board.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index eb749c5..6ce9090 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -198,6 +198,7 @@
 		clocks = <&soc_smc50mhz>;
 		clock-names = "apb_pclk";
 		power-domains = <&scpi_devpd 0>;
+		arm,scatter-gather;
 		port {
 			etr_in_port: endpoint {
 				slave-mode;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

This patch introduces a generic sg table data structure and
associated operations. An SG table can be used to map a set
of Data pages where the trace data could be stored by the TMC
ETR. The information about the data pages could be stored in
different formats, depending on the type of the underlying
SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
structure provides book keeping of the pages used for the data
as well as the table contents. The table should be filled by
the user of the infrastructure.

A table can be created by specifying the number of data pages
as well as the number of table pages required to hold the
pointers, where the latter could be different for different
types of tables. The pages are mapped in the appropriate dma
data direction mode (i.e, DMA_TO_DEVICE for table pages
and DMA_FROM_DEVICE for data pages).  The framework can optionally
accept a set of allocated data pages (e.g, perf ring buffer) and
map them accordingly. The table and data pages are vmap'ed to allow
easier access by the drivers. The framework also provides helpers to
sync the data written to the pages with appropriate directions.

This will be later used by the TMC ETR SG unit and CATU.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Address code style issues, more comments
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
 2 files changed, 340 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 9780798..1e844f8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,9 +17,299 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/*
+ * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
+ * and map the device address @addr to an offset within the virtual
+ * contiguous buffer.
+ */
+static long
+tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
+{
+	int i;
+	dma_addr_t page_start;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		page_start = tmc_pages->daddrs[i];
+		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
+			return i * PAGE_SIZE + (addr - page_start);
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * tmc_pages_free : Unmap and free the pages used by tmc_pages.
+ * If the pages were not allocated in tmc_pages_alloc(), we would
+ * simply drop the refcount.
+ */
+static void tmc_pages_free(struct tmc_pages *tmc_pages,
+			   struct device *dev, enum dma_data_direction dir)
+{
+	int i;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
+			dma_unmap_page(dev, tmc_pages->daddrs[i],
+					 PAGE_SIZE, dir);
+		if (tmc_pages->pages && tmc_pages->pages[i])
+			__free_page(tmc_pages->pages[i]);
+	}
+
+	kfree(tmc_pages->pages);
+	kfree(tmc_pages->daddrs);
+	tmc_pages->pages = NULL;
+	tmc_pages->daddrs = NULL;
+	tmc_pages->nr_pages = 0;
+}
+
+/*
+ * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
+ * If @pages is not NULL, the list of page virtual addresses are
+ * used as the data pages. The pages are then dma_map'ed for @dev
+ * with dma_direction @dir.
+ *
+ * Returns 0 upon success, else the error number.
+ */
+static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
+			   struct device *dev, int node,
+			   enum dma_data_direction dir, void **pages)
+{
+	int i, nr_pages;
+	dma_addr_t paddr;
+	struct page *page;
+
+	nr_pages = tmc_pages->nr_pages;
+	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
+					 GFP_KERNEL);
+	if (!tmc_pages->daddrs)
+		return -ENOMEM;
+	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
+					 GFP_KERNEL);
+	if (!tmc_pages->pages) {
+		kfree(tmc_pages->daddrs);
+		tmc_pages->daddrs = NULL;
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		if (pages && pages[i]) {
+			page = virt_to_page(pages[i]);
+			/* Hold a refcount on the page */
+			get_page(page);
+		} else {
+			page = alloc_pages_node(node,
+						GFP_KERNEL | __GFP_ZERO, 0);
+		}
+		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+		if (dma_mapping_error(dev, paddr))
+			goto err;
+		tmc_pages->daddrs[i] = paddr;
+		tmc_pages->pages[i] = page;
+	}
+	return 0;
+err:
+	tmc_pages_free(tmc_pages, dev, dir);
+	return -ENOMEM;
+}
+
+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return 0;
+	return sg_table->table_daddr;
+}
+
+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return NULL;
+	return sg_table->table_vaddr;
+}
+
+static inline void *
+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.nr_pages))
+		return 0;
+	return sg_table->data_vaddr;
+}
+
+static inline long
+tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
+{
+	return tmc_pages_get_offset(&sg_table->data_pages, addr);
+}
+
+static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->table_vaddr)
+		vunmap(sg_table->table_vaddr);
+	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
+}
+
+static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->data_vaddr)
+		vunmap(sg_table->data_vaddr);
+	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
+}
+
+void tmc_free_sg_table(struct tmc_sg_table *sg_table)
+{
+	tmc_free_table_pages(sg_table);
+	tmc_free_data_pages(sg_table);
+}
+
+/*
+ * Alloc pages for the table. Since this will be used by the device,
+ * allocate the pages closer to the device (i.e, dev_to_node(dev)
+ * rather than the CPU node).
+ */
+static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
+{
+	int rc;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	rc = tmc_pages_alloc(table_pages, sg_table->dev,
+			     dev_to_node(sg_table->dev),
+			     DMA_TO_DEVICE, NULL);
+	if (rc)
+		return rc;
+	sg_table->table_vaddr = vmap(table_pages->pages,
+				     table_pages->nr_pages,
+				     VM_MAP,
+				     PAGE_KERNEL);
+	if (!sg_table->table_vaddr)
+		rc = -ENOMEM;
+	else
+		sg_table->table_daddr = table_pages->daddrs[0];
+	return rc;
+}
+
+static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
+{
+	int rc;
+
+	/* Allocate data pages on the node requested by the caller */
+	rc = tmc_pages_alloc(&sg_table->data_pages,
+			     sg_table->dev, sg_table->node,
+			     DMA_FROM_DEVICE, pages);
+	if (!rc) {
+		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
+					    sg_table->data_pages.nr_pages,
+					    VM_MAP,
+					    PAGE_KERNEL);
+		if (!sg_table->data_vaddr)
+			rc = -ENOMEM;
+	}
+	return rc;
+}
+
+/*
+ * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
+ * and data buffers. TMC writes to the data buffers and reads from the SG
+ * Table pages.
+ *
+ * @dev		- Device to which page should be DMA mapped.
+ * @node	- Numa node for mem allocations
+ * @nr_tpages	- Number of pages for the table entries.
+ * @nr_dpages	- Number of pages for Data buffer.
+ * @pages	- Optional list of virtual address of pages.
+ */
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages)
+{
+	long rc;
+	struct tmc_sg_table *sg_table;
+
+	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
+	if (!sg_table)
+		return ERR_PTR(-ENOMEM);
+	sg_table->data_pages.nr_pages = nr_dpages;
+	sg_table->table_pages.nr_pages = nr_tpages;
+	sg_table->node = node;
+	sg_table->dev = dev;
+
+	rc  = tmc_alloc_data_pages(sg_table, pages);
+	if (!rc)
+		rc = tmc_alloc_table_pages(sg_table);
+	if (rc) {
+		tmc_free_sg_table(sg_table);
+		kfree(sg_table);
+		return ERR_PTR(rc);
+	}
+
+	return sg_table;
+}
+
+/*
+ * tmc_sg_table_sync_data_range: Sync the data buffer written
+ * by the device from @offset upto a @size bytes.
+ */
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size)
+{
+	int i, index, start;
+	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
+	struct device *dev = table->dev;
+	struct tmc_pages *data = &table->data_pages;
+
+	start = offset >> PAGE_SHIFT;
+	for (i = start; i < (start + npages); i++) {
+		index = i % data->nr_pages;
+		dma_sync_single_for_cpu(dev, data->daddrs[index],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+
+/* tmc_sg_sync_table: Sync the page table */
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
+{
+	int i;
+	struct device *dev = sg_table->dev;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	for (i = 0; i < table_pages->nr_pages; i++)
+		dma_sync_single_for_device(dev, table_pages->daddrs[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+}
+
+/*
+ * tmc_sg_table_get_data: Get the buffer pointer for data @offset
+ * in the SG buffer. The @bufpp is updated to point to the buffer.
+ * Returns :
+ *	the length of linear data available at @offset.
+ *	or
+ *	<= 0 if no data is available.
+ */
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp)
+{
+	size_t size;
+	int pg_idx = offset >> PAGE_SHIFT;
+	int pg_offset = offset & (PAGE_SIZE - 1);
+	struct tmc_pages *data_pages = &sg_table->data_pages;
+
+	size = tmc_sg_table_buf_size(sg_table);
+	if (offset >= size)
+		return -EINVAL;
+
+	/* Make sure we don't go beyond the end */
+	len = (len < (size - offset)) ? len : size - offset;
+	/* Respect the page boundaries */
+	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
+	if (len > 0)
+		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
+	return len;
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 73f944d..19a765c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -18,6 +18,7 @@
 #ifndef _CORESIGHT_TMC_H
 #define _CORESIGHT_TMC_H
 
+#include <linux/dma-mapping.h>
 #include <linux/miscdevice.h>
 
 #define TMC_RSZ			0x004
@@ -171,6 +172,38 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+/**
+ * struct tmc_pages - Collection of pages used for SG.
+ * @nr_pages:		Number of pages in the list.
+ * @daddrs:		Array of DMA'able page address.
+ * @pages:		Array pages for the buffer.
+ */
+struct tmc_pages {
+	int nr_pages;
+	dma_addr_t	*daddrs;
+	struct page	**pages;
+};
+
+/*
+ * struct tmc_sg_table - Generic SG table for TMC
+ * @dev:		Device for DMA allocations
+ * @table_vaddr:	Contiguous Virtual address for PageTable
+ * @data_vaddr:		Contiguous Virtual address for Data Buffer
+ * @table_daddr:	DMA address of the PageTable base
+ * @node:		Node for Page allocations
+ * @table_pages:	List of pages & dma address for Table
+ * @data_pages:		List of pages & dma address for Data
+ */
+struct tmc_sg_table {
+	struct device *dev;
+	void *table_vaddr;
+	void *data_vaddr;
+	dma_addr_t table_daddr;
+	int node;
+	struct tmc_pages table_pages;
+	struct tmc_pages data_pages;
+};
+
 /* Generic functions */
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
 void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
@@ -226,4 +259,21 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
 	return !!(drvdata->etr_caps & cap);
 }
 
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages);
+void tmc_free_sg_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size);
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp);
+static inline unsigned long
+tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
+{
+	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
+}
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

This patch introduces a generic sg table data structure and
associated operations. An SG table can be used to map a set
of Data pages where the trace data could be stored by the TMC
ETR. The information about the data pages could be stored in
different formats, depending on the type of the underlying
SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
structure provides book keeping of the pages used for the data
as well as the table contents. The table should be filled by
the user of the infrastructure.

A table can be created by specifying the number of data pages
as well as the number of table pages required to hold the
pointers, where the latter could be different for different
types of tables. The pages are mapped in the appropriate dma
data direction mode (i.e, DMA_TO_DEVICE for table pages
and DMA_FROM_DEVICE for data pages).  The framework can optionally
accept a set of allocated data pages (e.g, perf ring buffer) and
map them accordingly. The table and data pages are vmap'ed to allow
easier access by the drivers. The framework also provides helpers to
sync the data written to the pages with appropriate directions.

This will be later used by the TMC ETR SG unit and CATU.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Address code style issues, more comments
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
 2 files changed, 340 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 9780798..1e844f8 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,9 +17,299 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/*
+ * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
+ * and map the device address @addr to an offset within the virtual
+ * contiguous buffer.
+ */
+static long
+tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
+{
+	int i;
+	dma_addr_t page_start;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		page_start = tmc_pages->daddrs[i];
+		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
+			return i * PAGE_SIZE + (addr - page_start);
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * tmc_pages_free : Unmap and free the pages used by tmc_pages.
+ * If the pages were not allocated in tmc_pages_alloc(), we would
+ * simply drop the refcount.
+ */
+static void tmc_pages_free(struct tmc_pages *tmc_pages,
+			   struct device *dev, enum dma_data_direction dir)
+{
+	int i;
+
+	for (i = 0; i < tmc_pages->nr_pages; i++) {
+		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
+			dma_unmap_page(dev, tmc_pages->daddrs[i],
+					 PAGE_SIZE, dir);
+		if (tmc_pages->pages && tmc_pages->pages[i])
+			__free_page(tmc_pages->pages[i]);
+	}
+
+	kfree(tmc_pages->pages);
+	kfree(tmc_pages->daddrs);
+	tmc_pages->pages = NULL;
+	tmc_pages->daddrs = NULL;
+	tmc_pages->nr_pages = 0;
+}
+
+/*
+ * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
+ * If @pages is not NULL, the list of page virtual addresses are
+ * used as the data pages. The pages are then dma_map'ed for @dev
+ * with dma_direction @dir.
+ *
+ * Returns 0 upon success, else the error number.
+ */
+static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
+			   struct device *dev, int node,
+			   enum dma_data_direction dir, void **pages)
+{
+	int i, nr_pages;
+	dma_addr_t paddr;
+	struct page *page;
+
+	nr_pages = tmc_pages->nr_pages;
+	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
+					 GFP_KERNEL);
+	if (!tmc_pages->daddrs)
+		return -ENOMEM;
+	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
+					 GFP_KERNEL);
+	if (!tmc_pages->pages) {
+		kfree(tmc_pages->daddrs);
+		tmc_pages->daddrs = NULL;
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		if (pages && pages[i]) {
+			page = virt_to_page(pages[i]);
+			/* Hold a refcount on the page */
+			get_page(page);
+		} else {
+			page = alloc_pages_node(node,
+						GFP_KERNEL | __GFP_ZERO, 0);
+		}
+		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+		if (dma_mapping_error(dev, paddr))
+			goto err;
+		tmc_pages->daddrs[i] = paddr;
+		tmc_pages->pages[i] = page;
+	}
+	return 0;
+err:
+	tmc_pages_free(tmc_pages, dev, dir);
+	return -ENOMEM;
+}
+
+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return 0;
+	return sg_table->table_daddr;
+}
+
+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.pages[0]))
+		return NULL;
+	return sg_table->table_vaddr;
+}
+
+static inline void *
+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
+{
+	if (WARN_ON(!sg_table->data_pages.nr_pages))
+		return 0;
+	return sg_table->data_vaddr;
+}
+
+static inline long
+tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
+{
+	return tmc_pages_get_offset(&sg_table->data_pages, addr);
+}
+
+static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->table_vaddr)
+		vunmap(sg_table->table_vaddr);
+	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
+}
+
+static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
+{
+	if (sg_table->data_vaddr)
+		vunmap(sg_table->data_vaddr);
+	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
+}
+
+void tmc_free_sg_table(struct tmc_sg_table *sg_table)
+{
+	tmc_free_table_pages(sg_table);
+	tmc_free_data_pages(sg_table);
+}
+
+/*
+ * Alloc pages for the table. Since this will be used by the device,
+ * allocate the pages closer to the device (i.e, dev_to_node(dev)
+ * rather than the CPU node).
+ */
+static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
+{
+	int rc;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	rc = tmc_pages_alloc(table_pages, sg_table->dev,
+			     dev_to_node(sg_table->dev),
+			     DMA_TO_DEVICE, NULL);
+	if (rc)
+		return rc;
+	sg_table->table_vaddr = vmap(table_pages->pages,
+				     table_pages->nr_pages,
+				     VM_MAP,
+				     PAGE_KERNEL);
+	if (!sg_table->table_vaddr)
+		rc = -ENOMEM;
+	else
+		sg_table->table_daddr = table_pages->daddrs[0];
+	return rc;
+}
+
+static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
+{
+	int rc;
+
+	/* Allocate data pages on the node requested by the caller */
+	rc = tmc_pages_alloc(&sg_table->data_pages,
+			     sg_table->dev, sg_table->node,
+			     DMA_FROM_DEVICE, pages);
+	if (!rc) {
+		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
+					    sg_table->data_pages.nr_pages,
+					    VM_MAP,
+					    PAGE_KERNEL);
+		if (!sg_table->data_vaddr)
+			rc = -ENOMEM;
+	}
+	return rc;
+}
+
+/*
+ * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
+ * and data buffers. TMC writes to the data buffers and reads from the SG
+ * Table pages.
+ *
+ * @dev		- Device to which page should be DMA mapped.
+ * @node	- Numa node for mem allocations
+ * @nr_tpages	- Number of pages for the table entries.
+ * @nr_dpages	- Number of pages for Data buffer.
+ * @pages	- Optional list of virtual address of pages.
+ */
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages)
+{
+	long rc;
+	struct tmc_sg_table *sg_table;
+
+	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
+	if (!sg_table)
+		return ERR_PTR(-ENOMEM);
+	sg_table->data_pages.nr_pages = nr_dpages;
+	sg_table->table_pages.nr_pages = nr_tpages;
+	sg_table->node = node;
+	sg_table->dev = dev;
+
+	rc  = tmc_alloc_data_pages(sg_table, pages);
+	if (!rc)
+		rc = tmc_alloc_table_pages(sg_table);
+	if (rc) {
+		tmc_free_sg_table(sg_table);
+		kfree(sg_table);
+		return ERR_PTR(rc);
+	}
+
+	return sg_table;
+}
+
+/*
+ * tmc_sg_table_sync_data_range: Sync the data buffer written
+ * by the device from @offset upto a @size bytes.
+ */
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size)
+{
+	int i, index, start;
+	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
+	struct device *dev = table->dev;
+	struct tmc_pages *data = &table->data_pages;
+
+	start = offset >> PAGE_SHIFT;
+	for (i = start; i < (start + npages); i++) {
+		index = i % data->nr_pages;
+		dma_sync_single_for_cpu(dev, data->daddrs[index],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+
+/* tmc_sg_sync_table: Sync the page table */
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
+{
+	int i;
+	struct device *dev = sg_table->dev;
+	struct tmc_pages *table_pages = &sg_table->table_pages;
+
+	for (i = 0; i < table_pages->nr_pages; i++)
+		dma_sync_single_for_device(dev, table_pages->daddrs[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+}
+
+/*
+ * tmc_sg_table_get_data: Get the buffer pointer for data @offset
+ * in the SG buffer. The @bufpp is updated to point to the buffer.
+ * Returns :
+ *	the length of linear data available at @offset.
+ *	or
+ *	<= 0 if no data is available.
+ */
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp)
+{
+	size_t size;
+	int pg_idx = offset >> PAGE_SHIFT;
+	int pg_offset = offset & (PAGE_SIZE - 1);
+	struct tmc_pages *data_pages = &sg_table->data_pages;
+
+	size = tmc_sg_table_buf_size(sg_table);
+	if (offset >= size)
+		return -EINVAL;
+
+	/* Make sure we don't go beyond the end */
+	len = (len < (size - offset)) ? len : size - offset;
+	/* Respect the page boundaries */
+	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
+	if (len > 0)
+		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
+	return len;
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 73f944d..19a765c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -18,6 +18,7 @@
 #ifndef _CORESIGHT_TMC_H
 #define _CORESIGHT_TMC_H
 
+#include <linux/dma-mapping.h>
 #include <linux/miscdevice.h>
 
 #define TMC_RSZ			0x004
@@ -171,6 +172,38 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+/**
+ * struct tmc_pages - Collection of pages used for SG.
+ * @nr_pages:		Number of pages in the list.
+ * @daddrs:		Array of DMA'able page address.
+ * @pages:		Array pages for the buffer.
+ */
+struct tmc_pages {
+	int nr_pages;
+	dma_addr_t	*daddrs;
+	struct page	**pages;
+};
+
+/*
+ * struct tmc_sg_table - Generic SG table for TMC
+ * @dev:		Device for DMA allocations
+ * @table_vaddr:	Contiguous Virtual address for PageTable
+ * @data_vaddr:		Contiguous Virtual address for Data Buffer
+ * @table_daddr:	DMA address of the PageTable base
+ * @node:		Node for Page allocations
+ * @table_pages:	List of pages & dma address for Table
+ * @data_pages:		List of pages & dma address for Data
+ */
+struct tmc_sg_table {
+	struct device *dev;
+	void *table_vaddr;
+	void *data_vaddr;
+	dma_addr_t table_daddr;
+	int node;
+	struct tmc_pages table_pages;
+	struct tmc_pages data_pages;
+};
+
 /* Generic functions */
 void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
 void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
@@ -226,4 +259,21 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
 	return !!(drvdata->etr_caps & cap);
 }
 
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
+					int node,
+					int nr_tpages,
+					int nr_dpages,
+					void **pages);
+void tmc_free_sg_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
+void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
+				  u64 offset, u64 size);
+ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
+			      u64 offset, size_t len, char **bufpp);
+static inline unsigned long
+tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
+{
+	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
+}
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/11] coresight: Add support for TMC ETR SG unit
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

This patch adds support for setting up an SG table used by the
TMC ETR inbuilt SG unit. The TMC ETR uses 4K page sized tables
to hold pointers to the 4K data pages with the last entry in a
table pointing to the next table with the entries, by kind of
chaining. The 2 LSBs determine the type of the table entry, to
one of :

 Normal - Points to a 4KB data page.
 Last   - Points to a 4KB data page, but is the last entry in the
          page table.
 Link   - Points to another 4KB table page with pointers to data.

The code takes care of handling the system page size which could
be different than 4K. So we could end up putting multiple ETR
SG tables in a single system page, vice versa for the data pages.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 263 ++++++++++++++++++++++++
 1 file changed, 263 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 1e844f8..7ab0fd1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -22,6 +22,87 @@
 #include "coresight-tmc.h"
 
 /*
+ * The TMC ETR SG has a page size of 4K. The SG table contains pointers
+ * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
+ * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
+ * contain more than one SG buffer and tables.
+ *
+ * A table entry has the following format:
+ *
+ * ---Bit31------------Bit4-------Bit1-----Bit0--
+ * |     Address[39:12]    | SBZ |  Entry Type  |
+ * ----------------------------------------------
+ *
+ * Address: Bits [39:12] of a physical page address. Bits [11:0] are
+ *	    always zero.
+ *
+ * Entry type:
+ *	b00 - Reserved.
+ *	b01 - Last entry in the tables, points to 4K page buffer.
+ *	b10 - Normal entry, points to 4K page buffer.
+ *	b11 - Link. The address points to the base of next table.
+ */
+
+typedef u32 sgte_t;
+
+#define ETR_SG_PAGE_SHIFT		12
+#define ETR_SG_PAGE_SIZE		(1UL << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_PAGES_PER_SYSPAGE	(PAGE_SIZE / ETR_SG_PAGE_SIZE)
+#define ETR_SG_PTRS_PER_PAGE		(ETR_SG_PAGE_SIZE / sizeof(sgte_t))
+#define ETR_SG_PTRS_PER_SYSPAGE		(PAGE_SIZE / sizeof(sgte_t))
+
+#define ETR_SG_ET_MASK			0x3
+#define ETR_SG_ET_LAST			0x1
+#define ETR_SG_ET_NORMAL		0x2
+#define ETR_SG_ET_LINK			0x3
+
+#define ETR_SG_ADDR_SHIFT		4
+
+#define ETR_SG_ENTRY(addr, type) \
+	(sgte_t)((((addr) >> ETR_SG_PAGE_SHIFT) << ETR_SG_ADDR_SHIFT) | \
+		 (type & ETR_SG_ET_MASK))
+
+#define ETR_SG_ADDR(entry) \
+	(((dma_addr_t)(entry) >> ETR_SG_ADDR_SHIFT) << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_ET(entry)		((entry) & ETR_SG_ET_MASK)
+
+/*
+ * struct etr_sg_table : ETR SG Table
+ * @sg_table:		Generic SG Table holding the data/table pages.
+ * @hwaddr:		hwaddress used by the TMC, which is the base
+ *			address of the table.
+ */
+struct etr_sg_table {
+	struct tmc_sg_table	*sg_table;
+	dma_addr_t		hwaddr;
+};
+
+/*
+ * tmc_etr_sg_table_entries: Total number of table entries required to map
+ * @nr_pages system pages.
+ *
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
+ * with the last entry pointing to another page of table entries.
+ * If we spill over to a new page for mapping 1 entry, we could as
+ * well replace the link entry of the previous page with the last entry.
+ */
+static inline unsigned long __attribute_const__
+tmc_etr_sg_table_entries(int nr_pages)
+{
+	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
+	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
+	/*
+	 * If we spill over to a new page for 1 entry, we could as well
+	 * make it the LAST entry in the previous page, skipping the Link
+	 * address.
+	 */
+	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+		nr_sglinks--;
+	return nr_sgpages + nr_sglinks;
+}
+
+/*
  * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
  * and map the device address @addr to an offset within the virtual
  * contiguous buffer.
@@ -310,6 +391,188 @@ ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
 	return len;
 }
 
+#ifdef ETR_SG_DEBUG
+/* Map a dma address to virtual address */
+static unsigned long
+tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
+		      dma_addr_t addr, bool table)
+{
+	long offset;
+	unsigned long base;
+	struct tmc_pages *tmc_pages;
+
+	if (table) {
+		tmc_pages = &sg_table->table_pages;
+		base = (unsigned long)sg_table->table_vaddr;
+	} else {
+		tmc_pages = &sg_table->data_pages;
+		base = (unsigned long)sg_table->data_vaddr;
+	}
+
+	offset = tmc_pages_get_offset(tmc_pages, addr);
+	if (offset < 0)
+		return 0;
+	return base + offset;
+}
+
+/* Dump the given sg_table */
+static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
+{
+	sgte_t *ptr;
+	int i = 0;
+	dma_addr_t addr;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+
+	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+					      etr_table->hwaddr, true);
+	while (ptr) {
+		addr = ETR_SG_ADDR(*ptr);
+		switch (ETR_SG_ET(*ptr)) {
+		case ETR_SG_ET_NORMAL:
+			dev_dbg(sg_table->dev,
+				"%05d: %p\t:[N] 0x%llx\n", i, ptr, addr);
+			ptr++;
+			break;
+		case ETR_SG_ET_LINK:
+			dev_dbg(sg_table->dev,
+				"%05d: *** %p\t:{L} 0x%llx ***\n",
+				 i, ptr, addr);
+			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+							      addr, true);
+			break;
+		case ETR_SG_ET_LAST:
+			dev_dbg(sg_table->dev,
+				"%05d: ### %p\t:[L] 0x%llx ###\n",
+				 i, ptr, addr);
+			return;
+		default:
+			dev_dbg(sg_table->dev,
+				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
+				 i, ptr, addr);
+			return;
+		}
+		i++;
+	}
+	dev_dbg(sg_table->dev, "******* End of Table *****\n");
+}
+#else
+static inline void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table) {}
+#endif
+
+/*
+ * Populate the SG Table page table entries from table/data
+ * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages.
+ * So does a Table page. So we keep track of indices of the tables
+ * in each system page and move the pointers accordingly.
+ */
+#define INC_IDX_ROUND(idx, size) ((idx) = ((idx) + 1) % (size))
+static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
+{
+	dma_addr_t paddr;
+	int i, type, nr_entries;
+	int tpidx = 0; /* index to the current system table_page */
+	int sgtidx = 0;	/* index to the sg_table within the current syspage */
+	int sgtentry = 0; /* the entry within the sg_table */
+	int dpidx = 0; /* index to the current system data_page */
+	int spidx = 0; /* index to the SG page within the current data page */
+	sgte_t *ptr; /* pointer to the table entry to fill */
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
+	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
+
+	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
+	/*
+	 * Use the contiguous virtual address of the table to update entries.
+	 */
+	ptr = sg_table->table_vaddr;
+	/*
+	 * Fill all the entries, except the last entry to avoid special
+	 * checks within the loop.
+	 */
+	for (i = 0; i < nr_entries - 1; i++) {
+		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
+			/*
+			 * Last entry in a sg_table page is a link address to
+			 * the next table page. If this sg_table is the last
+			 * one in the system page, it links to the first
+			 * sg_table in the next system page. Otherwise, it
+			 * links to the next sg_table page within the system
+			 * page.
+			 */
+			if (sgtidx == ETR_SG_PAGES_PER_SYSPAGE - 1) {
+				paddr = table_daddrs[tpidx + 1];
+			} else {
+				paddr = table_daddrs[tpidx] +
+					(ETR_SG_PAGE_SIZE * (sgtidx + 1));
+			}
+			type = ETR_SG_ET_LINK;
+		} else {
+			/*
+			 * Update the indices to the data_pages to point to the
+			 * next sg_page in the data buffer.
+			 */
+			type = ETR_SG_ET_NORMAL;
+			paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+			if (!INC_IDX_ROUND(spidx, ETR_SG_PAGES_PER_SYSPAGE))
+				dpidx++;
+		}
+		*ptr++ = ETR_SG_ENTRY(paddr, type);
+		/*
+		 * Move to the next table pointer, moving the table page index
+		 * if necessary
+		 */
+		if (!INC_IDX_ROUND(sgtentry, ETR_SG_PTRS_PER_PAGE)) {
+			if (!INC_IDX_ROUND(sgtidx, ETR_SG_PAGES_PER_SYSPAGE))
+				tpidx++;
+		}
+	}
+
+	/* Set up the last entry, which is always a data pointer */
+	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+}
+
+/*
+ * tmc_init_etr_sg_table: Allocate a TMC ETR SG table, data buffer of @size and
+ * populate the table.
+ *
+ * @dev		- Device pointer for the TMC
+ * @node	- NUMA node where the memory should be allocated
+ * @size	- Total size of the data buffer
+ * @pages	- Optional list of page virtual address
+ */
+static struct etr_sg_table __maybe_unused *
+tmc_init_etr_sg_table(struct device *dev, int node,
+		  unsigned long size, void **pages)
+{
+	int nr_entries, nr_tpages;
+	int nr_dpages = size >> PAGE_SHIFT;
+	struct tmc_sg_table *sg_table;
+	struct etr_sg_table *etr_table;
+
+	etr_table = kzalloc(sizeof(*etr_table), GFP_KERNEL);
+	if (!etr_table)
+		return ERR_PTR(-ENOMEM);
+	nr_entries = tmc_etr_sg_table_entries(nr_dpages);
+	nr_tpages = DIV_ROUND_UP(nr_entries, ETR_SG_PTRS_PER_SYSPAGE);
+
+	sg_table = tmc_alloc_sg_table(dev, node, nr_tpages, nr_dpages, pages);
+	if (IS_ERR(sg_table)) {
+		kfree(etr_table);
+		return ERR_PTR(PTR_ERR(sg_table));
+	}
+
+	etr_table->sg_table = sg_table;
+	/* TMC should use table base address for DBA */
+	etr_table->hwaddr = sg_table->table_daddr;
+	tmc_etr_sg_table_populate(etr_table);
+	/* Sync the table pages for the HW */
+	tmc_sg_table_sync_table(sg_table);
+	tmc_etr_sg_table_dump(etr_table);
+
+	return etr_table;
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/11] coresight: Add support for TMC ETR SG unit
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for setting up an SG table used by the
TMC ETR inbuilt SG unit. The TMC ETR uses 4K page sized tables
to hold pointers to the 4K data pages with the last entry in a
table pointing to the next table with the entries, by kind of
chaining. The 2 LSBs determine the type of the table entry, to
one of :

 Normal - Points to a 4KB data page.
 Last   - Points to a 4KB data page, but is the last entry in the
          page table.
 Link   - Points to another 4KB table page with pointers to data.

The code takes care of handling the system page size which could
be different than 4K. So we could end up putting multiple ETR
SG tables in a single system page, vice versa for the data pages.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 263 ++++++++++++++++++++++++
 1 file changed, 263 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 1e844f8..7ab0fd1 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -22,6 +22,87 @@
 #include "coresight-tmc.h"
 
 /*
+ * The TMC ETR SG has a page size of 4K. The SG table contains pointers
+ * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
+ * 4K (i.e, 16KB or 64KB). This implies that a single OS page could
+ * contain more than one SG buffer and tables.
+ *
+ * A table entry has the following format:
+ *
+ * ---Bit31------------Bit4-------Bit1-----Bit0--
+ * |     Address[39:12]    | SBZ |  Entry Type  |
+ * ----------------------------------------------
+ *
+ * Address: Bits [39:12] of a physical page address. Bits [11:0] are
+ *	    always zero.
+ *
+ * Entry type:
+ *	b00 - Reserved.
+ *	b01 - Last entry in the tables, points to 4K page buffer.
+ *	b10 - Normal entry, points to 4K page buffer.
+ *	b11 - Link. The address points to the base of next table.
+ */
+
+typedef u32 sgte_t;
+
+#define ETR_SG_PAGE_SHIFT		12
+#define ETR_SG_PAGE_SIZE		(1UL << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_PAGES_PER_SYSPAGE	(PAGE_SIZE / ETR_SG_PAGE_SIZE)
+#define ETR_SG_PTRS_PER_PAGE		(ETR_SG_PAGE_SIZE / sizeof(sgte_t))
+#define ETR_SG_PTRS_PER_SYSPAGE		(PAGE_SIZE / sizeof(sgte_t))
+
+#define ETR_SG_ET_MASK			0x3
+#define ETR_SG_ET_LAST			0x1
+#define ETR_SG_ET_NORMAL		0x2
+#define ETR_SG_ET_LINK			0x3
+
+#define ETR_SG_ADDR_SHIFT		4
+
+#define ETR_SG_ENTRY(addr, type) \
+	(sgte_t)((((addr) >> ETR_SG_PAGE_SHIFT) << ETR_SG_ADDR_SHIFT) | \
+		 (type & ETR_SG_ET_MASK))
+
+#define ETR_SG_ADDR(entry) \
+	(((dma_addr_t)(entry) >> ETR_SG_ADDR_SHIFT) << ETR_SG_PAGE_SHIFT)
+#define ETR_SG_ET(entry)		((entry) & ETR_SG_ET_MASK)
+
+/*
+ * struct etr_sg_table : ETR SG Table
+ * @sg_table:		Generic SG Table holding the data/table pages.
+ * @hwaddr:		hwaddress used by the TMC, which is the base
+ *			address of the table.
+ */
+struct etr_sg_table {
+	struct tmc_sg_table	*sg_table;
+	dma_addr_t		hwaddr;
+};
+
+/*
+ * tmc_etr_sg_table_entries: Total number of table entries required to map
+ * @nr_pages system pages.
+ *
+ * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages.
+ * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers,
+ * with the last entry pointing to another page of table entries.
+ * If we spill over to a new page for mapping 1 entry, we could as
+ * well replace the link entry of the previous page with the last entry.
+ */
+static inline unsigned long __attribute_const__
+tmc_etr_sg_table_entries(int nr_pages)
+{
+	unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE;
+	unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1);
+	/*
+	 * If we spill over to a new page for 1 entry, we could as well
+	 * make it the LAST entry in the previous page, skipping the Link
+	 * address.
+	 */
+	if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2))
+		nr_sglinks--;
+	return nr_sgpages + nr_sglinks;
+}
+
+/*
  * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
  * and map the device address @addr to an offset within the virtual
  * contiguous buffer.
@@ -310,6 +391,188 @@ ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
 	return len;
 }
 
+#ifdef ETR_SG_DEBUG
+/* Map a dma address to virtual address */
+static unsigned long
+tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table,
+		      dma_addr_t addr, bool table)
+{
+	long offset;
+	unsigned long base;
+	struct tmc_pages *tmc_pages;
+
+	if (table) {
+		tmc_pages = &sg_table->table_pages;
+		base = (unsigned long)sg_table->table_vaddr;
+	} else {
+		tmc_pages = &sg_table->data_pages;
+		base = (unsigned long)sg_table->data_vaddr;
+	}
+
+	offset = tmc_pages_get_offset(tmc_pages, addr);
+	if (offset < 0)
+		return 0;
+	return base + offset;
+}
+
+/* Dump the given sg_table */
+static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table)
+{
+	sgte_t *ptr;
+	int i = 0;
+	dma_addr_t addr;
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+
+	ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+					      etr_table->hwaddr, true);
+	while (ptr) {
+		addr = ETR_SG_ADDR(*ptr);
+		switch (ETR_SG_ET(*ptr)) {
+		case ETR_SG_ET_NORMAL:
+			dev_dbg(sg_table->dev,
+				"%05d: %p\t:[N] 0x%llx\n", i, ptr, addr);
+			ptr++;
+			break;
+		case ETR_SG_ET_LINK:
+			dev_dbg(sg_table->dev,
+				"%05d: *** %p\t:{L} 0x%llx ***\n",
+				 i, ptr, addr);
+			ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table,
+							      addr, true);
+			break;
+		case ETR_SG_ET_LAST:
+			dev_dbg(sg_table->dev,
+				"%05d: ### %p\t:[L] 0x%llx ###\n",
+				 i, ptr, addr);
+			return;
+		default:
+			dev_dbg(sg_table->dev,
+				"%05d: xxx %p\t:[INVALID] 0x%llx xxx\n",
+				 i, ptr, addr);
+			return;
+		}
+		i++;
+	}
+	dev_dbg(sg_table->dev, "******* End of Table *****\n");
+}
+#else
+static inline void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table) {}
+#endif
+
+/*
+ * Populate the SG Table page table entries from table/data
+ * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages.
+ * So does a Table page. So we keep track of indices of the tables
+ * in each system page and move the pointers accordingly.
+ */
+#define INC_IDX_ROUND(idx, size) ((idx) = ((idx) + 1) % (size))
+static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
+{
+	dma_addr_t paddr;
+	int i, type, nr_entries;
+	int tpidx = 0; /* index to the current system table_page */
+	int sgtidx = 0;	/* index to the sg_table within the current syspage */
+	int sgtentry = 0; /* the entry within the sg_table */
+	int dpidx = 0; /* index to the current system data_page */
+	int spidx = 0; /* index to the SG page within the current data page */
+	sgte_t *ptr; /* pointer to the table entry to fill */
+	struct tmc_sg_table *sg_table = etr_table->sg_table;
+	dma_addr_t *table_daddrs = sg_table->table_pages.daddrs;
+	dma_addr_t *data_daddrs = sg_table->data_pages.daddrs;
+
+	nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages);
+	/*
+	 * Use the contiguous virtual address of the table to update entries.
+	 */
+	ptr = sg_table->table_vaddr;
+	/*
+	 * Fill all the entries, except the last entry to avoid special
+	 * checks within the loop.
+	 */
+	for (i = 0; i < nr_entries - 1; i++) {
+		if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) {
+			/*
+			 * Last entry in a sg_table page is a link address to
+			 * the next table page. If this sg_table is the last
+			 * one in the system page, it links to the first
+			 * sg_table in the next system page. Otherwise, it
+			 * links to the next sg_table page within the system
+			 * page.
+			 */
+			if (sgtidx == ETR_SG_PAGES_PER_SYSPAGE - 1) {
+				paddr = table_daddrs[tpidx + 1];
+			} else {
+				paddr = table_daddrs[tpidx] +
+					(ETR_SG_PAGE_SIZE * (sgtidx + 1));
+			}
+			type = ETR_SG_ET_LINK;
+		} else {
+			/*
+			 * Update the indices to the data_pages to point to the
+			 * next sg_page in the data buffer.
+			 */
+			type = ETR_SG_ET_NORMAL;
+			paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+			if (!INC_IDX_ROUND(spidx, ETR_SG_PAGES_PER_SYSPAGE))
+				dpidx++;
+		}
+		*ptr++ = ETR_SG_ENTRY(paddr, type);
+		/*
+		 * Move to the next table pointer, moving the table page index
+		 * if necessary
+		 */
+		if (!INC_IDX_ROUND(sgtentry, ETR_SG_PTRS_PER_PAGE)) {
+			if (!INC_IDX_ROUND(sgtidx, ETR_SG_PAGES_PER_SYSPAGE))
+				tpidx++;
+		}
+	}
+
+	/* Set up the last entry, which is always a data pointer */
+	paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE;
+	*ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST);
+}
+
+/*
+ * tmc_init_etr_sg_table: Allocate a TMC ETR SG table, data buffer of @size and
+ * populate the table.
+ *
+ * @dev		- Device pointer for the TMC
+ * @node	- NUMA node where the memory should be allocated
+ * @size	- Total size of the data buffer
+ * @pages	- Optional list of page virtual address
+ */
+static struct etr_sg_table __maybe_unused *
+tmc_init_etr_sg_table(struct device *dev, int node,
+		  unsigned long size, void **pages)
+{
+	int nr_entries, nr_tpages;
+	int nr_dpages = size >> PAGE_SHIFT;
+	struct tmc_sg_table *sg_table;
+	struct etr_sg_table *etr_table;
+
+	etr_table = kzalloc(sizeof(*etr_table), GFP_KERNEL);
+	if (!etr_table)
+		return ERR_PTR(-ENOMEM);
+	nr_entries = tmc_etr_sg_table_entries(nr_dpages);
+	nr_tpages = DIV_ROUND_UP(nr_entries, ETR_SG_PTRS_PER_SYSPAGE);
+
+	sg_table = tmc_alloc_sg_table(dev, node, nr_tpages, nr_dpages, pages);
+	if (IS_ERR(sg_table)) {
+		kfree(etr_table);
+		return ERR_PTR(PTR_ERR(sg_table));
+	}
+
+	etr_table->sg_table = sg_table;
+	/* TMC should use table base address for DBA */
+	etr_table->hwaddr = sg_table->table_daddr;
+	tmc_etr_sg_table_populate(etr_table);
+	/* Sync the table pages for the HW */
+	tmc_sg_table_sync_table(sg_table);
+	tmc_etr_sg_table_dump(etr_table);
+
+	return etr_table;
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/11] coresight: tmc-etr: Add transparent buffer management
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

At the moment we always use contiguous memory for TMC ETR tracing
when used from sysfs. The size of the buffer is fixed at boot time
and can only be changed by modifiying the DT. With the introduction
of SG support we could support really large buffers in that mode.
This patch abstracts the buffer used for ETR to switch between a
contiguous buffer or a SG table depending on the availability of
the memory.

This also enables the sysfs mode to use the ETR in SG mode depending
on configured the trace buffer size. Also, since ETR will use the
new infrastructure to manage the buffer, we can get rid of some
of the members in the tmc_drvdata and clean up the fields a bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 450 +++++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
 2 files changed, 418 insertions(+), 89 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7ab0fd1..143afba 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,10 +17,18 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/iommu.h>
 #include <linux/slab.h>
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+struct etr_flat_buf {
+	struct device	*dev;
+	dma_addr_t	daddr;
+	void		*vaddr;
+	size_t		size;
+};
+
 /*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
@@ -541,7 +549,7 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
  * @size	- Total size of the data buffer
  * @pages	- Optional list of page virtual address
  */
-static struct etr_sg_table __maybe_unused *
+static struct etr_sg_table *
 tmc_init_etr_sg_table(struct device *dev, int node,
 		  unsigned long size, void **pages)
 {
@@ -573,16 +581,307 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	return etr_table;
 }
 
+/*
+ * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
+ */
+static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
+				  struct etr_buf *etr_buf, int node,
+				  void **pages)
+{
+	struct etr_flat_buf *flat_buf;
+
+	/* We cannot reuse existing pages for flat buf */
+	if (pages)
+		return -EINVAL;
+
+	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
+	if (!flat_buf)
+		return -ENOMEM;
+
+	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
+					   &flat_buf->daddr, GFP_KERNEL);
+	if (!flat_buf->vaddr) {
+		kfree(flat_buf);
+		return -ENOMEM;
+	}
+
+	flat_buf->size = etr_buf->size;
+	flat_buf->dev = drvdata->dev;
+	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->mode = ETR_MODE_FLAT;
+	etr_buf->private = flat_buf;
+	return 0;
+}
+
+static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (flat_buf && flat_buf->daddr)
+		dma_free_coherent(flat_buf->dev, flat_buf->size,
+				  flat_buf->vaddr, flat_buf->daddr);
+	kfree(flat_buf);
+}
+
+static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	/*
+	 * Adjust the buffer to point to the beginning of the trace data
+	 * and update the available trace data.
+	 */
+	etr_buf->offset = rrp - etr_buf->hwaddr;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = rwp - rrp;
+}
+
+static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
+					 u64 offset, size_t len, char **bufpp)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	*bufpp = (char *)flat_buf->vaddr + offset;
+	/*
+	 * tmc_etr_buf_get_data already adjusts the length to handle
+	 * buffer wrapping around.
+	 */
+	return len;
+}
+
+static const struct etr_buf_operations etr_flat_buf_ops = {
+	.alloc = tmc_etr_alloc_flat_buf,
+	.free = tmc_etr_free_flat_buf,
+	.sync = tmc_etr_sync_flat_buf,
+	.get_data = tmc_etr_get_data_flat_buf,
+};
+
+/*
+ * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
+ * appropriately.
+ */
+static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
+				struct etr_buf *etr_buf, int node,
+				void **pages)
+{
+	struct etr_sg_table *etr_table;
+
+	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
+					  etr_buf->size, pages);
+	if (IS_ERR(etr_table))
+		return -ENOMEM;
+	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->mode = ETR_MODE_ETR_SG;
+	etr_buf->private = etr_table;
+	return 0;
+}
+
+static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	if (etr_table) {
+		tmc_free_sg_table(etr_table->sg_table);
+		kfree(etr_table);
+	}
+}
+
+static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
+				       size_t len, char **bufpp)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
+}
+
+static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	long r_offset, w_offset;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct tmc_sg_table *table = etr_table->sg_table;
+
+	/* Convert hw address to offset in the buffer */
+	r_offset = tmc_sg_get_data_page_offset(table, rrp);
+	if (r_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RRP %llx to offset\n", rrp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	w_offset = tmc_sg_get_data_page_offset(table, rwp);
+	if (w_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RWP %llx to offset\n", rwp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	etr_buf->offset = r_offset;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+				w_offset - r_offset;
+	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
+}
+
+static const struct etr_buf_operations etr_sg_buf_ops = {
+	.alloc = tmc_etr_alloc_sg_buf,
+	.free = tmc_etr_free_sg_buf,
+	.sync = tmc_etr_sync_sg_buf,
+	.get_data = tmc_etr_get_data_sg_buf,
+};
+
+static const struct etr_buf_operations *etr_buf_ops[] = {
+	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
+	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+};
+
+static inline int tmc_etr_mode_alloc_buf(int mode,
+					 struct tmc_drvdata *drvdata,
+					 struct etr_buf *etr_buf, int node,
+					 void **pages)
+{
+	int rc;
+
+	switch (mode) {
+	case ETR_MODE_FLAT:
+	case ETR_MODE_ETR_SG:
+		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
+		if (!rc)
+			etr_buf->ops = etr_buf_ops[mode];
+		return rc;
+	default:
+		return -EINVAL;
+	}
+}
+
+/*
+ * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
+ * @drvdata	: ETR device details.
+ * @size	: size of the requested buffer.
+ * @flags	: Required properties for the buffer.
+ * @node	: Node for memory allocations.
+ * @pages	: An optional list of pages.
+ */
+static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
+					 ssize_t size, int flags,
+					 int node, void **pages)
+{
+	int rc = -ENOMEM;
+	bool has_etr_sg, has_iommu;
+	struct etr_buf *etr_buf;
+
+	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
+	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+
+	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
+	if (!etr_buf)
+		return ERR_PTR(-ENOMEM);
+
+	etr_buf->size = size;
+
+	/*
+	 * If we have to use an existing list of pages, we cannot reliably
+	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
+	 * we use the contiguous DMA memory if at least one of the following
+	 * conditions is true:
+	 *  a) The ETR cannot use Scatter-Gather.
+	 *  b) we have a backing IOMMU
+	 *  c) The requested memory size is smaller (< 1M).
+	 *
+	 * Fallback to available mechanisms.
+	 *
+	 */
+	if (!pages &&
+	    (!has_etr_sg || has_iommu || size < SZ_1M))
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
+					    etr_buf, node, pages);
+	if (rc && has_etr_sg)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
+					    etr_buf, node, pages);
+	if (rc) {
+		kfree(etr_buf);
+		return ERR_PTR(rc);
+	}
+
+	return etr_buf;
+}
+
+static void tmc_free_etr_buf(struct etr_buf *etr_buf)
+{
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
+	etr_buf->ops->free(etr_buf);
+	kfree(etr_buf);
+}
+
+/*
+ * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
+ * with a maximum of @len bytes.
+ * Returns: The size of the linear data available @pos, with *bufpp
+ * updated to point to the buffer.
+ */
+static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
+				    u64 offset, size_t len, char **bufpp)
+{
+	/* Adjust the length to limit this transaction to end of buffer */
+	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
+
+	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	ssize_t len;
+	char *bufp;
+
+	len = tmc_etr_buf_get_data(etr_buf, offset,
+				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
+	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+		return -EINVAL;
+	coresight_insert_barrier_packet(bufp);
+	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+}
+
+/*
+ * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
+ * Makes sure the trace data is synced to the memory for consumption.
+ * @etr_buf->offset will hold the offset to the beginning of the trace data
+ * within the buffer, with @etr_buf->len bytes to consume.
+ */
+static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
+{
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+	u64 rrp, rwp;
+	u32 status;
+
+	rrp = tmc_read_rrp(drvdata);
+	rwp = tmc_read_rwp(drvdata);
+	status = readl_relaxed(drvdata->base + TMC_STS);
+	etr_buf->full = status & TMC_STS_FULL;
+
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
+
+	etr_buf->ops->sync(etr_buf, rrp, rwp);
+
+	/* Insert barrier packets at the beginning, if there was an overflow */
+	if (etr_buf->full)
+		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
 	tmc_wait_for_tmcready(drvdata);
 
-	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
+	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
 	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
 
 	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
@@ -595,16 +894,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 		axictl |= TMC_AXICTL_ARCACHE_OS;
 	}
 
+	if (etr_buf->mode == ETR_MODE_ETR_SG) {
+		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
+			return;
+		axictl |= TMC_AXICTL_SCT_GAT_MODE;
+	}
+
 	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
-	tmc_write_dba(drvdata, drvdata->paddr);
+	tmc_write_dba(drvdata, etr_buf->hwaddr);
 	/*
 	 * If the TMC pointers must be programmed before the session,
 	 * we have to set it properly (i.e, RRP/RWP to base address and
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, drvdata->paddr);
-		tmc_write_rwp(drvdata, drvdata->paddr);
+		tmc_write_rrp(drvdata, etr_buf->hwaddr);
+		tmc_write_rwp(drvdata, etr_buf->hwaddr);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
@@ -620,63 +925,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 }
 
 /*
- * Return the available trace data in the buffer @pos, with a maximum
- * limit of @len, also updating the @bufpp on where to find it.
+ * Return the available trace data in the buffer (starts at etr_buf->offset,
+ * limited by etr_buf->len) from @pos, with a maximum limit of @len,
+ * also updating the @bufpp on where to find it. Since the trace data
+ * starts at anywhere in the buffer, depending on the RRP, we adjust the
+ * @len returned to handle buffer wrapping around.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 				loff_t pos, size_t len, char **bufpp)
 {
+	s64 offset;
 	ssize_t actual = len;
-	char *bufp = drvdata->buf + pos;
-	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
-
-	/* Adjust the len to available size @pos */
-	if (pos + actual > drvdata->len)
-		actual = drvdata->len - pos;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
+	if (pos + actual > etr_buf->len)
+		actual = etr_buf->len - pos;
 	if (actual <= 0)
 		return actual;
 
-	/*
-	 * Since we use a circular buffer, with trace data starting
-	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
-	 * wrap the current @pos to within the buffer.
-	 */
-	if (bufp >= bufend)
-		bufp -= drvdata->size;
-	/*
-	 * For simplicity, avoid copying over a wrapped around buffer.
-	 */
-	if ((bufp + actual) > bufend)
-		actual = bufend - bufp;
-	*bufpp = bufp;
-	return actual;
+	/* Compute the offset from which we read the data */
+	offset = etr_buf->offset + pos;
+	if (offset >= etr_buf->size)
+		offset -= etr_buf->size;
+	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
 }
 
-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
+static struct etr_buf *
+tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	u32 val;
-	u64 rwp;
+	return tmc_alloc_etr_buf(drvdata, drvdata->size,
+				 0, cpu_to_node(0), NULL);
+}
 
-	rwp = tmc_read_rwp(drvdata);
-	val = readl_relaxed(drvdata->base + TMC_STS);
+static void
+tmc_etr_free_sysfs_buf(struct etr_buf *buf)
+{
+	if (buf)
+		tmc_free_etr_buf(buf);
+}
 
-	/*
-	 * Adjust the buffer to point to the beginning of the trace data
-	 * and update the available trace data.
-	 */
-	if (val & TMC_STS_FULL) {
-		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
-		drvdata->len = drvdata->size;
-		coresight_insert_barrier_packet(drvdata->buf);
-	} else {
-		drvdata->buf = drvdata->vaddr;
-		drvdata->len = rwp - drvdata->paddr;
-	}
+static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
+{
+	tmc_sync_etr_buf(drvdata);
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 {
+
 	CS_UNLOCK(drvdata->base);
 
 	tmc_flush_and_stop(drvdata);
@@ -685,7 +980,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	 * read before the TMC is disabled.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
-		tmc_etr_dump_hw(drvdata);
+		tmc_etr_sync_sysfs_buf(drvdata);
+
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
@@ -696,34 +992,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	int ret = 0;
 	bool used = false;
 	unsigned long flags;
-	void __iomem *vaddr = NULL;
-	dma_addr_t paddr;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_buf *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
-	 * If we don't have a buffer release the lock and allocate memory.
-	 * Otherwise keep the lock and move along.
+	 * If we are enabling the ETR from disabled state, we need to make
+	 * sure we have a buffer with the right size. The etr_buf is not reset
+	 * immediately after we stop the tracing in SYSFS mode as we wait for
+	 * the user to collect the data. We may be able to reuse the existing
+	 * buffer, provided the size matches. Any allocation has to be done
+	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->vaddr) {
+	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-		/*
-		 * Contiguous  memory can't be allocated while a spinlock is
-		 * held.  As such allocate memory here and free it if a buffer
-		 * has already been allocated (from a previous session).
-		 */
-		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
-					   &paddr, GFP_KERNEL);
-		if (!vaddr)
-			return -ENOMEM;
+		/* Allocate memory with the spinlock released */
+		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
+		if (IS_ERR(new_buf))
+			return PTR_ERR(new_buf);
 
 		/* Let's try again */
 		spin_lock_irqsave(&drvdata->spinlock, flags);
 	}
 
-	if (drvdata->reading) {
+	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -731,21 +1024,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
-	 * touched.
+	 * touched, even if the buffer size has changed.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
-	 * If drvdata::buf == NULL, use the memory allocated above.
-	 * Otherwise a buffer still exists from a previous session, so
-	 * simply use that.
+	 * If we don't have a buffer or it doesn't match the requested size,
+	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (drvdata->buf == NULL) {
+	if (!drvdata->etr_buf ||
+	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
 		used = true;
-		drvdata->vaddr = vaddr;
-		drvdata->paddr = paddr;
-		drvdata->buf = drvdata->vaddr;
+		free_buf = drvdata->etr_buf;
+		drvdata->etr_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
@@ -754,8 +1046,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free memory outside the spinlock if need be */
-	if (!used && vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (free_buf)
+		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
 		dev_info(drvdata->dev, "TMC-ETR enabled\n");
@@ -834,8 +1126,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::buf is NULL the trace data has been read already */
-	if (drvdata->buf == NULL) {
+	/* If drvdata::etr_buf is NULL the trace data has been read already */
+	if (drvdata->etr_buf == NULL) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -854,8 +1146,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	dma_addr_t paddr;
-	void __iomem *vaddr = NULL;
+	struct etr_buf *etr_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -876,17 +1167,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		vaddr = drvdata->vaddr;
-		paddr = drvdata->paddr;
-		drvdata->buf = drvdata->vaddr = NULL;
+		etr_buf =  drvdata->etr_buf;
+		drvdata->etr_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (etr_buf)
+		tmc_free_etr_buf(etr_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 19a765c..c00643c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -55,6 +55,7 @@
 #define TMC_STS_TMCREADY_BIT	2
 #define TMC_STS_FULL		BIT(0)
 #define TMC_STS_TRIGGERED	BIT(1)
+
 /*
  * TMC_AXICTL - 0x110
  *
@@ -134,6 +135,35 @@ enum tmc_mem_intf_width {
 #define CORESIGHT_SOC_600_ETR_CAPS	\
 	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
 
+enum etr_mode {
+	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
+	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+};
+
+struct etr_buf_operations;
+
+/**
+ * struct etr_buf - Details of the buffer used by ETR
+ * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
+ * @full	: Trace data overflow
+ * @size	: Size of the buffer.
+ * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
+ * @offset	: Offset of the trace data in the buffer for consumption.
+ * @len		: Available trace data @buf (may round up to the beginning).
+ * @ops		: ETR buffer operations for the mode.
+ * @private	: Backend specific information for the buf
+ */
+struct etr_buf {
+	enum etr_mode			mode;
+	bool				full;
+	ssize_t				size;
+	dma_addr_t			hwaddr;
+	unsigned long			offset;
+	s64				len;
+	const struct etr_buf_operations	*ops;
+	void				*private;
+};
+
 /**
  * struct tmc_drvdata - specifics associated to an TMC component
  * @base:	memory mapped base address for this component.
@@ -141,11 +171,10 @@ enum tmc_mem_intf_width {
  * @csdev:	component vitals needed by the framework.
  * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
  * @spinlock:	only one at a time pls.
- * @buf:	area of memory where trace data get sent.
- * @paddr:	DMA start location in RAM.
- * @vaddr:	virtual representation of @paddr.
- * @size:	trace buffer size.
- * @len:	size of the available trace.
+ * @buf:	Snapshot of the trace data for ETF/ETB.
+ * @etr_buf:	details of buffer used in TMC-ETR
+ * @len:	size of the available trace for ETF/ETB.
+ * @size:	trace buffer size for this TMC (common for all modes).
  * @mode:	how this TMC is being used.
  * @config_type: TMC variant, must be of type @tmc_config_type.
  * @memwidth:	width of the memory interface databus, in bytes.
@@ -160,11 +189,12 @@ struct tmc_drvdata {
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
 	bool			reading;
-	char			*buf;
-	dma_addr_t		paddr;
-	void __iomem		*vaddr;
-	u32			size;
+	union {
+		char		*buf;		/* TMC ETB */
+		struct etr_buf	*etr_buf;	/* TMC ETR */
+	};
 	u32			len;
+	u32			size;
 	u32			mode;
 	enum tmc_config_type	config_type;
 	enum tmc_mem_intf_width	memwidth;
@@ -172,6 +202,15 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+struct etr_buf_operations {
+	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+			int node, void **pages);
+	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
+				char **bufpp);
+	void (*free)(struct etr_buf *etr_buf);
+};
+
 /**
  * struct tmc_pages - Collection of pages used for SG.
  * @nr_pages:		Number of pages in the list.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/11] coresight: tmc-etr: Add transparent buffer management
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment we always use contiguous memory for TMC ETR tracing
when used from sysfs. The size of the buffer is fixed at boot time
and can only be changed by modifiying the DT. With the introduction
of SG support we could support really large buffers in that mode.
This patch abstracts the buffer used for ETR to switch between a
contiguous buffer or a SG table depending on the availability of
the memory.

This also enables the sysfs mode to use the ETR in SG mode depending
on configured the trace buffer size. Also, since ETR will use the
new infrastructure to manage the buffer, we can get rid of some
of the members in the tmc_drvdata and clean up the fields a bit.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 450 +++++++++++++++++++-----
 drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
 2 files changed, 418 insertions(+), 89 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7ab0fd1..143afba 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,10 +17,18 @@
 
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/iommu.h>
 #include <linux/slab.h>
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+struct etr_flat_buf {
+	struct device	*dev;
+	dma_addr_t	daddr;
+	void		*vaddr;
+	size_t		size;
+};
+
 /*
  * The TMC ETR SG has a page size of 4K. The SG table contains pointers
  * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
@@ -541,7 +549,7 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
  * @size	- Total size of the data buffer
  * @pages	- Optional list of page virtual address
  */
-static struct etr_sg_table __maybe_unused *
+static struct etr_sg_table *
 tmc_init_etr_sg_table(struct device *dev, int node,
 		  unsigned long size, void **pages)
 {
@@ -573,16 +581,307 @@ tmc_init_etr_sg_table(struct device *dev, int node,
 	return etr_table;
 }
 
+/*
+ * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
+ */
+static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
+				  struct etr_buf *etr_buf, int node,
+				  void **pages)
+{
+	struct etr_flat_buf *flat_buf;
+
+	/* We cannot reuse existing pages for flat buf */
+	if (pages)
+		return -EINVAL;
+
+	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
+	if (!flat_buf)
+		return -ENOMEM;
+
+	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
+					   &flat_buf->daddr, GFP_KERNEL);
+	if (!flat_buf->vaddr) {
+		kfree(flat_buf);
+		return -ENOMEM;
+	}
+
+	flat_buf->size = etr_buf->size;
+	flat_buf->dev = drvdata->dev;
+	etr_buf->hwaddr = flat_buf->daddr;
+	etr_buf->mode = ETR_MODE_FLAT;
+	etr_buf->private = flat_buf;
+	return 0;
+}
+
+static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	if (flat_buf && flat_buf->daddr)
+		dma_free_coherent(flat_buf->dev, flat_buf->size,
+				  flat_buf->vaddr, flat_buf->daddr);
+	kfree(flat_buf);
+}
+
+static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	/*
+	 * Adjust the buffer to point to the beginning of the trace data
+	 * and update the available trace data.
+	 */
+	etr_buf->offset = rrp - etr_buf->hwaddr;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = rwp - rrp;
+}
+
+static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
+					 u64 offset, size_t len, char **bufpp)
+{
+	struct etr_flat_buf *flat_buf = etr_buf->private;
+
+	*bufpp = (char *)flat_buf->vaddr + offset;
+	/*
+	 * tmc_etr_buf_get_data already adjusts the length to handle
+	 * buffer wrapping around.
+	 */
+	return len;
+}
+
+static const struct etr_buf_operations etr_flat_buf_ops = {
+	.alloc = tmc_etr_alloc_flat_buf,
+	.free = tmc_etr_free_flat_buf,
+	.sync = tmc_etr_sync_flat_buf,
+	.get_data = tmc_etr_get_data_flat_buf,
+};
+
+/*
+ * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
+ * appropriately.
+ */
+static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
+				struct etr_buf *etr_buf, int node,
+				void **pages)
+{
+	struct etr_sg_table *etr_table;
+
+	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
+					  etr_buf->size, pages);
+	if (IS_ERR(etr_table))
+		return -ENOMEM;
+	etr_buf->hwaddr = etr_table->hwaddr;
+	etr_buf->mode = ETR_MODE_ETR_SG;
+	etr_buf->private = etr_table;
+	return 0;
+}
+
+static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	if (etr_table) {
+		tmc_free_sg_table(etr_table->sg_table);
+		kfree(etr_table);
+	}
+}
+
+static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
+				       size_t len, char **bufpp)
+{
+	struct etr_sg_table *etr_table = etr_buf->private;
+
+	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
+}
+
+static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
+{
+	long r_offset, w_offset;
+	struct etr_sg_table *etr_table = etr_buf->private;
+	struct tmc_sg_table *table = etr_table->sg_table;
+
+	/* Convert hw address to offset in the buffer */
+	r_offset = tmc_sg_get_data_page_offset(table, rrp);
+	if (r_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RRP %llx to offset\n", rrp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	w_offset = tmc_sg_get_data_page_offset(table, rwp);
+	if (w_offset < 0) {
+		dev_warn(table->dev,
+			 "Unable to map RWP %llx to offset\n", rwp);
+		etr_buf->len = 0;
+		return;
+	}
+
+	etr_buf->offset = r_offset;
+	if (etr_buf->full)
+		etr_buf->len = etr_buf->size;
+	else
+		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
+				w_offset - r_offset;
+	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
+}
+
+static const struct etr_buf_operations etr_sg_buf_ops = {
+	.alloc = tmc_etr_alloc_sg_buf,
+	.free = tmc_etr_free_sg_buf,
+	.sync = tmc_etr_sync_sg_buf,
+	.get_data = tmc_etr_get_data_sg_buf,
+};
+
+static const struct etr_buf_operations *etr_buf_ops[] = {
+	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
+	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
+};
+
+static inline int tmc_etr_mode_alloc_buf(int mode,
+					 struct tmc_drvdata *drvdata,
+					 struct etr_buf *etr_buf, int node,
+					 void **pages)
+{
+	int rc;
+
+	switch (mode) {
+	case ETR_MODE_FLAT:
+	case ETR_MODE_ETR_SG:
+		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
+		if (!rc)
+			etr_buf->ops = etr_buf_ops[mode];
+		return rc;
+	default:
+		return -EINVAL;
+	}
+}
+
+/*
+ * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
+ * @drvdata	: ETR device details.
+ * @size	: size of the requested buffer.
+ * @flags	: Required properties for the buffer.
+ * @node	: Node for memory allocations.
+ * @pages	: An optional list of pages.
+ */
+static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
+					 ssize_t size, int flags,
+					 int node, void **pages)
+{
+	int rc = -ENOMEM;
+	bool has_etr_sg, has_iommu;
+	struct etr_buf *etr_buf;
+
+	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
+	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
+
+	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
+	if (!etr_buf)
+		return ERR_PTR(-ENOMEM);
+
+	etr_buf->size = size;
+
+	/*
+	 * If we have to use an existing list of pages, we cannot reliably
+	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
+	 * we use the contiguous DMA memory if at least one of the following
+	 * conditions is true:
+	 *  a) The ETR cannot use Scatter-Gather.
+	 *  b) we have a backing IOMMU
+	 *  c) The requested memory size is smaller (< 1M).
+	 *
+	 * Fallback to available mechanisms.
+	 *
+	 */
+	if (!pages &&
+	    (!has_etr_sg || has_iommu || size < SZ_1M))
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
+					    etr_buf, node, pages);
+	if (rc && has_etr_sg)
+		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
+					    etr_buf, node, pages);
+	if (rc) {
+		kfree(etr_buf);
+		return ERR_PTR(rc);
+	}
+
+	return etr_buf;
+}
+
+static void tmc_free_etr_buf(struct etr_buf *etr_buf)
+{
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
+	etr_buf->ops->free(etr_buf);
+	kfree(etr_buf);
+}
+
+/*
+ * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
+ * with a maximum of @len bytes.
+ * Returns: The size of the linear data available @pos, with *bufpp
+ * updated to point to the buffer.
+ */
+static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
+				    u64 offset, size_t len, char **bufpp)
+{
+	/* Adjust the length to limit this transaction to end of buffer */
+	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
+
+	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
+}
+
+static inline s64
+tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
+{
+	ssize_t len;
+	char *bufp;
+
+	len = tmc_etr_buf_get_data(etr_buf, offset,
+				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
+	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
+		return -EINVAL;
+	coresight_insert_barrier_packet(bufp);
+	return offset + CORESIGHT_BARRIER_PKT_SIZE;
+}
+
+/*
+ * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
+ * Makes sure the trace data is synced to the memory for consumption.
+ * @etr_buf->offset will hold the offset to the beginning of the trace data
+ * within the buffer, with @etr_buf->len bytes to consume.
+ */
+static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
+{
+	struct etr_buf *etr_buf = drvdata->etr_buf;
+	u64 rrp, rwp;
+	u32 status;
+
+	rrp = tmc_read_rrp(drvdata);
+	rwp = tmc_read_rwp(drvdata);
+	status = readl_relaxed(drvdata->base + TMC_STS);
+	etr_buf->full = status & TMC_STS_FULL;
+
+	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
+
+	etr_buf->ops->sync(etr_buf, rrp, rwp);
+
+	/* Insert barrier packets at the beginning, if there was an overflow */
+	if (etr_buf->full)
+		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
+}
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl, sts;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
 	CS_UNLOCK(drvdata->base);
 
 	/* Wait for TMCSReady bit to be set */
 	tmc_wait_for_tmcready(drvdata);
 
-	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
+	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
 	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
 
 	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
@@ -595,16 +894,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 		axictl |= TMC_AXICTL_ARCACHE_OS;
 	}
 
+	if (etr_buf->mode == ETR_MODE_ETR_SG) {
+		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
+			return;
+		axictl |= TMC_AXICTL_SCT_GAT_MODE;
+	}
+
 	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
-	tmc_write_dba(drvdata, drvdata->paddr);
+	tmc_write_dba(drvdata, etr_buf->hwaddr);
 	/*
 	 * If the TMC pointers must be programmed before the session,
 	 * we have to set it properly (i.e, RRP/RWP to base address and
 	 * STS to "not full").
 	 */
 	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
-		tmc_write_rrp(drvdata, drvdata->paddr);
-		tmc_write_rwp(drvdata, drvdata->paddr);
+		tmc_write_rrp(drvdata, etr_buf->hwaddr);
+		tmc_write_rwp(drvdata, etr_buf->hwaddr);
 		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
 		writel_relaxed(sts, drvdata->base + TMC_STS);
 	}
@@ -620,63 +925,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 }
 
 /*
- * Return the available trace data in the buffer @pos, with a maximum
- * limit of @len, also updating the @bufpp on where to find it.
+ * Return the available trace data in the buffer (starts at etr_buf->offset,
+ * limited by etr_buf->len) from @pos, with a maximum limit of @len,
+ * also updating the @bufpp on where to find it. Since the trace data
+ * starts at anywhere in the buffer, depending on the RRP, we adjust the
+ * @len returned to handle buffer wrapping around.
  */
 ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
 				loff_t pos, size_t len, char **bufpp)
 {
+	s64 offset;
 	ssize_t actual = len;
-	char *bufp = drvdata->buf + pos;
-	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
-
-	/* Adjust the len to available size @pos */
-	if (pos + actual > drvdata->len)
-		actual = drvdata->len - pos;
+	struct etr_buf *etr_buf = drvdata->etr_buf;
 
+	if (pos + actual > etr_buf->len)
+		actual = etr_buf->len - pos;
 	if (actual <= 0)
 		return actual;
 
-	/*
-	 * Since we use a circular buffer, with trace data starting
-	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
-	 * wrap the current @pos to within the buffer.
-	 */
-	if (bufp >= bufend)
-		bufp -= drvdata->size;
-	/*
-	 * For simplicity, avoid copying over a wrapped around buffer.
-	 */
-	if ((bufp + actual) > bufend)
-		actual = bufend - bufp;
-	*bufpp = bufp;
-	return actual;
+	/* Compute the offset from which we read the data */
+	offset = etr_buf->offset + pos;
+	if (offset >= etr_buf->size)
+		offset -= etr_buf->size;
+	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
 }
 
-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
+static struct etr_buf *
+tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
 {
-	u32 val;
-	u64 rwp;
+	return tmc_alloc_etr_buf(drvdata, drvdata->size,
+				 0, cpu_to_node(0), NULL);
+}
 
-	rwp = tmc_read_rwp(drvdata);
-	val = readl_relaxed(drvdata->base + TMC_STS);
+static void
+tmc_etr_free_sysfs_buf(struct etr_buf *buf)
+{
+	if (buf)
+		tmc_free_etr_buf(buf);
+}
 
-	/*
-	 * Adjust the buffer to point to the beginning of the trace data
-	 * and update the available trace data.
-	 */
-	if (val & TMC_STS_FULL) {
-		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
-		drvdata->len = drvdata->size;
-		coresight_insert_barrier_packet(drvdata->buf);
-	} else {
-		drvdata->buf = drvdata->vaddr;
-		drvdata->len = rwp - drvdata->paddr;
-	}
+static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
+{
+	tmc_sync_etr_buf(drvdata);
 }
 
 static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 {
+
 	CS_UNLOCK(drvdata->base);
 
 	tmc_flush_and_stop(drvdata);
@@ -685,7 +980,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	 * read before the TMC is disabled.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
-		tmc_etr_dump_hw(drvdata);
+		tmc_etr_sync_sysfs_buf(drvdata);
+
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
@@ -696,34 +992,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	int ret = 0;
 	bool used = false;
 	unsigned long flags;
-	void __iomem *vaddr = NULL;
-	dma_addr_t paddr;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct etr_buf *new_buf = NULL, *free_buf = NULL;
 
 
 	/*
-	 * If we don't have a buffer release the lock and allocate memory.
-	 * Otherwise keep the lock and move along.
+	 * If we are enabling the ETR from disabled state, we need to make
+	 * sure we have a buffer with the right size. The etr_buf is not reset
+	 * immediately after we stop the tracing in SYSFS mode as we wait for
+	 * the user to collect the data. We may be able to reuse the existing
+	 * buffer, provided the size matches. Any allocation has to be done
+	 * with the lock released.
 	 */
 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	if (!drvdata->vaddr) {
+	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-
-		/*
-		 * Contiguous  memory can't be allocated while a spinlock is
-		 * held.  As such allocate memory here and free it if a buffer
-		 * has already been allocated (from a previous session).
-		 */
-		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
-					   &paddr, GFP_KERNEL);
-		if (!vaddr)
-			return -ENOMEM;
+		/* Allocate memory with the spinlock released */
+		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
+		if (IS_ERR(new_buf))
+			return PTR_ERR(new_buf);
 
 		/* Let's try again */
 		spin_lock_irqsave(&drvdata->spinlock, flags);
 	}
 
-	if (drvdata->reading) {
+	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -731,21 +1024,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
-	 * touched.
+	 * touched, even if the buffer size has changed.
 	 */
 	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
-	 * If drvdata::buf == NULL, use the memory allocated above.
-	 * Otherwise a buffer still exists from a previous session, so
-	 * simply use that.
+	 * If we don't have a buffer or it doesn't match the requested size,
+	 * use the memory allocated above. Otherwise reuse it.
 	 */
-	if (drvdata->buf == NULL) {
+	if (!drvdata->etr_buf ||
+	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
 		used = true;
-		drvdata->vaddr = vaddr;
-		drvdata->paddr = paddr;
-		drvdata->buf = drvdata->vaddr;
+		free_buf = drvdata->etr_buf;
+		drvdata->etr_buf = new_buf;
 	}
 
 	drvdata->mode = CS_MODE_SYSFS;
@@ -754,8 +1046,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free memory outside the spinlock if need be */
-	if (!used && vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (free_buf)
+		tmc_etr_free_sysfs_buf(free_buf);
 
 	if (!ret)
 		dev_info(drvdata->dev, "TMC-ETR enabled\n");
@@ -834,8 +1126,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	/* If drvdata::buf is NULL the trace data has been read already */
-	if (drvdata->buf == NULL) {
+	/* If drvdata::etr_buf is NULL the trace data has been read already */
+	if (drvdata->etr_buf == NULL) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -854,8 +1146,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 {
 	unsigned long flags;
-	dma_addr_t paddr;
-	void __iomem *vaddr = NULL;
+	struct etr_buf *etr_buf = NULL;
 
 	/* config types are set a boot time and never change */
 	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
@@ -876,17 +1167,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 		 * The ETR is not tracing and the buffer was just read.
 		 * As such prepare to free the trace buffer.
 		 */
-		vaddr = drvdata->vaddr;
-		paddr = drvdata->paddr;
-		drvdata->buf = drvdata->vaddr = NULL;
+		etr_buf =  drvdata->etr_buf;
+		drvdata->etr_buf = NULL;
 	}
 
 	drvdata->reading = false;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	/* Free allocated memory out side of the spinlock */
-	if (vaddr)
-		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
+	if (etr_buf)
+		tmc_free_etr_buf(etr_buf);
 
 	return 0;
 }
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 19a765c..c00643c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -55,6 +55,7 @@
 #define TMC_STS_TMCREADY_BIT	2
 #define TMC_STS_FULL		BIT(0)
 #define TMC_STS_TRIGGERED	BIT(1)
+
 /*
  * TMC_AXICTL - 0x110
  *
@@ -134,6 +135,35 @@ enum tmc_mem_intf_width {
 #define CORESIGHT_SOC_600_ETR_CAPS	\
 	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
 
+enum etr_mode {
+	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
+	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
+};
+
+struct etr_buf_operations;
+
+/**
+ * struct etr_buf - Details of the buffer used by ETR
+ * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
+ * @full	: Trace data overflow
+ * @size	: Size of the buffer.
+ * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
+ * @offset	: Offset of the trace data in the buffer for consumption.
+ * @len		: Available trace data @buf (may round up to the beginning).
+ * @ops		: ETR buffer operations for the mode.
+ * @private	: Backend specific information for the buf
+ */
+struct etr_buf {
+	enum etr_mode			mode;
+	bool				full;
+	ssize_t				size;
+	dma_addr_t			hwaddr;
+	unsigned long			offset;
+	s64				len;
+	const struct etr_buf_operations	*ops;
+	void				*private;
+};
+
 /**
  * struct tmc_drvdata - specifics associated to an TMC component
  * @base:	memory mapped base address for this component.
@@ -141,11 +171,10 @@ enum tmc_mem_intf_width {
  * @csdev:	component vitals needed by the framework.
  * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
  * @spinlock:	only one at a time pls.
- * @buf:	area of memory where trace data get sent.
- * @paddr:	DMA start location in RAM.
- * @vaddr:	virtual representation of @paddr.
- * @size:	trace buffer size.
- * @len:	size of the available trace.
+ * @buf:	Snapshot of the trace data for ETF/ETB.
+ * @etr_buf:	details of buffer used in TMC-ETR
+ * @len:	size of the available trace for ETF/ETB.
+ * @size:	trace buffer size for this TMC (common for all modes).
  * @mode:	how this TMC is being used.
  * @config_type: TMC variant, must be of type @tmc_config_type.
  * @memwidth:	width of the memory interface databus, in bytes.
@@ -160,11 +189,12 @@ struct tmc_drvdata {
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
 	bool			reading;
-	char			*buf;
-	dma_addr_t		paddr;
-	void __iomem		*vaddr;
-	u32			size;
+	union {
+		char		*buf;		/* TMC ETB */
+		struct etr_buf	*etr_buf;	/* TMC ETR */
+	};
 	u32			len;
+	u32			size;
 	u32			mode;
 	enum tmc_config_type	config_type;
 	enum tmc_mem_intf_width	memwidth;
@@ -172,6 +202,15 @@ struct tmc_drvdata {
 	u32			etr_caps;
 };
 
+struct etr_buf_operations {
+	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
+			int node, void **pages);
+	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
+	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
+				char **bufpp);
+	void (*free)(struct etr_buf *etr_buf);
+};
+
 /**
  * struct tmc_pages - Collection of pages used for SG.
  * @nr_pages:		Number of pages in the list.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/11] coresight: tmc: Add configuration support for trace buffer size
  2018-05-18 16:39 ` Suzuki K Poulose
@ 2018-05-18 16:39   ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, mathieu.poirier, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland, Suzuki K Poulose

Now that we can dynamically switch between contiguous memory and
SG table depending on the trace buffer size, provide the support
for selecting an appropriate buffer size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |  8 ++++++
 .../devicetree/bindings/arm/coresight.txt          |  3 +-
 drivers/hwtracing/coresight/coresight-tmc.c        | 33 ++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
index 4fe677e..ea78714 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
@@ -83,3 +83,11 @@ KernelVersion:	4.7
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(R) Indicates the capabilities of the Coresight TMC.
 		The value is read directly from the DEVID register, 0xFC8,
+
+What:		/sys/bus/coresight/devices/<memory_map>.tmc/buffer_size
+Date:		August 2018
+KernelVersion:	4.18
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
+		mode. Writable only for TMC-ETR configurations. The value
+		should be aligned to the kernel pagesize.
diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 603d3c6..9aa30a1 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -84,7 +84,8 @@ its hardware characteristcs.
 * Optional property for TMC:
 
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
-	 (embedded trace router)
+	  (embedded trace router). This property is obsolete. The buffer size
+	  can be configured dynamically via buffer_size property in sysfs.
 
 	* arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely
 	  use the SG mode on this system.
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7d8331d..57b6621 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -285,8 +285,41 @@ static ssize_t trigger_cntr_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(trigger_cntr);
 
+static ssize_t buffer_size_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%#x\n", drvdata->size);
+}
+
+static ssize_t buffer_size_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	/* Only permitted for TMC-ETRs */
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+	/* The buffer size should be page aligned */
+	if (val & (PAGE_SIZE - 1))
+		return -EINVAL;
+	drvdata->size = val;
+	return size;
+}
+
+static DEVICE_ATTR_RW(buffer_size);
+
 static struct attribute *coresight_tmc_attrs[] = {
 	&dev_attr_trigger_cntr.attr,
+	&dev_attr_buffer_size.attr,
 	NULL,
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/11] coresight: tmc: Add configuration support for trace buffer size
@ 2018-05-18 16:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-18 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we can dynamically switch between contiguous memory and
SG table depending on the trace buffer size, provide the support
for selecting an appropriate buffer size.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../ABI/testing/sysfs-bus-coresight-devices-tmc    |  8 ++++++
 .../devicetree/bindings/arm/coresight.txt          |  3 +-
 drivers/hwtracing/coresight/coresight-tmc.c        | 33 ++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
index 4fe677e..ea78714 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
@@ -83,3 +83,11 @@ KernelVersion:	4.7
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(R) Indicates the capabilities of the Coresight TMC.
 		The value is read directly from the DEVID register, 0xFC8,
+
+What:		/sys/bus/coresight/devices/<memory_map>.tmc/buffer_size
+Date:		August 2018
+KernelVersion:	4.18
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
+		mode. Writable only for TMC-ETR configurations. The value
+		should be aligned to the kernel pagesize.
diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 603d3c6..9aa30a1 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -84,7 +84,8 @@ its hardware characteristcs.
 * Optional property for TMC:
 
 	* arm,buffer-size: size of contiguous buffer space for TMC ETR
-	 (embedded trace router)
+	  (embedded trace router). This property is obsolete. The buffer size
+	  can be configured dynamically via buffer_size property in sysfs.
 
 	* arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely
 	  use the SG mode on this system.
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c
index 7d8331d..57b6621 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -285,8 +285,41 @@ static ssize_t trigger_cntr_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(trigger_cntr);
 
+static ssize_t buffer_size_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	return sprintf(buf, "%#x\n", drvdata->size);
+}
+
+static ssize_t buffer_size_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	/* Only permitted for TMC-ETRs */
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+	/* The buffer size should be page aligned */
+	if (val & (PAGE_SIZE - 1))
+		return -EINVAL;
+	drvdata->size = val;
+	return size;
+}
+
+static DEVICE_ATTR_RW(buffer_size);
+
 static struct attribute *coresight_tmc_attrs[] = {
 	&dev_attr_trigger_cntr.attr,
+	&dev_attr_buffer_size.attr,
 	NULL,
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/11] dts: juno: Add scatter-gather support for all revisions
  2018-05-18 16:39   ` Suzuki K Poulose
@ 2018-05-23 17:39     ` Mathieu Poirier
  -1 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-23 17:39 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, Linux Kernel Mailing List, Rob Herring,
	Sudeep Holla, Frank Rowand, coresight, Mark Rutland, Liviu Dudau,
	Lorenzo Pieralisi

On 18 May 2018 at 10:39, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> Advertise that the scatter-gather is properly integrated on
> all revisions of Juno board.
>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Liviu Dudau <liviu.dudau@arm.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

> ---
>  arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
> index eb749c5..6ce9090 100644
> --- a/arch/arm64/boot/dts/arm/juno-base.dtsi
> +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
> @@ -198,6 +198,7 @@
>                 clocks = <&soc_smc50mhz>;
>                 clock-names = "apb_pclk";
>                 power-domains = <&scpi_devpd 0>;
> +               arm,scatter-gather;
>                 port {
>                         etr_in_port: endpoint {
>                                 slave-mode;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 07/11] dts: juno: Add scatter-gather support for all revisions
@ 2018-05-23 17:39     ` Mathieu Poirier
  0 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-23 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 May 2018 at 10:39, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> Advertise that the scatter-gather is properly integrated on
> all revisions of Juno board.
>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Liviu Dudau <liviu.dudau@arm.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

> ---
>  arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
> index eb749c5..6ce9090 100644
> --- a/arch/arm64/boot/dts/arm/juno-base.dtsi
> +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
> @@ -198,6 +198,7 @@
>                 clocks = <&soc_smc50mhz>;
>                 clock-names = "apb_pclk";
>                 power-domains = <&scpi_devpd 0>;
> +               arm,scatter-gather;
>                 port {
>                         etr_in_port: endpoint {
>                                 slave-mode;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/11] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
  2018-05-18 16:39   ` Suzuki K Poulose
@ 2018-05-23 18:18     ` Rob Herring
  -1 siblings, 0 replies; 38+ messages in thread
From: Rob Herring @ 2018-05-23 18:18 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, mathieu.poirier, sudeep.holla,
	frowand.list, coresight, mark.rutland, Mike Leach, John Horley,
	Robert Walker, devicetree

On Fri, May 18, 2018 at 05:39:22PM +0100, Suzuki K Poulose wrote:
> We are about to add the support for ETR builtin scatter-gather mode
> for dealing with large amount of trace buffers. However, on some of
> the platforms, using the ETR SG mode can lock up the system due to
> the way the ETR is connected to the memory subsystem.
> 
> In SG mode, the ETR performs READ from the scatter-gather table to
> fetch the next page and regular WRITE of trace data. If the READ
> operation doesn't complete(due to the memory subsystem issues,
> which we have seen on a couple of platforms) the trace WRITE
> cannot proceed leading to issues. So, we by default do not
> use the SG mode, unless it is known to be safe on the platform.
> We define a DT property for the TMC node to specify whether we
> have a proper SG mode.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: John Horley <john.horley@arm.com>
> Cc: Robert Walker <robert.walker@arm.com>
> Cc: devicetree@vger.kernel.org
> Cc: frowand.list@gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/coresight.txt | 2 ++
>  drivers/hwtracing/coresight/coresight-tmc.c         | 9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 06/11] dts: bindings: Restrict coresight tmc-etr scatter-gather mode
@ 2018-05-23 18:18     ` Rob Herring
  0 siblings, 0 replies; 38+ messages in thread
From: Rob Herring @ 2018-05-23 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 18, 2018 at 05:39:22PM +0100, Suzuki K Poulose wrote:
> We are about to add the support for ETR builtin scatter-gather mode
> for dealing with large amount of trace buffers. However, on some of
> the platforms, using the ETR SG mode can lock up the system due to
> the way the ETR is connected to the memory subsystem.
> 
> In SG mode, the ETR performs READ from the scatter-gather table to
> fetch the next page and regular WRITE of trace data. If the READ
> operation doesn't complete(due to the memory subsystem issues,
> which we have seen on a couple of platforms) the trace WRITE
> cannot proceed leading to issues. So, we by default do not
> use the SG mode, unless it is known to be safe on the platform.
> We define a DT property for the TMC node to specify whether we
> have a proper SG mode.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: John Horley <john.horley@arm.com>
> Cc: Robert Walker <robert.walker@arm.com>
> Cc: devicetree at vger.kernel.org
> Cc: frowand.list at gmail.com
> Cc: Rob Herring <robh@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/coresight.txt | 2 ++
>  drivers/hwtracing/coresight/coresight-tmc.c         | 9 ++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/11] coresight: Add generic TMC sg table framework
  2018-05-18 16:39   ` Suzuki K Poulose
@ 2018-05-23 20:25     ` Mathieu Poirier
  -1 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-23 20:25 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland

On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
> This patch introduces a generic sg table data structure and
> associated operations. An SG table can be used to map a set
> of Data pages where the trace data could be stored by the TMC
> ETR. The information about the data pages could be stored in
> different formats, depending on the type of the underlying
> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> structure provides book keeping of the pages used for the data
> as well as the table contents. The table should be filled by
> the user of the infrastructure.
> 
> A table can be created by specifying the number of data pages
> as well as the number of table pages required to hold the
> pointers, where the latter could be different for different
> types of tables. The pages are mapped in the appropriate dma
> data direction mode (i.e, DMA_TO_DEVICE for table pages
> and DMA_FROM_DEVICE for data pages).  The framework can optionally
> accept a set of allocated data pages (e.g, perf ring buffer) and
> map them accordingly. The table and data pages are vmap'ed to allow
> easier access by the drivers. The framework also provides helpers to
> sync the data written to the pages with appropriate directions.
> 
> This will be later used by the TMC ETR SG unit and CATU.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>  - Address code style issues, more comments
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>  2 files changed, 340 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 9780798..1e844f8 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,9 +17,299 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/slab.h>
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +/*
> + * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
> + * and map the device address @addr to an offset within the virtual
> + * contiguous buffer.
> + */
> +static long
> +tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
> +{
> +	int i;
> +	dma_addr_t page_start;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		page_start = tmc_pages->daddrs[i];
> +		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
> +			return i * PAGE_SIZE + (addr - page_start);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +/*
> + * tmc_pages_free : Unmap and free the pages used by tmc_pages.
> + * If the pages were not allocated in tmc_pages_alloc(), we would
> + * simply drop the refcount.
> + */
> +static void tmc_pages_free(struct tmc_pages *tmc_pages,
> +			   struct device *dev, enum dma_data_direction dir)
> +{
> +	int i;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
> +			dma_unmap_page(dev, tmc_pages->daddrs[i],
> +					 PAGE_SIZE, dir);
> +		if (tmc_pages->pages && tmc_pages->pages[i])
> +			__free_page(tmc_pages->pages[i]);
> +	}
> +
> +	kfree(tmc_pages->pages);
> +	kfree(tmc_pages->daddrs);
> +	tmc_pages->pages = NULL;
> +	tmc_pages->daddrs = NULL;
> +	tmc_pages->nr_pages = 0;
> +}
> +
> +/*
> + * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
> + * If @pages is not NULL, the list of page virtual addresses are
> + * used as the data pages. The pages are then dma_map'ed for @dev
> + * with dma_direction @dir.
> + *
> + * Returns 0 upon success, else the error number.
> + */
> +static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
> +			   struct device *dev, int node,
> +			   enum dma_data_direction dir, void **pages)
> +{
> +	int i, nr_pages;
> +	dma_addr_t paddr;
> +	struct page *page;
> +
> +	nr_pages = tmc_pages->nr_pages;
> +	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->daddrs)
> +		return -ENOMEM;
> +	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->pages) {
> +		kfree(tmc_pages->daddrs);
> +		tmc_pages->daddrs = NULL;
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		if (pages && pages[i]) {
> +			page = virt_to_page(pages[i]);
> +			/* Hold a refcount on the page */
> +			get_page(page);
> +		} else {
> +			page = alloc_pages_node(node,
> +						GFP_KERNEL | __GFP_ZERO, 0);
> +		}
> +		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> +		if (dma_mapping_error(dev, paddr))
> +			goto err;
> +		tmc_pages->daddrs[i] = paddr;
> +		tmc_pages->pages[i] = page;
> +	}
> +	return 0;
> +err:
> +	tmc_pages_free(tmc_pages, dev, dir);
> +	return -ENOMEM;
> +}
> +
> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return 0;
> +	return sg_table->table_daddr;
> +}
> +
> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return NULL;
> +	return sg_table->table_vaddr;
> +}

The above two functions deal with DMA'able and virtual addresses for the table
page buffer.  Yet the test in the WARN_ON is done on the data page array.
Shouldn't this be sg_table->table_pages.pages[0] instead?

If not please add a comment justifying your position so that someone else
looking at the code does't end up thinking the same in a year from now.

> +
> +static inline void *
> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
> +		return 0;
> +	return sg_table->data_vaddr;
> +}

I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
returning the virtual address of the contiguous buffer for table and data
respectively.  Yet there is a discrepency in the naming convention.  I would
have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
there is a little symmetry between them.


Otherwise looks good to me.

> +
> +static inline long
> +tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
> +{
> +	return tmc_pages_get_offset(&sg_table->data_pages, addr);
> +}
> +
> +static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->table_vaddr)
> +		vunmap(sg_table->table_vaddr);
> +	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
> +}
> +
> +static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->data_vaddr)
> +		vunmap(sg_table->data_vaddr);
> +	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
> +}
> +
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table)
> +{
> +	tmc_free_table_pages(sg_table);
> +	tmc_free_data_pages(sg_table);
> +}
> +
> +/*
> + * Alloc pages for the table. Since this will be used by the device,
> + * allocate the pages closer to the device (i.e, dev_to_node(dev)
> + * rather than the CPU node).
> + */
> +static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	int rc;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	rc = tmc_pages_alloc(table_pages, sg_table->dev,
> +			     dev_to_node(sg_table->dev),
> +			     DMA_TO_DEVICE, NULL);
> +	if (rc)
> +		return rc;
> +	sg_table->table_vaddr = vmap(table_pages->pages,
> +				     table_pages->nr_pages,
> +				     VM_MAP,
> +				     PAGE_KERNEL);
> +	if (!sg_table->table_vaddr)
> +		rc = -ENOMEM;
> +	else
> +		sg_table->table_daddr = table_pages->daddrs[0];
> +	return rc;
> +}
> +
> +static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
> +{
> +	int rc;
> +
> +	/* Allocate data pages on the node requested by the caller */
> +	rc = tmc_pages_alloc(&sg_table->data_pages,
> +			     sg_table->dev, sg_table->node,
> +			     DMA_FROM_DEVICE, pages);
> +	if (!rc) {
> +		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
> +					    sg_table->data_pages.nr_pages,
> +					    VM_MAP,
> +					    PAGE_KERNEL);
> +		if (!sg_table->data_vaddr)
> +			rc = -ENOMEM;
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
> + * and data buffers. TMC writes to the data buffers and reads from the SG
> + * Table pages.
> + *
> + * @dev		- Device to which page should be DMA mapped.
> + * @node	- Numa node for mem allocations
> + * @nr_tpages	- Number of pages for the table entries.
> + * @nr_dpages	- Number of pages for Data buffer.
> + * @pages	- Optional list of virtual address of pages.
> + */
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages)
> +{
> +	long rc;
> +	struct tmc_sg_table *sg_table;
> +
> +	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
> +	if (!sg_table)
> +		return ERR_PTR(-ENOMEM);
> +	sg_table->data_pages.nr_pages = nr_dpages;
> +	sg_table->table_pages.nr_pages = nr_tpages;
> +	sg_table->node = node;
> +	sg_table->dev = dev;
> +
> +	rc  = tmc_alloc_data_pages(sg_table, pages);
> +	if (!rc)
> +		rc = tmc_alloc_table_pages(sg_table);
> +	if (rc) {
> +		tmc_free_sg_table(sg_table);
> +		kfree(sg_table);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return sg_table;
> +}
> +
> +/*
> + * tmc_sg_table_sync_data_range: Sync the data buffer written
> + * by the device from @offset upto a @size bytes.
> + */
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size)
> +{
> +	int i, index, start;
> +	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
> +	struct device *dev = table->dev;
> +	struct tmc_pages *data = &table->data_pages;
> +
> +	start = offset >> PAGE_SHIFT;
> +	for (i = start; i < (start + npages); i++) {
> +		index = i % data->nr_pages;
> +		dma_sync_single_for_cpu(dev, data->daddrs[index],
> +					PAGE_SIZE, DMA_FROM_DEVICE);
> +	}
> +}
> +
> +/* tmc_sg_sync_table: Sync the page table */
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
> +{
> +	int i;
> +	struct device *dev = sg_table->dev;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	for (i = 0; i < table_pages->nr_pages; i++)
> +		dma_sync_single_for_device(dev, table_pages->daddrs[i],
> +					   PAGE_SIZE, DMA_TO_DEVICE);
> +}
> +
> +/*
> + * tmc_sg_table_get_data: Get the buffer pointer for data @offset
> + * in the SG buffer. The @bufpp is updated to point to the buffer.
> + * Returns :
> + *	the length of linear data available at @offset.
> + *	or
> + *	<= 0 if no data is available.
> + */
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp)
> +{
> +	size_t size;
> +	int pg_idx = offset >> PAGE_SHIFT;
> +	int pg_offset = offset & (PAGE_SIZE - 1);
> +	struct tmc_pages *data_pages = &sg_table->data_pages;
> +
> +	size = tmc_sg_table_buf_size(sg_table);
> +	if (offset >= size)
> +		return -EINVAL;
> +
> +	/* Make sure we don't go beyond the end */
> +	len = (len < (size - offset)) ? len : size - offset;
> +	/* Respect the page boundaries */
> +	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
> +	if (len > 0)
> +		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
> +	return len;
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 73f944d..19a765c 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -18,6 +18,7 @@
>  #ifndef _CORESIGHT_TMC_H
>  #define _CORESIGHT_TMC_H
>  
> +#include <linux/dma-mapping.h>
>  #include <linux/miscdevice.h>
>  
>  #define TMC_RSZ			0x004
> @@ -171,6 +172,38 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +/**
> + * struct tmc_pages - Collection of pages used for SG.
> + * @nr_pages:		Number of pages in the list.
> + * @daddrs:		Array of DMA'able page address.
> + * @pages:		Array pages for the buffer.
> + */
> +struct tmc_pages {
> +	int nr_pages;
> +	dma_addr_t	*daddrs;
> +	struct page	**pages;
> +};
> +
> +/*
> + * struct tmc_sg_table - Generic SG table for TMC
> + * @dev:		Device for DMA allocations
> + * @table_vaddr:	Contiguous Virtual address for PageTable
> + * @data_vaddr:		Contiguous Virtual address for Data Buffer
> + * @table_daddr:	DMA address of the PageTable base
> + * @node:		Node for Page allocations
> + * @table_pages:	List of pages & dma address for Table
> + * @data_pages:		List of pages & dma address for Data
> + */
> +struct tmc_sg_table {
> +	struct device *dev;
> +	void *table_vaddr;
> +	void *data_vaddr;
> +	dma_addr_t table_daddr;
> +	int node;
> +	struct tmc_pages table_pages;
> +	struct tmc_pages data_pages;
> +};
> +
>  /* Generic functions */
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
>  void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
> @@ -226,4 +259,21 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>  	return !!(drvdata->etr_caps & cap);
>  }
>  
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages);
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size);
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp);
> +static inline unsigned long
> +tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
> +{
> +	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
> +}
> +
>  #endif
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
@ 2018-05-23 20:25     ` Mathieu Poirier
  0 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-23 20:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
> This patch introduces a generic sg table data structure and
> associated operations. An SG table can be used to map a set
> of Data pages where the trace data could be stored by the TMC
> ETR. The information about the data pages could be stored in
> different formats, depending on the type of the underlying
> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> structure provides book keeping of the pages used for the data
> as well as the table contents. The table should be filled by
> the user of the infrastructure.
> 
> A table can be created by specifying the number of data pages
> as well as the number of table pages required to hold the
> pointers, where the latter could be different for different
> types of tables. The pages are mapped in the appropriate dma
> data direction mode (i.e, DMA_TO_DEVICE for table pages
> and DMA_FROM_DEVICE for data pages).  The framework can optionally
> accept a set of allocated data pages (e.g, perf ring buffer) and
> map them accordingly. The table and data pages are vmap'ed to allow
> easier access by the drivers. The framework also provides helpers to
> sync the data written to the pages with appropriate directions.
> 
> This will be later used by the TMC ETR SG unit and CATU.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>  - Address code style issues, more comments
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>  2 files changed, 340 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 9780798..1e844f8 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,9 +17,299 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/slab.h>
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +/*
> + * tmc_pages_get_offset:  Go through all the pages in the tmc_pages
> + * and map the device address @addr to an offset within the virtual
> + * contiguous buffer.
> + */
> +static long
> +tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr)
> +{
> +	int i;
> +	dma_addr_t page_start;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		page_start = tmc_pages->daddrs[i];
> +		if (addr >= page_start && addr < (page_start + PAGE_SIZE))
> +			return i * PAGE_SIZE + (addr - page_start);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +/*
> + * tmc_pages_free : Unmap and free the pages used by tmc_pages.
> + * If the pages were not allocated in tmc_pages_alloc(), we would
> + * simply drop the refcount.
> + */
> +static void tmc_pages_free(struct tmc_pages *tmc_pages,
> +			   struct device *dev, enum dma_data_direction dir)
> +{
> +	int i;
> +
> +	for (i = 0; i < tmc_pages->nr_pages; i++) {
> +		if (tmc_pages->daddrs && tmc_pages->daddrs[i])
> +			dma_unmap_page(dev, tmc_pages->daddrs[i],
> +					 PAGE_SIZE, dir);
> +		if (tmc_pages->pages && tmc_pages->pages[i])
> +			__free_page(tmc_pages->pages[i]);
> +	}
> +
> +	kfree(tmc_pages->pages);
> +	kfree(tmc_pages->daddrs);
> +	tmc_pages->pages = NULL;
> +	tmc_pages->daddrs = NULL;
> +	tmc_pages->nr_pages = 0;
> +}
> +
> +/*
> + * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages.
> + * If @pages is not NULL, the list of page virtual addresses are
> + * used as the data pages. The pages are then dma_map'ed for @dev
> + * with dma_direction @dir.
> + *
> + * Returns 0 upon success, else the error number.
> + */
> +static int tmc_pages_alloc(struct tmc_pages *tmc_pages,
> +			   struct device *dev, int node,
> +			   enum dma_data_direction dir, void **pages)
> +{
> +	int i, nr_pages;
> +	dma_addr_t paddr;
> +	struct page *page;
> +
> +	nr_pages = tmc_pages->nr_pages;
> +	tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->daddrs)
> +		return -ENOMEM;
> +	tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages),
> +					 GFP_KERNEL);
> +	if (!tmc_pages->pages) {
> +		kfree(tmc_pages->daddrs);
> +		tmc_pages->daddrs = NULL;
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		if (pages && pages[i]) {
> +			page = virt_to_page(pages[i]);
> +			/* Hold a refcount on the page */
> +			get_page(page);
> +		} else {
> +			page = alloc_pages_node(node,
> +						GFP_KERNEL | __GFP_ZERO, 0);
> +		}
> +		paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> +		if (dma_mapping_error(dev, paddr))
> +			goto err;
> +		tmc_pages->daddrs[i] = paddr;
> +		tmc_pages->pages[i] = page;
> +	}
> +	return 0;
> +err:
> +	tmc_pages_free(tmc_pages, dev, dir);
> +	return -ENOMEM;
> +}
> +
> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return 0;
> +	return sg_table->table_daddr;
> +}
> +
> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
> +		return NULL;
> +	return sg_table->table_vaddr;
> +}

The above two functions deal with DMA'able and virtual addresses for the table
page buffer.  Yet the test in the WARN_ON is done on the data page array.
Shouldn't this be sg_table->table_pages.pages[0] instead?

If not please add a comment justifying your position so that someone else
looking at the code does't end up thinking the same in a year from now.

> +
> +static inline void *
> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> +{
> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
> +		return 0;
> +	return sg_table->data_vaddr;
> +}

I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
returning the virtual address of the contiguous buffer for table and data
respectively.  Yet there is a discrepency in the naming convention.  I would
have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
there is a little symmetry between them.


Otherwise looks good to me.

> +
> +static inline long
> +tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr)
> +{
> +	return tmc_pages_get_offset(&sg_table->data_pages, addr);
> +}
> +
> +static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->table_vaddr)
> +		vunmap(sg_table->table_vaddr);
> +	tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE);
> +}
> +
> +static void tmc_free_data_pages(struct tmc_sg_table *sg_table)
> +{
> +	if (sg_table->data_vaddr)
> +		vunmap(sg_table->data_vaddr);
> +	tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE);
> +}
> +
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table)
> +{
> +	tmc_free_table_pages(sg_table);
> +	tmc_free_data_pages(sg_table);
> +}
> +
> +/*
> + * Alloc pages for the table. Since this will be used by the device,
> + * allocate the pages closer to the device (i.e, dev_to_node(dev)
> + * rather than the CPU node).
> + */
> +static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table)
> +{
> +	int rc;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	rc = tmc_pages_alloc(table_pages, sg_table->dev,
> +			     dev_to_node(sg_table->dev),
> +			     DMA_TO_DEVICE, NULL);
> +	if (rc)
> +		return rc;
> +	sg_table->table_vaddr = vmap(table_pages->pages,
> +				     table_pages->nr_pages,
> +				     VM_MAP,
> +				     PAGE_KERNEL);
> +	if (!sg_table->table_vaddr)
> +		rc = -ENOMEM;
> +	else
> +		sg_table->table_daddr = table_pages->daddrs[0];
> +	return rc;
> +}
> +
> +static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages)
> +{
> +	int rc;
> +
> +	/* Allocate data pages on the node requested by the caller */
> +	rc = tmc_pages_alloc(&sg_table->data_pages,
> +			     sg_table->dev, sg_table->node,
> +			     DMA_FROM_DEVICE, pages);
> +	if (!rc) {
> +		sg_table->data_vaddr = vmap(sg_table->data_pages.pages,
> +					    sg_table->data_pages.nr_pages,
> +					    VM_MAP,
> +					    PAGE_KERNEL);
> +		if (!sg_table->data_vaddr)
> +			rc = -ENOMEM;
> +	}
> +	return rc;
> +}
> +
> +/*
> + * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table
> + * and data buffers. TMC writes to the data buffers and reads from the SG
> + * Table pages.
> + *
> + * @dev		- Device to which page should be DMA mapped.
> + * @node	- Numa node for mem allocations
> + * @nr_tpages	- Number of pages for the table entries.
> + * @nr_dpages	- Number of pages for Data buffer.
> + * @pages	- Optional list of virtual address of pages.
> + */
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages)
> +{
> +	long rc;
> +	struct tmc_sg_table *sg_table;
> +
> +	sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL);
> +	if (!sg_table)
> +		return ERR_PTR(-ENOMEM);
> +	sg_table->data_pages.nr_pages = nr_dpages;
> +	sg_table->table_pages.nr_pages = nr_tpages;
> +	sg_table->node = node;
> +	sg_table->dev = dev;
> +
> +	rc  = tmc_alloc_data_pages(sg_table, pages);
> +	if (!rc)
> +		rc = tmc_alloc_table_pages(sg_table);
> +	if (rc) {
> +		tmc_free_sg_table(sg_table);
> +		kfree(sg_table);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return sg_table;
> +}
> +
> +/*
> + * tmc_sg_table_sync_data_range: Sync the data buffer written
> + * by the device from @offset upto a @size bytes.
> + */
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size)
> +{
> +	int i, index, start;
> +	int npages = DIV_ROUND_UP(size, PAGE_SIZE);
> +	struct device *dev = table->dev;
> +	struct tmc_pages *data = &table->data_pages;
> +
> +	start = offset >> PAGE_SHIFT;
> +	for (i = start; i < (start + npages); i++) {
> +		index = i % data->nr_pages;
> +		dma_sync_single_for_cpu(dev, data->daddrs[index],
> +					PAGE_SIZE, DMA_FROM_DEVICE);
> +	}
> +}
> +
> +/* tmc_sg_sync_table: Sync the page table */
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table)
> +{
> +	int i;
> +	struct device *dev = sg_table->dev;
> +	struct tmc_pages *table_pages = &sg_table->table_pages;
> +
> +	for (i = 0; i < table_pages->nr_pages; i++)
> +		dma_sync_single_for_device(dev, table_pages->daddrs[i],
> +					   PAGE_SIZE, DMA_TO_DEVICE);
> +}
> +
> +/*
> + * tmc_sg_table_get_data: Get the buffer pointer for data @offset
> + * in the SG buffer. The @bufpp is updated to point to the buffer.
> + * Returns :
> + *	the length of linear data available at @offset.
> + *	or
> + *	<= 0 if no data is available.
> + */
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp)
> +{
> +	size_t size;
> +	int pg_idx = offset >> PAGE_SHIFT;
> +	int pg_offset = offset & (PAGE_SIZE - 1);
> +	struct tmc_pages *data_pages = &sg_table->data_pages;
> +
> +	size = tmc_sg_table_buf_size(sg_table);
> +	if (offset >= size)
> +		return -EINVAL;
> +
> +	/* Make sure we don't go beyond the end */
> +	len = (len < (size - offset)) ? len : size - offset;
> +	/* Respect the page boundaries */
> +	len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset);
> +	if (len > 0)
> +		*bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset;
> +	return len;
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 73f944d..19a765c 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -18,6 +18,7 @@
>  #ifndef _CORESIGHT_TMC_H
>  #define _CORESIGHT_TMC_H
>  
> +#include <linux/dma-mapping.h>
>  #include <linux/miscdevice.h>
>  
>  #define TMC_RSZ			0x004
> @@ -171,6 +172,38 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +/**
> + * struct tmc_pages - Collection of pages used for SG.
> + * @nr_pages:		Number of pages in the list.
> + * @daddrs:		Array of DMA'able page address.
> + * @pages:		Array pages for the buffer.
> + */
> +struct tmc_pages {
> +	int nr_pages;
> +	dma_addr_t	*daddrs;
> +	struct page	**pages;
> +};
> +
> +/*
> + * struct tmc_sg_table - Generic SG table for TMC
> + * @dev:		Device for DMA allocations
> + * @table_vaddr:	Contiguous Virtual address for PageTable
> + * @data_vaddr:		Contiguous Virtual address for Data Buffer
> + * @table_daddr:	DMA address of the PageTable base
> + * @node:		Node for Page allocations
> + * @table_pages:	List of pages & dma address for Table
> + * @data_pages:		List of pages & dma address for Data
> + */
> +struct tmc_sg_table {
> +	struct device *dev;
> +	void *table_vaddr;
> +	void *data_vaddr;
> +	dma_addr_t table_daddr;
> +	int node;
> +	struct tmc_pages table_pages;
> +	struct tmc_pages data_pages;
> +};
> +
>  /* Generic functions */
>  void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
>  void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
> @@ -226,4 +259,21 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap)
>  	return !!(drvdata->etr_caps & cap);
>  }
>  
> +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev,
> +					int node,
> +					int nr_tpages,
> +					int nr_dpages,
> +					void **pages);
> +void tmc_free_sg_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table);
> +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table,
> +				  u64 offset, u64 size);
> +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table,
> +			      u64 offset, size_t len, char **bufpp);
> +static inline unsigned long
> +tmc_sg_table_buf_size(struct tmc_sg_table *sg_table)
> +{
> +	return sg_table->data_pages.nr_pages << PAGE_SHIFT;
> +}
> +
>  #endif
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 10/11] coresight: tmc-etr: Add transparent buffer management
  2018-05-18 16:39   ` Suzuki K Poulose
@ 2018-05-24 19:56     ` Mathieu Poirier
  -1 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-24 19:56 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland

On Fri, May 18, 2018 at 05:39:26PM +0100, Suzuki K Poulose wrote:
> At the moment we always use contiguous memory for TMC ETR tracing
> when used from sysfs. The size of the buffer is fixed at boot time
> and can only be changed by modifiying the DT. With the introduction
> of SG support we could support really large buffers in that mode.
> This patch abstracts the buffer used for ETR to switch between a
> contiguous buffer or a SG table depending on the availability of
> the memory.
> 
> This also enables the sysfs mode to use the ETR in SG mode depending
> on configured the trace buffer size. Also, since ETR will use the
> new infrastructure to manage the buffer, we can get rid of some
> of the members in the tmc_drvdata and clean up the fields a bit.

Upon first reading this changelog I thought this patch does way too many things
but after looking at the content it isn't the case.  We could try to split the
patch by moving the introduction of the SG operations to another patch but it
would save about 60 lines, which is hardly worth it.  

As it stands now it is alsmost guaranted other reviewers will ask you to split
your work.  Perhaps rephrasing the changelog to concentrate on the global idea
of what the patch does will help (just my personnal opinion).  

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 450 +++++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
>  2 files changed, 418 insertions(+), 89 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7ab0fd1..143afba 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,10 +17,18 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
>  #include <linux/slab.h>
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +struct etr_flat_buf {
> +	struct device	*dev;
> +	dma_addr_t	daddr;
> +	void		*vaddr;
> +	size_t		size;
> +};
> +
>  /*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
> @@ -541,7 +549,7 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
>   * @size	- Total size of the data buffer
>   * @pages	- Optional list of page virtual address
>   */
> -static struct etr_sg_table __maybe_unused *
> +static struct etr_sg_table *
>  tmc_init_etr_sg_table(struct device *dev, int node,
>  		  unsigned long size, void **pages)
>  {
> @@ -573,16 +581,307 @@ tmc_init_etr_sg_table(struct device *dev, int node,
>  	return etr_table;
>  }
>  
> +/*
> + * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
> + */
> +static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
> +				  struct etr_buf *etr_buf, int node,
> +				  void **pages)
> +{
> +	struct etr_flat_buf *flat_buf;
> +
> +	/* We cannot reuse existing pages for flat buf */
> +	if (pages)
> +		return -EINVAL;

Perfect.

> +
> +	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
> +	if (!flat_buf)
> +		return -ENOMEM;
> +
> +	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
> +					   &flat_buf->daddr, GFP_KERNEL);

Indendation of the second line.

> +	if (!flat_buf->vaddr) {
> +		kfree(flat_buf);
> +		return -ENOMEM;
> +	}
> +
> +	flat_buf->size = etr_buf->size;
> +	flat_buf->dev = drvdata->dev;
> +	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->mode = ETR_MODE_FLAT;
> +	etr_buf->private = flat_buf;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (flat_buf && flat_buf->daddr)
> +		dma_free_coherent(flat_buf->dev, flat_buf->size,
> +				  flat_buf->vaddr, flat_buf->daddr);
> +	kfree(flat_buf);
> +}
> +
> +static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	/*
> +	 * Adjust the buffer to point to the beginning of the trace data
> +	 * and update the available trace data.
> +	 */
> +	etr_buf->offset = rrp - etr_buf->hwaddr;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = rwp - rrp;
> +}
> +
> +static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
> +					 u64 offset, size_t len, char **bufpp)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	*bufpp = (char *)flat_buf->vaddr + offset;
> +	/*
> +	 * tmc_etr_buf_get_data already adjusts the length to handle
> +	 * buffer wrapping around.
> +	 */
> +	return len;
> +}
> +
> +static const struct etr_buf_operations etr_flat_buf_ops = {
> +	.alloc = tmc_etr_alloc_flat_buf,
> +	.free = tmc_etr_free_flat_buf,
> +	.sync = tmc_etr_sync_flat_buf,
> +	.get_data = tmc_etr_get_data_flat_buf,
> +};
> +
> +/*
> + * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
> + * appropriately.
> + */
> +static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
> +				struct etr_buf *etr_buf, int node,
> +				void **pages)
> +{
> +	struct etr_sg_table *etr_table;
> +
> +	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
> +					  etr_buf->size, pages);
> +	if (IS_ERR(etr_table))
> +		return -ENOMEM;
> +	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->mode = ETR_MODE_ETR_SG;
> +	etr_buf->private = etr_table;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	if (etr_table) {
> +		tmc_free_sg_table(etr_table->sg_table);
> +		kfree(etr_table);
> +	}
> +}
> +
> +static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
> +				       size_t len, char **bufpp)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
> +}
> +
> +static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	long r_offset, w_offset;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct tmc_sg_table *table = etr_table->sg_table;
> +
> +	/* Convert hw address to offset in the buffer */
> +	r_offset = tmc_sg_get_data_page_offset(table, rrp);
> +	if (r_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RRP %llx to offset\n", rrp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	w_offset = tmc_sg_get_data_page_offset(table, rwp);
> +	if (w_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RWP %llx to offset\n", rwp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +				w_offset - r_offset;
> +	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
> +}
> +
> +static const struct etr_buf_operations etr_sg_buf_ops = {
> +	.alloc = tmc_etr_alloc_sg_buf,
> +	.free = tmc_etr_free_sg_buf,
> +	.sync = tmc_etr_sync_sg_buf,
> +	.get_data = tmc_etr_get_data_sg_buf,
> +};
> +
> +static const struct etr_buf_operations *etr_buf_ops[] = {
> +	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
> +	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +};
> +
> +static inline int tmc_etr_mode_alloc_buf(int mode,
> +					 struct tmc_drvdata *drvdata,
> +					 struct etr_buf *etr_buf, int node,
> +					 void **pages)
> +{
> +	int rc;
> +
> +	switch (mode) {
> +	case ETR_MODE_FLAT:
> +	case ETR_MODE_ETR_SG:
> +		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
> +		if (!rc)
> +			etr_buf->ops = etr_buf_ops[mode];
> +		return rc;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
> + * @drvdata	: ETR device details.
> + * @size	: size of the requested buffer.
> + * @flags	: Required properties for the buffer.
> + * @node	: Node for memory allocations.
> + * @pages	: An optional list of pages.
> + */
> +static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
> +					 ssize_t size, int flags,
> +					 int node, void **pages)
> +{
> +	int rc = -ENOMEM;
> +	bool has_etr_sg, has_iommu;
> +	struct etr_buf *etr_buf;
> +
> +	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
> +	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +
> +	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
> +	if (!etr_buf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	etr_buf->size = size;
> +
> +	/*
> +	 * If we have to use an existing list of pages, we cannot reliably
> +	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
> +	 * we use the contiguous DMA memory if at least one of the following
> +	 * conditions is true:
> +	 *  a) The ETR cannot use Scatter-Gather.
> +	 *  b) we have a backing IOMMU
> +	 *  c) The requested memory size is smaller (< 1M).
> +	 *
> +	 * Fallback to available mechanisms.
> +	 *
> +	 */
> +	if (!pages &&
> +	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc && has_etr_sg)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc) {
> +		kfree(etr_buf);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return etr_buf;
> +}
> +
> +static void tmc_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
> +	etr_buf->ops->free(etr_buf);
> +	kfree(etr_buf);
> +}
> +
> +/*
> + * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
> + * with a maximum of @len bytes.
> + * Returns: The size of the linear data available @pos, with *bufpp
> + * updated to point to the buffer.
> + */
> +static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
> +				    u64 offset, size_t len, char **bufpp)
> +{
> +	/* Adjust the length to limit this transaction to end of buffer */
> +	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
> +
> +	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	ssize_t len;
> +	char *bufp;
> +
> +	len = tmc_etr_buf_get_data(etr_buf, offset,
> +				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> +	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +		return -EINVAL;
> +	coresight_insert_barrier_packet(bufp);
> +	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +}
> +
> +/*
> + * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
> + * Makes sure the trace data is synced to the memory for consumption.
> + * @etr_buf->offset will hold the offset to the beginning of the trace data
> + * within the buffer, with @etr_buf->len bytes to consume.
> + */
> +static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
> +{
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	u64 rrp, rwp;
> +	u32 status;
> +
> +	rrp = tmc_read_rrp(drvdata);
> +	rwp = tmc_read_rwp(drvdata);
> +	status = readl_relaxed(drvdata->base + TMC_STS);
> +	etr_buf->full = status & TMC_STS_FULL;
> +
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
> +
> +	etr_buf->ops->sync(etr_buf, rrp, rwp);
> +
> +	/* Insert barrier packets at the beginning, if there was an overflow */
> +	if (etr_buf->full)
> +		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
>  	CS_UNLOCK(drvdata->base);
>  
>  	/* Wait for TMCSReady bit to be set */
>  	tmc_wait_for_tmcready(drvdata);
>  
> -	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
> +	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
>  	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
>  
>  	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
> @@ -595,16 +894,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  		axictl |= TMC_AXICTL_ARCACHE_OS;
>  	}
>  
> +	if (etr_buf->mode == ETR_MODE_ETR_SG) {
> +		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
> +			return;
> +		axictl |= TMC_AXICTL_SCT_GAT_MODE;
> +	}
> +
>  	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
> -	tmc_write_dba(drvdata, drvdata->paddr);
> +	tmc_write_dba(drvdata, etr_buf->hwaddr);
>  	/*
>  	 * If the TMC pointers must be programmed before the session,
>  	 * we have to set it properly (i.e, RRP/RWP to base address and
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, drvdata->paddr);
> -		tmc_write_rwp(drvdata, drvdata->paddr);
> +		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rwp(drvdata, etr_buf->hwaddr);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> @@ -620,63 +925,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  }
>  
>  /*
> - * Return the available trace data in the buffer @pos, with a maximum
> - * limit of @len, also updating the @bufpp on where to find it.
> + * Return the available trace data in the buffer (starts at etr_buf->offset,
> + * limited by etr_buf->len) from @pos, with a maximum limit of @len,
> + * also updating the @bufpp on where to find it. Since the trace data
> + * starts at anywhere in the buffer, depending on the RRP, we adjust the
> + * @len returned to handle buffer wrapping around.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
>  				loff_t pos, size_t len, char **bufpp)
>  {
> +	s64 offset;
>  	ssize_t actual = len;
> -	char *bufp = drvdata->buf + pos;
> -	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> -
> -	/* Adjust the len to available size @pos */
> -	if (pos + actual > drvdata->len)
> -		actual = drvdata->len - pos;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
> +	if (pos + actual > etr_buf->len)
> +		actual = etr_buf->len - pos;
>  	if (actual <= 0)
>  		return actual;
>  
> -	/*
> -	 * Since we use a circular buffer, with trace data starting
> -	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> -	 * wrap the current @pos to within the buffer.
> -	 */
> -	if (bufp >= bufend)
> -		bufp -= drvdata->size;
> -	/*
> -	 * For simplicity, avoid copying over a wrapped around buffer.
> -	 */
> -	if ((bufp + actual) > bufend)
> -		actual = bufend - bufp;
> -	*bufpp = bufp;
> -	return actual;
> +	/* Compute the offset from which we read the data */
> +	offset = etr_buf->offset + pos;
> +	if (offset >= etr_buf->size)
> +		offset -= etr_buf->size;
> +	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
>  }
>  
> -static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
> +static struct etr_buf *
> +tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	u32 val;
> -	u64 rwp;
> +	return tmc_alloc_etr_buf(drvdata, drvdata->size,
> +				 0, cpu_to_node(0), NULL);
> +}
>  
> -	rwp = tmc_read_rwp(drvdata);
> -	val = readl_relaxed(drvdata->base + TMC_STS);
> +static void
> +tmc_etr_free_sysfs_buf(struct etr_buf *buf)
> +{
> +	if (buf)
> +		tmc_free_etr_buf(buf);
> +}
>  
> -	/*
> -	 * Adjust the buffer to point to the beginning of the trace data
> -	 * and update the available trace data.
> -	 */
> -	if (val & TMC_STS_FULL) {
> -		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
> -		drvdata->len = drvdata->size;
> -		coresight_insert_barrier_packet(drvdata->buf);
> -	} else {
> -		drvdata->buf = drvdata->vaddr;
> -		drvdata->len = rwp - drvdata->paddr;
> -	}
> +static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
> +{
> +	tmc_sync_etr_buf(drvdata);
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  {
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	tmc_flush_and_stop(drvdata);
> @@ -685,7 +980,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	 * read before the TMC is disabled.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
> -		tmc_etr_dump_hw(drvdata);
> +		tmc_etr_sync_sysfs_buf(drvdata);
> +
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> @@ -696,34 +992,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	int ret = 0;
>  	bool used = false;
>  	unsigned long flags;
> -	void __iomem *vaddr = NULL;
> -	dma_addr_t paddr;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_buf *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> -	 * If we don't have a buffer release the lock and allocate memory.
> -	 * Otherwise keep the lock and move along.
> +	 * If we are enabling the ETR from disabled state, we need to make
> +	 * sure we have a buffer with the right size. The etr_buf is not reset
> +	 * immediately after we stop the tracing in SYSFS mode as we wait for
> +	 * the user to collect the data. We may be able to reuse the existing
> +	 * buffer, provided the size matches. Any allocation has to be done
> +	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->vaddr) {
> +	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -
> -		/*
> -		 * Contiguous  memory can't be allocated while a spinlock is
> -		 * held.  As such allocate memory here and free it if a buffer
> -		 * has already been allocated (from a previous session).
> -		 */
> -		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
> -					   &paddr, GFP_KERNEL);
> -		if (!vaddr)
> -			return -ENOMEM;
> +		/* Allocate memory with the spinlock released */
> +		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> +		if (IS_ERR(new_buf))
> +			return PTR_ERR(new_buf);
>  
>  		/* Let's try again */
>  		spin_lock_irqsave(&drvdata->spinlock, flags);
>  	}
>  
> -	if (drvdata->reading) {
> +	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -731,21 +1024,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	/*
>  	 * In sysFS mode we can have multiple writers per sink.  Since this
>  	 * sink is already enabled no memory is needed and the HW need not be
> -	 * touched.
> +	 * touched, even if the buffer size has changed.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		goto out;
>  
>  	/*
> -	 * If drvdata::buf == NULL, use the memory allocated above.
> -	 * Otherwise a buffer still exists from a previous session, so
> -	 * simply use that.
> +	 * If we don't have a buffer or it doesn't match the requested size,
> +	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (drvdata->buf == NULL) {
> +	if (!drvdata->etr_buf ||
> +	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
>  		used = true;

With the introduction of variable free_buf, this is no longer needed.

> -		drvdata->vaddr = vaddr;
> -		drvdata->paddr = paddr;
> -		drvdata->buf = drvdata->vaddr;
> +		free_buf = drvdata->etr_buf;
> +		drvdata->etr_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> @@ -754,8 +1046,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free memory outside the spinlock if need be */
> -	if (!used && vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (free_buf)
> +		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
>  		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> @@ -834,8 +1126,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::buf is NULL the trace data has been read already */
> -	if (drvdata->buf == NULL) {
> +	/* If drvdata::etr_buf is NULL the trace data has been read already */
> +	if (drvdata->etr_buf == NULL) {

As I pointed out during my last review this hunk doesn't apply on my next
branch and as such can't review this part.

>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -854,8 +1146,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	dma_addr_t paddr;
> -	void __iomem *vaddr = NULL;
> +	struct etr_buf *etr_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -876,17 +1167,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		vaddr = drvdata->vaddr;
> -		paddr = drvdata->paddr;
> -		drvdata->buf = drvdata->vaddr = NULL;
> +		etr_buf =  drvdata->etr_buf;
> +		drvdata->etr_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (etr_buf)
> +		tmc_free_etr_buf(etr_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 19a765c..c00643c 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -55,6 +55,7 @@
>  #define TMC_STS_TMCREADY_BIT	2
>  #define TMC_STS_FULL		BIT(0)
>  #define TMC_STS_TRIGGERED	BIT(1)
> +
>  /*
>   * TMC_AXICTL - 0x110
>   *
> @@ -134,6 +135,35 @@ enum tmc_mem_intf_width {
>  #define CORESIGHT_SOC_600_ETR_CAPS	\
>  	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
>  
> +enum etr_mode {
> +	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
> +	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +};
> +
> +struct etr_buf_operations;
> +
> +/**
> + * struct etr_buf - Details of the buffer used by ETR
> + * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
> + * @full	: Trace data overflow
> + * @size	: Size of the buffer.
> + * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
> + * @offset	: Offset of the trace data in the buffer for consumption.
> + * @len		: Available trace data @buf (may round up to the beginning).
> + * @ops		: ETR buffer operations for the mode.
> + * @private	: Backend specific information for the buf
> + */
> +struct etr_buf {
> +	enum etr_mode			mode;
> +	bool				full;
> +	ssize_t				size;
> +	dma_addr_t			hwaddr;
> +	unsigned long			offset;
> +	s64				len;
> +	const struct etr_buf_operations	*ops;
> +	void				*private;
> +};
> +
>  /**
>   * struct tmc_drvdata - specifics associated to an TMC component
>   * @base:	memory mapped base address for this component.
> @@ -141,11 +171,10 @@ enum tmc_mem_intf_width {
>   * @csdev:	component vitals needed by the framework.
>   * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
>   * @spinlock:	only one at a time pls.
> - * @buf:	area of memory where trace data get sent.
> - * @paddr:	DMA start location in RAM.
> - * @vaddr:	virtual representation of @paddr.
> - * @size:	trace buffer size.
> - * @len:	size of the available trace.
> + * @buf:	Snapshot of the trace data for ETF/ETB.
> + * @etr_buf:	details of buffer used in TMC-ETR
> + * @len:	size of the available trace for ETF/ETB.
> + * @size:	trace buffer size for this TMC (common for all modes).
>   * @mode:	how this TMC is being used.
>   * @config_type: TMC variant, must be of type @tmc_config_type.
>   * @memwidth:	width of the memory interface databus, in bytes.
> @@ -160,11 +189,12 @@ struct tmc_drvdata {
>  	struct miscdevice	miscdev;
>  	spinlock_t		spinlock;
>  	bool			reading;
> -	char			*buf;
> -	dma_addr_t		paddr;
> -	void __iomem		*vaddr;
> -	u32			size;
> +	union {
> +		char		*buf;		/* TMC ETB */
> +		struct etr_buf	*etr_buf;	/* TMC ETR */
> +	};
>  	u32			len;
> +	u32			size;
>  	u32			mode;
>  	enum tmc_config_type	config_type;
>  	enum tmc_mem_intf_width	memwidth;
> @@ -172,6 +202,15 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +struct etr_buf_operations {
> +	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +			int node, void **pages);
> +	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
> +				char **bufpp);
> +	void (*free)(struct etr_buf *etr_buf);
> +};
> +
>  /**
>   * struct tmc_pages - Collection of pages used for SG.
>   * @nr_pages:		Number of pages in the list.
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 10/11] coresight: tmc-etr: Add transparent buffer management
@ 2018-05-24 19:56     ` Mathieu Poirier
  0 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-24 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 18, 2018 at 05:39:26PM +0100, Suzuki K Poulose wrote:
> At the moment we always use contiguous memory for TMC ETR tracing
> when used from sysfs. The size of the buffer is fixed at boot time
> and can only be changed by modifiying the DT. With the introduction
> of SG support we could support really large buffers in that mode.
> This patch abstracts the buffer used for ETR to switch between a
> contiguous buffer or a SG table depending on the availability of
> the memory.
> 
> This also enables the sysfs mode to use the ETR in SG mode depending
> on configured the trace buffer size. Also, since ETR will use the
> new infrastructure to manage the buffer, we can get rid of some
> of the members in the tmc_drvdata and clean up the fields a bit.

Upon first reading this changelog I thought this patch does way too many things
but after looking at the content it isn't the case.  We could try to split the
patch by moving the introduction of the SG operations to another patch but it
would save about 60 lines, which is hardly worth it.  

As it stands now it is alsmost guaranted other reviewers will ask you to split
your work.  Perhaps rephrasing the changelog to concentrate on the global idea
of what the patch does will help (just my personnal opinion).  

> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 450 +++++++++++++++++++-----
>  drivers/hwtracing/coresight/coresight-tmc.h     |  57 ++-
>  2 files changed, 418 insertions(+), 89 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 7ab0fd1..143afba 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -17,10 +17,18 @@
>  
>  #include <linux/coresight.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
>  #include <linux/slab.h>
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> +struct etr_flat_buf {
> +	struct device	*dev;
> +	dma_addr_t	daddr;
> +	void		*vaddr;
> +	size_t		size;
> +};
> +
>  /*
>   * The TMC ETR SG has a page size of 4K. The SG table contains pointers
>   * to 4KB buffers. However, the OS may use a PAGE_SIZE different from
> @@ -541,7 +549,7 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table)
>   * @size	- Total size of the data buffer
>   * @pages	- Optional list of page virtual address
>   */
> -static struct etr_sg_table __maybe_unused *
> +static struct etr_sg_table *
>  tmc_init_etr_sg_table(struct device *dev, int node,
>  		  unsigned long size, void **pages)
>  {
> @@ -573,16 +581,307 @@ tmc_init_etr_sg_table(struct device *dev, int node,
>  	return etr_table;
>  }
>  
> +/*
> + * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer.
> + */
> +static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
> +				  struct etr_buf *etr_buf, int node,
> +				  void **pages)
> +{
> +	struct etr_flat_buf *flat_buf;
> +
> +	/* We cannot reuse existing pages for flat buf */
> +	if (pages)
> +		return -EINVAL;

Perfect.

> +
> +	flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL);
> +	if (!flat_buf)
> +		return -ENOMEM;
> +
> +	flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size,
> +					   &flat_buf->daddr, GFP_KERNEL);

Indendation of the second line.

> +	if (!flat_buf->vaddr) {
> +		kfree(flat_buf);
> +		return -ENOMEM;
> +	}
> +
> +	flat_buf->size = etr_buf->size;
> +	flat_buf->dev = drvdata->dev;
> +	etr_buf->hwaddr = flat_buf->daddr;
> +	etr_buf->mode = ETR_MODE_FLAT;
> +	etr_buf->private = flat_buf;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	if (flat_buf && flat_buf->daddr)
> +		dma_free_coherent(flat_buf->dev, flat_buf->size,
> +				  flat_buf->vaddr, flat_buf->daddr);
> +	kfree(flat_buf);
> +}
> +
> +static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	/*
> +	 * Adjust the buffer to point to the beginning of the trace data
> +	 * and update the available trace data.
> +	 */
> +	etr_buf->offset = rrp - etr_buf->hwaddr;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = rwp - rrp;
> +}
> +
> +static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
> +					 u64 offset, size_t len, char **bufpp)
> +{
> +	struct etr_flat_buf *flat_buf = etr_buf->private;
> +
> +	*bufpp = (char *)flat_buf->vaddr + offset;
> +	/*
> +	 * tmc_etr_buf_get_data already adjusts the length to handle
> +	 * buffer wrapping around.
> +	 */
> +	return len;
> +}
> +
> +static const struct etr_buf_operations etr_flat_buf_ops = {
> +	.alloc = tmc_etr_alloc_flat_buf,
> +	.free = tmc_etr_free_flat_buf,
> +	.sync = tmc_etr_sync_flat_buf,
> +	.get_data = tmc_etr_get_data_flat_buf,
> +};
> +
> +/*
> + * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters
> + * appropriately.
> + */
> +static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata,
> +				struct etr_buf *etr_buf, int node,
> +				void **pages)
> +{
> +	struct etr_sg_table *etr_table;
> +
> +	etr_table = tmc_init_etr_sg_table(drvdata->dev, node,
> +					  etr_buf->size, pages);
> +	if (IS_ERR(etr_table))
> +		return -ENOMEM;
> +	etr_buf->hwaddr = etr_table->hwaddr;
> +	etr_buf->mode = ETR_MODE_ETR_SG;
> +	etr_buf->private = etr_table;
> +	return 0;
> +}
> +
> +static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	if (etr_table) {
> +		tmc_free_sg_table(etr_table->sg_table);
> +		kfree(etr_table);
> +	}
> +}
> +
> +static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset,
> +				       size_t len, char **bufpp)
> +{
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +
> +	return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp);
> +}
> +
> +static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> +{
> +	long r_offset, w_offset;
> +	struct etr_sg_table *etr_table = etr_buf->private;
> +	struct tmc_sg_table *table = etr_table->sg_table;
> +
> +	/* Convert hw address to offset in the buffer */
> +	r_offset = tmc_sg_get_data_page_offset(table, rrp);
> +	if (r_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RRP %llx to offset\n", rrp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	w_offset = tmc_sg_get_data_page_offset(table, rwp);
> +	if (w_offset < 0) {
> +		dev_warn(table->dev,
> +			 "Unable to map RWP %llx to offset\n", rwp);
> +		etr_buf->len = 0;
> +		return;
> +	}
> +
> +	etr_buf->offset = r_offset;
> +	if (etr_buf->full)
> +		etr_buf->len = etr_buf->size;
> +	else
> +		etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) +
> +				w_offset - r_offset;
> +	tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len);
> +}
> +
> +static const struct etr_buf_operations etr_sg_buf_ops = {
> +	.alloc = tmc_etr_alloc_sg_buf,
> +	.free = tmc_etr_free_sg_buf,
> +	.sync = tmc_etr_sync_sg_buf,
> +	.get_data = tmc_etr_get_data_sg_buf,
> +};
> +
> +static const struct etr_buf_operations *etr_buf_ops[] = {
> +	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
> +	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
> +};
> +
> +static inline int tmc_etr_mode_alloc_buf(int mode,
> +					 struct tmc_drvdata *drvdata,
> +					 struct etr_buf *etr_buf, int node,
> +					 void **pages)
> +{
> +	int rc;
> +
> +	switch (mode) {
> +	case ETR_MODE_FLAT:
> +	case ETR_MODE_ETR_SG:
> +		rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages);
> +		if (!rc)
> +			etr_buf->ops = etr_buf_ops[mode];
> +		return rc;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * tmc_alloc_etr_buf: Allocate a buffer use by ETR.
> + * @drvdata	: ETR device details.
> + * @size	: size of the requested buffer.
> + * @flags	: Required properties for the buffer.
> + * @node	: Node for memory allocations.
> + * @pages	: An optional list of pages.
> + */
> +static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata,
> +					 ssize_t size, int flags,
> +					 int node, void **pages)
> +{
> +	int rc = -ENOMEM;
> +	bool has_etr_sg, has_iommu;
> +	struct etr_buf *etr_buf;
> +
> +	has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG);
> +	has_iommu = iommu_get_domain_for_dev(drvdata->dev);
> +
> +	etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL);
> +	if (!etr_buf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	etr_buf->size = size;
> +
> +	/*
> +	 * If we have to use an existing list of pages, we cannot reliably
> +	 * use a contiguous DMA memory (even if we have an IOMMU). Otherwise,
> +	 * we use the contiguous DMA memory if at least one of the following
> +	 * conditions is true:
> +	 *  a) The ETR cannot use Scatter-Gather.
> +	 *  b) we have a backing IOMMU
> +	 *  c) The requested memory size is smaller (< 1M).
> +	 *
> +	 * Fallback to available mechanisms.
> +	 *
> +	 */
> +	if (!pages &&
> +	    (!has_etr_sg || has_iommu || size < SZ_1M))
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc && has_etr_sg)
> +		rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata,
> +					    etr_buf, node, pages);
> +	if (rc) {
> +		kfree(etr_buf);
> +		return ERR_PTR(rc);
> +	}
> +
> +	return etr_buf;
> +}
> +
> +static void tmc_free_etr_buf(struct etr_buf *etr_buf)
> +{
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->free);
> +	etr_buf->ops->free(etr_buf);
> +	kfree(etr_buf);
> +}
> +
> +/*
> + * tmc_etr_buf_get_data: Get the pointer the trace data at @offset
> + * with a maximum of @len bytes.
> + * Returns: The size of the linear data available @pos, with *bufpp
> + * updated to point to the buffer.
> + */
> +static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf,
> +				    u64 offset, size_t len, char **bufpp)
> +{
> +	/* Adjust the length to limit this transaction to end of buffer */
> +	len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset;
> +
> +	return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp);
> +}
> +
> +static inline s64
> +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset)
> +{
> +	ssize_t len;
> +	char *bufp;
> +
> +	len = tmc_etr_buf_get_data(etr_buf, offset,
> +				   CORESIGHT_BARRIER_PKT_SIZE, &bufp);
> +	if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE))
> +		return -EINVAL;
> +	coresight_insert_barrier_packet(bufp);
> +	return offset + CORESIGHT_BARRIER_PKT_SIZE;
> +}
> +
> +/*
> + * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata.
> + * Makes sure the trace data is synced to the memory for consumption.
> + * @etr_buf->offset will hold the offset to the beginning of the trace data
> + * within the buffer, with @etr_buf->len bytes to consume.
> + */
> +static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
> +{
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
> +	u64 rrp, rwp;
> +	u32 status;
> +
> +	rrp = tmc_read_rrp(drvdata);
> +	rwp = tmc_read_rwp(drvdata);
> +	status = readl_relaxed(drvdata->base + TMC_STS);
> +	etr_buf->full = status & TMC_STS_FULL;
> +
> +	WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
> +
> +	etr_buf->ops->sync(etr_buf, rrp, rwp);
> +
> +	/* Insert barrier packets at the beginning, if there was an overflow */
> +	if (etr_buf->full)
> +		tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
> +}
> +
>  static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  {
>  	u32 axictl, sts;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
>  	CS_UNLOCK(drvdata->base);
>  
>  	/* Wait for TMCSReady bit to be set */
>  	tmc_wait_for_tmcready(drvdata);
>  
> -	writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ);
> +	writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ);
>  	writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
>  
>  	axictl = readl_relaxed(drvdata->base + TMC_AXICTL);
> @@ -595,16 +894,22 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  		axictl |= TMC_AXICTL_ARCACHE_OS;
>  	}
>  
> +	if (etr_buf->mode == ETR_MODE_ETR_SG) {
> +		if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG)))
> +			return;
> +		axictl |= TMC_AXICTL_SCT_GAT_MODE;
> +	}
> +
>  	writel_relaxed(axictl, drvdata->base + TMC_AXICTL);
> -	tmc_write_dba(drvdata, drvdata->paddr);
> +	tmc_write_dba(drvdata, etr_buf->hwaddr);
>  	/*
>  	 * If the TMC pointers must be programmed before the session,
>  	 * we have to set it properly (i.e, RRP/RWP to base address and
>  	 * STS to "not full").
>  	 */
>  	if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) {
> -		tmc_write_rrp(drvdata, drvdata->paddr);
> -		tmc_write_rwp(drvdata, drvdata->paddr);
> +		tmc_write_rrp(drvdata, etr_buf->hwaddr);
> +		tmc_write_rwp(drvdata, etr_buf->hwaddr);
>  		sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL;
>  		writel_relaxed(sts, drvdata->base + TMC_STS);
>  	}
> @@ -620,63 +925,53 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
>  }
>  
>  /*
> - * Return the available trace data in the buffer @pos, with a maximum
> - * limit of @len, also updating the @bufpp on where to find it.
> + * Return the available trace data in the buffer (starts at etr_buf->offset,
> + * limited by etr_buf->len) from @pos, with a maximum limit of @len,
> + * also updating the @bufpp on where to find it. Since the trace data
> + * starts at anywhere in the buffer, depending on the RRP, we adjust the
> + * @len returned to handle buffer wrapping around.
>   */
>  ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
>  				loff_t pos, size_t len, char **bufpp)
>  {
> +	s64 offset;
>  	ssize_t actual = len;
> -	char *bufp = drvdata->buf + pos;
> -	char *bufend = (char *)(drvdata->vaddr + drvdata->size);
> -
> -	/* Adjust the len to available size @pos */
> -	if (pos + actual > drvdata->len)
> -		actual = drvdata->len - pos;
> +	struct etr_buf *etr_buf = drvdata->etr_buf;
>  
> +	if (pos + actual > etr_buf->len)
> +		actual = etr_buf->len - pos;
>  	if (actual <= 0)
>  		return actual;
>  
> -	/*
> -	 * Since we use a circular buffer, with trace data starting
> -	 * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr,
> -	 * wrap the current @pos to within the buffer.
> -	 */
> -	if (bufp >= bufend)
> -		bufp -= drvdata->size;
> -	/*
> -	 * For simplicity, avoid copying over a wrapped around buffer.
> -	 */
> -	if ((bufp + actual) > bufend)
> -		actual = bufend - bufp;
> -	*bufpp = bufp;
> -	return actual;
> +	/* Compute the offset from which we read the data */
> +	offset = etr_buf->offset + pos;
> +	if (offset >= etr_buf->size)
> +		offset -= etr_buf->size;
> +	return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp);
>  }
>  
> -static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
> +static struct etr_buf *
> +tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata)
>  {
> -	u32 val;
> -	u64 rwp;
> +	return tmc_alloc_etr_buf(drvdata, drvdata->size,
> +				 0, cpu_to_node(0), NULL);
> +}
>  
> -	rwp = tmc_read_rwp(drvdata);
> -	val = readl_relaxed(drvdata->base + TMC_STS);
> +static void
> +tmc_etr_free_sysfs_buf(struct etr_buf *buf)
> +{
> +	if (buf)
> +		tmc_free_etr_buf(buf);
> +}
>  
> -	/*
> -	 * Adjust the buffer to point to the beginning of the trace data
> -	 * and update the available trace data.
> -	 */
> -	if (val & TMC_STS_FULL) {
> -		drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr;
> -		drvdata->len = drvdata->size;
> -		coresight_insert_barrier_packet(drvdata->buf);
> -	} else {
> -		drvdata->buf = drvdata->vaddr;
> -		drvdata->len = rwp - drvdata->paddr;
> -	}
> +static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
> +{
> +	tmc_sync_etr_buf(drvdata);
>  }
>  
>  static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  {
> +
>  	CS_UNLOCK(drvdata->base);
>  
>  	tmc_flush_and_stop(drvdata);
> @@ -685,7 +980,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	 * read before the TMC is disabled.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
> -		tmc_etr_dump_hw(drvdata);
> +		tmc_etr_sync_sysfs_buf(drvdata);
> +
>  	tmc_disable_hw(drvdata);
>  
>  	CS_LOCK(drvdata->base);
> @@ -696,34 +992,31 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	int ret = 0;
>  	bool used = false;
>  	unsigned long flags;
> -	void __iomem *vaddr = NULL;
> -	dma_addr_t paddr;
>  	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etr_buf *new_buf = NULL, *free_buf = NULL;
>  
>  
>  	/*
> -	 * If we don't have a buffer release the lock and allocate memory.
> -	 * Otherwise keep the lock and move along.
> +	 * If we are enabling the ETR from disabled state, we need to make
> +	 * sure we have a buffer with the right size. The etr_buf is not reset
> +	 * immediately after we stop the tracing in SYSFS mode as we wait for
> +	 * the user to collect the data. We may be able to reuse the existing
> +	 * buffer, provided the size matches. Any allocation has to be done
> +	 * with the lock released.
>  	 */
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
> -	if (!drvdata->vaddr) {
> +	if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) {
>  		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -
> -		/*
> -		 * Contiguous  memory can't be allocated while a spinlock is
> -		 * held.  As such allocate memory here and free it if a buffer
> -		 * has already been allocated (from a previous session).
> -		 */
> -		vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size,
> -					   &paddr, GFP_KERNEL);
> -		if (!vaddr)
> -			return -ENOMEM;
> +		/* Allocate memory with the spinlock released */
> +		free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata);
> +		if (IS_ERR(new_buf))
> +			return PTR_ERR(new_buf);
>  
>  		/* Let's try again */
>  		spin_lock_irqsave(&drvdata->spinlock, flags);
>  	}
>  
> -	if (drvdata->reading) {
> +	if (drvdata->reading || drvdata->mode == CS_MODE_PERF) {
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -731,21 +1024,20 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	/*
>  	 * In sysFS mode we can have multiple writers per sink.  Since this
>  	 * sink is already enabled no memory is needed and the HW need not be
> -	 * touched.
> +	 * touched, even if the buffer size has changed.
>  	 */
>  	if (drvdata->mode == CS_MODE_SYSFS)
>  		goto out;
>  
>  	/*
> -	 * If drvdata::buf == NULL, use the memory allocated above.
> -	 * Otherwise a buffer still exists from a previous session, so
> -	 * simply use that.
> +	 * If we don't have a buffer or it doesn't match the requested size,
> +	 * use the memory allocated above. Otherwise reuse it.
>  	 */
> -	if (drvdata->buf == NULL) {
> +	if (!drvdata->etr_buf ||
> +	    (new_buf && drvdata->etr_buf->size != new_buf->size)) {
>  		used = true;

With the introduction of variable free_buf, this is no longer needed.

> -		drvdata->vaddr = vaddr;
> -		drvdata->paddr = paddr;
> -		drvdata->buf = drvdata->vaddr;
> +		free_buf = drvdata->etr_buf;
> +		drvdata->etr_buf = new_buf;
>  	}
>  
>  	drvdata->mode = CS_MODE_SYSFS;
> @@ -754,8 +1046,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free memory outside the spinlock if need be */
> -	if (!used && vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (free_buf)
> +		tmc_etr_free_sysfs_buf(free_buf);
>  
>  	if (!ret)
>  		dev_info(drvdata->dev, "TMC-ETR enabled\n");
> @@ -834,8 +1126,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  		goto out;
>  	}
>  
> -	/* If drvdata::buf is NULL the trace data has been read already */
> -	if (drvdata->buf == NULL) {
> +	/* If drvdata::etr_buf is NULL the trace data has been read already */
> +	if (drvdata->etr_buf == NULL) {

As I pointed out during my last review this hunk doesn't apply on my next
branch and as such can't review this part.

>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -854,8 +1146,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
>  int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  {
>  	unsigned long flags;
> -	dma_addr_t paddr;
> -	void __iomem *vaddr = NULL;
> +	struct etr_buf *etr_buf = NULL;
>  
>  	/* config types are set a boot time and never change */
>  	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR))
> @@ -876,17 +1167,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
>  		 * The ETR is not tracing and the buffer was just read.
>  		 * As such prepare to free the trace buffer.
>  		 */
> -		vaddr = drvdata->vaddr;
> -		paddr = drvdata->paddr;
> -		drvdata->buf = drvdata->vaddr = NULL;
> +		etr_buf =  drvdata->etr_buf;
> +		drvdata->etr_buf = NULL;
>  	}
>  
>  	drvdata->reading = false;
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
>  
>  	/* Free allocated memory out side of the spinlock */
> -	if (vaddr)
> -		dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr);
> +	if (etr_buf)
> +		tmc_free_etr_buf(etr_buf);
>  
>  	return 0;
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index 19a765c..c00643c 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -55,6 +55,7 @@
>  #define TMC_STS_TMCREADY_BIT	2
>  #define TMC_STS_FULL		BIT(0)
>  #define TMC_STS_TRIGGERED	BIT(1)
> +
>  /*
>   * TMC_AXICTL - 0x110
>   *
> @@ -134,6 +135,35 @@ enum tmc_mem_intf_width {
>  #define CORESIGHT_SOC_600_ETR_CAPS	\
>  	(TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
>  
> +enum etr_mode {
> +	ETR_MODE_FLAT,		/* Uses contiguous flat buffer */
> +	ETR_MODE_ETR_SG,	/* Uses in-built TMC ETR SG mechanism */
> +};
> +
> +struct etr_buf_operations;
> +
> +/**
> + * struct etr_buf - Details of the buffer used by ETR
> + * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
> + * @full	: Trace data overflow
> + * @size	: Size of the buffer.
> + * @hwaddr	: Address to be programmed in the TMC:DBA{LO,HI}
> + * @offset	: Offset of the trace data in the buffer for consumption.
> + * @len		: Available trace data @buf (may round up to the beginning).
> + * @ops		: ETR buffer operations for the mode.
> + * @private	: Backend specific information for the buf
> + */
> +struct etr_buf {
> +	enum etr_mode			mode;
> +	bool				full;
> +	ssize_t				size;
> +	dma_addr_t			hwaddr;
> +	unsigned long			offset;
> +	s64				len;
> +	const struct etr_buf_operations	*ops;
> +	void				*private;
> +};
> +
>  /**
>   * struct tmc_drvdata - specifics associated to an TMC component
>   * @base:	memory mapped base address for this component.
> @@ -141,11 +171,10 @@ enum tmc_mem_intf_width {
>   * @csdev:	component vitals needed by the framework.
>   * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
>   * @spinlock:	only one at a time pls.
> - * @buf:	area of memory where trace data get sent.
> - * @paddr:	DMA start location in RAM.
> - * @vaddr:	virtual representation of @paddr.
> - * @size:	trace buffer size.
> - * @len:	size of the available trace.
> + * @buf:	Snapshot of the trace data for ETF/ETB.
> + * @etr_buf:	details of buffer used in TMC-ETR
> + * @len:	size of the available trace for ETF/ETB.
> + * @size:	trace buffer size for this TMC (common for all modes).
>   * @mode:	how this TMC is being used.
>   * @config_type: TMC variant, must be of type @tmc_config_type.
>   * @memwidth:	width of the memory interface databus, in bytes.
> @@ -160,11 +189,12 @@ struct tmc_drvdata {
>  	struct miscdevice	miscdev;
>  	spinlock_t		spinlock;
>  	bool			reading;
> -	char			*buf;
> -	dma_addr_t		paddr;
> -	void __iomem		*vaddr;
> -	u32			size;
> +	union {
> +		char		*buf;		/* TMC ETB */
> +		struct etr_buf	*etr_buf;	/* TMC ETR */
> +	};
>  	u32			len;
> +	u32			size;
>  	u32			mode;
>  	enum tmc_config_type	config_type;
>  	enum tmc_mem_intf_width	memwidth;
> @@ -172,6 +202,15 @@ struct tmc_drvdata {
>  	u32			etr_caps;
>  };
>  
> +struct etr_buf_operations {
> +	int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf,
> +			int node, void **pages);
> +	void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp);
> +	ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len,
> +				char **bufpp);
> +	void (*free)(struct etr_buf *etr_buf);
> +};
> +
>  /**
>   * struct tmc_pages - Collection of pages used for SG.
>   * @nr_pages:		Number of pages in the list.
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/11] coresight: Add generic TMC sg table framework
  2018-05-23 20:25     ` Mathieu Poirier
@ 2018-05-25 16:07       ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-25 16:07 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland

On 23/05/18 21:25, Mathieu Poirier wrote:
> On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
>> This patch introduces a generic sg table data structure and
>> associated operations. An SG table can be used to map a set
>> of Data pages where the trace data could be stored by the TMC
>> ETR. The information about the data pages could be stored in
>> different formats, depending on the type of the underlying
>> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
>> structure provides book keeping of the pages used for the data
>> as well as the table contents. The table should be filled by
>> the user of the infrastructure.
>>
>> A table can be created by specifying the number of data pages
>> as well as the number of table pages required to hold the
>> pointers, where the latter could be different for different
>> types of tables. The pages are mapped in the appropriate dma
>> data direction mode (i.e, DMA_TO_DEVICE for table pages
>> and DMA_FROM_DEVICE for data pages).  The framework can optionally
>> accept a set of allocated data pages (e.g, perf ring buffer) and
>> map them accordingly. The table and data pages are vmap'ed to allow
>> easier access by the drivers. The framework also provides helpers to
>> sync the data written to the pages with appropriate directions.
>>
>> This will be later used by the TMC ETR SG unit and CATU.
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v1:
>>   - Address code style issues, more comments
>> ---
>>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>>   drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>>   2 files changed, 340 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 9780798..1e844f8 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -17,9 +17,299 @@


>> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>> +		return 0;
>> +	return sg_table->table_daddr;
>> +}
>> +
>> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>> +		return NULL;
>> +	return sg_table->table_vaddr;
>> +}
> 
> The above two functions deal with DMA'able and virtual addresses for the table
> page buffer.  Yet the test in the WARN_ON is done on the data page array.
> Shouldn't this be sg_table->table_pages.pages[0] instead?

The table is as good as empty if there are no data pages associated with
the table. Hence the data_pages check.

> 
> If not please add a comment justifying your position so that someone else
> looking at the code does't end up thinking the same in a year from now.

I will add a comment to reflect the above.

> 
>> +
>> +static inline void *
>> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
>> +		return 0;
>> +	return sg_table->data_vaddr;
>> +}
> 
> I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
> returning the virtual address of the contiguous buffer for table and data
> respectively.  Yet there is a discrepency in the naming convention.  I would
> have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
> there is a little symmetry between them.

Agree. I will fix it.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
@ 2018-05-25 16:07       ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-25 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/05/18 21:25, Mathieu Poirier wrote:
> On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
>> This patch introduces a generic sg table data structure and
>> associated operations. An SG table can be used to map a set
>> of Data pages where the trace data could be stored by the TMC
>> ETR. The information about the data pages could be stored in
>> different formats, depending on the type of the underlying
>> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
>> structure provides book keeping of the pages used for the data
>> as well as the table contents. The table should be filled by
>> the user of the infrastructure.
>>
>> A table can be created by specifying the number of data pages
>> as well as the number of table pages required to hold the
>> pointers, where the latter could be different for different
>> types of tables. The pages are mapped in the appropriate dma
>> data direction mode (i.e, DMA_TO_DEVICE for table pages
>> and DMA_FROM_DEVICE for data pages).  The framework can optionally
>> accept a set of allocated data pages (e.g, perf ring buffer) and
>> map them accordingly. The table and data pages are vmap'ed to allow
>> easier access by the drivers. The framework also provides helpers to
>> sync the data written to the pages with appropriate directions.
>>
>> This will be later used by the TMC ETR SG unit and CATU.
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Changes since v1:
>>   - Address code style issues, more comments
>> ---
>>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>>   drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>>   2 files changed, 340 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> index 9780798..1e844f8 100644
>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>> @@ -17,9 +17,299 @@


>> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>> +		return 0;
>> +	return sg_table->table_daddr;
>> +}
>> +
>> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>> +		return NULL;
>> +	return sg_table->table_vaddr;
>> +}
> 
> The above two functions deal with DMA'able and virtual addresses for the table
> page buffer.  Yet the test in the WARN_ON is done on the data page array.
> Shouldn't this be sg_table->table_pages.pages[0] instead?

The table is as good as empty if there are no data pages associated with
the table. Hence the data_pages check.

> 
> If not please add a comment justifying your position so that someone else
> looking at the code does't end up thinking the same in a year from now.

I will add a comment to reflect the above.

> 
>> +
>> +static inline void *
>> +tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
>> +{
>> +	if (WARN_ON(!sg_table->data_pages.nr_pages))
>> +		return 0;
>> +	return sg_table->data_vaddr;
>> +}
> 
> I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
> returning the virtual address of the contiguous buffer for table and data
> respectively.  Yet there is a discrepency in the naming convention.  I would
> have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
> there is a little symmetry between them.

Agree. I will fix it.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/11] coresight: Add generic TMC sg table framework
  2018-05-25 16:07       ` Suzuki K Poulose
@ 2018-05-25 16:43         ` Mathieu Poirier
  -1 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-25 16:43 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland

On Fri, May 25, 2018 at 05:07:07PM +0100, Suzuki K Poulose wrote:
> On 23/05/18 21:25, Mathieu Poirier wrote:
> >On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
> >>This patch introduces a generic sg table data structure and
> >>associated operations. An SG table can be used to map a set
> >>of Data pages where the trace data could be stored by the TMC
> >>ETR. The information about the data pages could be stored in
> >>different formats, depending on the type of the underlying
> >>SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> >>structure provides book keeping of the pages used for the data
> >>as well as the table contents. The table should be filled by
> >>the user of the infrastructure.
> >>
> >>A table can be created by specifying the number of data pages
> >>as well as the number of table pages required to hold the
> >>pointers, where the latter could be different for different
> >>types of tables. The pages are mapped in the appropriate dma
> >>data direction mode (i.e, DMA_TO_DEVICE for table pages
> >>and DMA_FROM_DEVICE for data pages).  The framework can optionally
> >>accept a set of allocated data pages (e.g, perf ring buffer) and
> >>map them accordingly. The table and data pages are vmap'ed to allow
> >>easier access by the drivers. The framework also provides helpers to
> >>sync the data written to the pages with appropriate directions.
> >>
> >>This will be later used by the TMC ETR SG unit and CATU.
> >>
> >>Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>---
> >>Changes since v1:
> >>  - Address code style issues, more comments
> >>---
> >>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
> >>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
> >>  2 files changed, 340 insertions(+)
> >>
> >>diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>index 9780798..1e844f8 100644
> >>--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>@@ -17,9 +17,299 @@
> 
> 
> >>+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.pages[0]))
> >>+		return 0;
> >>+	return sg_table->table_daddr;
> >>+}
> >>+
> >>+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.pages[0]))
> >>+		return NULL;
> >>+	return sg_table->table_vaddr;
> >>+}
> >
> >The above two functions deal with DMA'able and virtual addresses for the table
> >page buffer.  Yet the test in the WARN_ON is done on the data page array.
> >Shouldn't this be sg_table->table_pages.pages[0] instead?
> 
> The table is as good as empty if there are no data pages associated with
> the table. Hence the data_pages check.

That is correct.  On the flip side you can't have data_pages without table_pages
and vice versa, hence my comment.

> 
> >
> >If not please add a comment justifying your position so that someone else
> >looking at the code does't end up thinking the same in a year from now.
> 
> I will add a comment to reflect the above.
> 
> >
> >>+
> >>+static inline void *
> >>+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.nr_pages))
> >>+		return 0;
> >>+	return sg_table->data_vaddr;
> >>+}
> >
> >I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
> >returning the virtual address of the contiguous buffer for table and data
> >respectively.  Yet there is a discrepency in the naming convention.  I would
> >have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
> >there is a little symmetry between them.
> 
> Agree. I will fix it.
> 
> Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
@ 2018-05-25 16:43         ` Mathieu Poirier
  0 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2018-05-25 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 25, 2018 at 05:07:07PM +0100, Suzuki K Poulose wrote:
> On 23/05/18 21:25, Mathieu Poirier wrote:
> >On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
> >>This patch introduces a generic sg table data structure and
> >>associated operations. An SG table can be used to map a set
> >>of Data pages where the trace data could be stored by the TMC
> >>ETR. The information about the data pages could be stored in
> >>different formats, depending on the type of the underlying
> >>SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
> >>structure provides book keeping of the pages used for the data
> >>as well as the table contents. The table should be filled by
> >>the user of the infrastructure.
> >>
> >>A table can be created by specifying the number of data pages
> >>as well as the number of table pages required to hold the
> >>pointers, where the latter could be different for different
> >>types of tables. The pages are mapped in the appropriate dma
> >>data direction mode (i.e, DMA_TO_DEVICE for table pages
> >>and DMA_FROM_DEVICE for data pages).  The framework can optionally
> >>accept a set of allocated data pages (e.g, perf ring buffer) and
> >>map them accordingly. The table and data pages are vmap'ed to allow
> >>easier access by the drivers. The framework also provides helpers to
> >>sync the data written to the pages with appropriate directions.
> >>
> >>This will be later used by the TMC ETR SG unit and CATU.
> >>
> >>Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>---
> >>Changes since v1:
> >>  - Address code style issues, more comments
> >>---
> >>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
> >>  drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
> >>  2 files changed, 340 insertions(+)
> >>
> >>diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>index 9780798..1e844f8 100644
> >>--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> >>@@ -17,9 +17,299 @@
> 
> 
> >>+static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.pages[0]))
> >>+		return 0;
> >>+	return sg_table->table_daddr;
> >>+}
> >>+
> >>+static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.pages[0]))
> >>+		return NULL;
> >>+	return sg_table->table_vaddr;
> >>+}
> >
> >The above two functions deal with DMA'able and virtual addresses for the table
> >page buffer.  Yet the test in the WARN_ON is done on the data page array.
> >Shouldn't this be sg_table->table_pages.pages[0] instead?
> 
> The table is as good as empty if there are no data pages associated with
> the table. Hence the data_pages check.

That is correct.  On the flip side you can't have data_pages without table_pages
and vice versa, hence my comment.

> 
> >
> >If not please add a comment justifying your position so that someone else
> >looking at the code does't end up thinking the same in a year from now.
> 
> I will add a comment to reflect the above.
> 
> >
> >>+
> >>+static inline void *
> >>+tmc_sg_table_data_vaddr(struct tmc_sg_table *sg_table)
> >>+{
> >>+	if (WARN_ON(!sg_table->data_pages.nr_pages))
> >>+		return 0;
> >>+	return sg_table->data_vaddr;
> >>+}
> >
> >I see that tmc_sg_table_base_vaddr() and tmc_sg_table_data_vaddr() are both
> >returning the virtual address of the contiguous buffer for table and data
> >respectively.  Yet there is a discrepency in the naming convention.  I would
> >have expected tmc_sg_table_base_vaddr() and tmc_sg_data_base_vaddr() so that
> >there is a little symmetry between them.
> 
> Agree. I will fix it.
> 
> Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/11] coresight: Add generic TMC sg table framework
  2018-05-25 16:43         ` Mathieu Poirier
@ 2018-05-25 16:54           ` Suzuki K Poulose
  -1 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-25 16:54 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, robh, sudeep.holla, frowand.list,
	coresight, mark.rutland

On 25/05/18 17:43, Mathieu Poirier wrote:
> On Fri, May 25, 2018 at 05:07:07PM +0100, Suzuki K Poulose wrote:
>> On 23/05/18 21:25, Mathieu Poirier wrote:
>>> On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
>>>> This patch introduces a generic sg table data structure and
>>>> associated operations. An SG table can be used to map a set
>>>> of Data pages where the trace data could be stored by the TMC
>>>> ETR. The information about the data pages could be stored in
>>>> different formats, depending on the type of the underlying
>>>> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
>>>> structure provides book keeping of the pages used for the data
>>>> as well as the table contents. The table should be filled by
>>>> the user of the infrastructure.
>>>>
>>>> A table can be created by specifying the number of data pages
>>>> as well as the number of table pages required to hold the
>>>> pointers, where the latter could be different for different
>>>> types of tables. The pages are mapped in the appropriate dma
>>>> data direction mode (i.e, DMA_TO_DEVICE for table pages
>>>> and DMA_FROM_DEVICE for data pages).  The framework can optionally
>>>> accept a set of allocated data pages (e.g, perf ring buffer) and
>>>> map them accordingly. The table and data pages are vmap'ed to allow
>>>> easier access by the drivers. The framework also provides helpers to
>>>> sync the data written to the pages with appropriate directions.
>>>>
>>>> This will be later used by the TMC ETR SG unit and CATU.
>>>>
>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>> Changes since v1:
>>>>   - Address code style issues, more comments
>>>> ---
>>>>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>>>>   drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>>>>   2 files changed, 340 insertions(+)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> index 9780798..1e844f8 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> @@ -17,9 +17,299 @@
>>
>>
>>>> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
>>>> +{
>>>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>>>> +		return 0;
>>>> +	return sg_table->table_daddr;
>>>> +}
>>>> +
>>>> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
>>>> +{
>>>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>>>> +		return NULL;
>>>> +	return sg_table->table_vaddr;
>>>> +}
>>>
>>> The above two functions deal with DMA'able and virtual addresses for the table
>>> page buffer.  Yet the test in the WARN_ON is done on the data page array.
>>> Shouldn't this be sg_table->table_pages.pages[0] instead?
>>
>> The table is as good as empty if there are no data pages associated with
>> the table. Hence the data_pages check.
> 
> That is correct.  On the flip side you can't have data_pages without table_pages
> and vice versa, hence my comment.

Agree. On a second thought, those helpers are not used anywhere now. Also,
we only use the base addresses just after creation of the table and
we have necessary guards to make sure the table is actually created.
I suspect this was a left over from my original code. I am tempted to
rather remove them.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/11] coresight: Add generic TMC sg table framework
@ 2018-05-25 16:54           ` Suzuki K Poulose
  0 siblings, 0 replies; 38+ messages in thread
From: Suzuki K Poulose @ 2018-05-25 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/05/18 17:43, Mathieu Poirier wrote:
> On Fri, May 25, 2018 at 05:07:07PM +0100, Suzuki K Poulose wrote:
>> On 23/05/18 21:25, Mathieu Poirier wrote:
>>> On Fri, May 18, 2018 at 05:39:24PM +0100, Suzuki K Poulose wrote:
>>>> This patch introduces a generic sg table data structure and
>>>> associated operations. An SG table can be used to map a set
>>>> of Data pages where the trace data could be stored by the TMC
>>>> ETR. The information about the data pages could be stored in
>>>> different formats, depending on the type of the underlying
>>>> SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic
>>>> structure provides book keeping of the pages used for the data
>>>> as well as the table contents. The table should be filled by
>>>> the user of the infrastructure.
>>>>
>>>> A table can be created by specifying the number of data pages
>>>> as well as the number of table pages required to hold the
>>>> pointers, where the latter could be different for different
>>>> types of tables. The pages are mapped in the appropriate dma
>>>> data direction mode (i.e, DMA_TO_DEVICE for table pages
>>>> and DMA_FROM_DEVICE for data pages).  The framework can optionally
>>>> accept a set of allocated data pages (e.g, perf ring buffer) and
>>>> map them accordingly. The table and data pages are vmap'ed to allow
>>>> easier access by the drivers. The framework also provides helpers to
>>>> sync the data written to the pages with appropriate directions.
>>>>
>>>> This will be later used by the TMC ETR SG unit and CATU.
>>>>
>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>> Changes since v1:
>>>>   - Address code style issues, more comments
>>>> ---
>>>>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 290 ++++++++++++++++++++++++
>>>>   drivers/hwtracing/coresight/coresight-tmc.h     |  50 ++++
>>>>   2 files changed, 340 insertions(+)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> index 9780798..1e844f8 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
>>>> @@ -17,9 +17,299 @@
>>
>>
>>>> +static inline dma_addr_t tmc_sg_table_base_paddr(struct tmc_sg_table *sg_table)
>>>> +{
>>>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>>>> +		return 0;
>>>> +	return sg_table->table_daddr;
>>>> +}
>>>> +
>>>> +static inline void *tmc_sg_table_base_vaddr(struct tmc_sg_table *sg_table)
>>>> +{
>>>> +	if (WARN_ON(!sg_table->data_pages.pages[0]))
>>>> +		return NULL;
>>>> +	return sg_table->table_vaddr;
>>>> +}
>>>
>>> The above two functions deal with DMA'able and virtual addresses for the table
>>> page buffer.  Yet the test in the WARN_ON is done on the data page array.
>>> Shouldn't this be sg_table->table_pages.pages[0] instead?
>>
>> The table is as good as empty if there are no data pages associated with
>> the table. Hence the data_pages check.
> 
> That is correct.  On the flip side you can't have data_pages without table_pages
> and vice versa, hence my comment.

Agree. On a second thought, those helpers are not used anywhere now. Also,
we only use the base addresses just after creation of the table and
we have necessary guards to make sure the table is actually created.
I suspect this was a left over from my original code. I am tempted to
rather remove them.

Suzuki

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-05-25 16:54 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 16:39 [PATCH 00/11] coresight: tmc-etr Transparent buffer management Suzuki K Poulose
2018-05-18 16:39 ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 01/11] coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35 Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 02/11] coresight: tmc: Hide trace buffer handling for file read Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 03/11] coresight: tmc-etr: Do not clean trace buffer Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 04/11] coresight: tmc-etr: Disallow perf mode Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 05/11] coresight: Add helper for inserting synchronization packets Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 06/11] dts: bindings: Restrict coresight tmc-etr scatter-gather mode Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-23 18:18   ` Rob Herring
2018-05-23 18:18     ` Rob Herring
2018-05-18 16:39 ` [PATCH 07/11] dts: juno: Add scatter-gather support for all revisions Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-23 17:39   ` Mathieu Poirier
2018-05-23 17:39     ` Mathieu Poirier
2018-05-18 16:39 ` [PATCH 08/11] coresight: Add generic TMC sg table framework Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-23 20:25   ` Mathieu Poirier
2018-05-23 20:25     ` Mathieu Poirier
2018-05-25 16:07     ` Suzuki K Poulose
2018-05-25 16:07       ` Suzuki K Poulose
2018-05-25 16:43       ` Mathieu Poirier
2018-05-25 16:43         ` Mathieu Poirier
2018-05-25 16:54         ` Suzuki K Poulose
2018-05-25 16:54           ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 09/11] coresight: Add support for TMC ETR SG unit Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-18 16:39 ` [PATCH 10/11] coresight: tmc-etr: Add transparent buffer management Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose
2018-05-24 19:56   ` Mathieu Poirier
2018-05-24 19:56     ` Mathieu Poirier
2018-05-18 16:39 ` [PATCH 11/11] coresight: tmc: Add configuration support for trace buffer size Suzuki K Poulose
2018-05-18 16:39   ` Suzuki K Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.