linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
@ 2021-09-21 13:41 Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
                   ` (18 more replies)
  0 siblings, 19 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

This series adds CPU erratum work arounds related to the self-hosted
tracing. The list of affected errata handled in this series are :

 * TRBE may overwrite trace in FILL mode
   - Arm Neoverse-N2	#2139208
   - Cortex-A710	#211985

 * A TSB instruction may not flush the trace completely when executed
   in trace prohibited region.

   - Arm Neoverse-N2	#2067961
   - Cortex-A710	#2054223

 * TRBE may write to out-of-range address
   - Arm Neoverse-N2	#2253138
   - Cortex-A710	#2224489

The series applies on the self-hosted/trbe fixes posted here [0].
A tree containing both the series is available here [1]

 [0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com
 [1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2

Changes since v1:
 https://lkml.kernel.org/r/20210728135217.591173-1-suzuki.poulose@arm.com
 - Added a fix to the TRBE driver handling of sink_specific data
 - Added more description and ASCII art for overwrite in FILL mode
   work around 
 - Added another TRBE erratum to the list.
  "TRBE may write to out-of-range address"
  Patches from 12-17
 - Added comment to list the expectations around TSB erratum workaround.


Suzuki K Poulose (17):
  coresight: trbe: Fix incorrect access of the sink specific data
  coresight: trbe: Add infrastructure for Errata handling
  coresight: trbe: Add a helper to calculate the trace generated
  coresight: trbe: Add a helper to pad a given buffer area
  coresight: trbe: Decouple buffer base from the hardware base
  coresight: trbe: Allow driver to choose a different alignment
  arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
  arm64: Add erratum detection for TRBE overwrite in FILL mode
  coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  arm64: Enable workaround for TRBE overwrite in FILL mode
  arm64: errata: Add workaround for TSB flush failures
  coresight: trbe: Add a helper to fetch cpudata from perf handle
  coresight: trbe: Add a helper to determine the minimum buffer size
  coresight: trbe: Make sure we have enough space
  arm64: Add erratum detection for TRBE write to out-of-range
  coresight: trbe: Work around write to out of range
  arm64: Advertise TRBE erratum workaround for write to out-of-range address

 Documentation/arm64/silicon-errata.rst       |  12 +
 arch/arm64/Kconfig                           | 109 ++++++
 arch/arm64/include/asm/barrier.h             |  16 +-
 arch/arm64/include/asm/cputype.h             |   4 +
 arch/arm64/kernel/cpu_errata.c               |  64 ++++
 arch/arm64/tools/cpucaps                     |   3 +
 drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++--
 7 files changed, 510 insertions(+), 37 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  5:41   ` Anshuman Khandual
  2021-09-30 17:57   ` Mathieu Poirier
  2021-09-21 13:41 ` [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling Suzuki K Poulose
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

The TRBE driver wrongly treats the aux private data as the TRBE driver
specific buffer for a given perf handle, while it is the ETM PMU's
event specific data. Fix this by correcting the instance to use
appropriate helper.

Fixes: 3fbf7f011f242 ("coresight: sink: Add TRBE driver")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index d4c57aed05e5..e3d73751d568 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -363,7 +363,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
 
 static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
 {
-	struct trbe_buf *buf = perf_get_aux(handle);
+	struct trbe_buf *buf = etm_perf_sink_config(handle);
 	u64 limit = __trbe_normal_offset(handle);
 	u64 head = PERF_IDX2OFF(handle->head, buf);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  6:47   ` Anshuman Khandual
  2021-10-05 16:46   ` Mathieu Poirier
  2021-09-21 13:41 ` [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated Suzuki K Poulose
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Add a minimal infrastructure to keep track of the errata
affecting the given TRBE instance. Given that we have
heterogeneous CPUs, we have to manage the list per-TRBE
instance to be able to apply the work around as needed.

We rely on the arm64 errata framework for the actual
description and the discovery of a given erratum, to
keep the Erratum work around at a central place and
benefit from the code and the advertisement from the
kernel. We use a local mapping of the erratum to
avoid bloating up the individual TRBE structures.
i.e, each arm64 TRBE erratum bit is assigned a new number
within the driver to track. Each trbe instance updates
the list of affected erratum at probe time on the CPU.
This makes sure that we can easily access the list of
errata on a given TRBE instance without much overhead.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
  - Flip the order of args for trbe_has_erratum()
  - Move erratum detection further down in the sequence
---
 drivers/hwtracing/coresight/coresight-trbe.c | 49 ++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index e3d73751d568..63f7edd5fd1f 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -16,6 +16,8 @@
 #define pr_fmt(fmt) DRVNAME ": " fmt
 
 #include <asm/barrier.h>
+#include <asm/cputype.h>
+
 #include "coresight-self-hosted-trace.h"
 #include "coresight-trbe.h"
 
@@ -65,6 +67,35 @@ struct trbe_buf {
 	struct trbe_cpudata *cpudata;
 };
 
+/*
+ * TRBE erratum list
+ *
+ * We rely on the corresponding cpucaps to be defined for a given
+ * TRBE erratum. We map the given cpucap into a TRBE internal number
+ * to make the tracking of the errata lean.
+ *
+ * This helps in :
+ *   - Not duplicating the detection logic
+ *   - Streamlined detection of erratum across the system
+ *
+ * Since the erratum work arounds could be applied individually
+ * per TRBE instance, we keep track of the list of errata that
+ * affects the given instance of the TRBE.
+ */
+#define TRBE_ERRATA_MAX			0
+
+static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
+};
+
+/*
+ * struct trbe_cpudata: TRBE instance specific data
+ * @trbe_flag		- TRBE dirty/access flag support
+ * @tbre_align		- Actual TRBE alignment required for TRBPTR_EL1.
+ * @cpu			- CPU this TRBE belongs to.
+ * @mode		- Mode of current operation. (perf/disabled)
+ * @drvdata		- TRBE specific drvdata
+ * @errata		- Bit map for the errata on this TRBE.
+ */
 struct trbe_cpudata {
 	bool trbe_flag;
 	u64 trbe_align;
@@ -72,6 +103,7 @@ struct trbe_cpudata {
 	enum cs_mode mode;
 	struct trbe_buf *buf;
 	struct trbe_drvdata *drvdata;
+	DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);
 };
 
 struct trbe_drvdata {
@@ -84,6 +116,21 @@ struct trbe_drvdata {
 	struct platform_device *pdev;
 };
 
+static void trbe_check_errata(struct trbe_cpudata *cpudata)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
+		if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
+			set_bit(i, cpudata->errata);
+	}
+}
+
+static inline bool trbe_has_erratum(struct trbe_cpudata *cpudata, int i)
+{
+	return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);
+}
+
 static int trbe_alloc_node(struct perf_event *event)
 {
 	if (event->cpu == -1)
@@ -926,6 +973,8 @@ static void arm_trbe_probe_cpu(void *info)
 		pr_err("Unsupported alignment on cpu %d\n", cpu);
 		goto cpu_clear;
 	}
+
+	trbe_check_errata(cpudata);
 	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
 	cpudata->cpu = cpu;
 	cpudata->drvdata = drvdata;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-30 17:54   ` Mathieu Poirier
  2021-09-21 13:41 ` [PATCH v2 04/17] coresight: trbe: Add a helper to pad a given buffer area Suzuki K Poulose
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

We collect the trace from the TRBE on FILL event from IRQ context
and when via update_buffer(), when the event is stopped. Let us
consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 63f7edd5fd1f..063c4505a203 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
 	return TRBE_FAULT_ACT_SPURIOUS;
 }
 
+static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
+					 struct trbe_buf *buf,
+					 bool wrap)
+{
+	u64 write;
+	u64 start_off, end_off;
+
+	/*
+	 * If the TRBE has wrapped around the write pointer has
+	 * wrapped and should be treated as limit.
+	 */
+	if (wrap)
+		write = get_trbe_limit_pointer();
+	else
+		write = get_trbe_write_pointer();
+
+	end_off = write - buf->trbe_base;
+	start_off = PERF_IDX2OFF(handle->head, buf);
+
+	if (WARN_ON_ONCE(end_off < start_off))
+		return 0;
+	return (end_off - start_off);
+}
+
 static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
 				   struct perf_event *event, void **pages,
 				   int nr_pages, bool snapshot)
@@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
 	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
 	struct trbe_buf *buf = config;
 	enum trbe_fault_action act;
-	unsigned long size, offset;
-	unsigned long write, base, status;
+	unsigned long size, status;
 	unsigned long flags;
+	bool wrap = false;
 
 	WARN_ON(buf->cpudata != cpudata);
 	WARN_ON(cpudata->cpu != smp_processor_id());
@@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
 	 * handle gets freed in etm_event_stop().
 	 */
 	trbe_drain_and_disable_local();
-	write = get_trbe_write_pointer();
-	base = get_trbe_base_pointer();
 
 	/* Check if there is a pending interrupt and handle it here */
 	status = read_sysreg_s(SYS_TRBSR_EL1);
@@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
 			goto done;
 		}
 
-		/*
-		 * Otherwise, the buffer is full and the write pointer
-		 * has reached base. Adjust this back to the Limit pointer
-		 * for correct size. Also, mark the buffer truncated.
-		 */
-		write = get_trbe_limit_pointer();
 		perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION);
+		wrap = true;
 	}
 
-	offset = write - base;
-	if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf)))
-		size = 0;
-	else
-		size = offset - PERF_IDX2OFF(handle->head, buf);
+	size = trbe_get_trace_size(handle, buf, wrap);
 
 done:
 	local_irq_restore(flags);
@@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle)
 {
 	struct perf_event *event = handle->event;
 	struct trbe_buf *buf = etm_perf_sink_config(handle);
-	unsigned long offset, size;
+	unsigned long size;
 	struct etm_event_data *event_data;
 
-	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
-	size = offset - PERF_IDX2OFF(handle->head, buf);
+	size = trbe_get_trace_size(handle, buf, true);
 	if (buf->snapshot)
 		handle->head += size;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 04/17] coresight: trbe: Add a helper to pad a given buffer area
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 05/17] coresight: trbe: Decouple buffer base from the hardware base Suzuki K Poulose
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Refactor the helper to pad a given AUX buffer area to allow
"filling" ignore packets, without moving any handle pointers.
This will be useful in working around errata, where we may
have to fill the buffer after a session.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 063c4505a203..a32ef083aa36 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -227,12 +227,18 @@ static void trbe_stop_and_truncate_event(struct perf_output_handle *handle)
  * consumed from the user space. The enabled TRBE buffer area is a moving subset of
  * the allocated perf auxiliary buffer.
  */
+
+static void __trbe_pad_buf(struct trbe_buf *buf, u64 offset, int len)
+{
+	memset((void *)buf->trbe_base + offset, ETE_IGNORE_PACKET, len);
+}
+
 static void trbe_pad_buf(struct perf_output_handle *handle, int len)
 {
 	struct trbe_buf *buf = etm_perf_sink_config(handle);
 	u64 head = PERF_IDX2OFF(handle->head, buf);
 
-	memset((void *)buf->trbe_base + head, ETE_IGNORE_PACKET, len);
+	__trbe_pad_buf(buf, head, len);
 	if (!buf->snapshot)
 		perf_aux_output_skip(handle, len);
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 05/17] coresight: trbe: Decouple buffer base from the hardware base
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 04/17] coresight: trbe: Add a helper to pad a given buffer area Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 06/17] coresight: trbe: Allow driver to choose a different alignment Suzuki K Poulose
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

We always set the TRBBASER_EL1 to the base of the virtual ring
buffer. We are about to change this for working around an erratum.
So, in preparation to that, allow the driver to choose a different
base for the TRBBASER_EL1 (which is within the buffer range).

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index a32ef083aa36..27616eac24ba 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -59,6 +59,8 @@ struct trbe_buf {
 	 * trbe_limit sibling pointers.
 	 */
 	unsigned long trbe_base;
+	/* The base programmed into the TRBE */
+	unsigned long trbe_hw_base;
 	unsigned long trbe_limit;
 	unsigned long trbe_write;
 	int nr_pages;
@@ -498,12 +500,13 @@ static void set_trbe_limit_pointer_enabled(unsigned long addr)
 
 static void trbe_enable_hw(struct trbe_buf *buf)
 {
-	WARN_ON(buf->trbe_write < buf->trbe_base);
+	WARN_ON(buf->trbe_hw_base < buf->trbe_base);
+	WARN_ON(buf->trbe_write < buf->trbe_hw_base);
 	WARN_ON(buf->trbe_write >= buf->trbe_limit);
 	set_trbe_disabled();
 	isb();
 	clr_trbe_status();
-	set_trbe_base_pointer(buf->trbe_base);
+	set_trbe_base_pointer(buf->trbe_hw_base);
 	set_trbe_write_pointer(buf->trbe_write);
 
 	/*
@@ -707,6 +710,8 @@ static int __arm_trbe_enable(struct trbe_buf *buf,
 		trbe_stop_and_truncate_event(handle);
 		return -ENOSPC;
 	}
+	/* Set the base of the TRBE to the buffer base */
+	buf->trbe_hw_base = buf->trbe_base;
 	*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
 	trbe_enable_hw(buf);
 	return 0;
@@ -804,7 +809,7 @@ static bool is_perf_trbe(struct perf_output_handle *handle)
 	struct trbe_drvdata *drvdata = cpudata->drvdata;
 	int cpu = smp_processor_id();
 
-	WARN_ON(buf->trbe_base != get_trbe_base_pointer());
+	WARN_ON(buf->trbe_hw_base != get_trbe_base_pointer());
 	WARN_ON(buf->trbe_limit != get_trbe_limit_pointer());
 
 	if (cpudata->mode != CS_MODE_PERF)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 06/17] coresight: trbe: Allow driver to choose a different alignment
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 05/17] coresight: trbe: Decouple buffer base from the hardware base Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition Suzuki K Poulose
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

The TRBE hardware mandates a minimum alignment for the TRBPTR_EL1,
advertised via the TRBIDR_EL1. This is used by the driver to
align the buffer write head. This patch allows the driver to
choose a different alignment from that of the hardware, by
decoupling the alignment tracking. This will be useful for
working around errata.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 27616eac24ba..f569010c672b 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -92,7 +92,8 @@ static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
 /*
  * struct trbe_cpudata: TRBE instance specific data
  * @trbe_flag		- TRBE dirty/access flag support
- * @tbre_align		- Actual TRBE alignment required for TRBPTR_EL1.
+ * @trbe_hw_align	- Actual TRBE alignment required for TRBPTR_EL1.
+ * @trbe_align		- Software alignment used for the TRBPTR_EL1,
  * @cpu			- CPU this TRBE belongs to.
  * @mode		- Mode of current operation. (perf/disabled)
  * @drvdata		- TRBE specific drvdata
@@ -100,6 +101,7 @@ static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
  */
 struct trbe_cpudata {
 	bool trbe_flag;
+	u64 trbe_hw_align;
 	u64 trbe_align;
 	int cpu;
 	enum cs_mode mode;
@@ -903,7 +905,7 @@ static ssize_t align_show(struct device *dev, struct device_attribute *attr, cha
 {
 	struct trbe_cpudata *cpudata = dev_get_drvdata(dev);
 
-	return sprintf(buf, "%llx\n", cpudata->trbe_align);
+	return sprintf(buf, "%llx\n", cpudata->trbe_hw_align);
 }
 static DEVICE_ATTR_RO(align);
 
@@ -991,13 +993,14 @@ static void arm_trbe_probe_cpu(void *info)
 		goto cpu_clear;
 	}
 
-	cpudata->trbe_align = 1ULL << get_trbe_address_align(trbidr);
-	if (cpudata->trbe_align > SZ_2K) {
+	cpudata->trbe_hw_align = 1ULL << get_trbe_address_align(trbidr);
+	if (cpudata->trbe_hw_align > SZ_2K) {
 		pr_err("Unsupported alignment on cpu %d\n", cpu);
 		goto cpu_clear;
 	}
 
 	trbe_check_errata(cpudata);
+	cpudata->trbe_align = cpudata->trbe_hw_align;
 	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
 	cpudata->cpu = cpu;
 	cpudata->drvdata = drvdata;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 06/17] coresight: trbe: Allow driver to choose a different alignment Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  6:57   ` Anshuman Khandual
  2021-09-21 13:41 ` [PATCH v2 08/17] arm64: Add erratum detection for TRBE overwrite in FILL mode Suzuki K Poulose
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cputype.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 6231e1f0abe7..19b8441aa8f2 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -73,6 +73,8 @@
 #define ARM_CPU_PART_CORTEX_A76		0xD0B
 #define ARM_CPU_PART_NEOVERSE_N1	0xD0C
 #define ARM_CPU_PART_CORTEX_A77		0xD0D
+#define ARM_CPU_PART_CORTEX_A710	0xD47
+#define ARM_CPU_PART_NEOVERSE_N2	0xD49
 
 #define APM_CPU_PART_POTENZA		0x000
 
@@ -113,6 +115,8 @@
 #define MIDR_CORTEX_A76	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
 #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
 #define MIDR_CORTEX_A77	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
+#define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
+#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
 #define MIDR_THUNDERX	MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 08/17] arm64: Add erratum detection for TRBE overwrite in FILL mode
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (6 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-21 13:41 ` [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata " Suzuki K Poulose
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Arm Neoverse-N2 and the Cortex-A710 cores are affected
by a CPU erratum where the TRBE will overwrite the trace buffer
in FILL mode. The TRBE doesn't stop (as expected in FILL mode)
when it reaches the limit and wraps to the base to continue
writing upto 3 cache lines. This will overwrite any trace that
was written previously.

Add the Neoverse-N2 erratumi(#2139208) and Cortex-A710 erratum
(#2119858) to the  detection logic.

This will be used by the TRBE driver in later patches to work
around the issue. The detection has been kept with the core
arm64 errata framework list to make sure :
  - We don't duplicate the framework in TRBE driver
  - The errata detection is advertised like the rest
    of the CPU errata.

Note that the Kconfig entries will be added after we have added
the work around in the TRBE driver, which depends on the cpucap
from here.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
cc: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpu_errata.c | 25 +++++++++++++++++++++++++
 arch/arm64/tools/cpucaps       |  1 +
 2 files changed, 26 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index e2c20c036442..ccd757373f36 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -340,6 +340,18 @@ static const struct midr_range erratum_1463225[] = {
 };
 #endif
 
+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2139208
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2119858
+	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+	{},
+};
+#endif	/* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
+
 const struct arm64_cpu_capabilities arm64_errata[] = {
 #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
 	{
@@ -533,6 +545,19 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.capability = ARM64_WORKAROUND_NVIDIA_CARMEL_CNP,
 		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
 	},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+	{
+		/*
+		 * The erratum work around is handled within the TRBE
+		 * driver and can be applied per-cpu. So, we can allow
+		 * a late CPU to come online with this erratum.
+		 */
+		.desc = "ARM erratum 2119858 or 2139208",
+		.capability = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
+		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+		CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 49305c2e6dfd..1ccb92165bd8 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -53,6 +53,7 @@ WORKAROUND_1418040
 WORKAROUND_1463225
 WORKAROUND_1508412
 WORKAROUND_1542419
+WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 WORKAROUND_CAVIUM_23154
 WORKAROUND_CAVIUM_27456
 WORKAROUND_CAVIUM_30115
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (7 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 08/17] arm64: Add erratum detection for TRBE overwrite in FILL mode Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-23  6:13   ` Anshuman Khandual
  2021-10-01 17:15   ` Mathieu Poirier
  2021-09-21 13:41 ` [PATCH v2 10/17] arm64: Enable workaround for TRBE " Suzuki K Poulose
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
an erratum, which when triggered, might cause the TRBE to overwrite
the trace data already collected in FILL mode, in the event of a WRAP.
i.e, the TRBE doesn't stop writing the data, instead wraps to the base
and could write upto 3 cache line size worth trace. Thus, this could
corrupt the trace at the "BASE" pointer.

The workaround is to program the write pointer 256bytes from the
base, such that if the erratum is triggered, it doesn't overwrite
the trace data that was captured. This skipped region could be
padded with ignore packets at the end of the session, so that
the decoder sees a continuous buffer with some padding at the
beginning. The trace data written at the base is considered
lost as the limit could have been in the middle of the perf
ring buffer, and jumping to the "base" is not acceptable.
We set the flags already to indicate that some amount of trace
was lost during the FILL event IRQ. So this is fine.

One important change with the work around is, we program the
TRBBASER_EL1 to current page where we are allowed to write.
Otherwise, it could overwrite a region that may be consumed
by the perf. Towards this, we always make sure that the
"handle->head" and thus the trbe_write is PAGE_SIZE aligned,
so that we can set the BASE to the PAGE base and move the
TRBPTR to the 256bytes offset.

Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Change since v1:
 - Updated comment with ASCII art
 - Add _BYTES suffix for the space to skip for the work around.
---
 drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
 1 file changed, 132 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index f569010c672b..983dd5039e52 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -16,6 +16,7 @@
 #define pr_fmt(fmt) DRVNAME ": " fmt
 
 #include <asm/barrier.h>
+#include <asm/cpufeature.h>
 #include <asm/cputype.h>
 
 #include "coresight-self-hosted-trace.h"
@@ -84,9 +85,17 @@ struct trbe_buf {
  * per TRBE instance, we keep track of the list of errata that
  * affects the given instance of the TRBE.
  */
-#define TRBE_ERRATA_MAX			0
+#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
+#define TRBE_ERRATA_MAX				1
+
+/*
+ * Safe limit for the number of bytes that may be overwritten
+ * when the erratum is triggered.
+ */
+#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256
 
 static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
+	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
 };
 
 /*
@@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
 	set_trbe_limit_pointer_enabled(buf->trbe_limit);
 }
 
-static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
+static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
+						 u64 trbsr)
 {
 	int ec = get_trbe_ec(trbsr);
 	int bsc = get_trbe_bsc(trbsr);
+	struct trbe_buf *buf = etm_perf_sink_config(handle);
+	struct trbe_cpudata *cpudata = buf->cpudata;
 
 	WARN_ON(is_trbe_running(trbsr));
 	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
@@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
 	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
 		return TRBE_FAULT_ACT_FATAL;
 
-	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
-		if (get_trbe_write_pointer() == get_trbe_base_pointer())
-			return TRBE_FAULT_ACT_WRAP;
-	}
+	/*
+	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
+	 * it might write data after a WRAP event in the fill mode.
+	 * Thus the check TRBPTR == TRBBASER will not be honored.
+	 */
+	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
+	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
+	     get_trbe_write_pointer() == get_trbe_base_pointer()))
+		return TRBE_FAULT_ACT_WRAP;
+
 	return TRBE_FAULT_ACT_SPURIOUS;
 }
 
@@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
 {
 	u64 write;
 	u64 start_off, end_off;
+	u64 size;
+	u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
 
 	/*
 	 * If the TRBE has wrapped around the write pointer has
@@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
 
 	if (WARN_ON_ONCE(end_off < start_off))
 		return 0;
-	return (end_off - start_off);
+
+	size = end_off - start_off;
+	/*
+	 * If the TRBE is affected by the following erratum, we must fill
+	 * the space we skipped with IGNORE packets. And we are always
+	 * guaranteed to have at least a PAGE_SIZE space in the buffer.
+	 */
+	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
+	    !WARN_ON(size < overwrite_skip))
+		__trbe_pad_buf(buf, start_off, overwrite_skip);
+
+	return size;
 }
 
 static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
@@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
 		clr_trbe_irq();
 		isb();
 
-		act = trbe_get_fault_act(status);
+		act = trbe_get_fault_act(handle, status);
 		/*
 		 * If this was not due to a WRAP event, we have some
 		 * errors and as such buffer is empty.
@@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
 	return size;
 }
 
+
+static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
+{
+	/*
+	 * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
+	 * line size from the "TRBBASER_EL1" in the event of a "FILL".
+	 * Thus, we could loose some amount of the trace at the base.
+	 *
+	 * Before Fix:
+	 *
+	 *  normal-BASE     head  normal-PTR              tail normal-LIMIT
+	 *  |                   \/                       /
+	 *   -------------------------------------------------------------
+	 *  |         |          |xyzdefghij..|...  tuvw|                |
+	 *   -------------------------------------------------------------
+	 *                      /    |                   \
+	 * After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
+	 *
+	 * In the normal course of action, we would set the TRBBASER to the
+	 * beginning of the ring-buffer (normal-BASE). But with the erratum,
+	 * the TRBE could overwrite the contents at the "normal-BASE", after
+	 * hitting the "normal-LIMIT", since it doesn't stop as expected. And
+	 * this is wrong. So we must always make sure that the TRBBASER is
+	 * within the region [head, head+size].
+	 *
+	 * Also, we would set the TRBPTR to head (after adjusting for
+	 * alignment) at normal-PTR. This would mean that the last few bytes
+	 * of the trace (say, "xyz") might overwrite the first few bytes of
+	 * trace written ("abc"). More importantly they will appear in what\
+	 * userspace sees as the beginning of the trace, which is wrong. We may
+	 * not always have space to move the latest trace "xyz" to the correct
+	 * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
+	 * Thus it is easier to ignore those bytes than to complicate the
+	 * driver to move it, assuming that the erratum was triggered and doing
+	 * additional checks to see if there is indeed allowed space at
+	 * TRBLIMITR.LIMIT.
+	 *
+	 * To summarize, with the work around:
+	 *
+	 *  - We always align the offset for the next session to PAGE_SIZE
+	 *    (This is to ensure we can program the TRBBASER to this offset
+	 *    within the region [head...head+size]).
+	 *
+	 *  - At TRBE enable:
+	 *     - Set the TRBBASER to the page aligned offset of the current
+	 *       proposed write offset. (which is guaranteed to be aligned
+	 *       as above)
+	 *     - Move the TRBPTR to skip first 256bytes (that might be
+	 *       overwritten with the erratum). This ensures that the trace
+	 *       generated in the session is not re-written.
+	 *
+	 *  - At trace collection:
+	 *     - Pad the 256bytes skipped above again with IGNORE packets.
+	 */
+	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
+		if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
+			return -EINVAL;
+		buf->trbe_hw_base = buf->trbe_write;
+		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
+	}
+
+	return 0;
+}
+
 static int __arm_trbe_enable(struct trbe_buf *buf,
 			     struct perf_output_handle *handle)
 {
+	int ret = 0;
+
 	perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
 	buf->trbe_limit = compute_trbe_buffer_limit(handle);
 	buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
 	if (buf->trbe_limit == buf->trbe_base) {
-		trbe_stop_and_truncate_event(handle);
-		return -ENOSPC;
+		ret = -ENOSPC;
+		goto err;
 	}
 	/* Set the base of the TRBE to the buffer base */
 	buf->trbe_hw_base = buf->trbe_base;
+
+	ret = trbe_apply_work_around_before_enable(buf);
+	if (ret)
+		goto err;
+
 	*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
 	trbe_enable_hw(buf);
 	return 0;
+err:
+	trbe_stop_and_truncate_event(handle);
+	return ret;
 }
 
 static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
@@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
 	if (!is_perf_trbe(handle))
 		return IRQ_NONE;
 
-	act = trbe_get_fault_act(status);
+	act = trbe_get_fault_act(handle, status);
 	switch (act) {
 	case TRBE_FAULT_ACT_WRAP:
 		truncated = !!trbe_handle_overflow(handle);
@@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
 	}
 
 	trbe_check_errata(cpudata);
-	cpudata->trbe_align = cpudata->trbe_hw_align;
+	/*
+	 * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
+	 * we must always program the TBRPTR_EL1, 256bytes from a page
+	 * boundary, with TRBBASER_EL1 set to the page, to prevent
+	 * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
+	 *
+	 * Thus make sure we always align our write pointer to a PAGE_SIZE,
+	 * which also guarantees that we have at least a PAGE_SIZE space in
+	 * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
+	 * the required bytes at the base.
+	 */
+	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
+		cpudata->trbe_align = PAGE_SIZE;
+	else
+		cpudata->trbe_align = cpudata->trbe_hw_align;
+
 	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
 	cpudata->cpu = cpu;
 	cpudata->drvdata = drvdata;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (8 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata " Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  7:23   ` Anshuman Khandual
  2021-10-07 16:09   ` Catalin Marinas
  2021-09-21 13:41 ` [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures Suzuki K Poulose
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Now that we have the work around implmented in the TRBE
driver, add the Kconfig entries and document the errata.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/arm64/silicon-errata.rst |  4 +++
 arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index d410a47ffa57..2f99229d993c 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -92,12 +92,16 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1349291        | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1542419        | ARM64_ERRATUM_1542419       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 077f2ec4eeb2..eac4030322df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
 
 	  If unsure, say Y.
 
+config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+	bool
+
+config ARM64_ERRATUM_2119858
+	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
+	default y
+	depends on CORESIGHT_TRBE
+	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+	help
+	  This option adds the workaround for ARM Cortex-A710 erratum 2119858.
+
+	  Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
+	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
+	  the event of a WRAP event.
+
+	  Work around the issue by always making sure we move the TRBPTR_EL1 by
+	  256bytes before enabling the buffer and filling the first 256bytes of
+	  the buffer with ETM ignore packets upon disabling.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_2139208
+	bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
+	default y
+	depends on CORESIGHT_TRBE
+	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+	help
+	  This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
+
+	  Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
+	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
+	  the event of a WRAP event.
+
+	  Work around the issue by always making sure we move the TRBPTR_EL1 by
+	  256bytes before enabling the buffer and filling the first 256bytes of
+	  the buffer with ETM ignore packets upon disabling.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (9 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 10/17] arm64: Enable workaround for TRBE " Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  7:39   ` Anshuman Khandual
  2021-10-07 16:10   ` Catalin Marinas
  2021-09-21 13:41 ` [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle Suzuki K Poulose
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround everytime a TSB is executed.

The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 - Switch to cpus_have_final_cap()
 - Document the requirements on TSB.
---
 Documentation/arm64/silicon-errata.rst |  4 ++++
 arch/arm64/Kconfig                     | 31 ++++++++++++++++++++++++++
 arch/arm64/include/asm/barrier.h       | 16 ++++++++++++-
 arch/arm64/kernel/cpu_errata.c         | 19 ++++++++++++++++
 arch/arm64/tools/cpucaps               |  1 +
 5 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 2f99229d993c..569a92411dcd 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -94,6 +94,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1349291        | N/A                         |
@@ -102,6 +104,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N2     | #2067961        | ARM64_ERRATUM_2067961       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eac4030322df..0764774e12bb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
 
 	  If unsure, say Y.
 
+config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+	bool
+
+config ARM64_ERRATUM_2054223
+	bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
+	default y
+	help
+	  Enable workaround for ARM Cortex-A710 erratum 2054223
+
+	  Affected cores may fail to flush the trace data on a TSB instruction, when
+	  the PE is in trace prohibited state. This will cause losing a few bytes
+	  of the trace cached.
+
+	  Workaround is to issue two TSB consecutively on affected cores.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_2067961
+	bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
+	default y
+	help
+	  Enable workaround for ARM Neoverse-N2 erratum 2067961
+
+	  Affected cores may fail to flush the trace data on a TSB instruction, when
+	  the PE is in trace prohibited state. This will cause losing a few bytes
+	  of the trace cached.
+
+	  Workaround is to issue two TSB consecutively on affected cores.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..1c5a00598458 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -23,7 +23,7 @@
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
 
 #define psb_csync()	asm volatile("hint #17" : : : "memory")
-#define tsb_csync()	asm volatile("hint #18" : : : "memory")
+#define __tsb_csync()	asm volatile("hint #18" : : : "memory")
 #define csdb()		asm volatile("hint #20" : : : "memory")
 
 #ifdef CONFIG_ARM64_PSEUDO_NMI
@@ -46,6 +46,20 @@
 #define dma_rmb()	dmb(oshld)
 #define dma_wmb()	dmb(oshst)
 
+
+#define tsb_csync()								\
+	do {									\
+		/*								\
+		 * CPUs affected by Arm Erratum 2054223 or 2067961 needs	\
+		 * another TSB to ensure the trace is flushed. The barriers	\
+		 * don't have to be strictly back to back, as long as the	\
+		 * CPU is in trace prohibited state.				\
+		 */								\
+		if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE))	\
+			__tsb_csync();						\
+		__tsb_csync();							\
+	} while (0)
+
 /*
  * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
  * and 0 otherwise.
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ccd757373f36..bdbeac75ead6 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
 };
 #endif	/* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
 
+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+static const struct midr_range tsb_flush_fail_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2067961
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2054223
+	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+	{},
+};
+#endif	/* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
+
 const struct arm64_cpu_capabilities arm64_errata[] = {
 #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
 	{
@@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
 		CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
 	},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
+	{
+		.desc = "ARM erratum 2067961 or 2054223",
+		.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
+		ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1ccb92165bd8..2102e15af43d 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -54,6 +54,7 @@ WORKAROUND_1463225
 WORKAROUND_1508412
 WORKAROUND_1542419
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+WORKAROUND_TSB_FLUSH_FAILURE
 WORKAROUND_CAVIUM_23154
 WORKAROUND_CAVIUM_27456
 WORKAROUND_CAVIUM_30115
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (10 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  7:59   ` Anshuman Khandual
  2021-10-04 17:42   ` Mathieu Poirier
  2021-09-21 13:41 ` [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size Suzuki K Poulose
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Add a helper to get the CPU specific data for TRBE instance, from
a given perf handle. This also adds extra checks to make sure that
the event associated with the handle is "bound" to the CPU and is
active on the TRBE.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 983dd5039e52..797d978f9fa7 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
 	return buf->nr_pages * PAGE_SIZE;
 }
 
+static inline struct trbe_cpudata *
+trbe_handle_to_cpudata(struct perf_output_handle *handle)
+{
+	struct trbe_buf *buf = etm_perf_sink_config(handle);
+
+	BUG_ON(!buf || !buf->cpudata);
+	return buf->cpudata;
+}
+
 /*
  * TRBE Limit Calculation
  *
@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
 {
 	int ec = get_trbe_ec(trbsr);
 	int bsc = get_trbe_bsc(trbsr);
-	struct trbe_buf *buf = etm_perf_sink_config(handle);
-	struct trbe_cpudata *cpudata = buf->cpudata;
+	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
 
 	WARN_ON(is_trbe_running(trbsr));
 	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (11 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  9:51   ` Anshuman Khandual
  2021-09-21 13:41 ` [PATCH v2 14/17] coresight: trbe: Make sure we have enough space Suzuki K Poulose
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

For the TRBE to operate, we need a minimum space available to collect
meaningful trace session. This is currently a few bytes, but we may need
to extend this for working around errata. So, abstract this into a helper
function.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 797d978f9fa7..3373f4e2183b 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -277,6 +277,11 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
 	return buf->cpudata;
 }
 
+static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
+{
+	return TRBE_TRACE_MIN_BUF_SIZE;
+}
+
 /*
  * TRBE Limit Calculation
  *
@@ -447,7 +452,7 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
 	 * have space for a meaningful run, we rather pad it
 	 * and start fresh.
 	 */
-	if (limit && (limit - head < TRBE_TRACE_MIN_BUF_SIZE)) {
+	if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
 		trbe_pad_buf(handle, limit - head);
 		limit = __trbe_normal_offset(handle);
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 14/17] coresight: trbe: Make sure we have enough space
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (12 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22  9:58   ` Anshuman Khandual
  2021-09-21 13:41 ` [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range Suzuki K Poulose
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

The TRBE driver makes sure that there is enough space for a meaningful
run, otherwise pads the given space and restarts the offset calculation
once. But there is no guarantee that we may find space or hit "no space".
Make sure that we repeat the step until, either :
  - We have the minimum space
   OR
  - There is NO space at all.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 3373f4e2183b..02f9e00e2091 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
 	 * If the head is too close to the limit and we don't
 	 * have space for a meaningful run, we rather pad it
 	 * and start fresh.
+	 *
+	 * We might have to do this more than once to make sure
+	 * we have enough required space.
 	 */
-	if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
+	while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
 		trbe_pad_buf(handle, limit - head);
 		limit = __trbe_normal_offset(handle);
+		head = PERF_IDX2OFF(handle->head, buf);
 	}
 	return limit;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (13 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 14/17] coresight: trbe: Make sure we have enough space Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22 10:59   ` Anshuman Khandual
  2021-10-07 16:10   ` Catalin Marinas
  2021-09-21 13:41 ` [PATCH v2 16/17] coresight: trbe: Work around write to out of range Suzuki K Poulose
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
trbe, under some circumstances, might write upto 64bytes to an address after
the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -

  - Corrupt a page in the ring buffer, which may corrupt trace from a
    previous session, consumed by userspace.
  - Hit the guard page at the end of the vmalloc area and raise a fault.

To keep the handling simpler, we always leave the last page from the
range, which TRBE is allowed to write. This can be achieved by ensuring
that we always have more than a PAGE worth space in the range, while
calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
while enabling it. This makes sure that the TRBE will only write to an area
within its allowed limit (i.e, [head-head+size]) and we do not have to handle
address faults within the driver.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/cpu_errata.c | 20 ++++++++++++++++++++
 arch/arm64/tools/cpucaps       |  1 +
 2 files changed, 21 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index bdbeac75ead6..e2978b89d4b8 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -364,6 +364,18 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
 };
 #endif	/* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
 
+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+static struct midr_range trbe_write_out_of_range_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2253138
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2224489
+	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+	{},
+};
+#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
+
 const struct arm64_cpu_capabilities arm64_errata[] = {
 #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
 	{
@@ -577,6 +589,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
 		ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
 	},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+	{
+		.desc = "ARM erratum 2253138 or 2224489",
+		.capability = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
+		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 2102e15af43d..90628638e0f9 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -55,6 +55,7 @@ WORKAROUND_1508412
 WORKAROUND_1542419
 WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 WORKAROUND_TSB_FLUSH_FAILURE
+WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
 WORKAROUND_CAVIUM_23154
 WORKAROUND_CAVIUM_27456
 WORKAROUND_CAVIUM_30115
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 16/17] coresight: trbe: Work around write to out of range
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (14 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-23  3:15   ` Anshuman Khandual
  2021-09-21 13:41 ` [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address Suzuki K Poulose
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

TRBE implementations affected by Arm erratum (2253138 or 2224489), could
write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
to the TRBBASER. This implies that the TRBE could potentially corrupt :

  - A page used by the rest of the kernel/user (if the LIMIT = end of
    perf ring buffer)
  - A page within the ring buffer, but outside the driver's range.
    [head, head + size]. This may contain some trace data, may be
    consumed by the userspace.

We workaround this erratum by :
  - Making sure that there is at least an extra PAGE space left in the
    TRBE's range than we normally assign. This will be additional to other
    restrictions (e.g, the TRBE alignment for working around
    TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
    Thus we would have 2 * PAGE_SIZE)

  - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
    range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :

        TRBLIMITR.LIMIT -= PAGE_SIZE

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 02f9e00e2091..ea907345354c 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -86,7 +86,8 @@ struct trbe_buf {
  * affects the given instance of the TRBE.
  */
 #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
-#define TRBE_ERRATA_MAX				1
+#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE	1
+#define TRBE_ERRATA_MAX				2
 
 /*
  * Safe limit for the number of bytes that may be overwritten
@@ -96,6 +97,7 @@ struct trbe_buf {
 
 static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
 	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
+	[TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
 };
 
 /*
@@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
 
 static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
 {
-	return TRBE_TRACE_MIN_BUF_SIZE;
+	u64 size = TRBE_TRACE_MIN_BUF_SIZE;
+	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
+
+	/*
+	 * When the TRBE is affected by an erratum that could make it
+	 * write to the next "virtually addressed" page beyond the LIMIT.
+	 * We need to make sure there is always a PAGE after the LIMIT,
+	 * within the buffer. Thus we ensure there is at least an extra
+	 * page than normal. With this we could then adjust the LIMIT
+	 * pointer down by a PAGE later.
+	 */
+	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE))
+		size += PAGE_SIZE;
+	return size;
 }
 
 /*
@@ -585,6 +600,17 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
 	/*
 	 * If the TRBE has wrapped around the write pointer has
 	 * wrapped and should be treated as limit.
+	 *
+	 * When the TRBE is affected by TRBE_WORKAROUND_WRITE_OUT_OF_RANGE,
+	 * it may write upto 64bytes beyond the "LIMIT". The driver already
+	 * keeps a valid page next to the LIMIT and we could potentially
+	 * consume the trace data that may have been collected there. But we
+	 * cannot be really sure it is available, and the TRBPTR may not
+	 * indicate the same. Also, affected cores are also affected by another
+	 * erratum which forces the PAGE_SIZE alignment on the TRBPTR, and thus
+	 * could potentially pad an entire PAGE_SIZE - 64bytes, to get those
+	 * 64bytes. Thus we ignore the potential triggering of the erratum
+	 * on WRAP and limit the data to LIMIT.
 	 */
 	if (wrap)
 		write = get_trbe_limit_pointer();
@@ -811,6 +837,35 @@ static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
 		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
 	}
 
+	/*
+	 * TRBE_WORKAROUND_WRITE_OUT_OF_RANGE could cause the TRBE to write to
+	 * the next page after the TRBLIMITR.LIMIT. For perf, the "next page"
+	 * may be:
+	 * 	- The page beyond the ring buffer. This could mean, TRBE could
+	 * 	  corrupt another entity (kernel / user)
+	 * 	- A portion of the "ring buffer" consumed by the userspace.
+	 * 	  i.e, a page outisde [head, head + size].
+	 *
+	 * We work around this by:
+	 * 	- Making sure that we have at least an extra space of PAGE left
+	 * 	in the ring buffer [head, head + size], than we normally do
+	 * 	without the erratum. See trbe_min_trace_buf_size().
+	 *
+	 * 	- Adjust the TRBLIMITR.LIMIT to leave the extra PAGE outside
+	 * 	the TRBE's range (i.e [TRBBASER, TRBLIMITR.LIMI] ).
+	 */
+	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) {
+		s64 space = buf->trbe_limit - buf->trbe_write;
+		/*
+		 * We must have more than a PAGE_SIZE worth space in the proposed
+		 * range for the TRBE.
+		 */
+		if (WARN_ON(space <= PAGE_SIZE ||
+			    !IS_ALIGNED(buf->trbe_limit, PAGE_SIZE)))
+			return -EINVAL;
+		buf->trbe_limit -= PAGE_SIZE;
+	}
+
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (15 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 16/17] coresight: trbe: Work around write to out of range Suzuki K Poulose
@ 2021-09-21 13:41 ` Suzuki K Poulose
  2021-09-22 11:03   ` Anshuman Khandual
  2021-10-07 16:11   ` Catalin Marinas
  2021-10-05 17:04 ` [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Mathieu Poirier
  2021-10-08  7:32 ` Will Deacon
  18 siblings, 2 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-21 13:41 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight, Suzuki K Poulose

Add Kconfig entries for the errata workarounds for TRBE writing
to an out-of-range address.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 Documentation/arm64/silicon-errata.rst |  4 +++
 arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 569a92411dcd..5342e895fb60 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -96,6 +96,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A710     | #2224489        | ARM64_ERRATUM_2224489       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1349291        | N/A                         |
@@ -106,6 +108,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #2067961        | ARM64_ERRATUM_2067961       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N2     | #2253138        | ARM64_ERRATUM_2253138       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0764774e12bb..611ae02aabbd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -736,6 +736,45 @@ config ARM64_ERRATUM_2067961
 
 	  If unsure, say Y.
 
+config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+	bool
+
+config ARM64_ERRATUM_2253138
+	bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
+	depends on CORESIGHT_TRBE
+	default y
+	select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+	help
+	  This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
+
+	  Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
+	  for TRBE. Under some conditions, the TRBE might generate a write to the next
+	  virtually addressed page following the last page of the TRBE address space
+	  (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
+
+	  We work around this in the driver by, always making sure that there is a
+	  page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_2224489
+	bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range"
+	depends on CORESIGHT_TRBE
+	default y
+	select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+	help
+	  This option adds the workaround for ARM Cortex-A710 erratum 2224489.
+
+	  Affected Cortex-A710 cores might write to an out-of-range address, not reserved
+	  for TRBE. Under some conditions, the TRBE might generate a write to the next
+	  virtually addressed page following the last page of the TRBE address space
+	  (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
+
+	  We work around this in the driver by, always making sure that there is a
+	  page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data
  2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
@ 2021-09-22  5:41   ` Anshuman Khandual
  2021-09-30 17:57   ` Mathieu Poirier
  1 sibling, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  5:41 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> The TRBE driver wrongly treats the aux private data as the TRBE driver
> specific buffer for a given perf handle, while it is the ETM PMU's
> event specific data. Fix this by correcting the instance to use
> appropriate helper.
> 
> Fixes: 3fbf7f011f242 ("coresight: sink: Add TRBE driver")
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index d4c57aed05e5..e3d73751d568 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -363,7 +363,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
>  
>  static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>  {
> -	struct trbe_buf *buf = perf_get_aux(handle);
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
>  	u64 limit = __trbe_normal_offset(handle);
>  	u64 head = PERF_IDX2OFF(handle->head, buf);
>  
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling
  2021-09-21 13:41 ` [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling Suzuki K Poulose
@ 2021-09-22  6:47   ` Anshuman Khandual
  2021-10-05 16:46   ` Mathieu Poirier
  1 sibling, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  6:47 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add a minimal infrastructure to keep track of the errata
> affecting the given TRBE instance. Given that we have
> heterogeneous CPUs, we have to manage the list per-TRBE
> instance to be able to apply the work around as needed.
> 
> We rely on the arm64 errata framework for the actual
> description and the discovery of a given erratum, to
> keep the Erratum work around at a central place and
> benefit from the code and the advertisement from the
> kernel. We use a local mapping of the erratum to
> avoid bloating up the individual TRBE structures.
> i.e, each arm64 TRBE erratum bit is assigned a new number
> within the driver to track. Each trbe instance updates
> the list of affected erratum at probe time on the CPU.
> This makes sure that we can easily access the list of
> errata on a given TRBE instance without much overhead.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>   - Flip the order of args for trbe_has_erratum()
>   - Move erratum detection further down in the sequence
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 49 ++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index e3d73751d568..63f7edd5fd1f 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -16,6 +16,8 @@
>  #define pr_fmt(fmt) DRVNAME ": " fmt
>  
>  #include <asm/barrier.h>
> +#include <asm/cputype.h>
> +
>  #include "coresight-self-hosted-trace.h"
>  #include "coresight-trbe.h"
>  
> @@ -65,6 +67,35 @@ struct trbe_buf {
>  	struct trbe_cpudata *cpudata;
>  };
>  
> +/*
> + * TRBE erratum list
> + *
> + * We rely on the corresponding cpucaps to be defined for a given
> + * TRBE erratum. We map the given cpucap into a TRBE internal number
> + * to make the tracking of the errata lean.
> + *
> + * This helps in :
> + *   - Not duplicating the detection logic
> + *   - Streamlined detection of erratum across the system
> + *
> + * Since the erratum work arounds could be applied individually
> + * per TRBE instance, we keep track of the list of errata that
> + * affects the given instance of the TRBE.
> + */
> +#define TRBE_ERRATA_MAX			0
> +
> +static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> +};

Hence TRBE_ERRATA_MAX needs to be updated manually here when new
TRBE specific erratums get added to the cpucap list. Hence lets
add a comment indicating that the TRBE_ERRATA_MAX needs explicit
syncing with changes to cpucap list.

> +
> +/*
> + * struct trbe_cpudata: TRBE instance specific data
> + * @trbe_flag		- TRBE dirty/access flag support
> + * @tbre_align		- Actual TRBE alignment required for TRBPTR_EL1.
> + * @cpu			- CPU this TRBE belongs to.
> + * @mode		- Mode of current operation. (perf/disabled)
> + * @drvdata		- TRBE specific drvdata
> + * @errata		- Bit map for the errata on this TRBE.
> + */
>  struct trbe_cpudata {
>  	bool trbe_flag;
>  	u64 trbe_align;
> @@ -72,6 +103,7 @@ struct trbe_cpudata {
>  	enum cs_mode mode;
>  	struct trbe_buf *buf;
>  	struct trbe_drvdata *drvdata;
> +	DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);
>  };
>  
>  struct trbe_drvdata {
> @@ -84,6 +116,21 @@ struct trbe_drvdata {
>  	struct platform_device *pdev;
>  };
>  
> +static void trbe_check_errata(struct trbe_cpudata *cpudata)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
> +		if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
> +			set_bit(i, cpudata->errata);
> +	}
> +}
> +
> +static inline bool trbe_has_erratum(struct trbe_cpudata *cpudata, int i)
> +{
> +	return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);
> +}
> +
>  static int trbe_alloc_node(struct perf_event *event)
>  {
>  	if (event->cpu == -1)
> @@ -926,6 +973,8 @@ static void arm_trbe_probe_cpu(void *info)
>  		pr_err("Unsupported alignment on cpu %d\n", cpu);
>  		goto cpu_clear;
>  	}
> +
> +	trbe_check_errata(cpudata);

This could be moved further down just before the 'return' statement.
Lets not interrupt cpudata init sequence, rather run all the errata
detection right at the end.

>  	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
>  	cpudata->cpu = cpu;
>  	cpudata->drvdata = drvdata;
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
  2021-09-21 13:41 ` [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition Suzuki K Poulose
@ 2021-09-22  6:57   ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  6:57 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add the CPU Partnumbers for the new Arm designs.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  arch/arm64/include/asm/cputype.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
> index 6231e1f0abe7..19b8441aa8f2 100644
> --- a/arch/arm64/include/asm/cputype.h
> +++ b/arch/arm64/include/asm/cputype.h
> @@ -73,6 +73,8 @@
>  #define ARM_CPU_PART_CORTEX_A76		0xD0B
>  #define ARM_CPU_PART_NEOVERSE_N1	0xD0C
>  #define ARM_CPU_PART_CORTEX_A77		0xD0D
> +#define ARM_CPU_PART_CORTEX_A710	0xD47
> +#define ARM_CPU_PART_NEOVERSE_N2	0xD49
>  
>  #define APM_CPU_PART_POTENZA		0x000
>  
> @@ -113,6 +115,8 @@
>  #define MIDR_CORTEX_A76	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76)
>  #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1)
>  #define MIDR_CORTEX_A77	MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77)
> +#define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710)
> +#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
>  #define MIDR_THUNDERX	MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
>  #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
>  #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode
  2021-09-21 13:41 ` [PATCH v2 10/17] arm64: Enable workaround for TRBE " Suzuki K Poulose
@ 2021-09-22  7:23   ` Anshuman Khandual
  2021-09-22  8:11     ` Suzuki K Poulose
  2021-10-07 16:09   ` Catalin Marinas
  1 sibling, 1 reply; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  7:23 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Now that we have the work around implmented in the TRBE
> driver, add the Kconfig entries and document the errata.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  Documentation/arm64/silicon-errata.rst |  4 +++
>  arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index d410a47ffa57..2f99229d993c 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -92,12 +92,16 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1349291        | N/A                         |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1542419        | ARM64_ERRATUM_1542419       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | MMU-500         | #841119,826419  | N/A                         |
>  +----------------+-----------------+-----------------+-----------------------------+
>  +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 077f2ec4eeb2..eac4030322df 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>  
>  	  If unsure, say Y.
>  
> +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> +	bool
> +
> +config ARM64_ERRATUM_2119858
> +	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
> +	default y
> +	depends on CORESIGHT_TRBE
> +	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> +	help
> +	  This option adds the workaround for ARM Cortex-A710 erratum 2119858.
> +
> +	  Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
> +	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
> +	  the event of a WRAP event.
> +
> +	  Work around the issue by always making sure we move the TRBPTR_EL1 by
> +	  256bytes before enabling the buffer and filling the first 256bytes of
> +	  the buffer with ETM ignore packets upon disabling.
> +
> +	  If unsure, say Y.
> +
> +config ARM64_ERRATUM_2139208
> +	bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
> +	default y
> +	depends on CORESIGHT_TRBE
> +	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> +	help
> +	  This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
> +
> +	  Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
> +	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in

s/ponited/pointed

> +	  the event of a WRAP event.
> +
> +	  Work around the issue by always making sure we move the TRBPTR_EL1 by
> +	  256bytes before enabling the buffer and filling the first 256bytes of
> +	  the buffer with ETM ignore packets upon disabling.
> +
> +	  If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>  	bool "Cavium erratum 22375, 24313"
>  	default y
> 

The real errata problem description for both these erratums are exactly
the same. Rather a more generalized description should be included for
the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
problem and a corresponding solution that is implemented in the driver.
This should also help us reduce current redundancy.

----------------------------------------------------------------------
Affected cores could overwrite upto 3 cache lines of trace data at the
base of the buffer (pointed by TRBASER_EL1) in FILL mode in the event
of a WRAP event.

Work around the issue by always making sure we move the TRBPTR_EL1 by
256bytes before enabling the buffer and filling the first 256bytes of
the buffer with ETM ignore packets upon disabling.
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures
  2021-09-21 13:41 ` [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures Suzuki K Poulose
@ 2021-09-22  7:39   ` Anshuman Khandual
  2021-09-22 12:03     ` Suzuki K Poulose
  2021-10-07 16:10   ` Catalin Marinas
  1 sibling, 1 reply; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  7:39 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
> from errata, where a TSB (trace synchronization barrier)
> fails to flush the trace data completely, when executed from
> a trace prohibited region. In Linux we always execute it
> after we have moved the PE to trace prohibited region. So,
> we can apply the workaround everytime a TSB is executed.

s/everytime/every time

> 
> The work around is to issue two TSB consecutively.
> 
> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
> that a late CPU could be blocked from booting if it is the
> first CPU that requires the workaround. This is because we
> do not allow setting a cpu_hwcaps after the SMP boot. The
> other alternative is to use "this_cpu_has_cap()" instead
> of the faster system wide check, which may be a bit of an
> overhead, given we may have to do this in nvhe KVM host
> before a guest entry.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>  - Switch to cpus_have_final_cap()
>  - Document the requirements on TSB.
> ---
>  Documentation/arm64/silicon-errata.rst |  4 ++++
>  arch/arm64/Kconfig                     | 31 ++++++++++++++++++++++++++
>  arch/arm64/include/asm/barrier.h       | 16 ++++++++++++-
>  arch/arm64/kernel/cpu_errata.c         | 19 ++++++++++++++++
>  arch/arm64/tools/cpucaps               |  1 +
>  5 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 2f99229d993c..569a92411dcd 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -94,6 +94,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1349291        | N/A                         |
> @@ -102,6 +104,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Neoverse-N2     | #2067961        | ARM64_ERRATUM_2067961       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | MMU-500         | #841119,826419  | N/A                         |
>  +----------------+-----------------+-----------------+-----------------------------+
>  +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eac4030322df..0764774e12bb 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>  
>  	  If unsure, say Y.
>  
> +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
> +	bool
> +
> +config ARM64_ERRATUM_2054223
> +	bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
> +	default y
> +	help
> +	  Enable workaround for ARM Cortex-A710 erratum 2054223
> +
> +	  Affected cores may fail to flush the trace data on a TSB instruction, when
> +	  the PE is in trace prohibited state. This will cause losing a few bytes
> +	  of the trace cached.
> +
> +	  Workaround is to issue two TSB consecutively on affected cores.
> +
> +	  If unsure, say Y.
> +
> +config ARM64_ERRATUM_2067961
> +	bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
> +	default y
> +	help
> +	  Enable workaround for ARM Neoverse-N2 erratum 2067961
> +
> +	  Affected cores may fail to flush the trace data on a TSB instruction, when
> +	  the PE is in trace prohibited state. This will cause losing a few bytes
> +	  of the trace cached.
> +
> +	  Workaround is to issue two TSB consecutively on affected cores.

Like I had mentioned in the previous patch, these descriptions here could
be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.

> +
> +	  If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>  	bool "Cavium erratum 22375, 24313"
>  	default y
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 451e11e5fd23..1c5a00598458 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -23,7 +23,7 @@
>  #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>  
>  #define psb_csync()	asm volatile("hint #17" : : : "memory")
> -#define tsb_csync()	asm volatile("hint #18" : : : "memory")
> +#define __tsb_csync()	asm volatile("hint #18" : : : "memory")
>  #define csdb()		asm volatile("hint #20" : : : "memory")
>  
>  #ifdef CONFIG_ARM64_PSEUDO_NMI
> @@ -46,6 +46,20 @@
>  #define dma_rmb()	dmb(oshld)
>  #define dma_wmb()	dmb(oshst)
>  
> +
> +#define tsb_csync()								\
> +	do {									\
> +		/*								\
> +		 * CPUs affected by Arm Erratum 2054223 or 2067961 needs	\
> +		 * another TSB to ensure the trace is flushed. The barriers	\
> +		 * don't have to be strictly back to back, as long as the	\
> +		 * CPU is in trace prohibited state.				\
> +		 */								\
> +		if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE))	\
> +			__tsb_csync();						\
> +		__tsb_csync();							\
> +	} while (0)
> +
>  /*
>   * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
>   * and 0 otherwise.
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index ccd757373f36..bdbeac75ead6 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
>  };
>  #endif	/* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>  
> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
> +static const struct midr_range tsb_flush_fail_cpus[] = {
> +#ifdef CONFIG_ARM64_ERRATUM_2067961
> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2054223
> +	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> +#endif
> +	{},
> +};
> +#endif	/* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
> +
>  const struct arm64_cpu_capabilities arm64_errata[] = {
>  #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>  	{
> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>  		CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
>  	},
> +#endif
> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
> +	{
> +		.desc = "ARM erratum 2067961 or 2054223",
> +		.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
> +		ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
> +	},
>  #endif
>  	{
>  	}
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 1ccb92165bd8..2102e15af43d 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -54,6 +54,7 @@ WORKAROUND_1463225
>  WORKAROUND_1508412
>  WORKAROUND_1542419
>  WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> +WORKAROUND_TSB_FLUSH_FAILURE
>  WORKAROUND_CAVIUM_23154
>  WORKAROUND_CAVIUM_27456
>  WORKAROUND_CAVIUM_30115
> 

This adds all the required bits of these erratas in a single patch,
where as the previous work around had split all the required pieces
into multiple patches. Could we instead follow the same standard in
both the places ?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-09-21 13:41 ` [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle Suzuki K Poulose
@ 2021-09-22  7:59   ` Anshuman Khandual
  2021-10-04 17:42   ` Mathieu Poirier
  1 sibling, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  7:59 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add a helper to get the CPU specific data for TRBE instance, from
> a given perf handle. This also adds extra checks to make sure that
> the event associated with the handle is "bound" to the CPU and is
> active on the TRBE.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 983dd5039e52..797d978f9fa7 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>  	return buf->nr_pages * PAGE_SIZE;
>  }
>  
> +static inline struct trbe_cpudata *
> +trbe_handle_to_cpudata(struct perf_output_handle *handle)

This is actually a perf handle not a TRBE handle. Hence
should be renamed as 'perf_handle_to_cpudata' instead.

> +{
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> +
> +	BUG_ON(!buf || !buf->cpudata);
> +	return buf->cpudata;
> +}
> +
>  /*
>   * TRBE Limit Calculation
>   *
> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>  {
>  	int ec = get_trbe_ec(trbsr);
>  	int bsc = get_trbe_bsc(trbsr);
> -	struct trbe_buf *buf = etm_perf_sink_config(handle);
> -	struct trbe_cpudata *cpudata = buf->cpudata;
> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>  
>  	WARN_ON(is_trbe_running(trbsr));
>  	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode
  2021-09-22  7:23   ` Anshuman Khandual
@ 2021-09-22  8:11     ` Suzuki K Poulose
  2021-10-01  4:35       ` Anshuman Khandual
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-22  8:11 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight

On 22/09/2021 08:23, Anshuman Khandual wrote:
> 
> 
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> Now that we have the work around implmented in the TRBE
>> driver, add the Kconfig entries and document the errata.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   Documentation/arm64/silicon-errata.rst |  4 +++
>>   arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
>>   2 files changed, 43 insertions(+)
>>
>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
>> index d410a47ffa57..2f99229d993c 100644
>> --- a/Documentation/arm64/silicon-errata.rst
>> +++ b/Documentation/arm64/silicon-errata.rst
>> @@ -92,12 +92,16 @@ stable kernels.
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>>   +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>> ++----------------+-----------------+-----------------+-----------------------------+
>>   | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   | ARM            | Neoverse-N1     | #1349291        | N/A                         |
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   | ARM            | Neoverse-N1     | #1542419        | ARM64_ERRATUM_1542419       |
>>   +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
>> ++----------------+-----------------+-----------------+-----------------------------+
>>   | ARM            | MMU-500         | #841119,826419  | N/A                         |
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   +----------------+-----------------+-----------------+-----------------------------+
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 077f2ec4eeb2..eac4030322df 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>>   
>>   	  If unsure, say Y.
>>   
>> +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> +	bool
>> +
>> +config ARM64_ERRATUM_2119858
>> +	bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>> +	default y
>> +	depends on CORESIGHT_TRBE
>> +	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> +	help
>> +	  This option adds the workaround for ARM Cortex-A710 erratum 2119858.
>> +
>> +	  Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
>> +	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>> +	  the event of a WRAP event.
>> +
>> +	  Work around the issue by always making sure we move the TRBPTR_EL1 by
>> +	  256bytes before enabling the buffer and filling the first 256bytes of
>> +	  the buffer with ETM ignore packets upon disabling.
>> +
>> +	  If unsure, say Y.
>> +
>> +config ARM64_ERRATUM_2139208
>> +	bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
>> +	default y
>> +	depends on CORESIGHT_TRBE
>> +	select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> +	help
>> +	  This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
>> +
>> +	  Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
>> +	  data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
> 
> s/ponited/pointed
> 
>> +	  the event of a WRAP event.
>> +
>> +	  Work around the issue by always making sure we move the TRBPTR_EL1 by
>> +	  256bytes before enabling the buffer and filling the first 256bytes of
>> +	  the buffer with ETM ignore packets upon disabling.
>> +
>> +	  If unsure, say Y.
>> +
>>   config CAVIUM_ERRATUM_22375
>>   	bool "Cavium erratum 22375, 24313"
>>   	default y
>>
> 
> The real errata problem description for both these erratums are exactly
> the same. Rather a more generalized description should be included for
> the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
> problem and a corresponding solution that is implemented in the driver.
> This should also help us reduce current redundancy.
> 

The issue is what a user wants to see. A user who wants to configure the
kernel specifically for a given CPU (think embedded systems), they would
want to hand pick the errata for the particular CPU. So, moving the help
text to an implicitly selected Kconfig symbol. I would rather keep this
as it is to keep it user friendly. This doesn't affect the code size
anyways.

The other option is to remove all the CPU specific Kconfig symbols and
update the "title" to reflect both the CPU/erratum numbers.

Kind regards
Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size
  2021-09-21 13:41 ` [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size Suzuki K Poulose
@ 2021-09-22  9:51   ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  9:51 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> For the TRBE to operate, we need a minimum space available to collect
> meaningful trace session. This is currently a few bytes, but we may need
> to extend this for working around errata. So, abstract this into a helper
> function.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 797d978f9fa7..3373f4e2183b 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -277,6 +277,11 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>  	return buf->cpudata;
>  }
>  
> +static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
> +{
> +	return TRBE_TRACE_MIN_BUF_SIZE;
> +}

Assuming that struct perf_output_handle could provide all the
required support for a variable minimum trace buffer length.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> +
>  /*
>   * TRBE Limit Calculation
>   *
> @@ -447,7 +452,7 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>  	 * have space for a meaningful run, we rather pad it
>  	 * and start fresh.
>  	 */
> -	if (limit && (limit - head < TRBE_TRACE_MIN_BUF_SIZE)) {
> +	if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>  		trbe_pad_buf(handle, limit - head);
>  		limit = __trbe_normal_offset(handle);
>  	}
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space
  2021-09-21 13:41 ` [PATCH v2 14/17] coresight: trbe: Make sure we have enough space Suzuki K Poulose
@ 2021-09-22  9:58   ` Anshuman Khandual
  2021-09-22 10:16     ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22  9:58 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> The TRBE driver makes sure that there is enough space for a meaningful
> run, otherwise pads the given space and restarts the offset calculation
> once. But there is no guarantee that we may find space or hit "no space".

So what happens currently when it neither finds the required minimum buffer
space for a meaningful run nor does it hit the "no space" scenario ?

> Make sure that we repeat the step until, either :
>   - We have the minimum space
>    OR
>   - There is NO space at all.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 3373f4e2183b..02f9e00e2091 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>  	 * If the head is too close to the limit and we don't
>  	 * have space for a meaningful run, we rather pad it
>  	 * and start fresh.
> +	 *
> +	 * We might have to do this more than once to make sure
> +	 * we have enough required space.

OR no space at all, as explained in the commit message.
Hence this comment needs an update.

>  	 */
> -	if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
> +	while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>  		trbe_pad_buf(handle, limit - head);
>  		limit = __trbe_normal_offset(handle);
> +		head = PERF_IDX2OFF(handle->head, buf);

Should the loop be bound with a retry limit as well ?

>  	}
>  	return limit;
>  }
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space
  2021-09-22  9:58   ` Anshuman Khandual
@ 2021-09-22 10:16     ` Suzuki K Poulose
  2021-10-01  4:40       ` Anshuman Khandual
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-22 10:16 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight

On 22/09/2021 10:58, Anshuman Khandual wrote:
> 
> 
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> The TRBE driver makes sure that there is enough space for a meaningful
>> run, otherwise pads the given space and restarts the offset calculation
>> once. But there is no guarantee that we may find space or hit "no space".
> 
> So what happens currently when it neither finds the required minimum buffer
> space for a meaningful run nor does it hit the "no space" scenario ?

It tries once today and assumes that it will either hit :

  - No space
    OR
  - Enough space

which is reasonable, given the minimum space needed is a few bytes.
But this may no longer be true with other erratum workaround.

> 
>> Make sure that we repeat the step until, either :
>>    - We have the minimum space
>>     OR
>>    - There is NO space at all.
>>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 3373f4e2183b..02f9e00e2091 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>>   	 * If the head is too close to the limit and we don't
>>   	 * have space for a meaningful run, we rather pad it
>>   	 * and start fresh.
>> +	 *
>> +	 * We might have to do this more than once to make sure
>> +	 * we have enough required space.
> 
> OR no space at all, as explained in the commit message.
> Hence this comment needs an update.
> 
>>   	 */
>> -	if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>> +	while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>>   		trbe_pad_buf(handle, limit - head);
>>   		limit = __trbe_normal_offset(handle);
>> +		head = PERF_IDX2OFF(handle->head, buf);
> 
> Should the loop be bound with a retry limit as well ?

No. We will eventually hit No-space as we keep on padding
the buffer.

Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range
  2021-09-21 13:41 ` [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range Suzuki K Poulose
@ 2021-09-22 10:59   ` Anshuman Khandual
  2021-10-07 16:10   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22 10:59 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
> trbe, under some circumstances, might write upto 64bytes to an address after
> the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -
> 
>   - Corrupt a page in the ring buffer, which may corrupt trace from a
>     previous session, consumed by userspace.
>   - Hit the guard page at the end of the vmalloc area and raise a fault.
> 
> To keep the handling simpler, we always leave the last page from the
> range, which TRBE is allowed to write. This can be achieved by ensuring
> that we always have more than a PAGE worth space in the range, while
> calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
> to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
> while enabling it. This makes sure that the TRBE will only write to an area
> within its allowed limit (i.e, [head-head+size]) and we do not have to handle
> address faults within the driver.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  arch/arm64/kernel/cpu_errata.c | 20 ++++++++++++++++++++
>  arch/arm64/tools/cpucaps       |  1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index bdbeac75ead6..e2978b89d4b8 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -364,6 +364,18 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
>  };
>  #endif	/* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>  
> +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +static struct midr_range trbe_write_out_of_range_cpus[] = {
> +#ifdef CONFIG_ARM64_ERRATUM_2253138
> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2224489
> +	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> +#endif
> +	{},
> +};
> +#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
> +
>  const struct arm64_cpu_capabilities arm64_errata[] = {
>  #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>  	{
> @@ -577,6 +589,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
>  		ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
>  	},
> +#endif
> +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +	{
> +		.desc = "ARM erratum 2253138 or 2224489",
> +		.capability = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
> +		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
> +		CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
> +	},
>  #endif
>  	{
>  	}
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 2102e15af43d..90628638e0f9 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -55,6 +55,7 @@ WORKAROUND_1508412
>  WORKAROUND_1542419
>  WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  WORKAROUND_TSB_FLUSH_FAILURE
> +WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
>  WORKAROUND_CAVIUM_23154
>  WORKAROUND_CAVIUM_27456
>  WORKAROUND_CAVIUM_30115
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address
  2021-09-21 13:41 ` [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address Suzuki K Poulose
@ 2021-09-22 11:03   ` Anshuman Khandual
  2021-10-07 16:11   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-22 11:03 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add Kconfig entries for the errata workarounds for TRBE writing
> to an out-of-range address.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  Documentation/arm64/silicon-errata.rst |  4 +++
>  arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 569a92411dcd..5342e895fb60 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -96,6 +96,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Cortex-A710     | #2224489        | ARM64_ERRATUM_2224489       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N1     | #1349291        | N/A                         |
> @@ -106,6 +108,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | Neoverse-N2     | #2067961        | ARM64_ERRATUM_2067961       |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| ARM            | Neoverse-N2     | #2253138        | ARM64_ERRATUM_2253138       |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | ARM            | MMU-500         | #841119,826419  | N/A                         |
>  +----------------+-----------------+-----------------+-----------------------------+
>  +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0764774e12bb..611ae02aabbd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -736,6 +736,45 @@ config ARM64_ERRATUM_2067961
>  
>  	  If unsure, say Y.
>  
> +config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +	bool
> +
> +config ARM64_ERRATUM_2253138
> +	bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
> +	depends on CORESIGHT_TRBE
> +	default y
> +	select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +	help
> +	  This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
> +
> +	  Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
> +	  for TRBE. Under some conditions, the TRBE might generate a write to the next
> +	  virtually addressed page following the last page of the TRBE address space
> +	  (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
> +
> +	  We work around this in the driver by, always making sure that there is a
> +	  page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
> +
> +	  If unsure, say Y.
> +
> +config ARM64_ERRATUM_2224489
> +	bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range"
> +	depends on CORESIGHT_TRBE
> +	default y
> +	select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +	help
> +	  This option adds the workaround for ARM Cortex-A710 erratum 2224489.
> +
> +	  Affected Cortex-A710 cores might write to an out-of-range address, not reserved
> +	  for TRBE. Under some conditions, the TRBE might generate a write to the next
> +	  virtually addressed page following the last page of the TRBE address space
> +	  (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
> +
> +	  We work around this in the driver by, always making sure that there is a
> +	  page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
> +
> +	  If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>  	bool "Cavium erratum 22375, 24313"
>  	default y
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures
  2021-09-22  7:39   ` Anshuman Khandual
@ 2021-09-22 12:03     ` Suzuki K Poulose
  2021-10-01  4:38       ` Anshuman Khandual
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-22 12:03 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight

Hi Anshuman

On 22/09/2021 08:39, Anshuman Khandual wrote:
> 
> 
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
>> from errata, where a TSB (trace synchronization barrier)
>> fails to flush the trace data completely, when executed from
>> a trace prohibited region. In Linux we always execute it
>> after we have moved the PE to trace prohibited region. So,
>> we can apply the workaround everytime a TSB is executed.
> 
> s/everytime/every time

Ack

> 
>>
>> The work around is to issue two TSB consecutively.
>>
>> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
>> that a late CPU could be blocked from booting if it is the
>> first CPU that requires the workaround. This is because we
>> do not allow setting a cpu_hwcaps after the SMP boot. The
>> other alternative is to use "this_cpu_has_cap()" instead
>> of the faster system wide check, which may be a bit of an
>> overhead, given we may have to do this in nvhe KVM host
>> before a guest entry.
>>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---

...

>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index eac4030322df..0764774e12bb 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>>   
>>   	  If unsure, say Y.
>>   
>> +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>> +	bool
>> +
>> +config ARM64_ERRATUM_2054223
>> +	bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
>> +	default y
>> +	help
>> +	  Enable workaround for ARM Cortex-A710 erratum 2054223
>> +
>> +	  Affected cores may fail to flush the trace data on a TSB instruction, when
>> +	  the PE is in trace prohibited state. This will cause losing a few bytes
>> +	  of the trace cached.
>> +
>> +	  Workaround is to issue two TSB consecutively on affected cores.
>> +
>> +	  If unsure, say Y.
>> +
>> +config ARM64_ERRATUM_2067961
>> +	bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
>> +	default y
>> +	help
>> +	  Enable workaround for ARM Neoverse-N2 erratum 2067961
>> +
>> +	  Affected cores may fail to flush the trace data on a TSB instruction, when
>> +	  the PE is in trace prohibited state. This will cause losing a few bytes
>> +	  of the trace cached.
>> +
>> +	  Workaround is to issue two TSB consecutively on affected cores.
> 
> Like I had mentioned in the previous patch, these descriptions here could
> be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.

Please see my response there.

> 
>> +
>> +	  If unsure, say Y.
>> +
>>   config CAVIUM_ERRATUM_22375
>>   	bool "Cavium erratum 22375, 24313"
>>   	default y
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 451e11e5fd23..1c5a00598458 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -23,7 +23,7 @@
>>   #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
>>   
>>   #define psb_csync()	asm volatile("hint #17" : : : "memory")
>> -#define tsb_csync()	asm volatile("hint #18" : : : "memory")
>> +#define __tsb_csync()	asm volatile("hint #18" : : : "memory")
>>   #define csdb()		asm volatile("hint #20" : : : "memory")
>>   
>>   #ifdef CONFIG_ARM64_PSEUDO_NMI
>> @@ -46,6 +46,20 @@
>>   #define dma_rmb()	dmb(oshld)
>>   #define dma_wmb()	dmb(oshst)
>>   
>> +
>> +#define tsb_csync()								\
>> +	do {									\
>> +		/*								\
>> +		 * CPUs affected by Arm Erratum 2054223 or 2067961 needs	\
>> +		 * another TSB to ensure the trace is flushed. The barriers	\
>> +		 * don't have to be strictly back to back, as long as the	\
>> +		 * CPU is in trace prohibited state.				\
>> +		 */								\
>> +		if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE))	\
>> +			__tsb_csync();						\
>> +		__tsb_csync();							\
>> +	} while (0)
>> +
>>   /*
>>    * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
>>    * and 0 otherwise.
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index ccd757373f36..bdbeac75ead6 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
>>   };
>>   #endif	/* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>>   
>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>> +static const struct midr_range tsb_flush_fail_cpus[] = {
>> +#ifdef CONFIG_ARM64_ERRATUM_2067961
>> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
>> +#endif
>> +#ifdef CONFIG_ARM64_ERRATUM_2054223
>> +	MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
>> +#endif
>> +	{},
>> +};
>> +#endif	/* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>> +
>>   const struct arm64_cpu_capabilities arm64_errata[] = {
>>   #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>>   	{
>> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>>   		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>>   		CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
>>   	},
>> +#endif
>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
>> +	{
>> +		.desc = "ARM erratum 2067961 or 2054223",
>> +		.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
>> +		ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
>> +	},
>>   #endif
>>   	{
>>   	}
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index 1ccb92165bd8..2102e15af43d 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -54,6 +54,7 @@ WORKAROUND_1463225
>>   WORKAROUND_1508412
>>   WORKAROUND_1542419
>>   WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> +WORKAROUND_TSB_FLUSH_FAILURE
>>   WORKAROUND_CAVIUM_23154
>>   WORKAROUND_CAVIUM_27456
>>   WORKAROUND_CAVIUM_30115
>>
> 
> This adds all the required bits of these erratas in a single patch,
> where as the previous work around had split all the required pieces
> into multiple patches. Could we instead follow the same standard in
> both the places ?

We could do this for this particular erratum as the work around is
within the arm64 kernel code, unlike the other ones - where the TRBE
driver needs a change.

So, there is a kind of dependency for the other two, which we don't
in this particular case.

i.e, TRBE driver needs a cpucap number to implement the work around ->
The arm64 kernel must define one, which we cant advertise yet until
we have a TRBE work around.

Thus, they follow a 3 step model.

  - Define CPUCAP erratum
  - TRBE driver work around
  - Finally advertise to the user.

I don't think this one needs that.

Suzuki


> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range
  2021-09-21 13:41 ` [PATCH v2 16/17] coresight: trbe: Work around write to out of range Suzuki K Poulose
@ 2021-09-23  3:15   ` Anshuman Khandual
  2021-09-28 10:32     ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-23  3:15 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
> to the TRBBASER. This implies that the TRBE could potentially corrupt :
> 
>   - A page used by the rest of the kernel/user (if the LIMIT = end of
>     perf ring buffer)
>   - A page within the ring buffer, but outside the driver's range.
>     [head, head + size]. This may contain some trace data, may be
>     consumed by the userspace.
> 
> We workaround this erratum by :
>   - Making sure that there is at least an extra PAGE space left in the
>     TRBE's range than we normally assign. This will be additional to other
>     restrictions (e.g, the TRBE alignment for working around
>     TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>     Thus we would have 2 * PAGE_SIZE)
> 
>   - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>     range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
> 
>         TRBLIMITR.LIMIT -= PAGE_SIZE
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>  1 file changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 02f9e00e2091..ea907345354c 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -86,7 +86,8 @@ struct trbe_buf {
>   * affects the given instance of the TRBE.
>   */
>  #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
> -#define TRBE_ERRATA_MAX				1
> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE	1
> +#define TRBE_ERRATA_MAX				2
>  
>  /*
>   * Safe limit for the number of bytes that may be overwritten
> @@ -96,6 +97,7 @@ struct trbe_buf {
>  
>  static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>  	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
> +	[TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>  };
>  
>  /*
> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>  
>  static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>  {
> -	return TRBE_TRACE_MIN_BUF_SIZE;
> +	u64 size = TRBE_TRACE_MIN_BUF_SIZE;
> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
> +
> +	/*
> +	 * When the TRBE is affected by an erratum that could make it
> +	 * write to the next "virtually addressed" page beyond the LIMIT.

What if the next "virtually addressed" page is just blocked from future
usage in the kernel and never really gets mapped into a physical page ?
In that case it would be guaranteed that, a next "virtually addressed"
page would not even exist after the LIMIT pointer and hence the errata
would not be triggered. Something like there is a virtual mapping cliff
right after the LIMIT pointer from the MMU perspective.

Although it might be bit tricky. Currently the entire ring buffer gets
mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
the above solution, each computation of the LIMIT pointer needs to be
followed by a temporary unmapping of next virtual page from existing
vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
always contains all the physical pages from the perf ring buffer.

> +	 * We need to make sure there is always a PAGE after the LIMIT,
> +	 * within the buffer. Thus we ensure there is at least an extra
> +	 * page than normal. With this we could then adjust the LIMIT
> +	 * pointer down by a PAGE later.
> +	 */
> +	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE))
> +		size += PAGE_SIZE;
> +	return size;
>  }
>  
>  /*
> @@ -585,6 +600,17 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>  	/*
>  	 * If the TRBE has wrapped around the write pointer has
>  	 * wrapped and should be treated as limit.
> +	 *
> +	 * When the TRBE is affected by TRBE_WORKAROUND_WRITE_OUT_OF_RANGE,
> +	 * it may write upto 64bytes beyond the "LIMIT". The driver already
> +	 * keeps a valid page next to the LIMIT and we could potentially
> +	 * consume the trace data that may have been collected there. But we
> +	 * cannot be really sure it is available, and the TRBPTR may not
> +	 * indicate the same. Also, affected cores are also affected by another
> +	 * erratum which forces the PAGE_SIZE alignment on the TRBPTR, and thus
> +	 * could potentially pad an entire PAGE_SIZE - 64bytes, to get those
> +	 * 64bytes. Thus we ignore the potential triggering of the erratum
> +	 * on WRAP and limit the data to LIMIT.
>  	 */
>  	if (wrap)
>  		write = get_trbe_limit_pointer();
> @@ -811,6 +837,35 @@ static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>  		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>  	}
>  
> +	/*
> +	 * TRBE_WORKAROUND_WRITE_OUT_OF_RANGE could cause the TRBE to write to
> +	 * the next page after the TRBLIMITR.LIMIT. For perf, the "next page"
> +	 * may be:
> +	 * 	- The page beyond the ring buffer. This could mean, TRBE could
> +	 * 	  corrupt another entity (kernel / user)
> +	 * 	- A portion of the "ring buffer" consumed by the userspace.
> +	 * 	  i.e, a page outisde [head, head + size].
> +	 *
> +	 * We work around this by:
> +	 * 	- Making sure that we have at least an extra space of PAGE left
> +	 * 	in the ring buffer [head, head + size], than we normally do
> +	 * 	without the erratum. See trbe_min_trace_buf_size().
> +	 *
> +	 * 	- Adjust the TRBLIMITR.LIMIT to leave the extra PAGE outside
> +	 * 	the TRBE's range (i.e [TRBBASER, TRBLIMITR.LIMI] ).
> +	 */
> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) {
> +		s64 space = buf->trbe_limit - buf->trbe_write;
> +		/*
> +		 * We must have more than a PAGE_SIZE worth space in the proposed
> +		 * range for the TRBE.
> +		 */
> +		if (WARN_ON(space <= PAGE_SIZE ||
> +			    !IS_ALIGNED(buf->trbe_limit, PAGE_SIZE)))
> +			return -EINVAL;
> +		buf->trbe_limit -= PAGE_SIZE;
> +	}
> +
>  	return 0;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-09-21 13:41 ` [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata " Suzuki K Poulose
@ 2021-09-23  6:13   ` Anshuman Khandual
  2021-09-28 10:40     ` Suzuki K Poulose
  2021-10-01 17:15   ` Mathieu Poirier
  1 sibling, 1 reply; 62+ messages in thread
From: Anshuman Khandual @ 2021-09-23  6:13 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
> an erratum, which when triggered, might cause the TRBE to overwrite
> the trace data already collected in FILL mode, in the event of a WRAP.
> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
> and could write upto 3 cache line size worth trace. Thus, this could
> corrupt the trace at the "BASE" pointer.
> 
> The workaround is to program the write pointer 256bytes from the

3 cache lines = 256 bytes on all implementation which might have TRBE ?
OR this skid bytes should be derived from the platform cache line size
instead.

> base, such that if the erratum is triggered, it doesn't overwrite
> the trace data that was captured. This skipped region could be
> padded with ignore packets at the end of the session, so that
> the decoder sees a continuous buffer with some padding at the
> beginning. The trace data written at the base is considered
> lost as the limit could have been in the middle of the perf
> ring buffer, and jumping to the "base" is not acceptable.
> We set the flags already to indicate that some amount of trace
> was lost during the FILL event IRQ. So this is fine.

Via PERF_AUX_FLAG_TRUNCATED ? Should be specified here to be clear.

> 
> One important change with the work around is, we program the
> TRBBASER_EL1 to current page where we are allowed to write.
> Otherwise, it could overwrite a region that may be consumed
> by the perf. Towards this, we always make sure that the
> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
> so that we can set the BASE to the PAGE base and move the
> TRBPTR to the 256bytes offset.
> 
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Change since v1:
>  - Updated comment with ASCII art
>  - Add _BYTES suffix for the space to skip for the work around.
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>  1 file changed, 132 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index f569010c672b..983dd5039e52 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -16,6 +16,7 @@
>  #define pr_fmt(fmt) DRVNAME ": " fmt
>  
>  #include <asm/barrier.h>
> +#include <asm/cpufeature.h>
>  #include <asm/cputype.h>
>  
>  #include "coresight-self-hosted-trace.h"
> @@ -84,9 +85,17 @@ struct trbe_buf {
>   * per TRBE instance, we keep track of the list of errata that
>   * affects the given instance of the TRBE.
>   */
> -#define TRBE_ERRATA_MAX			0
> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
> +#define TRBE_ERRATA_MAX				1
> +
> +/*
> + * Safe limit for the number of bytes that may be overwritten
> + * when the erratum is triggered.
> + */
> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256

As mentioned earlier, does it depend on the platform cache line size ?
Otherwise if the skip bytes is something platform independent, should
be mentioned here in a comment.

>  
>  static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> +	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>  };
>  
>  /*
> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>  	set_trbe_limit_pointer_enabled(buf->trbe_limit);
>  }
>  
> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
> +						 u64 trbsr)
>  {
>  	int ec = get_trbe_ec(trbsr);
>  	int bsc = get_trbe_bsc(trbsr);
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = buf->cpudata;

Passing down the perf handle to derive trbe_cpudata seems to be right.

>  
>  	WARN_ON(is_trbe_running(trbsr));
>  	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>  	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>  		return TRBE_FAULT_ACT_FATAL;
>  
> -	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
> -		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> -			return TRBE_FAULT_ACT_WRAP;
> -	}
> +	/*
> +	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> +	 * it might write data after a WRAP event in the fill mode.
> +	 * Thus the check TRBPTR == TRBBASER will not be honored.
> +	 */

Needs bit formatting/alignment cleanup.

> +	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
> +	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
> +	     get_trbe_write_pointer() == get_trbe_base_pointer()))
> +		return TRBE_FAULT_ACT_WRAP;
> +

Right, TRBE without the errata should continue to have the write
pointer = base pointer check. Could all TRBE errata checks like
the following be shortened (without the workaround index) for
better readability ? But not something very important.

trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)


>  	return TRBE_FAULT_ACT_SPURIOUS;
>  }
>  
> @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>  {
>  	u64 write;
>  	u64 start_off, end_off;
> +	u64 size;
> +	u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>  
>  	/*
>  	 * If the TRBE has wrapped around the write pointer has
> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>  
>  	if (WARN_ON_ONCE(end_off < start_off))
>  		return 0;
> -	return (end_off - start_off);
> +
> +	size = end_off - start_off;
> +	/*
> +	 * If the TRBE is affected by the following erratum, we must fill
> +	 * the space we skipped with IGNORE packets. And we are always
> +	 * guaranteed to have at least a PAGE_SIZE space in the buffer.
> +	 */
> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
> +	    !WARN_ON(size < overwrite_skip))
> +		__trbe_pad_buf(buf, start_off, overwrite_skip);
> +
> +	return size;
>  }
>  
>  static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  		clr_trbe_irq();
>  		isb();
>  
> -		act = trbe_get_fault_act(status);
> +		act = trbe_get_fault_act(handle, status);
>  		/*
>  		 * If this was not due to a WRAP event, we have some
>  		 * errors and as such buffer is empty.
> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  	return size;
>  }
>  
> +
> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
> +{
> +	/*
> +	 * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache

few cache lines = 3 cache lines ?

> +	 * line size from the "TRBBASER_EL1" in the event of a "FILL".
> +	 * Thus, we could loose some amount of the trace at the base.
> +	 *
> +	 * Before Fix:
> +	 *
> +	 *  normal-BASE     head  normal-PTR              tail normal-LIMIT
> +	 *  |                   \/                       /
> +	 *   -------------------------------------------------------------
> +	 *  |         |          |xyzdefghij..|...  tuvw|                |
> +	 *   -------------------------------------------------------------
> +	 *                      /    |                   \
> +	 * After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
> +	 *
> +	 * In the normal course of action, we would set the TRBBASER to the
> +	 * beginning of the ring-buffer (normal-BASE). But with the erratum,
> +	 * the TRBE could overwrite the contents at the "normal-BASE", after
> +	 * hitting the "normal-LIMIT", since it doesn't stop as expected. And
> +	 * this is wrong. So we must always make sure that the TRBBASER is
> +	 * within the region [head, head+size].
> +	 *
> +	 * Also, we would set the TRBPTR to head (after adjusting for
> +	 * alignment) at normal-PTR. This would mean that the last few bytes
> +	 * of the trace (say, "xyz") might overwrite the first few bytes of
> +	 * trace written ("abc"). More importantly they will appear in what\
> +	 * userspace sees as the beginning of the trace, which is wrong. We may
> +	 * not always have space to move the latest trace "xyz" to the correct
> +	 * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
> +	 * Thus it is easier to ignore those bytes than to complicate the
> +	 * driver to move it, assuming that the erratum was triggered and doing
> +	 * additional checks to see if there is indeed allowed space at
> +	 * TRBLIMITR.LIMIT.
> +	 *
> +	 * To summarize, with the work around:
> +	 *
> +	 *  - We always align the offset for the next session to PAGE_SIZE
> +	 *    (This is to ensure we can program the TRBBASER to this offset
> +	 *    within the region [head...head+size]).
> +	 *
> +	 *  - At TRBE enable:
> +	 *     - Set the TRBBASER to the page aligned offset of the current
> +	 *       proposed write offset. (which is guaranteed to be aligned
> +	 *       as above)
> +	 *     - Move the TRBPTR to skip first 256bytes (that might be
> +	 *       overwritten with the erratum). This ensures that the trace
> +	 *       generated in the session is not re-written.
> +	 *
> +	 *  - At trace collection:
> +	 *     - Pad the 256bytes skipped above again with IGNORE packets.
> +	 */
> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
> +		if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
> +			return -EINVAL;
> +		buf->trbe_hw_base = buf->trbe_write;
> +		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
> +	}
> +
> +	return 0;
> +}
> +
>  static int __arm_trbe_enable(struct trbe_buf *buf,
>  			     struct perf_output_handle *handle)
>  {
> +	int ret = 0;
> +
>  	perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>  	buf->trbe_limit = compute_trbe_buffer_limit(handle);
>  	buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>  	if (buf->trbe_limit == buf->trbe_base) {
> -		trbe_stop_and_truncate_event(handle);
> -		return -ENOSPC;
> +		ret = -ENOSPC;
> +		goto err;
>  	}
>  	/* Set the base of the TRBE to the buffer base */
>  	buf->trbe_hw_base = buf->trbe_base;
> +
> +	ret = trbe_apply_work_around_before_enable(buf);
> +	if (ret)
> +		goto err;
> +
>  	*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>  	trbe_enable_hw(buf);
>  	return 0;
> +err:
> +	trbe_stop_and_truncate_event(handle);
> +	return ret;
>  }
>  
>  static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>  	if (!is_perf_trbe(handle))
>  		return IRQ_NONE;
>  
> -	act = trbe_get_fault_act(status);
> +	act = trbe_get_fault_act(handle, status);
>  	switch (act) {
>  	case TRBE_FAULT_ACT_WRAP:
>  		truncated = !!trbe_handle_overflow(handle);
> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>  	}
>  
>  	trbe_check_errata(cpudata);
> -	cpudata->trbe_align = cpudata->trbe_hw_align;
> +	/*
> +	 * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> +	 * we must always program the TBRPTR_EL1, 256bytes from a page
> +	 * boundary, with TRBBASER_EL1 set to the page, to prevent
> +	 * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
> +	 *
> +	 * Thus make sure we always align our write pointer to a PAGE_SIZE,
> +	 * which also guarantees that we have at least a PAGE_SIZE space in
> +	 * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
> +	 * the required bytes at the base.
> +	 */
> +	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
> +		cpudata->trbe_align = PAGE_SIZE;
> +	else
> +		cpudata->trbe_align = cpudata->trbe_hw_align;
> +

But like trbe_apply_work_around_before_enable(), trbe_align assignment
should also be wrapped inside a new helper which should contain these
comments and conditional block. Because it makes sense to have errata
work arounds in the leaf level helper functions, rather than TRBE core
operations.

>  	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
>  	cpudata->cpu = cpu;
>  	cpudata->drvdata = drvdata;
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range
  2021-09-23  3:15   ` Anshuman Khandual
@ 2021-09-28 10:32     ` Suzuki K Poulose
  2021-10-01  4:56       ` Anshuman Khandual
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-28 10:32 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight

On 23/09/2021 04:15, Anshuman Khandual wrote:
> 
> 
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>
>>    - A page used by the rest of the kernel/user (if the LIMIT = end of
>>      perf ring buffer)
>>    - A page within the ring buffer, but outside the driver's range.
>>      [head, head + size]. This may contain some trace data, may be
>>      consumed by the userspace.
>>
>> We workaround this erratum by :
>>    - Making sure that there is at least an extra PAGE space left in the
>>      TRBE's range than we normally assign. This will be additional to other
>>      restrictions (e.g, the TRBE alignment for working around
>>      TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>>      Thus we would have 2 * PAGE_SIZE)
>>
>>    - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>>      range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>
>>          TRBLIMITR.LIMIT -= PAGE_SIZE
>>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>>   1 file changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 02f9e00e2091..ea907345354c 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -86,7 +86,8 @@ struct trbe_buf {
>>    * affects the given instance of the TRBE.
>>    */
>>   #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
>> -#define TRBE_ERRATA_MAX				1
>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE	1
>> +#define TRBE_ERRATA_MAX				2
>>   
>>   /*
>>    * Safe limit for the number of bytes that may be overwritten
>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>   
>>   static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>>   	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> +	[TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>>   };
>>   
>>   /*
>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>   
>>   static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>>   {
>> -	return TRBE_TRACE_MIN_BUF_SIZE;
>> +	u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>> +
>> +	/*
>> +	 * When the TRBE is affected by an erratum that could make it
>> +	 * write to the next "virtually addressed" page beyond the LIMIT.
> 
> What if the next "virtually addressed" page is just blocked from future
> usage in the kernel and never really gets mapped into a physical page ?

That is the case today for vmap(), the end of the vm_area has a guard
page. But that implies when the erratum is triggered, the TRBE
encounters a fault and we need to handle that in the driver. This works
for "end" of the ring buffer. But not when the LIMIT is in the middle
of the ring buffer.

> In that case it would be guaranteed that, a next "virtually addressed"
> page would not even exist after the LIMIT pointer and hence the errata
> would not be triggered. Something like there is a virtual mapping cliff
> right after the LIMIT pointer from the MMU perspective.
> 
> Although it might be bit tricky. Currently the entire ring buffer gets
> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
> the above solution, each computation of the LIMIT pointer needs to be
> followed by a temporary unmapping of next virtual page from existing
> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
> always contains all the physical pages from the perf ring buffer.

It is much easier to leave a page aside than to do this map, unmap
dance, which might even change the VA address you get and thus it
complicates the TRBE driver in general. I believe this is much
simpler and we can reason about the code better. And all faults
are still illegal for the driver, which helps us to detect any
other issues in the TRBE.

Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-09-23  6:13   ` Anshuman Khandual
@ 2021-09-28 10:40     ` Suzuki K Poulose
  2021-10-01  4:21       ` Anshuman Khandual
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-09-28 10:40 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight

On 23/09/2021 07:13, Anshuman Khandual wrote:
> 
> 
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>> an erratum, which when triggered, might cause the TRBE to overwrite
>> the trace data already collected in FILL mode, in the event of a WRAP.
>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>> and could write upto 3 cache line size worth trace. Thus, this could
>> corrupt the trace at the "BASE" pointer.
>>
>> The workaround is to program the write pointer 256bytes from the
> 
> 3 cache lines = 256 bytes on all implementation which might have TRBE ?
> OR this skid bytes should be derived from the platform cache line size
> instead.

256bytes is the aligned (to the power of 2) value for the safe guard.
Not 3 cache lines. Ideally, if there is another CPU that has larger
cache line size, affected by the erratum, yes, we must do that.
But for now this is sufficient.

> 
>> base, such that if the erratum is triggered, it doesn't overwrite
>> the trace data that was captured. This skipped region could be
>> padded with ignore packets at the end of the session, so that
>> the decoder sees a continuous buffer with some padding at the
>> beginning. The trace data written at the base is considered
>> lost as the limit could have been in the middle of the perf
>> ring buffer, and jumping to the "base" is not acceptable.
>> We set the flags already to indicate that some amount of trace
>> was lost during the FILL event IRQ. So this is fine.
> 
> Via PERF_AUX_FLAG_TRUNCATED ? Should be specified here to be clear.

Please note that setting the flag is not a side effect of the
work around. And as such I don't think this needs to be mentioned
here. e.g, we changed this to COLLISION recently for WRAP events.
It makes sense to keep the details to the driver.

> 
>>

>> One important change with the work around is, we program the
>> TRBBASER_EL1 to current page where we are allowed to write.
>> Otherwise, it could overwrite a region that may be consumed
>> by the perf. Towards this, we always make sure that the
>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>> so that we can set the BASE to the PAGE base and move the
>> TRBPTR to the 256bytes offset.
>>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Change since v1:
>>   - Updated comment with ASCII art
>>   - Add _BYTES suffix for the space to skip for the work around.
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>>   1 file changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index f569010c672b..983dd5039e52 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,7 @@
>>   #define pr_fmt(fmt) DRVNAME ": " fmt
>>   
>>   #include <asm/barrier.h>
>> +#include <asm/cpufeature.h>
>>   #include <asm/cputype.h>
>>   
>>   #include "coresight-self-hosted-trace.h"
>> @@ -84,9 +85,17 @@ struct trbe_buf {
>>    * per TRBE instance, we keep track of the list of errata that
>>    * affects the given instance of the TRBE.
>>    */
>> -#define TRBE_ERRATA_MAX			0
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
>> +#define TRBE_ERRATA_MAX				1
>> +
>> +/*
>> + * Safe limit for the number of bytes that may be overwritten
>> + * when the erratum is triggered.
>> + */
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256
> 
> As mentioned earlier, does it depend on the platform cache line size ?
> Otherwise if the skip bytes is something platform independent, should
> be mentioned here in a comment.

I could add in a comment.

> 
>>   
>>   static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> +	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>>   };
>>   
>>   /*
>> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>>   	set_trbe_limit_pointer_enabled(buf->trbe_limit);
>>   }
>>   
>> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
>> +						 u64 trbsr)
>>   {
>>   	int ec = get_trbe_ec(trbsr);
>>   	int bsc = get_trbe_bsc(trbsr);
>> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
>> +	struct trbe_cpudata *cpudata = buf->cpudata;
> 
> Passing down the perf handle to derive trbe_cpudata seems to be right.
> 
>>   
>>   	WARN_ON(is_trbe_running(trbsr));
>>   	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>   	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>>   		return TRBE_FAULT_ACT_FATAL;
>>   
>> -	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> -		if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> -			return TRBE_FAULT_ACT_WRAP;
>> -	}
>> +	/*
>> +	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> +	 * it might write data after a WRAP event in the fill mode.
>> +	 * Thus the check TRBPTR == TRBBASER will not be honored.
>> +	 */
> 
> Needs bit formatting/alignment cleanup.
> 
>> +	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
>> +	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
>> +	     get_trbe_write_pointer() == get_trbe_base_pointer()))
>> +		return TRBE_FAULT_ACT_WRAP;
>> +
> 
> Right, TRBE without the errata should continue to have the write
> pointer = base pointer check. Could all TRBE errata checks like
> the following be shortened (without the workaround index) for
> better readability ? But not something very important.
> 
> trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)

Do you mean something like :

trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ->
trbe_may_overwrite_in_fill_mode(cpudata) ?


> 
> 
>>   	return TRBE_FAULT_ACT_SPURIOUS;
>>   }
>>   
>> @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>   {
>>   	u64 write;
>>   	u64 start_off, end_off;
>> +	u64 size;
>> +	u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>>   
>>   	/*
>>   	 * If the TRBE has wrapped around the write pointer has
>> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>   
>>   	if (WARN_ON_ONCE(end_off < start_off))
>>   		return 0;
>> -	return (end_off - start_off);
>> +
>> +	size = end_off - start_off;
>> +	/*
>> +	 * If the TRBE is affected by the following erratum, we must fill
>> +	 * the space we skipped with IGNORE packets. And we are always
>> +	 * guaranteed to have at least a PAGE_SIZE space in the buffer.
>> +	 */
>> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
>> +	    !WARN_ON(size < overwrite_skip))
>> +		__trbe_pad_buf(buf, start_off, overwrite_skip);
>> +
>> +	return size;
>>   }
>>   
>>   static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>>   		clr_trbe_irq();
>>   		isb();
>>   
>> -		act = trbe_get_fault_act(status);
>> +		act = trbe_get_fault_act(handle, status);
>>   		/*
>>   		 * If this was not due to a WRAP event, we have some
>>   		 * errors and as such buffer is empty.
>> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>>   	return size;
>>   }
>>   
>> +
>> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>> +{
>> +	/*
>> +	 * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
> 
> few cache lines = 3 cache lines ?

Yes, upto 3.

> 
>> +	 * line size from the "TRBBASER_EL1" in the event of a "FILL".
>> +	 * Thus, we could loose some amount of the trace at the base.
>> +	 *
>> +	 * Before Fix:
>> +	 *
>> +	 *  normal-BASE     head  normal-PTR              tail normal-LIMIT
>> +	 *  |                   \/                       /
>> +	 *   -------------------------------------------------------------
>> +	 *  |         |          |xyzdefghij..|...  tuvw|                |
>> +	 *   -------------------------------------------------------------
>> +	 *                      /    |                   \
>> +	 * After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
>> +	 *
>> +	 * In the normal course of action, we would set the TRBBASER to the
>> +	 * beginning of the ring-buffer (normal-BASE). But with the erratum,
>> +	 * the TRBE could overwrite the contents at the "normal-BASE", after
>> +	 * hitting the "normal-LIMIT", since it doesn't stop as expected. And
>> +	 * this is wrong. So we must always make sure that the TRBBASER is
>> +	 * within the region [head, head+size].
>> +	 *
>> +	 * Also, we would set the TRBPTR to head (after adjusting for
>> +	 * alignment) at normal-PTR. This would mean that the last few bytes
>> +	 * of the trace (say, "xyz") might overwrite the first few bytes of
>> +	 * trace written ("abc"). More importantly they will appear in what\
>> +	 * userspace sees as the beginning of the trace, which is wrong. We may
>> +	 * not always have space to move the latest trace "xyz" to the correct
>> +	 * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
>> +	 * Thus it is easier to ignore those bytes than to complicate the
>> +	 * driver to move it, assuming that the erratum was triggered and doing
>> +	 * additional checks to see if there is indeed allowed space at
>> +	 * TRBLIMITR.LIMIT.
>> +	 *
>> +	 * To summarize, with the work around:
>> +	 *
>> +	 *  - We always align the offset for the next session to PAGE_SIZE
>> +	 *    (This is to ensure we can program the TRBBASER to this offset
>> +	 *    within the region [head...head+size]).
>> +	 *
>> +	 *  - At TRBE enable:
>> +	 *     - Set the TRBBASER to the page aligned offset of the current
>> +	 *       proposed write offset. (which is guaranteed to be aligned
>> +	 *       as above)
>> +	 *     - Move the TRBPTR to skip first 256bytes (that might be
>> +	 *       overwritten with the erratum). This ensures that the trace
>> +	 *       generated in the session is not re-written.
>> +	 *
>> +	 *  - At trace collection:
>> +	 *     - Pad the 256bytes skipped above again with IGNORE packets.
>> +	 */
>> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
>> +		if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
>> +			return -EINVAL;
>> +		buf->trbe_hw_base = buf->trbe_write;
>> +		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int __arm_trbe_enable(struct trbe_buf *buf,
>>   			     struct perf_output_handle *handle)
>>   {
>> +	int ret = 0;
>> +
>>   	perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>>   	buf->trbe_limit = compute_trbe_buffer_limit(handle);
>>   	buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>>   	if (buf->trbe_limit == buf->trbe_base) {
>> -		trbe_stop_and_truncate_event(handle);
>> -		return -ENOSPC;
>> +		ret = -ENOSPC;
>> +		goto err;
>>   	}
>>   	/* Set the base of the TRBE to the buffer base */
>>   	buf->trbe_hw_base = buf->trbe_base;
>> +
>> +	ret = trbe_apply_work_around_before_enable(buf);
>> +	if (ret)
>> +		goto err;
>> +
>>   	*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>>   	trbe_enable_hw(buf);
>>   	return 0;
>> +err:
>> +	trbe_stop_and_truncate_event(handle);
>> +	return ret;
>>   }
>>   
>>   static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>>   	if (!is_perf_trbe(handle))
>>   		return IRQ_NONE;
>>   
>> -	act = trbe_get_fault_act(status);
>> +	act = trbe_get_fault_act(handle, status);
>>   	switch (act) {
>>   	case TRBE_FAULT_ACT_WRAP:
>>   		truncated = !!trbe_handle_overflow(handle);
>> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>>   	}
>>   
>>   	trbe_check_errata(cpudata);
>> -	cpudata->trbe_align = cpudata->trbe_hw_align;
>> +	/*
>> +	 * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> +	 * we must always program the TBRPTR_EL1, 256bytes from a page
>> +	 * boundary, with TRBBASER_EL1 set to the page, to prevent
>> +	 * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
>> +	 *
>> +	 * Thus make sure we always align our write pointer to a PAGE_SIZE,
>> +	 * which also guarantees that we have at least a PAGE_SIZE space in
>> +	 * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
>> +	 * the required bytes at the base.
>> +	 */
>> +	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
>> +		cpudata->trbe_align = PAGE_SIZE;
>> +	else
>> +		cpudata->trbe_align = cpudata->trbe_hw_align;
>> +
> 
> But like trbe_apply_work_around_before_enable(), trbe_align assignment
> should also be wrapped inside a new helper which should contain these
> comments and conditional block. Because it makes sense to have errata
> work arounds in the leaf level helper functions, rather than TRBE core
> operations.

That would imply we re-initialize the trbe_align in the new helper after
setting the value here for all other unaffected TRBEs. I would rather
leave it as it is, until we have more work arounds that touch this area.
This is one of code called per TRBE instance.

Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated
  2021-09-21 13:41 ` [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated Suzuki K Poulose
@ 2021-09-30 17:54   ` Mathieu Poirier
  2021-10-01  8:36     ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Mathieu Poirier @ 2021-09-30 17:54 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
> We collect the trace from the TRBE on FILL event from IRQ context
> and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

> consolidate how we calculate the trace generated into a helper.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 63f7edd5fd1f..063c4505a203 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>  	return TRBE_FAULT_ACT_SPURIOUS;
>  }
>  
> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
> +					 struct trbe_buf *buf,
> +					 bool wrap)

Stacking

> +{
> +	u64 write;
> +	u64 start_off, end_off;
> +
> +	/*
> +	 * If the TRBE has wrapped around the write pointer has
> +	 * wrapped and should be treated as limit.
> +	 */
> +	if (wrap)
> +		write = get_trbe_limit_pointer();
> +	else
> +		write = get_trbe_write_pointer();
> +
> +	end_off = write - buf->trbe_base;

In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
acquired using get_trbe_base_pointer() but here it is referenced directly - any
reason for that?  It certainly makes reviewing this simple patch quite
difficult because I keep wondering if I am missing something subtle...  

> +	start_off = PERF_IDX2OFF(handle->head, buf);
> +
> +	if (WARN_ON_ONCE(end_off < start_off))
> +		return 0;
> +	return (end_off - start_off);
> +}
> +
>  static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>  				   struct perf_event *event, void **pages,
>  				   int nr_pages, bool snapshot)
> @@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  	struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
>  	struct trbe_buf *buf = config;
>  	enum trbe_fault_action act;
> -	unsigned long size, offset;
> -	unsigned long write, base, status;
> +	unsigned long size, status;
>  	unsigned long flags;
> +	bool wrap = false;
>  
>  	WARN_ON(buf->cpudata != cpudata);
>  	WARN_ON(cpudata->cpu != smp_processor_id());
> @@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  	 * handle gets freed in etm_event_stop().
>  	 */
>  	trbe_drain_and_disable_local();
> -	write = get_trbe_write_pointer();
> -	base = get_trbe_base_pointer();
>  
>  	/* Check if there is a pending interrupt and handle it here */
>  	status = read_sysreg_s(SYS_TRBSR_EL1);
> @@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  			goto done;
>  		}
>  
> -		/*
> -		 * Otherwise, the buffer is full and the write pointer
> -		 * has reached base. Adjust this back to the Limit pointer
> -		 * for correct size. Also, mark the buffer truncated.
> -		 */
> -		write = get_trbe_limit_pointer();
>  		perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION);
> +		wrap = true;
>  	}
>  
> -	offset = write - base;
> -	if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf)))
> -		size = 0;
> -	else
> -		size = offset - PERF_IDX2OFF(handle->head, buf);
> +	size = trbe_get_trace_size(handle, buf, wrap);
>  
>  done:
>  	local_irq_restore(flags);
> @@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle)
>  {
>  	struct perf_event *event = handle->event;
>  	struct trbe_buf *buf = etm_perf_sink_config(handle);
> -	unsigned long offset, size;
> +	unsigned long size;
>  	struct etm_event_data *event_data;
>  
> -	offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> -	size = offset - PERF_IDX2OFF(handle->head, buf);
> +	size = trbe_get_trace_size(handle, buf, true);
>  	if (buf->snapshot)
>  		handle->head += size;
>  
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data
  2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
  2021-09-22  5:41   ` Anshuman Khandual
@ 2021-09-30 17:57   ` Mathieu Poirier
  1 sibling, 0 replies; 62+ messages in thread
From: Mathieu Poirier @ 2021-09-30 17:57 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:05PM +0100, Suzuki K Poulose wrote:
> The TRBE driver wrongly treats the aux private data as the TRBE driver
> specific buffer for a given perf handle, while it is the ETM PMU's
> event specific data. Fix this by correcting the instance to use
> appropriate helper.
> 
> Fixes: 3fbf7f011f242 ("coresight: sink: Add TRBE driver")
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index d4c57aed05e5..e3d73751d568 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -363,7 +363,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
>  
>  static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>  {
> -	struct trbe_buf *buf = perf_get_aux(handle);
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);

I really wonder how things got to work before...

I have fixed the 13-character SHA in the "Fixes" tag and added this patch to my
local tree.  More comments tomorrow.

Thanks,
Mathieu

>  	u64 limit = __trbe_normal_offset(handle);
>  	u64 head = PERF_IDX2OFF(handle->head, buf);
>  
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-09-28 10:40     ` Suzuki K Poulose
@ 2021-10-01  4:21       ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-10-01  4:21 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/28/21 4:10 PM, Suzuki K Poulose wrote:
> On 23/09/2021 07:13, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>>> an erratum, which when triggered, might cause the TRBE to overwrite
>>> the trace data already collected in FILL mode, in the event of a WRAP.
>>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>>> and could write upto 3 cache line size worth trace. Thus, this could
>>> corrupt the trace at the "BASE" pointer.
>>>
>>> The workaround is to program the write pointer 256bytes from the
>>
>> 3 cache lines = 256 bytes on all implementation which might have TRBE ?
>> OR this skid bytes should be derived from the platform cache line size
>> instead.
> 
> 256bytes is the aligned (to the power of 2) value for the safe guard.
> Not 3 cache lines. Ideally, if there is another CPU that has larger
> cache line size, affected by the erratum, yes, we must do that.
> But for now this is sufficient.

Okay.

> 
>>
>>> base, such that if the erratum is triggered, it doesn't overwrite
>>> the trace data that was captured. This skipped region could be
>>> padded with ignore packets at the end of the session, so that
>>> the decoder sees a continuous buffer with some padding at the
>>> beginning. The trace data written at the base is considered
>>> lost as the limit could have been in the middle of the perf
>>> ring buffer, and jumping to the "base" is not acceptable.
>>> We set the flags already to indicate that some amount of trace
>>> was lost during the FILL event IRQ. So this is fine.
>>
>> Via PERF_AUX_FLAG_TRUNCATED ? Should be specified here to be clear.
> 
> Please note that setting the flag is not a side effect of the
> work around. And as such I don't think this needs to be mentioned
> here. e.g, we changed this to COLLISION recently for WRAP events.
> It makes sense to keep the details to the driver.

Okay.

> 
>>
>>>
> 
>>> One important change with the work around is, we program the
>>> TRBBASER_EL1 to current page where we are allowed to write.
>>> Otherwise, it could overwrite a region that may be consumed
>>> by the perf. Towards this, we always make sure that the
>>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>>> so that we can set the BASE to the PAGE base and move the
>>> TRBPTR to the 256bytes offset.
>>>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>> Change since v1:
>>>   - Updated comment with ASCII art
>>>   - Add _BYTES suffix for the space to skip for the work around.
>>> ---
>>>   drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>>>   1 file changed, 132 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index f569010c672b..983dd5039e52 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -16,6 +16,7 @@
>>>   #define pr_fmt(fmt) DRVNAME ": " fmt
>>>     #include <asm/barrier.h>
>>> +#include <asm/cpufeature.h>
>>>   #include <asm/cputype.h>
>>>     #include "coresight-self-hosted-trace.h"
>>> @@ -84,9 +85,17 @@ struct trbe_buf {
>>>    * per TRBE instance, we keep track of the list of errata that
>>>    * affects the given instance of the TRBE.
>>>    */
>>> -#define TRBE_ERRATA_MAX            0
>>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE    0
>>> +#define TRBE_ERRATA_MAX                1
>>> +
>>> +/*
>>> + * Safe limit for the number of bytes that may be overwritten
>>> + * when the erratum is triggered.
>>> + */
>>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES    256
>>
>> As mentioned earlier, does it depend on the platform cache line size ?
>> Otherwise if the skip bytes is something platform independent, should
>> be mentioned here in a comment.
> 
> I could add in a comment.
> 
>>
>>>     static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>>> +    [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>>>   };
>>>     /*
>>> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>>>       set_trbe_limit_pointer_enabled(buf->trbe_limit);
>>>   }
>>>   -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
>>> +                         u64 trbsr)
>>>   {
>>>       int ec = get_trbe_ec(trbsr);
>>>       int bsc = get_trbe_bsc(trbsr);
>>> +    struct trbe_buf *buf = etm_perf_sink_config(handle);
>>> +    struct trbe_cpudata *cpudata = buf->cpudata;
>>
>> Passing down the perf handle to derive trbe_cpudata seems to be right.
>>
>>>         WARN_ON(is_trbe_running(trbsr));
>>>       if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>>> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>>       if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>>>           return TRBE_FAULT_ACT_FATAL;
>>>   -    if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>>> -        if (get_trbe_write_pointer() == get_trbe_base_pointer())
>>> -            return TRBE_FAULT_ACT_WRAP;
>>> -    }
>>> +    /*
>>> +     * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>>> +     * it might write data after a WRAP event in the fill mode.
>>> +     * Thus the check TRBPTR == TRBBASER will not be honored.
>>> +     */
>>
>> Needs bit formatting/alignment cleanup.
>>
>>> +    if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
>>> +        (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
>>> +         get_trbe_write_pointer() == get_trbe_base_pointer()))
>>> +        return TRBE_FAULT_ACT_WRAP;
>>> +
>>
>> Right, TRBE without the errata should continue to have the write
>> pointer = base pointer check. Could all TRBE errata checks like
>> the following be shortened (without the workaround index) for
>> better readability ? But not something very important.
>>
>> trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)
> 
> Do you mean something like :
> 
> trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ->
> trbe_may_overwrite_in_fill_mode(cpudata) ?

Right, something similar which absorbs the work around index in
its name itself.

> 
> 
>>
>>
>>>       return TRBE_FAULT_ACT_SPURIOUS;
>>>   }
>>>   @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>>   {
>>>       u64 write;
>>>       u64 start_off, end_off;
>>> +    u64 size;
>>> +    u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>>>         /*
>>>        * If the TRBE has wrapped around the write pointer has
>>> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>>         if (WARN_ON_ONCE(end_off < start_off))
>>>           return 0;
>>> -    return (end_off - start_off);
>>> +
>>> +    size = end_off - start_off;
>>> +    /*
>>> +     * If the TRBE is affected by the following erratum, we must fill
>>> +     * the space we skipped with IGNORE packets. And we are always
>>> +     * guaranteed to have at least a PAGE_SIZE space in the buffer.
>>> +     */
>>> +    if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
>>> +        !WARN_ON(size < overwrite_skip))
>>> +        __trbe_pad_buf(buf, start_off, overwrite_skip);
>>> +
>>> +    return size;
>>>   }
>>>     static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>>> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>>>           clr_trbe_irq();
>>>           isb();
>>>   -        act = trbe_get_fault_act(status);
>>> +        act = trbe_get_fault_act(handle, status);
>>>           /*
>>>            * If this was not due to a WRAP event, we have some
>>>            * errors and as such buffer is empty.
>>> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>>>       return size;
>>>   }
>>>   +
>>> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>>> +{
>>> +    /*
>>> +     * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
>>
>> few cache lines = 3 cache lines ?
> 
> Yes, upto 3.
> 
>>
>>> +     * line size from the "TRBBASER_EL1" in the event of a "FILL".
>>> +     * Thus, we could loose some amount of the trace at the base.
>>> +     *
>>> +     * Before Fix:
>>> +     *
>>> +     *  normal-BASE     head  normal-PTR              tail normal-LIMIT
>>> +     *  |                   \/                       /
>>> +     *   -------------------------------------------------------------
>>> +     *  |         |          |xyzdefghij..|...  tuvw|                |
>>> +     *   -------------------------------------------------------------
>>> +     *                      /    |                   \
>>> +     * After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
>>> +     *
>>> +     * In the normal course of action, we would set the TRBBASER to the
>>> +     * beginning of the ring-buffer (normal-BASE). But with the erratum,
>>> +     * the TRBE could overwrite the contents at the "normal-BASE", after
>>> +     * hitting the "normal-LIMIT", since it doesn't stop as expected. And
>>> +     * this is wrong. So we must always make sure that the TRBBASER is
>>> +     * within the region [head, head+size].
>>> +     *
>>> +     * Also, we would set the TRBPTR to head (after adjusting for
>>> +     * alignment) at normal-PTR. This would mean that the last few bytes
>>> +     * of the trace (say, "xyz") might overwrite the first few bytes of
>>> +     * trace written ("abc"). More importantly they will appear in what\
>>> +     * userspace sees as the beginning of the trace, which is wrong. We may
>>> +     * not always have space to move the latest trace "xyz" to the correct
>>> +     * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
>>> +     * Thus it is easier to ignore those bytes than to complicate the
>>> +     * driver to move it, assuming that the erratum was triggered and doing
>>> +     * additional checks to see if there is indeed allowed space at
>>> +     * TRBLIMITR.LIMIT.
>>> +     *
>>> +     * To summarize, with the work around:
>>> +     *
>>> +     *  - We always align the offset for the next session to PAGE_SIZE
>>> +     *    (This is to ensure we can program the TRBBASER to this offset
>>> +     *    within the region [head...head+size]).
>>> +     *
>>> +     *  - At TRBE enable:
>>> +     *     - Set the TRBBASER to the page aligned offset of the current
>>> +     *       proposed write offset. (which is guaranteed to be aligned
>>> +     *       as above)
>>> +     *     - Move the TRBPTR to skip first 256bytes (that might be
>>> +     *       overwritten with the erratum). This ensures that the trace
>>> +     *       generated in the session is not re-written.
>>> +     *
>>> +     *  - At trace collection:
>>> +     *     - Pad the 256bytes skipped above again with IGNORE packets.
>>> +     */
>>> +    if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
>>> +        if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
>>> +            return -EINVAL;
>>> +        buf->trbe_hw_base = buf->trbe_write;
>>> +        buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static int __arm_trbe_enable(struct trbe_buf *buf,
>>>                    struct perf_output_handle *handle)
>>>   {
>>> +    int ret = 0;
>>> +
>>>       perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>>>       buf->trbe_limit = compute_trbe_buffer_limit(handle);
>>>       buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>>>       if (buf->trbe_limit == buf->trbe_base) {
>>> -        trbe_stop_and_truncate_event(handle);
>>> -        return -ENOSPC;
>>> +        ret = -ENOSPC;
>>> +        goto err;
>>>       }
>>>       /* Set the base of the TRBE to the buffer base */
>>>       buf->trbe_hw_base = buf->trbe_base;
>>> +
>>> +    ret = trbe_apply_work_around_before_enable(buf);
>>> +    if (ret)
>>> +        goto err;
>>> +
>>>       *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>>>       trbe_enable_hw(buf);
>>>       return 0;
>>> +err:
>>> +    trbe_stop_and_truncate_event(handle);
>>> +    return ret;
>>>   }
>>>     static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>>> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>>>       if (!is_perf_trbe(handle))
>>>           return IRQ_NONE;
>>>   -    act = trbe_get_fault_act(status);
>>> +    act = trbe_get_fault_act(handle, status);
>>>       switch (act) {
>>>       case TRBE_FAULT_ACT_WRAP:
>>>           truncated = !!trbe_handle_overflow(handle);
>>> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>>>       }
>>>         trbe_check_errata(cpudata);
>>> -    cpudata->trbe_align = cpudata->trbe_hw_align;
>>> +    /*
>>> +     * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>>> +     * we must always program the TBRPTR_EL1, 256bytes from a page
>>> +     * boundary, with TRBBASER_EL1 set to the page, to prevent
>>> +     * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
>>> +     *
>>> +     * Thus make sure we always align our write pointer to a PAGE_SIZE,
>>> +     * which also guarantees that we have at least a PAGE_SIZE space in
>>> +     * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
>>> +     * the required bytes at the base.
>>> +     */
>>> +    if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
>>> +        cpudata->trbe_align = PAGE_SIZE;
>>> +    else
>>> +        cpudata->trbe_align = cpudata->trbe_hw_align;
>>> +
>>
>> But like trbe_apply_work_around_before_enable(), trbe_align assignment
>> should also be wrapped inside a new helper which should contain these
>> comments and conditional block. Because it makes sense to have errata
>> work arounds in the leaf level helper functions, rather than TRBE core
>> operations.
> 
> That would imply we re-initialize the trbe_align in the new helper after
> setting the value here for all other unaffected TRBEs. I would rather
> leave it as it is, until we have more work arounds that touch this area.
> This is one of code called per TRBE instance.

Okay.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode
  2021-09-22  8:11     ` Suzuki K Poulose
@ 2021-10-01  4:35       ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-10-01  4:35 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/22/21 1:41 PM, Suzuki K Poulose wrote:
> On 22/09/2021 08:23, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> Now that we have the work around implmented in the TRBE
>>> driver, add the Kconfig entries and document the errata.
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   Documentation/arm64/silicon-errata.rst |  4 +++
>>>   arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
>>>   2 files changed, 43 insertions(+)
>>>
>>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
>>> index d410a47ffa57..2f99229d993c 100644
>>> --- a/Documentation/arm64/silicon-errata.rst
>>> +++ b/Documentation/arm64/silicon-errata.rst
>>> @@ -92,12 +92,16 @@ stable kernels.
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> +| ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>>> ++----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1349291        | N/A                         |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1542419        | ARM64_ERRATUM_1542419       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> +| ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
>>> ++----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | MMU-500         | #841119,826419  | N/A                         |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 077f2ec4eeb2..eac4030322df 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>>>           If unsure, say Y.
>>>   +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    bool
>>> +
>>> +config ARM64_ERRATUM_2119858
>>> +    bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>>> +    default y
>>> +    depends on CORESIGHT_TRBE
>>> +    select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    help
>>> +      This option adds the workaround for ARM Cortex-A710 erratum 2119858.
>>> +
>>> +      Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
>>> +      data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>>> +      the event of a WRAP event.
>>> +
>>> +      Work around the issue by always making sure we move the TRBPTR_EL1 by
>>> +      256bytes before enabling the buffer and filling the first 256bytes of
>>> +      the buffer with ETM ignore packets upon disabling.
>>> +
>>> +      If unsure, say Y.
>>> +
>>> +config ARM64_ERRATUM_2139208
>>> +    bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
>>> +    default y
>>> +    depends on CORESIGHT_TRBE
>>> +    select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    help
>>> +      This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
>>> +
>>> +      Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
>>> +      data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>>
>> s/ponited/pointed
>>
>>> +      the event of a WRAP event.
>>> +
>>> +      Work around the issue by always making sure we move the TRBPTR_EL1 by
>>> +      256bytes before enabling the buffer and filling the first 256bytes of
>>> +      the buffer with ETM ignore packets upon disabling.
>>> +
>>> +      If unsure, say Y.
>>> +
>>>   config CAVIUM_ERRATUM_22375
>>>       bool "Cavium erratum 22375, 24313"
>>>       default y
>>>
>>
>> The real errata problem description for both these erratums are exactly
>> the same. Rather a more generalized description should be included for
>> the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
>> problem and a corresponding solution that is implemented in the driver.
>> This should also help us reduce current redundancy.
>>
> 
> The issue is what a user wants to see. A user who wants to configure the
> kernel specifically for a given CPU (think embedded systems), they would
> want to hand pick the errata for the particular CPU. So, moving the help
> text to an implicitly selected Kconfig symbol. I would rather keep this
> as it is to keep it user friendly. This doesn't affect the code size
> anyways.

Understood.

> 
> The other option is to remove all the CPU specific Kconfig symbols and
> update the "title" to reflect both the CPU/erratum numbers.

Hmm, but I guess the current proposal is better instead.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures
  2021-09-22 12:03     ` Suzuki K Poulose
@ 2021-10-01  4:38       ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-10-01  4:38 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/22/21 5:33 PM, Suzuki K Poulose wrote:
> Hi Anshuman
> 
> On 22/09/2021 08:39, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
>>> from errata, where a TSB (trace synchronization barrier)
>>> fails to flush the trace data completely, when executed from
>>> a trace prohibited region. In Linux we always execute it
>>> after we have moved the PE to trace prohibited region. So,
>>> we can apply the workaround everytime a TSB is executed.
>>
>> s/everytime/every time
> 
> Ack
> 
>>
>>>
>>> The work around is to issue two TSB consecutively.
>>>
>>> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
>>> that a late CPU could be blocked from booting if it is the
>>> first CPU that requires the workaround. This is because we
>>> do not allow setting a cpu_hwcaps after the SMP boot. The
>>> other alternative is to use "this_cpu_has_cap()" instead
>>> of the faster system wide check, which may be a bit of an
>>> overhead, given we may have to do this in nvhe KVM host
>>> before a guest entry.
>>>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
> 
> ...
> 
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index eac4030322df..0764774e12bb 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>>>           If unsure, say Y.
>>>   +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>>> +    bool
>>> +
>>> +config ARM64_ERRATUM_2054223
>>> +    bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
>>> +    default y
>>> +    help
>>> +      Enable workaround for ARM Cortex-A710 erratum 2054223
>>> +
>>> +      Affected cores may fail to flush the trace data on a TSB instruction, when
>>> +      the PE is in trace prohibited state. This will cause losing a few bytes
>>> +      of the trace cached.
>>> +
>>> +      Workaround is to issue two TSB consecutively on affected cores.
>>> +
>>> +      If unsure, say Y.
>>> +
>>> +config ARM64_ERRATUM_2067961
>>> +    bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
>>> +    default y
>>> +    help
>>> +      Enable workaround for ARM Neoverse-N2 erratum 2067961
>>> +
>>> +      Affected cores may fail to flush the trace data on a TSB instruction, when
>>> +      the PE is in trace prohibited state. This will cause losing a few bytes
>>> +      of the trace cached.
>>> +
>>> +      Workaround is to issue two TSB consecutively on affected cores.
>>
>> Like I had mentioned in the previous patch, these descriptions here could
>> be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.
> 
> Please see my response there.
> 
>>
>>> +
>>> +      If unsure, say Y.
>>> +
>>>   config CAVIUM_ERRATUM_22375
>>>       bool "Cavium erratum 22375, 24313"
>>>       default y
>>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>>> index 451e11e5fd23..1c5a00598458 100644
>>> --- a/arch/arm64/include/asm/barrier.h
>>> +++ b/arch/arm64/include/asm/barrier.h
>>> @@ -23,7 +23,7 @@
>>>   #define dsb(opt)    asm volatile("dsb " #opt : : : "memory")
>>>     #define psb_csync()    asm volatile("hint #17" : : : "memory")
>>> -#define tsb_csync()    asm volatile("hint #18" : : : "memory")
>>> +#define __tsb_csync()    asm volatile("hint #18" : : : "memory")
>>>   #define csdb()        asm volatile("hint #20" : : : "memory")
>>>     #ifdef CONFIG_ARM64_PSEUDO_NMI
>>> @@ -46,6 +46,20 @@
>>>   #define dma_rmb()    dmb(oshld)
>>>   #define dma_wmb()    dmb(oshst)
>>>   +
>>> +#define tsb_csync()                                \
>>> +    do {                                    \
>>> +        /*                                \
>>> +         * CPUs affected by Arm Erratum 2054223 or 2067961 needs    \
>>> +         * another TSB to ensure the trace is flushed. The barriers    \
>>> +         * don't have to be strictly back to back, as long as the    \
>>> +         * CPU is in trace prohibited state.                \
>>> +         */                                \
>>> +        if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE))    \
>>> +            __tsb_csync();                        \
>>> +        __tsb_csync();                            \
>>> +    } while (0)
>>> +
>>>   /*
>>>    * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
>>>    * and 0 otherwise.
>>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>>> index ccd757373f36..bdbeac75ead6 100644
>>> --- a/arch/arm64/kernel/cpu_errata.c
>>> +++ b/arch/arm64/kernel/cpu_errata.c
>>> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
>>>   };
>>>   #endif    /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>>>   +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>>> +static const struct midr_range tsb_flush_fail_cpus[] = {
>>> +#ifdef CONFIG_ARM64_ERRATUM_2067961
>>> +    MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
>>> +#endif
>>> +#ifdef CONFIG_ARM64_ERRATUM_2054223
>>> +    MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
>>> +#endif
>>> +    {},
>>> +};
>>> +#endif    /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>>> +
>>>   const struct arm64_cpu_capabilities arm64_errata[] = {
>>>   #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>>>       {
>>> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>>>           .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>>>           CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
>>>       },
>>> +#endif
>>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
>>> +    {
>>> +        .desc = "ARM erratum 2067961 or 2054223",
>>> +        .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
>>> +        ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
>>> +    },
>>>   #endif
>>>       {
>>>       }
>>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>>> index 1ccb92165bd8..2102e15af43d 100644
>>> --- a/arch/arm64/tools/cpucaps
>>> +++ b/arch/arm64/tools/cpucaps
>>> @@ -54,6 +54,7 @@ WORKAROUND_1463225
>>>   WORKAROUND_1508412
>>>   WORKAROUND_1542419
>>>   WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +WORKAROUND_TSB_FLUSH_FAILURE
>>>   WORKAROUND_CAVIUM_23154
>>>   WORKAROUND_CAVIUM_27456
>>>   WORKAROUND_CAVIUM_30115
>>>
>>
>> This adds all the required bits of these erratas in a single patch,
>> where as the previous work around had split all the required pieces
>> into multiple patches. Could we instead follow the same standard in
>> both the places ?
> 
> We could do this for this particular erratum as the work around is
> within the arm64 kernel code, unlike the other ones - where the TRBE
> driver needs a change.
> 
> So, there is a kind of dependency for the other two, which we don't
> in this particular case.
> 
> i.e, TRBE driver needs a cpucap number to implement the work around ->
> The arm64 kernel must define one, which we cant advertise yet until
> we have a TRBE work around.
> 
> Thus, they follow a 3 step model.
> 
>  - Define CPUCAP erratum
>  - TRBE driver work around
>  - Finally advertise to the user.
> 
> I don't think this one needs that.

Okay, understood.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space
  2021-09-22 10:16     ` Suzuki K Poulose
@ 2021-10-01  4:40       ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-10-01  4:40 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/22/21 3:46 PM, Suzuki K Poulose wrote:
> On 22/09/2021 10:58, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> The TRBE driver makes sure that there is enough space for a meaningful
>>> run, otherwise pads the given space and restarts the offset calculation
>>> once. But there is no guarantee that we may find space or hit "no space".
>>
>> So what happens currently when it neither finds the required minimum buffer
>> space for a meaningful run nor does it hit the "no space" scenario ?
> 
> It tries once today and assumes that it will either hit :
> 
>  - No space
>    OR
>  - Enough space
> 
> which is reasonable, given the minimum space needed is a few bytes.
> But this may no longer be true with other erratum workaround.

Okay.

> 
>>
>>> Make sure that we repeat the step until, either :
>>>    - We have the minimum space
>>>     OR
>>>    - There is NO space at all.
>>>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index 3373f4e2183b..02f9e00e2091 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>>>        * If the head is too close to the limit and we don't
>>>        * have space for a meaningful run, we rather pad it
>>>        * and start fresh.
>>> +     *
>>> +     * We might have to do this more than once to make sure
>>> +     * we have enough required space.
>>
>> OR no space at all, as explained in the commit message.
>> Hence this comment needs an update.
>>
>>>        */
>>> -    if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>>> +    while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>>>           trbe_pad_buf(handle, limit - head);
>>>           limit = __trbe_normal_offset(handle);
>>> +        head = PERF_IDX2OFF(handle->head, buf);
>>
>> Should the loop be bound with a retry limit as well ?
> 
> No. We will eventually hit No-space as we keep on padding
> the buffer.

Got it.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range
  2021-09-28 10:32     ` Suzuki K Poulose
@ 2021-10-01  4:56       ` Anshuman Khandual
  0 siblings, 0 replies; 62+ messages in thread
From: Anshuman Khandual @ 2021-10-01  4:56 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, maz, catalin.marinas, mark.rutland, james.morse,
	leo.yan, mike.leach, mathieu.poirier, will, lcherian, coresight



On 9/28/21 4:02 PM, Suzuki K Poulose wrote:
> On 23/09/2021 04:15, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>>
>>>    - A page used by the rest of the kernel/user (if the LIMIT = end of
>>>      perf ring buffer)
>>>    - A page within the ring buffer, but outside the driver's range.
>>>      [head, head + size]. This may contain some trace data, may be
>>>      consumed by the userspace.
>>>
>>> We workaround this erratum by :
>>>    - Making sure that there is at least an extra PAGE space left in the
>>>      TRBE's range than we normally assign. This will be additional to other
>>>      restrictions (e.g, the TRBE alignment for working around
>>>      TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>>>      Thus we would have 2 * PAGE_SIZE)
>>>
>>>    - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>>>      range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>>
>>>          TRBLIMITR.LIMIT -= PAGE_SIZE
>>>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>> Cc: Mike Leach <mike.leach@linaro.org>
>>> Cc: Leo Yan <leo.yan@linaro.org>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>   drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>>>   1 file changed, 57 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index 02f9e00e2091..ea907345354c 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -86,7 +86,8 @@ struct trbe_buf {
>>>    * affects the given instance of the TRBE.
>>>    */
>>>   #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE    0
>>> -#define TRBE_ERRATA_MAX                1
>>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE    1
>>> +#define TRBE_ERRATA_MAX                2
>>>     /*
>>>    * Safe limit for the number of bytes that may be overwritten
>>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>>     static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>>>       [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>>> +    [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>>>   };
>>>     /*
>>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>>     static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>>>   {
>>> -    return TRBE_TRACE_MIN_BUF_SIZE;
>>> +    u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>>> +    struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>>> +
>>> +    /*
>>> +     * When the TRBE is affected by an erratum that could make it
>>> +     * write to the next "virtually addressed" page beyond the LIMIT.
>>
>> What if the next "virtually addressed" page is just blocked from future
>> usage in the kernel and never really gets mapped into a physical page ?
> 
> That is the case today for vmap(), the end of the vm_area has a guard
> page. But that implies when the erratum is triggered, the TRBE
> encounters a fault and we need to handle that in the driver. This works
> for "end" of the ring buffer. But not when the LIMIT is in the middle
> of the ring buffer.
> 
>> In that case it would be guaranteed that, a next "virtually addressed"
>> page would not even exist after the LIMIT pointer and hence the errata
>> would not be triggered. Something like there is a virtual mapping cliff
>> right after the LIMIT pointer from the MMU perspective.
>>
>> Although it might be bit tricky. Currently the entire ring buffer gets
>> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
>> the above solution, each computation of the LIMIT pointer needs to be
>> followed by a temporary unmapping of next virtual page from existing
>> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
>> always contains all the physical pages from the perf ring buffer.
> 
> It is much easier to leave a page aside than to do this map, unmap
> dance, which might even change the VA address you get and thus it
> complicates the TRBE driver in general. I believe this is much
> simpler and we can reason about the code better. And all faults
> are still illegal for the driver, which helps us to detect any
> other issues in the TRBE.

Agreed, as I had mentioned earlier this would have been anyways bit
complicated. Not changing the virtual address for the entire buffer
and to treat each fault inside the driver as illegal, makes current
implementation much simpler and easier to reason about. So probably
discarding those properties might not be a good idea after all.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated
  2021-09-30 17:54   ` Mathieu Poirier
@ 2021-10-01  8:36     ` Suzuki K Poulose
  2021-10-01 15:15       ` Mathieu Poirier
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-01  8:36 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On 30/09/2021 18:54, Mathieu Poirier wrote:
> Hi Suzuki,
> 
> On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
>> We collect the trace from the TRBE on FILL event from IRQ context
>> and when via update_buffer(), when the event is stopped. Let us
> 
> s/"and when via"/"and via"
> 
>> consolidate how we calculate the trace generated into a helper.
>>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
>>   1 file changed, 30 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 63f7edd5fd1f..063c4505a203 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>   	return TRBE_FAULT_ACT_SPURIOUS;
>>   }
>>   
>> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>> +					 struct trbe_buf *buf,
>> +					 bool wrap)
> 
> Stacking
> 

Ack

>> +{
>> +	u64 write;
>> +	u64 start_off, end_off;
>> +
>> +	/*
>> +	 * If the TRBE has wrapped around the write pointer has
>> +	 * wrapped and should be treated as limit.
>> +	 */
>> +	if (wrap)
>> +		write = get_trbe_limit_pointer();
>> +	else
>> +		write = get_trbe_write_pointer();
>> +
>> +	end_off = write - buf->trbe_base;
> 
> In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
> acquired using get_trbe_base_pointer() but here it is referenced directly - any
> reason for that?  It certainly makes reviewing this simple patch quite
> difficult because I keep wondering if I am missing something subtle...

Very good observation. So far, we always prgrammed the TRBBASER with the
the VA(ring_buffer[0]). And thus reading the BASER and using the 
buf->trbe_base is all fine.

But going forward, we are going to use different values for the TRBBASER
to work around erratum. Thus to make the computation of the "offsets"
within the ring buffer, it is always correct to use this field. I could
move this to the patch where the work around is introduced, and put in
a comment there.

Thanks for the review

Suzuki


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated
  2021-10-01  8:36     ` Suzuki K Poulose
@ 2021-10-01 15:15       ` Mathieu Poirier
  2021-10-01 15:22         ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-01 15:15 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:
> On 30/09/2021 18:54, Mathieu Poirier wrote:
> > Hi Suzuki,
> > 
> > On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
> > > We collect the trace from the TRBE on FILL event from IRQ context
> > > and when via update_buffer(), when the event is stopped. Let us
> > 
> > s/"and when via"/"and via"
> > 
> > > consolidate how we calculate the trace generated into a helper.
> > > 
> > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > Cc: Mike Leach <mike.leach@linaro.org>
> > > Cc: Leo Yan <leo.yan@linaro.org>
> > > Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > >   drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
> > >   1 file changed, 30 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > > index 63f7edd5fd1f..063c4505a203 100644
> > > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > > @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> > >   	return TRBE_FAULT_ACT_SPURIOUS;
> > >   }
> > > +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
> > > +					 struct trbe_buf *buf,
> > > +					 bool wrap)
> > 
> > Stacking
> > 
> 
> Ack
> 
> > > +{
> > > +	u64 write;
> > > +	u64 start_off, end_off;
> > > +
> > > +	/*
> > > +	 * If the TRBE has wrapped around the write pointer has
> > > +	 * wrapped and should be treated as limit.
> > > +	 */
> > > +	if (wrap)
> > > +		write = get_trbe_limit_pointer();
> > > +	else
> > > +		write = get_trbe_write_pointer();
> > > +
> > > +	end_off = write - buf->trbe_base;
> > 
> > In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
> > acquired using get_trbe_base_pointer() but here it is referenced directly - any
> > reason for that?  It certainly makes reviewing this simple patch quite
> > difficult because I keep wondering if I am missing something subtle...
> 
> Very good observation. So far, we always prgrammed the TRBBASER with the
> the VA(ring_buffer[0]). And thus reading the BASER and using the
> buf->trbe_base is all fine.
> 
> But going forward, we are going to use different values for the TRBBASER
> to work around erratum. Thus to make the computation of the "offsets"
> within the ring buffer, it is always correct to use this field. I could
> move this to the patch where the work around is introduced, and put in
> a comment there.

That will be greatly appreciated.

> 
> Thanks for the review
> 
> Suzuki
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated
  2021-10-01 15:15       ` Mathieu Poirier
@ 2021-10-01 15:22         ` Suzuki K Poulose
  0 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-01 15:22 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On 01/10/2021 16:15, Mathieu Poirier wrote:
> On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:
>> On 30/09/2021 18:54, Mathieu Poirier wrote:
>>> Hi Suzuki,
>>>
>>> On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
>>>> We collect the trace from the TRBE on FILL event from IRQ context
>>>> and when via update_buffer(), when the event is stopped. Let us
>>>
>>> s/"and when via"/"and via"
>>>
>>>> consolidate how we calculate the trace generated into a helper.
>>>>
>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>> Cc: Mike Leach <mike.leach@linaro.org>
>>>> Cc: Leo Yan <leo.yan@linaro.org>
>>>> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>    drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
>>>>    1 file changed, 30 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> index 63f7edd5fd1f..063c4505a203 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>>>    	return TRBE_FAULT_ACT_SPURIOUS;
>>>>    }
>>>> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>>> +					 struct trbe_buf *buf,
>>>> +					 bool wrap)
>>>
>>> Stacking
>>>
>>
>> Ack
>>
>>>> +{
>>>> +	u64 write;
>>>> +	u64 start_off, end_off;
>>>> +
>>>> +	/*
>>>> +	 * If the TRBE has wrapped around the write pointer has
>>>> +	 * wrapped and should be treated as limit.
>>>> +	 */
>>>> +	if (wrap)
>>>> +		write = get_trbe_limit_pointer();
>>>> +	else
>>>> +		write = get_trbe_write_pointer();
>>>> +
>>>> +	end_off = write - buf->trbe_base;
>>>
>>> In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
>>> acquired using get_trbe_base_pointer() but here it is referenced directly - any
>>> reason for that?  It certainly makes reviewing this simple patch quite
>>> difficult because I keep wondering if I am missing something subtle...
>>
>> Very good observation. So far, we always prgrammed the TRBBASER with the
>> the VA(ring_buffer[0]). And thus reading the BASER and using the
>> buf->trbe_base is all fine.
>>
>> But going forward, we are going to use different values for the TRBBASER
>> to work around erratum. Thus to make the computation of the "offsets"
>> within the ring buffer, it is always correct to use this field. I could
>> move this to the patch where the work around is introduced, and put in
>> a comment there.
> 
> That will be greatly appreciated.

I have moved this to the patch, which introduces the concept of "TRBE 
using" a different BASE address than the beginning of the ring buffer.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-09-21 13:41 ` [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata " Suzuki K Poulose
  2021-09-23  6:13   ` Anshuman Khandual
@ 2021-10-01 17:15   ` Mathieu Poirier
  2021-10-04  8:46     ` Suzuki K Poulose
  1 sibling, 1 reply; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-01 17:15 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:
> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
> an erratum, which when triggered, might cause the TRBE to overwrite
> the trace data already collected in FILL mode, in the event of a WRAP.
> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
> and could write upto 3 cache line size worth trace. Thus, this could
> corrupt the trace at the "BASE" pointer.
> 
> The workaround is to program the write pointer 256bytes from the
> base, such that if the erratum is triggered, it doesn't overwrite
> the trace data that was captured. This skipped region could be
> padded with ignore packets at the end of the session, so that
> the decoder sees a continuous buffer with some padding at the
> beginning. The trace data written at the base is considered
> lost as the limit could have been in the middle of the perf
> ring buffer, and jumping to the "base" is not acceptable.
> We set the flags already to indicate that some amount of trace
> was lost during the FILL event IRQ. So this is fine.
> 
> One important change with the work around is, we program the
> TRBBASER_EL1 to current page where we are allowed to write.
> Otherwise, it could overwrite a region that may be consumed
> by the perf. Towards this, we always make sure that the
> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
> so that we can set the BASE to the PAGE base and move the
> TRBPTR to the 256bytes offset.
> 
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Change since v1:
>  - Updated comment with ASCII art
>  - Add _BYTES suffix for the space to skip for the work around.
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>  1 file changed, 132 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index f569010c672b..983dd5039e52 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -16,6 +16,7 @@
>  #define pr_fmt(fmt) DRVNAME ": " fmt
>  
>  #include <asm/barrier.h>
> +#include <asm/cpufeature.h>
>  #include <asm/cputype.h>
>  
>  #include "coresight-self-hosted-trace.h"
> @@ -84,9 +85,17 @@ struct trbe_buf {
>   * per TRBE instance, we keep track of the list of errata that
>   * affects the given instance of the TRBE.
>   */
> -#define TRBE_ERRATA_MAX			0
> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
> +#define TRBE_ERRATA_MAX				1
> +
> +/*
> + * Safe limit for the number of bytes that may be overwritten
> + * when the erratum is triggered.
> + */
> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256
>  
>  static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> +	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>  };
>  
>  /*
> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>  	set_trbe_limit_pointer_enabled(buf->trbe_limit);
>  }
>  
> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
> +						 u64 trbsr)
>  {
>  	int ec = get_trbe_ec(trbsr);
>  	int bsc = get_trbe_bsc(trbsr);
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> +	struct trbe_cpudata *cpudata = buf->cpudata;
>  
>  	WARN_ON(is_trbe_running(trbsr));
>  	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>  	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>  		return TRBE_FAULT_ACT_FATAL;
>  
> -	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
> -		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> -			return TRBE_FAULT_ACT_WRAP;
> -	}
> +	/*
> +	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> +	 * it might write data after a WRAP event in the fill mode.
> +	 * Thus the check TRBPTR == TRBBASER will not be honored.
> +	 */
> +	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
> +	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
> +	     get_trbe_write_pointer() == get_trbe_base_pointer()))
> +		return TRBE_FAULT_ACT_WRAP;
> +

I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a
TRBE the code will always run on the CPU it is associated with, and if
I'm correct here we could call this_cpu_has_cap() directly with the same
outcome.  I doubt that all divers using the cpucaps subsystem carry a shadow
structure to keep the same information. 

I have to stop here for today.  Although small in size this patchset demands a
fair amount of involvement - I will continue next week but I may not go through
the whole thing for this revision.

Thanks,
Mathieu

>  	return TRBE_FAULT_ACT_SPURIOUS;
>  }
>  
> @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>  {
>  	u64 write;
>  	u64 start_off, end_off;
> +	u64 size;
> +	u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>  
>  	/*
>  	 * If the TRBE has wrapped around the write pointer has
> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>  
>  	if (WARN_ON_ONCE(end_off < start_off))
>  		return 0;
> -	return (end_off - start_off);
> +
> +	size = end_off - start_off;
> +	/*
> +	 * If the TRBE is affected by the following erratum, we must fill
> +	 * the space we skipped with IGNORE packets. And we are always
> +	 * guaranteed to have at least a PAGE_SIZE space in the buffer.
> +	 */
> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
> +	    !WARN_ON(size < overwrite_skip))
> +		__trbe_pad_buf(buf, start_off, overwrite_skip);
> +
> +	return size;
>  }
>  
>  static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  		clr_trbe_irq();
>  		isb();
>  
> -		act = trbe_get_fault_act(status);
> +		act = trbe_get_fault_act(handle, status);
>  		/*
>  		 * If this was not due to a WRAP event, we have some
>  		 * errors and as such buffer is empty.
> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>  	return size;
>  }
>  
> +
> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
> +{
> +	/*
> +	 * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
> +	 * line size from the "TRBBASER_EL1" in the event of a "FILL".
> +	 * Thus, we could loose some amount of the trace at the base.
> +	 *
> +	 * Before Fix:
> +	 *
> +	 *  normal-BASE     head  normal-PTR              tail normal-LIMIT
> +	 *  |                   \/                       /
> +	 *   -------------------------------------------------------------
> +	 *  |         |          |xyzdefghij..|...  tuvw|                |
> +	 *   -------------------------------------------------------------
> +	 *                      /    |                   \
> +	 * After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
> +	 *
> +	 * In the normal course of action, we would set the TRBBASER to the
> +	 * beginning of the ring-buffer (normal-BASE). But with the erratum,
> +	 * the TRBE could overwrite the contents at the "normal-BASE", after
> +	 * hitting the "normal-LIMIT", since it doesn't stop as expected. And
> +	 * this is wrong. So we must always make sure that the TRBBASER is
> +	 * within the region [head, head+size].
> +	 *
> +	 * Also, we would set the TRBPTR to head (after adjusting for
> +	 * alignment) at normal-PTR. This would mean that the last few bytes
> +	 * of the trace (say, "xyz") might overwrite the first few bytes of
> +	 * trace written ("abc"). More importantly they will appear in what\
> +	 * userspace sees as the beginning of the trace, which is wrong. We may
> +	 * not always have space to move the latest trace "xyz" to the correct
> +	 * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
> +	 * Thus it is easier to ignore those bytes than to complicate the
> +	 * driver to move it, assuming that the erratum was triggered and doing
> +	 * additional checks to see if there is indeed allowed space at
> +	 * TRBLIMITR.LIMIT.
> +	 *
> +	 * To summarize, with the work around:
> +	 *
> +	 *  - We always align the offset for the next session to PAGE_SIZE
> +	 *    (This is to ensure we can program the TRBBASER to this offset
> +	 *    within the region [head...head+size]).
> +	 *
> +	 *  - At TRBE enable:
> +	 *     - Set the TRBBASER to the page aligned offset of the current
> +	 *       proposed write offset. (which is guaranteed to be aligned
> +	 *       as above)
> +	 *     - Move the TRBPTR to skip first 256bytes (that might be
> +	 *       overwritten with the erratum). This ensures that the trace
> +	 *       generated in the session is not re-written.
> +	 *
> +	 *  - At trace collection:
> +	 *     - Pad the 256bytes skipped above again with IGNORE packets.
> +	 */
> +	if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
> +		if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
> +			return -EINVAL;
> +		buf->trbe_hw_base = buf->trbe_write;
> +		buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
> +	}
> +
> +	return 0;
> +}
> +
>  static int __arm_trbe_enable(struct trbe_buf *buf,
>  			     struct perf_output_handle *handle)
>  {
> +	int ret = 0;
> +
>  	perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>  	buf->trbe_limit = compute_trbe_buffer_limit(handle);
>  	buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>  	if (buf->trbe_limit == buf->trbe_base) {
> -		trbe_stop_and_truncate_event(handle);
> -		return -ENOSPC;
> +		ret = -ENOSPC;
> +		goto err;
>  	}
>  	/* Set the base of the TRBE to the buffer base */
>  	buf->trbe_hw_base = buf->trbe_base;
> +
> +	ret = trbe_apply_work_around_before_enable(buf);
> +	if (ret)
> +		goto err;
> +
>  	*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>  	trbe_enable_hw(buf);
>  	return 0;
> +err:
> +	trbe_stop_and_truncate_event(handle);
> +	return ret;
>  }
>  
>  static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>  	if (!is_perf_trbe(handle))
>  		return IRQ_NONE;
>  
> -	act = trbe_get_fault_act(status);
> +	act = trbe_get_fault_act(handle, status);
>  	switch (act) {
>  	case TRBE_FAULT_ACT_WRAP:
>  		truncated = !!trbe_handle_overflow(handle);
> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>  	}
>  
>  	trbe_check_errata(cpudata);
> -	cpudata->trbe_align = cpudata->trbe_hw_align;
> +	/*
> +	 * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> +	 * we must always program the TBRPTR_EL1, 256bytes from a page
> +	 * boundary, with TRBBASER_EL1 set to the page, to prevent
> +	 * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
> +	 *
> +	 * Thus make sure we always align our write pointer to a PAGE_SIZE,
> +	 * which also guarantees that we have at least a PAGE_SIZE space in
> +	 * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
> +	 * the required bytes at the base.
> +	 */
> +	if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
> +		cpudata->trbe_align = PAGE_SIZE;
> +	else
> +		cpudata->trbe_align = cpudata->trbe_hw_align;
> +
>  	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
>  	cpudata->cpu = cpu;
>  	cpudata->drvdata = drvdata;
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-10-01 17:15   ` Mathieu Poirier
@ 2021-10-04  8:46     ` Suzuki K Poulose
  2021-10-04 16:47       ` Mathieu Poirier
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-04  8:46 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

Hi Mathieu

On 01/10/2021 18:15, Mathieu Poirier wrote:
> On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:
>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>> an erratum, which when triggered, might cause the TRBE to overwrite
>> the trace data already collected in FILL mode, in the event of a WRAP.
>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>> and could write upto 3 cache line size worth trace. Thus, this could
>> corrupt the trace at the "BASE" pointer.
>>
>> The workaround is to program the write pointer 256bytes from the
>> base, such that if the erratum is triggered, it doesn't overwrite
>> the trace data that was captured. This skipped region could be
>> padded with ignore packets at the end of the session, so that
>> the decoder sees a continuous buffer with some padding at the
>> beginning. The trace data written at the base is considered
>> lost as the limit could have been in the middle of the perf
>> ring buffer, and jumping to the "base" is not acceptable.
>> We set the flags already to indicate that some amount of trace
>> was lost during the FILL event IRQ. So this is fine.
>>
>> One important change with the work around is, we program the
>> TRBBASER_EL1 to current page where we are allowed to write.
>> Otherwise, it could overwrite a region that may be consumed
>> by the perf. Towards this, we always make sure that the
>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>> so that we can set the BASE to the PAGE base and move the
>> TRBPTR to the 256bytes offset.
>>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>> Change since v1:
>>   - Updated comment with ASCII art
>>   - Add _BYTES suffix for the space to skip for the work around.
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>>   1 file changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index f569010c672b..983dd5039e52 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,7 @@
>>   #define pr_fmt(fmt) DRVNAME ": " fmt
>>   
>>   #include <asm/barrier.h>
>> +#include <asm/cpufeature.h>
>>   #include <asm/cputype.h>
>>   
>>   #include "coresight-self-hosted-trace.h"
>> @@ -84,9 +85,17 @@ struct trbe_buf {
>>    * per TRBE instance, we keep track of the list of errata that
>>    * affects the given instance of the TRBE.
>>    */
>> -#define TRBE_ERRATA_MAX			0
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
>> +#define TRBE_ERRATA_MAX				1
>> +
>> +/*
>> + * Safe limit for the number of bytes that may be overwritten
>> + * when the erratum is triggered.
>> + */
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256
>>   
>>   static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> +	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>>   };
>>   
>>   /*
>> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>>   	set_trbe_limit_pointer_enabled(buf->trbe_limit);
>>   }
>>   
>> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
>> +						 u64 trbsr)
>>   {
>>   	int ec = get_trbe_ec(trbsr);
>>   	int bsc = get_trbe_bsc(trbsr);
>> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
>> +	struct trbe_cpudata *cpudata = buf->cpudata;
>>   
>>   	WARN_ON(is_trbe_running(trbsr));
>>   	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>   	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>>   		return TRBE_FAULT_ACT_FATAL;
>>   
>> -	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> -		if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> -			return TRBE_FAULT_ACT_WRAP;
>> -	}
>> +	/*
>> +	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> +	 * it might write data after a WRAP event in the fill mode.
>> +	 * Thus the check TRBPTR == TRBBASER will not be honored.
>> +	 */
>> +	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
>> +	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
>> +	     get_trbe_write_pointer() == get_trbe_base_pointer()))
>> +		return TRBE_FAULT_ACT_WRAP;
>> +
> 
> I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a
> TRBE the code will always run on the CPU it is associated with, and if
> I'm correct here we could call this_cpu_has_cap() directly with the same
> outcome.  I doubt that all divers using the cpucaps subsystem carry a shadow
> structure to keep the same information.

Very valid question. Of course, we can use the this_cpu_has_cap()
helper. Unlike the cpus_have_*_cap() - which gives you the system
wide status of the erratum - the cpucap doesn't keep a cache of which
CPUs are affected by a given erratum. Thus this_cpu_has_cap() would
involve running the detection on the current CPU everytime we call it.
i.e, scanning the MIDR of the CPU through the list of affected MIDRs
for the given erratum. This is a bit of overhead.

Given that we already have CPU specific information in trbe_cpudata, I
chose to cache the affected errata locally. This gives us quick access
to the erratum for individual TRBE instances. Of course this list is
initialised at TRBE probe and thus avoids us having to do the costly
check, each time we need it. I could make this clear in the patch
which introduces the framework.


Thanks for the review

Suzuki

> Thanks,
> Mathieu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  2021-10-04  8:46     ` Suzuki K Poulose
@ 2021-10-04 16:47       ` Mathieu Poirier
  0 siblings, 0 replies; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-04 16:47 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

Good morning,

On Mon, Oct 04, 2021 at 09:46:07AM +0100, Suzuki K Poulose wrote:
> Hi Mathieu
> 
> On 01/10/2021 18:15, Mathieu Poirier wrote:
> > On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:
> > > ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
> > > an erratum, which when triggered, might cause the TRBE to overwrite
> > > the trace data already collected in FILL mode, in the event of a WRAP.
> > > i.e, the TRBE doesn't stop writing the data, instead wraps to the base
> > > and could write upto 3 cache line size worth trace. Thus, this could
> > > corrupt the trace at the "BASE" pointer.
> > > 
> > > The workaround is to program the write pointer 256bytes from the
> > > base, such that if the erratum is triggered, it doesn't overwrite
> > > the trace data that was captured. This skipped region could be
> > > padded with ignore packets at the end of the session, so that
> > > the decoder sees a continuous buffer with some padding at the
> > > beginning. The trace data written at the base is considered
> > > lost as the limit could have been in the middle of the perf
> > > ring buffer, and jumping to the "base" is not acceptable.
> > > We set the flags already to indicate that some amount of trace
> > > was lost during the FILL event IRQ. So this is fine.
> > > 
> > > One important change with the work around is, we program the
> > > TRBBASER_EL1 to current page where we are allowed to write.
> > > Otherwise, it could overwrite a region that may be consumed
> > > by the perf. Towards this, we always make sure that the
> > > "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
> > > so that we can set the BASE to the PAGE base and move the
> > > TRBPTR to the 256bytes offset.
> > > 
> > > Cc: Mike Leach <mike.leach@linaro.org>
> > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > > Cc: Leo Yan <leo.yan@linaro.org>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > > Change since v1:
> > >   - Updated comment with ASCII art
> > >   - Add _BYTES suffix for the space to skip for the work around.
> > > ---
> > >   drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
> > >   1 file changed, 132 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > > index f569010c672b..983dd5039e52 100644
> > > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > > @@ -16,6 +16,7 @@
> > >   #define pr_fmt(fmt) DRVNAME ": " fmt
> > >   #include <asm/barrier.h>
> > > +#include <asm/cpufeature.h>
> > >   #include <asm/cputype.h>
> > >   #include "coresight-self-hosted-trace.h"
> > > @@ -84,9 +85,17 @@ struct trbe_buf {
> > >    * per TRBE instance, we keep track of the list of errata that
> > >    * affects the given instance of the TRBE.
> > >    */
> > > -#define TRBE_ERRATA_MAX			0
> > > +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE	0
> > > +#define TRBE_ERRATA_MAX				1
> > > +
> > > +/*
> > > + * Safe limit for the number of bytes that may be overwritten
> > > + * when the erratum is triggered.
> > > + */
> > > +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES	256
> > >   static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> > > +	[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
> > >   };
> > >   /*
> > > @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
> > >   	set_trbe_limit_pointer_enabled(buf->trbe_limit);
> > >   }
> > > -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> > > +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
> > > +						 u64 trbsr)
> > >   {
> > >   	int ec = get_trbe_ec(trbsr);
> > >   	int bsc = get_trbe_bsc(trbsr);
> > > +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> > > +	struct trbe_cpudata *cpudata = buf->cpudata;
> > >   	WARN_ON(is_trbe_running(trbsr));
> > >   	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> > > @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> > >   	if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
> > >   		return TRBE_FAULT_ACT_FATAL;
> > > -	if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
> > > -		if (get_trbe_write_pointer() == get_trbe_base_pointer())
> > > -			return TRBE_FAULT_ACT_WRAP;
> > > -	}
> > > +	/*
> > > +	 * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
> > > +	 * it might write data after a WRAP event in the fill mode.
> > > +	 * Thus the check TRBPTR == TRBBASER will not be honored.
> > > +	 */
> > > +	if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
> > > +	    (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
> > > +	     get_trbe_write_pointer() == get_trbe_base_pointer()))
> > > +		return TRBE_FAULT_ACT_WRAP;
> > > +
> > 
> > I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a
> > TRBE the code will always run on the CPU it is associated with, and if
> > I'm correct here we could call this_cpu_has_cap() directly with the same
> > outcome.  I doubt that all divers using the cpucaps subsystem carry a shadow
> > structure to keep the same information.
> 
> Very valid question. Of course, we can use the this_cpu_has_cap()
> helper. Unlike the cpus_have_*_cap() - which gives you the system
> wide status of the erratum - the cpucap doesn't keep a cache of which
> CPUs are affected by a given erratum. Thus this_cpu_has_cap() would
> involve running the detection on the current CPU everytime we call it.
> i.e, scanning the MIDR of the CPU through the list of affected MIDRs
> for the given erratum. This is a bit of overhead.

I've looked around in the kernel for other places where this_cpu_has_cap() is
used.  In most instance it is part of some initialisation code where actions are
taken based on the turn value of the function.  In our case we need to call this
regularly so yes, I agree with your design.

> 
> Given that we already have CPU specific information in trbe_cpudata, I
> chose to cache the affected errata locally. This gives us quick access
> to the erratum for individual TRBE instances. Of course this list is
> initialised at TRBE probe and thus avoids us having to do the costly
> check, each time we need it. I could make this clear in the patch
> which introduces the framework.

Yes please.

Thanks,
Mathieu

> 
> 
> Thanks for the review
> 
> Suzuki
> 
> > Thanks,
> > Mathieu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-09-21 13:41 ` [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle Suzuki K Poulose
  2021-09-22  7:59   ` Anshuman Khandual
@ 2021-10-04 17:42   ` Mathieu Poirier
  2021-10-05 22:35     ` Suzuki K Poulose
  1 sibling, 1 reply; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-04 17:42 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
> Add a helper to get the CPU specific data for TRBE instance, from
> a given perf handle. This also adds extra checks to make sure that
> the event associated with the handle is "bound" to the CPU and is
> active on the TRBE.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 983dd5039e52..797d978f9fa7 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>  	return buf->nr_pages * PAGE_SIZE;
>  }
>  
> +static inline struct trbe_cpudata *
> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
> +{
> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> +
> +	BUG_ON(!buf || !buf->cpudata);
> +	return buf->cpudata;
> +}
> +
>  /*
>   * TRBE Limit Calculation
>   *
> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>  {
>  	int ec = get_trbe_ec(trbsr);
>  	int bsc = get_trbe_bsc(trbsr);
> -	struct trbe_buf *buf = etm_perf_sink_config(handle);
> -	struct trbe_cpudata *cpudata = buf->cpudata;
> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present:  is_perf_trbe() and
__trbe_normal_offset().

I have to stop here for today.  More comments tomorrow.

Thanks,
Mathieu

>  
>  	WARN_ON(is_trbe_running(trbsr));
>  	if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling
  2021-09-21 13:41 ` [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling Suzuki K Poulose
  2021-09-22  6:47   ` Anshuman Khandual
@ 2021-10-05 16:46   ` Mathieu Poirier
  1 sibling, 0 replies; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-05 16:46 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:06PM +0100, Suzuki K Poulose wrote:
> Add a minimal infrastructure to keep track of the errata
> affecting the given TRBE instance. Given that we have
> heterogeneous CPUs, we have to manage the list per-TRBE
> instance to be able to apply the work around as needed.
> 
> We rely on the arm64 errata framework for the actual
> description and the discovery of a given erratum, to
> keep the Erratum work around at a central place and
> benefit from the code and the advertisement from the
> kernel. We use a local mapping of the erratum to
> avoid bloating up the individual TRBE structures.
> i.e, each arm64 TRBE erratum bit is assigned a new number
> within the driver to track. Each trbe instance updates
> the list of affected erratum at probe time on the CPU.
> This makes sure that we can easily access the list of
> errata on a given TRBE instance without much overhead.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> Changes since v1:
>   - Flip the order of args for trbe_has_erratum()
>   - Move erratum detection further down in the sequence
> ---
>  drivers/hwtracing/coresight/coresight-trbe.c | 49 ++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index e3d73751d568..63f7edd5fd1f 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -16,6 +16,8 @@
>  #define pr_fmt(fmt) DRVNAME ": " fmt
>  
>  #include <asm/barrier.h>
> +#include <asm/cputype.h>
> +
>  #include "coresight-self-hosted-trace.h"
>  #include "coresight-trbe.h"
>  
> @@ -65,6 +67,35 @@ struct trbe_buf {
>  	struct trbe_cpudata *cpudata;
>  };
>  
> +/*
> + * TRBE erratum list
> + *
> + * We rely on the corresponding cpucaps to be defined for a given
> + * TRBE erratum. We map the given cpucap into a TRBE internal number
> + * to make the tracking of the errata lean.
> + *
> + * This helps in :
> + *   - Not duplicating the detection logic
> + *   - Streamlined detection of erratum across the system
> + *
> + * Since the erratum work arounds could be applied individually
> + * per TRBE instance, we keep track of the list of errata that
> + * affects the given instance of the TRBE.
> + */
> +#define TRBE_ERRATA_MAX			0
> +
> +static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> +};
> +
> +/*
> + * struct trbe_cpudata: TRBE instance specific data
> + * @trbe_flag		- TRBE dirty/access flag support
> + * @tbre_align		- Actual TRBE alignment required for TRBPTR_EL1.
> + * @cpu			- CPU this TRBE belongs to.
> + * @mode		- Mode of current operation. (perf/disabled)
> + * @drvdata		- TRBE specific drvdata
> + * @errata		- Bit map for the errata on this TRBE.
> + */
>  struct trbe_cpudata {
>  	bool trbe_flag;
>  	u64 trbe_align;
> @@ -72,6 +103,7 @@ struct trbe_cpudata {
>  	enum cs_mode mode;
>  	struct trbe_buf *buf;
>  	struct trbe_drvdata *drvdata;
> +	DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);
>  };
>  
>  struct trbe_drvdata {
> @@ -84,6 +116,21 @@ struct trbe_drvdata {
>  	struct platform_device *pdev;
>  };
>  
> +static void trbe_check_errata(struct trbe_cpudata *cpudata)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
> +		if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
> +			set_bit(i, cpudata->errata);
> +	}
> +}
> +
> +static inline bool trbe_has_erratum(struct trbe_cpudata *cpudata, int i)
> +{
> +	return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);
> +}
> +
>  static int trbe_alloc_node(struct perf_event *event)
>  {
>  	if (event->cpu == -1)
> @@ -926,6 +973,8 @@ static void arm_trbe_probe_cpu(void *info)
>  		pr_err("Unsupported alignment on cpu %d\n", cpu);
>  		goto cpu_clear;
>  	}
> +
> +	trbe_check_errata(cpudata);
>  	cpudata->trbe_flag = get_trbe_flag_update(trbidr);
>  	cpudata->cpu = cpu;
>  	cpudata->drvdata = drvdata;
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (16 preceding siblings ...)
  2021-09-21 13:41 ` [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address Suzuki K Poulose
@ 2021-10-05 17:04 ` Mathieu Poirier
  2021-10-08  7:32 ` Will Deacon
  18 siblings, 0 replies; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-05 17:04 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> This series adds CPU erratum work arounds related to the self-hosted
> tracing. The list of affected errata handled in this series are :
> 
>  * TRBE may overwrite trace in FILL mode
>    - Arm Neoverse-N2	#2139208
>    - Cortex-A710	#211985
> 
>  * A TSB instruction may not flush the trace completely when executed
>    in trace prohibited region.
> 
>    - Arm Neoverse-N2	#2067961
>    - Cortex-A710	#2054223
> 
>  * TRBE may write to out-of-range address
>    - Arm Neoverse-N2	#2253138
>    - Cortex-A710	#2224489
> 
> The series applies on the self-hosted/trbe fixes posted here [0].
> A tree containing both the series is available here [1]
> 
>  [0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com
>  [1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2
> 
> Changes since v1:
>  https://lkml.kernel.org/r/20210728135217.591173-1-suzuki.poulose@arm.com
>  - Added a fix to the TRBE driver handling of sink_specific data
>  - Added more description and ASCII art for overwrite in FILL mode
>    work around 
>  - Added another TRBE erratum to the list.
>   "TRBE may write to out-of-range address"
>   Patches from 12-17
>  - Added comment to list the expectations around TSB erratum workaround.
> 
> 
> Suzuki K Poulose (17):
>   coresight: trbe: Fix incorrect access of the sink specific data
>   coresight: trbe: Add infrastructure for Errata handling
>   coresight: trbe: Add a helper to calculate the trace generated
>   coresight: trbe: Add a helper to pad a given buffer area
>   coresight: trbe: Decouple buffer base from the hardware base
>   coresight: trbe: Allow driver to choose a different alignment
>   arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
>   arm64: Add erratum detection for TRBE overwrite in FILL mode
>   coresight: trbe: Workaround TRBE errata overwrite in FILL mode
>   arm64: Enable workaround for TRBE overwrite in FILL mode
>   arm64: errata: Add workaround for TSB flush failures
>   coresight: trbe: Add a helper to fetch cpudata from perf handle
>   coresight: trbe: Add a helper to determine the minimum buffer size
>   coresight: trbe: Make sure we have enough space
>   arm64: Add erratum detection for TRBE write to out-of-range
>   coresight: trbe: Work around write to out of range
>   arm64: Advertise TRBE erratum workaround for write to out-of-range address
> 
>  Documentation/arm64/silicon-errata.rst       |  12 +
>  arch/arm64/Kconfig                           | 109 ++++++
>  arch/arm64/include/asm/barrier.h             |  16 +-
>  arch/arm64/include/asm/cputype.h             |   4 +
>  arch/arm64/kernel/cpu_errata.c               |  64 ++++
>  arch/arm64/tools/cpucaps                     |   3 +
>  drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++--
>  7 files changed, 510 insertions(+), 37 deletions(-)

Patches 04 to 11 and 13 to 17:

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

I am done reviewing this set.

Thanks,
Mathieu

> 
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-10-04 17:42   ` Mathieu Poirier
@ 2021-10-05 22:35     ` Suzuki K Poulose
  2021-10-06 17:15       ` Mathieu Poirier
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-05 22:35 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

Hi Mathieu

On 04/10/2021 18:42, Mathieu Poirier wrote:
> On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
>> Add a helper to get the CPU specific data for TRBE instance, from
>> a given perf handle. This also adds extra checks to make sure that
>> the event associated with the handle is "bound" to the CPU and is
>> active on the TRBE.
>>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mike Leach <mike.leach@linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Leo Yan <leo.yan@linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 983dd5039e52..797d978f9fa7 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>>   	return buf->nr_pages * PAGE_SIZE;
>>   }
>>   
>> +static inline struct trbe_cpudata *
>> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
>> +{
>> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
>> +
>> +	BUG_ON(!buf || !buf->cpudata);
>> +	return buf->cpudata;
>> +}
>> +
>>   /*
>>    * TRBE Limit Calculation
>>    *
>> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>>   {
>>   	int ec = get_trbe_ec(trbsr);
>>   	int bsc = get_trbe_bsc(trbsr);
>> -	struct trbe_buf *buf = etm_perf_sink_config(handle);
>> -	struct trbe_cpudata *cpudata = buf->cpudata;
>> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
> 
> There is two other places where this pattern is present:  is_perf_trbe() and
> __trbe_normal_offset().

I skipped them, as they have to get access to the "trbe_buf" anyways.
So the step by step, made sense. But I could replace them too to make it
transparent.

What do you think ?

Suzuki



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-10-05 22:35     ` Suzuki K Poulose
@ 2021-10-06 17:15       ` Mathieu Poirier
  2021-10-07  9:18         ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Mathieu Poirier @ 2021-10-06 17:15 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:
> Hi Mathieu
> 
> On 04/10/2021 18:42, Mathieu Poirier wrote:
> > On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
> > > Add a helper to get the CPU specific data for TRBE instance, from
> > > a given perf handle. This also adds extra checks to make sure that
> > > the event associated with the handle is "bound" to the CPU and is
> > > active on the TRBE.
> > > 
> > > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > > Cc: Mike Leach <mike.leach@linaro.org>
> > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > Cc: Leo Yan <leo.yan@linaro.org>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > ---
> > >   drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
> > >   1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > > index 983dd5039e52..797d978f9fa7 100644
> > > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > > @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> > >   	return buf->nr_pages * PAGE_SIZE;
> > >   }
> > > +static inline struct trbe_cpudata *
> > > +trbe_handle_to_cpudata(struct perf_output_handle *handle)
> > > +{
> > > +	struct trbe_buf *buf = etm_perf_sink_config(handle);
> > > +
> > > +	BUG_ON(!buf || !buf->cpudata);
> > > +	return buf->cpudata;
> > > +}
> > > +
> > >   /*
> > >    * TRBE Limit Calculation
> > >    *
> > > @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
> > >   {
> > >   	int ec = get_trbe_ec(trbsr);
> > >   	int bsc = get_trbe_bsc(trbsr);
> > > -	struct trbe_buf *buf = etm_perf_sink_config(handle);
> > > -	struct trbe_cpudata *cpudata = buf->cpudata;
> > > +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
> > 
> > There is two other places where this pattern is present:  is_perf_trbe() and
> > __trbe_normal_offset().
> 
> I skipped them, as they have to get access to the "trbe_buf" anyways.
> So the step by step, made sense. But I could replace them too to make it
> transparent.
> 
> What do you think ?

Humm...  I don't think there is a right way or a wrong way here.  If we move
forward with this patchset we have two ways of getting to buf->cpudata.  One
using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and
__trbe_normal_offset(), each with an equal number of occurences (2 for each).

I am usually not fond of small functions like trbe_handle_to_cpudata() and to me
keeping the current heuristic in trbe_get_fault_act() would have been just fine.
I agree with the argument that trbe_handle_to_cpudata() provides more checks but
is it really worth it if they aren't done everywhere?

In short I would get rid of trbe_handle_to_cpudata() entirely and live without
the extra checks... But I'm not strongly opinionated on this either.

> 
> Suzuki
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle
  2021-10-06 17:15       ` Mathieu Poirier
@ 2021-10-07  9:18         ` Suzuki K Poulose
  0 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-07  9:18 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, will, lcherian, coresight

On 06/10/2021 18:15, Mathieu Poirier wrote:
> On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:
>> Hi Mathieu
>>
>> On 04/10/2021 18:42, Mathieu Poirier wrote:
>>> On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
>>>> Add a helper to get the CPU specific data for TRBE instance, from
>>>> a given perf handle. This also adds extra checks to make sure that
>>>> the event associated with the handle is "bound" to the CPU and is
>>>> active on the TRBE.
>>>>
>>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> Cc: Mike Leach <mike.leach@linaro.org>
>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>>>> Cc: Leo Yan <leo.yan@linaro.org>
>>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> ---
>>>>    drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>>>>    1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> index 983dd5039e52..797d978f9fa7 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>>>>    	return buf->nr_pages * PAGE_SIZE;
>>>>    }
>>>> +static inline struct trbe_cpudata *
>>>> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>>> +{
>>>> +	struct trbe_buf *buf = etm_perf_sink_config(handle);
>>>> +
>>>> +	BUG_ON(!buf || !buf->cpudata);
>>>> +	return buf->cpudata;
>>>> +}
>>>> +
>>>>    /*
>>>>     * TRBE Limit Calculation
>>>>     *
>>>> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>>>>    {
>>>>    	int ec = get_trbe_ec(trbsr);
>>>>    	int bsc = get_trbe_bsc(trbsr);
>>>> -	struct trbe_buf *buf = etm_perf_sink_config(handle);
>>>> -	struct trbe_cpudata *cpudata = buf->cpudata;
>>>> +	struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>>>
>>> There is two other places where this pattern is present:  is_perf_trbe() and
>>> __trbe_normal_offset().
>>
>> I skipped them, as they have to get access to the "trbe_buf" anyways.
>> So the step by step, made sense. But I could replace them too to make it
>> transparent.
>>
>> What do you think ?
> 
> Humm...  I don't think there is a right way or a wrong way here.  If we move
> forward with this patchset we have two ways of getting to buf->cpudata.  One
> using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and
> __trbe_normal_offset(), each with an equal number of occurences (2 for each).
> 
> I am usually not fond of small functions like trbe_handle_to_cpudata() and to me
> keeping the current heuristic in trbe_get_fault_act() would have been just fine.

There is another user introduced in the work around patch. But, yes, I
agree, we could open code it, rather than having it inconsistent across
the driver.

> I agree with the argument that trbe_handle_to_cpudata() provides more checks but
> is it really worth it if they aren't done everywhere?
> 
> In short I would get rid of trbe_handle_to_cpudata() entirely and live without
> the extra checks... But I'm not strongly opinionated on this either.

Ok, I will remove this then. Thanks for the feedback.

Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode
  2021-09-21 13:41 ` [PATCH v2 10/17] arm64: Enable workaround for TRBE " Suzuki K Poulose
  2021-09-22  7:23   ` Anshuman Khandual
@ 2021-10-07 16:09   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Catalin Marinas @ 2021-10-07 16:09 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:14PM +0100, Suzuki K Poulose wrote:
> Now that we have the work around implmented in the TRBE
> driver, add the Kconfig entries and document the errata.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures
  2021-09-21 13:41 ` [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures Suzuki K Poulose
  2021-09-22  7:39   ` Anshuman Khandual
@ 2021-10-07 16:10   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Catalin Marinas @ 2021-10-07 16:10 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:15PM +0100, Suzuki K Poulose wrote:
> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
> from errata, where a TSB (trace synchronization barrier)
> fails to flush the trace data completely, when executed from
> a trace prohibited region. In Linux we always execute it
> after we have moved the PE to trace prohibited region. So,
> we can apply the workaround everytime a TSB is executed.
> 
> The work around is to issue two TSB consecutively.
> 
> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
> that a late CPU could be blocked from booting if it is the
> first CPU that requires the workaround. This is because we
> do not allow setting a cpu_hwcaps after the SMP boot. The
> other alternative is to use "this_cpu_has_cap()" instead
> of the faster system wide check, which may be a bit of an
> overhead, given we may have to do this in nvhe KVM host
> before a guest entry.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range
  2021-09-21 13:41 ` [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range Suzuki K Poulose
  2021-09-22 10:59   ` Anshuman Khandual
@ 2021-10-07 16:10   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Catalin Marinas @ 2021-10-07 16:10 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:19PM +0100, Suzuki K Poulose wrote:
> Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
> trbe, under some circumstances, might write upto 64bytes to an address after
> the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -
> 
>   - Corrupt a page in the ring buffer, which may corrupt trace from a
>     previous session, consumed by userspace.
>   - Hit the guard page at the end of the vmalloc area and raise a fault.
> 
> To keep the handling simpler, we always leave the last page from the
> range, which TRBE is allowed to write. This can be achieved by ensuring
> that we always have more than a PAGE worth space in the range, while
> calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
> to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
> while enabling it. This makes sure that the TRBE will only write to an area
> within its allowed limit (i.e, [head-head+size]) and we do not have to handle
> address faults within the driver.
> 
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address
  2021-09-21 13:41 ` [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address Suzuki K Poulose
  2021-09-22 11:03   ` Anshuman Khandual
@ 2021-10-07 16:11   ` Catalin Marinas
  1 sibling, 0 replies; 62+ messages in thread
From: Catalin Marinas @ 2021-10-07 16:11 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, mark.rutland, james.morse,
	anshuman.khandual, leo.yan, mike.leach, mathieu.poirier, will,
	lcherian, coresight

On Tue, Sep 21, 2021 at 02:41:21PM +0100, Suzuki K Poulose wrote:
> Add Kconfig entries for the errata workarounds for TRBE writing
> to an out-of-range address.
> 
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
  2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
                   ` (17 preceding siblings ...)
  2021-10-05 17:04 ` [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Mathieu Poirier
@ 2021-10-08  7:32 ` Will Deacon
  2021-10-08  9:25   ` Suzuki K Poulose
  18 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-10-08  7:32 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, mathieu.poirier, lcherian, coresight

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> This series adds CPU erratum work arounds related to the self-hosted
> tracing. The list of affected errata handled in this series are :
> 
>  * TRBE may overwrite trace in FILL mode
>    - Arm Neoverse-N2	#2139208
>    - Cortex-A710	#211985
> 
>  * A TSB instruction may not flush the trace completely when executed
>    in trace prohibited region.
> 
>    - Arm Neoverse-N2	#2067961
>    - Cortex-A710	#2054223
> 
>  * TRBE may write to out-of-range address
>    - Arm Neoverse-N2	#2253138
>    - Cortex-A710	#2224489
> 
> The series applies on the self-hosted/trbe fixes posted here [0].
> A tree containing both the series is available here [1]

Any chance you could put the arch/arm64/ bits at the start of the series,
please? That way, I can queue them on their own branch which can be shared
with the coresight tree.

Thanks,

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
  2021-10-08  7:32 ` Will Deacon
@ 2021-10-08  9:25   ` Suzuki K Poulose
  2021-10-08  9:52     ` Will Deacon
  0 siblings, 1 reply; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-08  9:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, mathieu.poirier, lcherian, coresight

Hi Will

On 08/10/2021 08:32, Will Deacon wrote:
> Hi Suzuki,
> 
> On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
>> This series adds CPU erratum work arounds related to the self-hosted
>> tracing. The list of affected errata handled in this series are :
>>
>>   * TRBE may overwrite trace in FILL mode
>>     - Arm Neoverse-N2	#2139208
>>     - Cortex-A710	#211985
>>
>>   * A TSB instruction may not flush the trace completely when executed
>>     in trace prohibited region.
>>
>>     - Arm Neoverse-N2	#2067961
>>     - Cortex-A710	#2054223
>>
>>   * TRBE may write to out-of-range address
>>     - Arm Neoverse-N2	#2253138
>>     - Cortex-A710	#2224489
>>
>> The series applies on the self-hosted/trbe fixes posted here [0].
>> A tree containing both the series is available here [1]
> 
> Any chance you could put the arch/arm64/ bits at the start of the series,
> please? That way, I can queue them on their own branch which can be shared
> with the coresight tree.

I could move the bits around. I have a question though.

Will, Catalin, Mathieu,

The workaround for these errata, at least two of them are
in the TRBE driver patches. Are we happy with enabling the Kconfig
entry in the kernel, without the CoreSight patches to implement the work
around ?

Suzuki

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
  2021-10-08  9:25   ` Suzuki K Poulose
@ 2021-10-08  9:52     ` Will Deacon
  2021-10-08  9:57       ` Suzuki K Poulose
  0 siblings, 1 reply; 62+ messages in thread
From: Will Deacon @ 2021-10-08  9:52 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, mathieu.poirier, lcherian, coresight

On Fri, Oct 08, 2021 at 10:25:03AM +0100, Suzuki K Poulose wrote:
> Hi Will
> 
> On 08/10/2021 08:32, Will Deacon wrote:
> > Hi Suzuki,
> > 
> > On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> > > This series adds CPU erratum work arounds related to the self-hosted
> > > tracing. The list of affected errata handled in this series are :
> > > 
> > >   * TRBE may overwrite trace in FILL mode
> > >     - Arm Neoverse-N2	#2139208
> > >     - Cortex-A710	#211985
> > > 
> > >   * A TSB instruction may not flush the trace completely when executed
> > >     in trace prohibited region.
> > > 
> > >     - Arm Neoverse-N2	#2067961
> > >     - Cortex-A710	#2054223
> > > 
> > >   * TRBE may write to out-of-range address
> > >     - Arm Neoverse-N2	#2253138
> > >     - Cortex-A710	#2224489
> > > 
> > > The series applies on the self-hosted/trbe fixes posted here [0].
> > > A tree containing both the series is available here [1]
> > 
> > Any chance you could put the arch/arm64/ bits at the start of the series,
> > please? That way, I can queue them on their own branch which can be shared
> > with the coresight tree.
> 
> I could move the bits around. I have a question though.
> 
> Will, Catalin, Mathieu,
> 
> The workaround for these errata, at least two of them are
> in the TRBE driver patches. Are we happy with enabling the Kconfig
> entry in the kernel, without the CoreSight patches to implement the work
> around ?

I suppose you could move all the Kconfig changes into their own patch and
stick it right at the end in the Coresight tree.

Will

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds
  2021-10-08  9:52     ` Will Deacon
@ 2021-10-08  9:57       ` Suzuki K Poulose
  0 siblings, 0 replies; 62+ messages in thread
From: Suzuki K Poulose @ 2021-10-08  9:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, maz, catalin.marinas,
	mark.rutland, james.morse, anshuman.khandual, leo.yan,
	mike.leach, mathieu.poirier, lcherian, coresight

On 08/10/2021 10:52, Will Deacon wrote:
> On Fri, Oct 08, 2021 at 10:25:03AM +0100, Suzuki K Poulose wrote:
>> Hi Will
>>
>> On 08/10/2021 08:32, Will Deacon wrote:
>>> Hi Suzuki,
>>>
>>> On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
>>>> This series adds CPU erratum work arounds related to the self-hosted
>>>> tracing. The list of affected errata handled in this series are :
>>>>
>>>>    * TRBE may overwrite trace in FILL mode
>>>>      - Arm Neoverse-N2	#2139208
>>>>      - Cortex-A710	#211985
>>>>
>>>>    * A TSB instruction may not flush the trace completely when executed
>>>>      in trace prohibited region.
>>>>
>>>>      - Arm Neoverse-N2	#2067961
>>>>      - Cortex-A710	#2054223
>>>>
>>>>    * TRBE may write to out-of-range address
>>>>      - Arm Neoverse-N2	#2253138
>>>>      - Cortex-A710	#2224489
>>>>
>>>> The series applies on the self-hosted/trbe fixes posted here [0].
>>>> A tree containing both the series is available here [1]
>>>
>>> Any chance you could put the arch/arm64/ bits at the start of the series,
>>> please? That way, I can queue them on their own branch which can be shared
>>> with the coresight tree.
>>
>> I could move the bits around. I have a question though.
>>
>> Will, Catalin, Mathieu,
>>
>> The workaround for these errata, at least two of them are
>> in the TRBE driver patches. Are we happy with enabling the Kconfig
>> entry in the kernel, without the CoreSight patches to implement the work
>> around ?
> 
> I suppose you could move all the Kconfig changes into their own patch and
> stick it right at the end in the Coresight tree.

Cool, I will do that then. Thanks. I will send the updated series.

Suzuki


^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2021-10-08  9:57 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 13:41 [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data Suzuki K Poulose
2021-09-22  5:41   ` Anshuman Khandual
2021-09-30 17:57   ` Mathieu Poirier
2021-09-21 13:41 ` [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling Suzuki K Poulose
2021-09-22  6:47   ` Anshuman Khandual
2021-10-05 16:46   ` Mathieu Poirier
2021-09-21 13:41 ` [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated Suzuki K Poulose
2021-09-30 17:54   ` Mathieu Poirier
2021-10-01  8:36     ` Suzuki K Poulose
2021-10-01 15:15       ` Mathieu Poirier
2021-10-01 15:22         ` Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 04/17] coresight: trbe: Add a helper to pad a given buffer area Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 05/17] coresight: trbe: Decouple buffer base from the hardware base Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 06/17] coresight: trbe: Allow driver to choose a different alignment Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition Suzuki K Poulose
2021-09-22  6:57   ` Anshuman Khandual
2021-09-21 13:41 ` [PATCH v2 08/17] arm64: Add erratum detection for TRBE overwrite in FILL mode Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata " Suzuki K Poulose
2021-09-23  6:13   ` Anshuman Khandual
2021-09-28 10:40     ` Suzuki K Poulose
2021-10-01  4:21       ` Anshuman Khandual
2021-10-01 17:15   ` Mathieu Poirier
2021-10-04  8:46     ` Suzuki K Poulose
2021-10-04 16:47       ` Mathieu Poirier
2021-09-21 13:41 ` [PATCH v2 10/17] arm64: Enable workaround for TRBE " Suzuki K Poulose
2021-09-22  7:23   ` Anshuman Khandual
2021-09-22  8:11     ` Suzuki K Poulose
2021-10-01  4:35       ` Anshuman Khandual
2021-10-07 16:09   ` Catalin Marinas
2021-09-21 13:41 ` [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures Suzuki K Poulose
2021-09-22  7:39   ` Anshuman Khandual
2021-09-22 12:03     ` Suzuki K Poulose
2021-10-01  4:38       ` Anshuman Khandual
2021-10-07 16:10   ` Catalin Marinas
2021-09-21 13:41 ` [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle Suzuki K Poulose
2021-09-22  7:59   ` Anshuman Khandual
2021-10-04 17:42   ` Mathieu Poirier
2021-10-05 22:35     ` Suzuki K Poulose
2021-10-06 17:15       ` Mathieu Poirier
2021-10-07  9:18         ` Suzuki K Poulose
2021-09-21 13:41 ` [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size Suzuki K Poulose
2021-09-22  9:51   ` Anshuman Khandual
2021-09-21 13:41 ` [PATCH v2 14/17] coresight: trbe: Make sure we have enough space Suzuki K Poulose
2021-09-22  9:58   ` Anshuman Khandual
2021-09-22 10:16     ` Suzuki K Poulose
2021-10-01  4:40       ` Anshuman Khandual
2021-09-21 13:41 ` [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range Suzuki K Poulose
2021-09-22 10:59   ` Anshuman Khandual
2021-10-07 16:10   ` Catalin Marinas
2021-09-21 13:41 ` [PATCH v2 16/17] coresight: trbe: Work around write to out of range Suzuki K Poulose
2021-09-23  3:15   ` Anshuman Khandual
2021-09-28 10:32     ` Suzuki K Poulose
2021-10-01  4:56       ` Anshuman Khandual
2021-09-21 13:41 ` [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address Suzuki K Poulose
2021-09-22 11:03   ` Anshuman Khandual
2021-10-07 16:11   ` Catalin Marinas
2021-10-05 17:04 ` [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds Mathieu Poirier
2021-10-08  7:32 ` Will Deacon
2021-10-08  9:25   ` Suzuki K Poulose
2021-10-08  9:52     ` Will Deacon
2021-10-08  9:57       ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).