[PATCH] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue

* [PATCH] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue
@ 2023-04-12 12:55 Ivan T. Ivanov
  2023-04-13 11:19 ` Mark Rutland
  2023-04-17  3:07 ` Peng Fan
  0 siblings, 2 replies; 9+ messages in thread
From: Ivan T. Ivanov @ 2023-04-12 12:55 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mark Brown, Mark Rutland, Shawn Guo, Dong Aisheng,
	linux-arm-kernel, linux-imx, Ivan T. Ivanov

According to NXP errata document[1] i.MX8QuadMax SoC suffers from
serious cache coherence issue. It was also mentioned in initial
support[2] for imx8qm mek machine.

I chose to use an ALTERNATIVE() framework, instead downstream solution[3],
for this issue with the hope to reduce effect of this fix on unaffected
platforms.

Unfortunately I was unable to find a way to identify SoC ID using
registers. Boot CPU MIDR_EL1 is equal to 0x410fd034. So I fallback to
using devicetree compatible strings for this.

I know this fix is a suboptimal solution for affected machines, but I
haven't been able to come up with a less intrusive fix.  And I hope once
TLB caches are invalidated any immediate attempt to invalidate them again
will be close to NOP operation (flush_tlb_kernel_range())

I have run few simple benchmarks and perf tests on affected and unaffected
machines and I was not able see any obvious issues. iMX8QM "performance"
was nearly doubled with 2 A72 bringed online.

Following is excerpt from NXP IMX8_1N94W "Mask Set Errata" document
Rev. 5, 3/2023. Just in case it gets lost somehow.

---
"ERR050104: Arm/A53: Cache coherency issue"

Description

Some maintenance operations exchanged between the A53 and A72
core clusters, involving some Translation Look-aside Buffer
Invalidate (TLBI) and Instruction Cache (IC) instructions can
be corrupted. The upper bits, above bit-35, of ARADDR and ACADDR
buses within in Arm A53 sub-system have been incorrectly connected.
Therefore ARADDR and ACADDR address bits above bit-35 should not
be used.

Workaround

The following software instructions are required to be downgraded
to TLBI VMALLE1IS:  TLBI ASIDE1, TLBI ASIDE1IS, TLBI VAAE1,
TLBI VAAE1IS, TLBI VAALE1, TLBI VAALE1IS, TLBI VAE1, TLBI VAE1IS,
TLBI VALE1, TLBI VALE1IS

The following software instructions are required to be downgraded
to TLBI VMALLS12E1IS: TLBI IPAS2E1IS, TLBI IPAS2LE1IS

The following software instructions are required to be downgraded
to TLBI ALLE2IS: TLBI VAE2IS, TLBI VALE2IS.

The following software instructions are required to be downgraded
to TLBI ALLE3IS: TLBI VAE3IS, TLBI VALE3IS.

The following software instructions are required to be downgraded
to TLBI VMALLE1IS when the Force Broadcast (FB) bit [9] of the
Hypervisor Configuration Register (HCR_EL2) is set:
TLBI ASIDE1, TLBI VAAE1, TLBI VAALE1, TLBI VAE1, TLBI VALE1

The following software instruction is required to be downgraded
to IC IALLUIS: IC IVAU, Xt

Specifically for the IC IVAU, Xt downgrade, setting SCTLR_EL1.UCI
to 0 will disable EL0 access to this instruction. Any attempt to
execute from EL0 will generate an EL1 trap, where the downgrade to
IC ALLUIS can be implemented.
--

[1] https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf
[2] 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support")
[3] https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/arch/arm64/include/asm/tlbflush.h#L19

Signed-off-by: Ivan T. Ivanov <iivanov@suse.de>
---
 Documentation/arm64/silicon-errata.rst |  2 ++
 arch/arm64/Kconfig                     | 10 ++++++++++
 arch/arm64/include/asm/cpufeature.h    |  3 ++-
 arch/arm64/include/asm/tlbflush.h      |  6 +++++-
 arch/arm64/kernel/cpu_errata.c         | 18 ++++++++++++++++++
 arch/arm64/kernel/traps.c              | 22 +++++++++++++++++++++-
 arch/arm64/tools/cpucaps               |  1 +
 7 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ec5f889d7681..fce231797184 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -175,6 +175,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
 +----------------+-----------------+-----------------+-----------------------------+
+| Freescale/NXP  | i.MX 8QuadMax   | ERR050104       | NXP_IMX8QM_ERRATUM_ERR050104|
++----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Hisilicon      | Hip0{5,6,7}     | #161010101      | HISILICON_ERRATUM_161010101 |
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..437cb53f8753 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1159,6 +1159,16 @@ config SOCIONEXT_SYNQUACER_PREITS
 
 	  If unsure, say Y.
 
+config NXP_IMX8QM_ERRATUM_ERR050104
+	bool "NXP iMX8QM: Workaround for Arm/A53 Cache coherency issue"
+	default n
+	help
+	  Some maintenance operations exchanged between the A53 and A72 core
+	  clusters, involving some Translation Look-aside Buffer Invalidate
+	  (TLBI) and Instruction Cache (IC) instructions can be corrupted.
+
+	  If unsure, say N.
+
 endmenu # "ARM errata workarounds via the alternatives framework"
 
 choice
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 6bf013fb110d..1ed648f7f29a 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -835,7 +835,8 @@ static inline bool system_supports_bti(void)
 static inline bool system_supports_tlb_range(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_TLB_RANGE) &&
-		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
+		cpus_have_const_cap(ARM64_HAS_TLB_RANGE) &&
+		!cpus_have_const_cap(ARM64_WORKAROUND_NXP_ERR050104);
 }
 
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25..12055b859ce3 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -37,7 +37,11 @@
 			    : : )
 
 #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE			       \
-			       "tlbi " #op ", %0\n"			       \
+		   ALTERNATIVE("nop\n	nop\n  tlbi " #op ", %0",              \
+			       "tlbi vmalle1is\n  dsb ish\n  isb",	       \
+			       ARM64_WORKAROUND_NXP_ERR050104)		       \
+			    : : "r" (arg));				       \
+			  asm (ARM64_ASM_PREAMBLE			       \
 		   ALTERNATIVE("nop\n			nop",		       \
 			       "dsb ish\n		tlbi " #op ", %0",     \
 			       ARM64_WORKAROUND_REPEAT_TLBI,		       \
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 307faa2b4395..7b702a79bf60 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -8,6 +8,7 @@
 #include <linux/arm-smccc.h>
 #include <linux/types.h>
 #include <linux/cpu.h>
+#include <linux/of.h>
 #include <asm/cpu.h>
 #include <asm/cputype.h>
 #include <asm/cpufeature.h>
@@ -55,6 +56,14 @@ is_kryo_midr(const struct arm64_cpu_capabilities *entry, int scope)
 	return model == entry->midr_range.model;
 }
 
+static bool __maybe_unused
+is_imx8qm_soc(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	WARN_ON(preemptible());
+
+	return of_machine_is_compatible("fsl,imx8qm");
+}
+
 static bool
 has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
 			  int scope)
@@ -729,6 +738,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		MIDR_FIXED(MIDR_CPU_VAR_REV(1,1), BIT(25)),
 		.cpu_enable = cpu_clear_bf16_from_user_emulation,
 	},
+#endif
+#ifdef CONFIG_NXP_IMX8QM_ERRATUM_ERR050104
+	{
+		.desc = "NXP A53 cache coherency issue",
+		.capability = ARM64_WORKAROUND_NXP_ERR050104,
+		.type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
+		.matches = is_imx8qm_soc,
+		.cpu_enable = cpu_enable_cache_maint_trap,
+	},
 #endif
 	{
 	}
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 4a79ba100799..4858f8c86fd5 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -529,6 +529,26 @@ void do_el1_fpac(struct pt_regs *regs, unsigned long esr)
 		uaccess_ttbr0_disable();			\
 	}
 
+#define __user_instruction_cache_maint(address, res)		\
+do {								\
+	if (address >= TASK_SIZE_MAX) {				\
+		res = -EFAULT;					\
+	} else {						\
+		uaccess_ttbr0_enable();				\
+		asm volatile (					\
+			"1:\n"					\
+	    ALTERNATIVE("	ic	ivau, %1\n",		\
+			"	ic	ialluis\n",		\
+			ARM64_WORKAROUND_NXP_ERR050104)		\
+			"	mov	%w0, #0\n"		\
+			"2:\n"					\
+			_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0)	\
+			: "=r" (res)				\
+			: "r" (address));			\
+		uaccess_ttbr0_disable();			\
+	}							\
+} while (0)
+
 static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
 {
 	unsigned long tagged_address, address;
@@ -556,7 +576,7 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
 		__user_cache_maint("dc civac", address, ret);
 		break;
 	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
-		__user_cache_maint("ic ivau", address, ret);
+		__user_instruction_cache_maint(address, ret);
 		break;
 	default:
 		force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0);
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 37b1340e9646..e225f1cd1005 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -90,3 +90,4 @@ WORKAROUND_NVIDIA_CARMEL_CNP
 WORKAROUND_QCOM_FALKOR_E1003
 WORKAROUND_REPEAT_TLBI
 WORKAROUND_SPECULATIVE_AT
+WORKAROUND_NXP_ERR050104
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread