linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/5] arm64: Define Falkor v1 CPU
@ 2017-01-11 14:41 Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 2/5] arm64: Work around Falkor erratum 1003 Christopher Covington
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur
  Cc: Mark Langsdorf, Mark Salter, Jon Masters, Christopher Covington

From: Shanker Donthineni <shankerd@codeaurora.org>

Define the MIDR implementer and part number field values for the Qualcomm
Datacenter Technologies Falkor processor version 1 in the usual manner.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 arch/arm64/include/asm/cputype.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 26a68dd..ee60561 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -71,6 +71,7 @@
 #define ARM_CPU_IMP_APM			0x50
 #define ARM_CPU_IMP_CAVIUM		0x43
 #define ARM_CPU_IMP_BRCM		0x42
+#define ARM_CPU_IMP_QCOM		0x51
 
 #define ARM_CPU_PART_AEM_V8		0xD0F
 #define ARM_CPU_PART_FOUNDATION		0xD00
@@ -84,10 +85,13 @@
 
 #define BRCM_CPU_PART_VULCAN		0x516
 
+#define QCOM_CPU_PART_FALKOR_V1		0x800
+
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
 #define MIDR_THUNDERX	MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
+#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
 
 #ifndef __ASSEMBLY__
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 14:41 [PATCH v3 1/5] arm64: Define Falkor v1 CPU Christopher Covington
@ 2017-01-11 14:41 ` Christopher Covington
  2017-01-11 18:06   ` Catalin Marinas
  2017-01-11 14:41 ` [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros Christopher Covington
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Christopher Covington @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Jonathan Corbet, linux-doc
  Cc: Mark Langsdorf, Mark Salter, Jon Masters, Christopher Covington

From: Shanker Donthineni <shankerd@codeaurora.org>

On the Qualcomm Datacenter Technologies Falkor v1 CPU, memory accesses may
allocate TLB entries using an incorrect ASID when TTBRx_EL1 is being
updated. Changing the TTBRx_EL1[ASID] and TTBRx_EL1[BADDR] fields
separately using a reserved ASID will ensure that there are no TLB entries
with incorrect ASID after changing the the ASID.

Pseudo code:
  write TTBRx_EL1[ASID] to a reserved value
  ISB
  write TTBRx_EL1[BADDR] to a desired value
  ISB
  write TTBRx_EL1[ASID] to a desired value
  ISB

EL2 and EL3 code changing the EL1&0 ASID is not subject to this erratum
because hardware is prohibited from performing translations from an
out-of-context translation regime.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 Documentation/arm64/silicon-errata.txt | 43 +++++++++++++++++-----------------
 arch/arm64/Kconfig                     | 11 +++++++++
 arch/arm64/include/asm/cpucaps.h       |  3 ++-
 arch/arm64/include/asm/mmu_context.h   |  8 ++++++-
 arch/arm64/kernel/cpu_errata.c         |  7 ++++++
 arch/arm64/mm/context.c                | 10 ++++++++
 arch/arm64/mm/proc.S                   | 13 ++++++++++
 7 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index 405da11..7151aed 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -42,24 +42,25 @@ file acts as a registry of software workarounds in the Linux Kernel and
 will be updated when new workarounds are committed and backported to
 stable kernels.
 
-| Implementor    | Component       | Erratum ID      | Kconfig                 |
-+----------------+-----------------+-----------------+-------------------------+
-| ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319    |
-| ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319    |
-| ARM            | Cortex-A53      | #824069         | ARM64_ERRATUM_824069    |
-| ARM            | Cortex-A53      | #819472         | ARM64_ERRATUM_819472    |
-| ARM            | Cortex-A53      | #845719         | ARM64_ERRATUM_845719    |
-| ARM            | Cortex-A53      | #843419         | ARM64_ERRATUM_843419    |
-| ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
-| ARM            | Cortex-A57      | #852523         | N/A                     |
-| ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
-| ARM            | Cortex-A72      | #853709         | N/A                     |
-| ARM            | MMU-500         | #841119,#826419 | N/A                     |
-|                |                 |                 |                         |
-| Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
-| Cavium         | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144    |
-| Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
-| Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
-| Cavium         | ThunderX SMMUv2 | #27704          | N/A		       |
-|                |                 |                 |                         |
-| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
+| Implementor   | Component       | Erratum ID      | Kconfig                  |
++---------------+-----------------+-----------------+--------------------------+
+| ARM           | Cortex-A53      | #826319         | ARM64_ERRATUM_826319     |
+| ARM           | Cortex-A53      | #827319         | ARM64_ERRATUM_827319     |
+| ARM           | Cortex-A53      | #824069         | ARM64_ERRATUM_824069     |
+| ARM           | Cortex-A53      | #819472         | ARM64_ERRATUM_819472     |
+| ARM           | Cortex-A53      | #845719         | ARM64_ERRATUM_845719     |
+| ARM           | Cortex-A53      | #843419         | ARM64_ERRATUM_843419     |
+| ARM           | Cortex-A57      | #832075         | ARM64_ERRATUM_832075     |
+| ARM           | Cortex-A57      | #852523         | N/A                      |
+| ARM           | Cortex-A57      | #834220         | ARM64_ERRATUM_834220     |
+| ARM           | Cortex-A72      | #853709         | N/A                      |
+| ARM           | MMU-500         | #841119,#826419 | N/A                      |
+|               |                 |                 |                          |
+| Cavium        | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375     |
+| Cavium        | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144     |
+| Cavium        | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154     |
+| Cavium        | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456     |
+| Cavium        | ThunderX SMMUv2 | #27704          | N/A                      |
+|               |                 |                 |                          |
+| Freescale/NXP | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585      |
+| Qualcomm      | Falkor v1       | E1003           | QCOM_FALKOR_ERRATUM_1003 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..2a80ac9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -479,6 +479,17 @@ config CAVIUM_ERRATUM_27456
 
 	  If unsure, say Y.
 
+config QCOM_FALKOR_ERRATUM_1003
+	bool "Falkor E1003: Incorrect translation due to ASID change"
+	default y
+	help
+	  An incorrect translation TLBI entry may be created while changing the
+	  ASID and translation table address together for TTBR0_EL1. The
+	  workaround for this issue is to use a reserved ASID in
+	  cpu_do_switch_mm() before switching to the target ASID.
+
+	  If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 4174f09..5aaf7ee 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -35,7 +35,8 @@
 #define ARM64_HYP_OFFSET_LOW			14
 #define ARM64_MISMATCHED_CACHE_LINE_SIZE	15
 #define ARM64_HAS_NO_FPSIMD			16
+#define ARM64_WORKAROUND_QCOM_FALKOR_E1003	17
 
-#define ARM64_NCAPS				17
+#define ARM64_NCAPS				18
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 0363fe8..9632b05 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -19,6 +19,10 @@
 #ifndef __ASM_MMU_CONTEXT_H
 #define __ASM_MMU_CONTEXT_H
 
+#define FALKOR_RESERVED_ASID	1
+
+#ifndef __ASSEMBLY__
+
 #include <linux/compiler.h>
 #include <linux/sched.h>
 
@@ -220,4 +224,6 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 
 void verify_cpu_asid_bits(void);
 
-#endif
+#endif /* !__ASSEMBLY__ */
+
+#endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index b75e917..787b542 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -130,6 +130,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		.def_scope = SCOPE_LOCAL_CPU,
 		.enable = cpu_enable_trap_ctr_access,
 	},
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
+	{
+		.desc = "Qualcomm Falkor erratum 1003",
+		.capability = ARM64_WORKAROUND_QCOM_FALKOR_E1003,
+		MIDR_RANGE(MIDR_QCOM_FALKOR_V1, 0x00, 0x00),
+	},
+#endif
 	{
 	}
 };
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 4c63cb1..5a0a82a 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -87,6 +87,11 @@ static void flush_context(unsigned int cpu)
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
 	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
 
+	/* Reserve ASID for Falkor erratum 1003 */
+	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
+	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
+		__set_bit(FALKOR_RESERVED_ASID, asid_map);
+
 	/*
 	 * Ensure the generation bump is observed before we xchg the
 	 * active_asids.
@@ -244,6 +249,11 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	/* Reserve ASID for Falkor erratum 1003 */
+	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
+	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
+		__set_bit(FALKOR_RESERVED_ASID, asid_map);
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 32682be..9ee46df 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -23,6 +23,7 @@
 #include <asm/assembler.h>
 #include <asm/asm-offsets.h>
 #include <asm/hwcap.h>
+#include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/cpufeature.h>
@@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
 ENTRY(cpu_do_switch_mm)
 	mmid	x1, x1				// get mm->context.id
 	bfi	x0, x1, #48, #16		// set the ASID
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
+	mrs     x2, ttbr0_el1
+	mov     x3, #FALKOR_RESERVED_ASID
+	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
+	msr     ttbr0_el1, x2
+	isb
+	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
+	msr     ttbr0_el1, x2
+	isb
+alternative_else_nop_endif
+#endif
 	msr	ttbr0_el1, x0			// set TTBR0
 	isb
 	post_ttbr0_update_workaround
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros
  2017-01-11 14:41 [PATCH v3 1/5] arm64: Define Falkor v1 CPU Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 2/5] arm64: Work around Falkor erratum 1003 Christopher Covington
@ 2017-01-11 14:41 ` Christopher Covington
  2017-01-12 16:58   ` Will Deacon
  2017-01-11 14:41 ` [PATCH v3 4/5] arm64: Use __tlbi_dsb() macros in KVM code Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 5/5] arm64: Work around Falkor erratum 1009 Christopher Covington
  3 siblings, 1 reply; 26+ messages in thread
From: Christopher Covington @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur
  Cc: Mark Langsdorf, Mark Salter, Jon Masters, Christopher Covington

This refactoring will allow an errata workaround that repeats tlbi dsb
sequences to only change one location. This is not intended to change the
generated assembly and comparison of before and after preprocessor output
of arch/arm64/mm/mmu.c and vmlinux objdump shows no functional changes.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 arch/arm64/include/asm/tlbflush.h | 104 +++++++++++++++++++++++++-------------
 1 file changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index deab523..f28813c 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -25,22 +25,69 @@
 #include <asm/cputype.h>
 
 /*
- * Raw TLBI operations.
+ * Raw TLBI, DSB operations
  *
- * Where necessary, use the __tlbi() macro to avoid asm()
- * boilerplate. Drivers and most kernel code should use the TLB
- * management routines in preference to the macro below.
+ * Where necessary, use __tlbi_*dsb() macros to avoid asm() boilerplate.
+ * Drivers and most kernel code should use the TLB management routines in
+ * preference to the macros below.
  *
- * The macro can be used as __tlbi(op) or __tlbi(op, arg), depending
- * on whether a particular TLBI operation takes an argument or
- * not. The macros handles invoking the asm with or without the
- * register argument as appropriate.
+ * The __tlbi_dsb() macro handles invoking the asm without any register
+ * argument, with a single register argument, and with start (included)
+ * and end (excluded) range of register arguments. For example:
+ *
+ * __tlbi_dsb(op, attr)
+ *
+ * 	tlbi op
+ *	dsb attr
+ *
+ * __tlbi_dsb(op, attr, addr)
+ *
+ *	mov %[addr], =addr
+ *	tlbi op, %[addr]
+ *	dsb attr
+ *
+ * __tlbi_range_dsb(op, attr, start, end)
+ *
+ * 	mov %[arg], =start
+ *	mov %[end], =end
+ * for:
+ * 	tlbi op, %[addr]
+ * 	add %[addr], %[addr], #(1 << (PAGE_SHIFT - 12))
+ * 	cmp %[addr], %[end]
+ * 	b.ne for
+ * 	dsb attr
  */
-#define __TLBI_0(op, arg)		asm ("tlbi " #op)
-#define __TLBI_1(op, arg)		asm ("tlbi " #op ", %0" : : "r" (arg))
-#define __TLBI_N(op, arg, n, ...)	__TLBI_##n(op, arg)
 
-#define __tlbi(op, ...)		__TLBI_N(op, ##__VA_ARGS__, 1, 0)
+#define __TLBI_FOR_0(ig0, ig1, ig2)
+#define __TLBI_INSTR_0(op, ig1, ig2)	"tlbi " #op
+#define __TLBI_IO_0(ig0, ig1, ig2)	: :
+
+#define __TLBI_FOR_1(ig0, ig1, ig2)
+#define __TLBI_INSTR_1(op, ig0, ig1)	"tlbi " #op ", %0"
+#define __TLBI_IO_1(ig0, arg, ig1)	: : "r" (arg)
+
+#define __TLBI_FOR_2(ig0, start, ig1)	unsigned long addr;		       \
+					for (addr = start; addr < end;	       \
+						addr += 1 << (PAGE_SHIFT - 12))
+#define __TLBI_INSTR_2(op, ig0, ig1)	"tlbi " #op ", %0"
+#define __TLBI_IO_2(ig0, ig1, ig2)	: : "r" (addr)
+
+#define __TLBI_FOR_N(op, a1, a2, n, ...)	__TLBI_FOR_##n(op, a1, a2)
+#define __TLBI_INSTR_N(op, a1, a2, n, ...)	__TLBI_INSTR_##n(op, a1, a2)
+#define __TLBI_IO_N(op, a1, a2, n, ...)	__TLBI_IO_##n(op, a1, a2)
+
+#define __TLBI_FOR(op, ...)		__TLBI_FOR_N(op, ##__VA_ARGS__, 2, 1, 0)
+#define __TLBI_INSTR(op, ...)		__TLBI_INSTR_N(op, ##__VA_ARGS__, 2, 1, 0)
+#define __TLBI_IO(op, ...)		__TLBI_IO_N(op, ##__VA_ARGS__, 2, 1, 0)
+
+#define __tlbi_asm_dsb(as, op, attr, ...) do {				       \
+		__TLBI_FOR(op, ##__VA_ARGS__)				       \
+			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
+			__TLBI_IO(op, ##__VA_ARGS__));			       \
+		asm volatile (	     as			"\ndsb " #attr "\n"    \
+		: : : "memory"); } while (0)
+
+#define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)
 
 /*
  *	TLB Management
@@ -84,16 +131,14 @@
 static inline void local_flush_tlb_all(void)
 {
 	dsb(nshst);
-	__tlbi(vmalle1);
-	dsb(nsh);
+	__tlbi_dsb(vmalle1, nsh);
 	isb();
 }
 
 static inline void flush_tlb_all(void)
 {
 	dsb(ishst);
-	__tlbi(vmalle1is);
-	dsb(ish);
+	__tlbi_dsb(vmalle1is, ish);
 	isb();
 }
 
@@ -102,8 +147,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 	unsigned long asid = ASID(mm) << 48;
 
 	dsb(ishst);
-	__tlbi(aside1is, asid);
-	dsb(ish);
+	__tlbi_dsb(aside1is, ish, asid);
 }
 
 static inline void flush_tlb_page(struct vm_area_struct *vma,
@@ -112,8 +156,7 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 	unsigned long addr = uaddr >> 12 | (ASID(vma->vm_mm) << 48);
 
 	dsb(ishst);
-	__tlbi(vale1is, addr);
-	dsb(ish);
+	__tlbi_dsb(vale1is, ish, addr);
 }
 
 /*
@@ -127,7 +170,6 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     bool last_level)
 {
 	unsigned long asid = ASID(vma->vm_mm) << 48;
-	unsigned long addr;
 
 	if ((end - start) > MAX_TLB_RANGE) {
 		flush_tlb_mm(vma->vm_mm);
@@ -138,13 +180,10 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	end = asid | (end >> 12);
 
 	dsb(ishst);
-	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) {
-		if (last_level)
-			__tlbi(vale1is, addr);
-		else
-			__tlbi(vae1is, addr);
-	}
-	dsb(ish);
+	if (last_level)
+		__tlbi_dsb(vale1is, ish, start, end);
+	else
+		__tlbi_dsb(vae1is, ish, start, end);
 }
 
 static inline void flush_tlb_range(struct vm_area_struct *vma,
@@ -155,8 +194,6 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-	unsigned long addr;
-
 	if ((end - start) > MAX_TLB_RANGE) {
 		flush_tlb_all();
 		return;
@@ -166,9 +203,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
 	end >>= 12;
 
 	dsb(ishst);
-	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
-		__tlbi(vaae1is, addr);
-	dsb(ish);
+	__tlbi_dsb(vaae1is, ish, start, end);
 	isb();
 }
 
@@ -181,8 +216,7 @@ static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 {
 	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
-	__tlbi(vae1is, addr);
-	dsb(ish);
+	__tlbi_dsb(vae1is, ish, addr);
 }
 
 #endif
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/5] arm64: Use __tlbi_dsb() macros in KVM code
  2017-01-11 14:41 [PATCH v3 1/5] arm64: Define Falkor v1 CPU Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 2/5] arm64: Work around Falkor erratum 1003 Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros Christopher Covington
@ 2017-01-11 14:41 ` Christopher Covington
  2017-01-11 14:41 ` [PATCH v3 5/5] arm64: Work around Falkor erratum 1009 Christopher Covington
  3 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur
  Cc: Mark Langsdorf, Mark Salter, Jon Masters, Christopher Covington

Refactor the KVM code to use the newly introduced __tlbi_dsb macros, which
will allow an errata workaround that repeats tlbi dsb sequences to only
change one location. This is not intended to change the generated assembly
and comparing before and after vmlinux objdump shows no functional changes.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 arch/arm64/kvm/hyp/tlb.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 88e2f2b..9669e4b 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,6 +16,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/tlbflush.h>
 
 void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 {
@@ -30,19 +31,15 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	 * We could do so much better if we had the VA as well.
 	 * Instead, we invalidate Stage-2 for this IPA, and the
 	 * whole of Stage-1. Weep...
+	 *
+	 * We have to ensure completion of the invalidation at Stage-2 with a
+	 * DSB, since a table walk on another CPU could refill a TLB with a
+	 * complete (S1 + S2) walk based on the old Stage-2 mapping if the
+	 * Stage-1 invalidation happened first.
 	 */
 	ipa >>= 12;
-	asm volatile("tlbi ipas2e1is, %0" : : "r" (ipa));
-
-	/*
-	 * We have to ensure completion of the invalidation at Stage-2,
-	 * since a table walk on another CPU could refill a TLB with a
-	 * complete (S1 + S2) walk based on the old Stage-2 mapping if
-	 * the Stage-1 invalidation happened first.
-	 */
-	dsb(ish);
-	asm volatile("tlbi vmalle1is" : : );
-	dsb(ish);
+	__tlbi_dsb(ipas2e1is, ish, ipa);
+	__tlbi_dsb(vmalle1is, ish);
 	isb();
 
 	write_sysreg(0, vttbr_el2);
@@ -57,8 +54,7 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 	write_sysreg(kvm->arch.vttbr, vttbr_el2);
 	isb();
 
-	asm volatile("tlbi vmalls12e1is" : : );
-	dsb(ish);
+	__tlbi_dsb(vmalls12e1is, ish);
 	isb();
 
 	write_sysreg(0, vttbr_el2);
@@ -72,8 +68,7 @@ void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
 	write_sysreg(kvm->arch.vttbr, vttbr_el2);
 	isb();
 
-	asm volatile("tlbi vmalle1" : : );
-	dsb(nsh);
+	__tlbi_dsb(vmalle1, nsh);
 	isb();
 
 	write_sysreg(0, vttbr_el2);
@@ -82,7 +77,5 @@ void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
 void __hyp_text __kvm_flush_vm_context(void)
 {
 	dsb(ishst);
-	asm volatile("tlbi alle1is	\n"
-		     "ic ialluis	  ": : );
-	dsb(ish);
+	__tlbi_asm_dsb("ic ialluis", alle1is, ish);
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 5/5] arm64: Work around Falkor erratum 1009
  2017-01-11 14:41 [PATCH v3 1/5] arm64: Define Falkor v1 CPU Christopher Covington
                   ` (2 preceding siblings ...)
  2017-01-11 14:41 ` [PATCH v3 4/5] arm64: Use __tlbi_dsb() macros in KVM code Christopher Covington
@ 2017-01-11 14:41 ` Christopher Covington
  3 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Jonathan Corbet, linux-doc
  Cc: Mark Langsdorf, Mark Salter, Jon Masters, Christopher Covington

During a TLB invalidate sequence targeting the inner shareable
domain, Falkor may prematurely complete the DSB before all loads
and stores using the old translation are observed; instruction
fetches are not subject to the conditions of this erratum.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig                     | 10 ++++++++++
 arch/arm64/include/asm/cpucaps.h       |  3 ++-
 arch/arm64/include/asm/tlbflush.h      |  5 ++++-
 arch/arm64/kernel/cpu_errata.c         |  7 +++++++
 5 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index 7151aed..98bef2a 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -64,3 +64,4 @@ stable kernels.
 |               |                 |                 |                          |
 | Freescale/NXP | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585      |
 | Qualcomm      | Falkor v1       | E1003           | QCOM_FALKOR_ERRATUM_1003 |
+| Qualcomm      | Falkor v1       | E1009           | QCOM_FALKOR_ERRATUM_1009 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 2a80ac9..d13e903 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -490,6 +490,16 @@ config QCOM_FALKOR_ERRATUM_1003
 
 	  If unsure, say Y.
 
+config QCOM_FALKOR_ERRATUM_1009
+	bool "Falkor E1009: Prematurely complete a DSB after a TLBI"
+	default y
+	help
+	  Falkor CPU may prematurely complete a DSB following a TLBI xxIS
+	  invalidate maintenance operations. Repeat the TLBI operation one
+	  more time to fix the issue.
+
+	  If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 5aaf7ee..55bcd02 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -36,7 +36,8 @@
 #define ARM64_MISMATCHED_CACHE_LINE_SIZE	15
 #define ARM64_HAS_NO_FPSIMD			16
 #define ARM64_WORKAROUND_QCOM_FALKOR_E1003	17
+#define ARM64_WORKAROUND_REPEAT_TLBI		18
 
-#define ARM64_NCAPS				18
+#define ARM64_NCAPS				19
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index f28813c..7313cd3 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -85,7 +85,10 @@
 			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
 			__TLBI_IO(op, ##__VA_ARGS__));			       \
 		asm volatile (	     as			"\ndsb " #attr "\n"    \
-		: : : "memory"); } while (0)
+			ALTERNATIVE("nop"		"\nnop"	       "\n",   \
+			__TLBI_INSTR(op, ##__VA_ARGS__)	"\ndsb " #attr "\n",   \
+			ARM64_WORKAROUND_REPEAT_TLBI)			       \
+		__TLBI_IO(op, ##__VA_ARGS__) : "memory"); } while (0)
 
 #define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)
 
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 787b542..e644364 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -137,6 +137,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		MIDR_RANGE(MIDR_QCOM_FALKOR_V1, 0x00, 0x00),
 	},
 #endif
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009
+	{
+		.desc = "Qualcomm Falkor erratum 1009",
+		.capability = ARM64_WORKAROUND_REPEAT_TLBI,
+		MIDR_RANGE(MIDR_QCOM_FALKOR_V1, 0x00, 0x00),
+	},
+#endif
 	{
 	}
 };
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 14:41 ` [PATCH v3 2/5] arm64: Work around Falkor erratum 1003 Christopher Covington
@ 2017-01-11 18:06   ` Catalin Marinas
  2017-01-11 18:22     ` Marc Zyngier
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Catalin Marinas @ 2017-01-11 18:06 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Will Deacon, kvm,
	linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Jonathan Corbet, linux-doc, Jon Masters, Mark Langsdorf,
	Mark Salter

Some minor comments below, nothing fundamental (as long as you say the
new sequence doesn't have the speculative TLB load problem I mentioned
on a previous version).

On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index 405da11..7151aed 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -42,24 +42,25 @@ file acts as a registry of software workarounds in the Linux Kernel and
>  will be updated when new workarounds are committed and backported to
>  stable kernels.
>  
> -| Implementor    | Component       | Erratum ID      | Kconfig                 |
> -+----------------+-----------------+-----------------+-------------------------+
> -| ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319    |
> -| ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319    |
> -| ARM            | Cortex-A53      | #824069         | ARM64_ERRATUM_824069    |
> -| ARM            | Cortex-A53      | #819472         | ARM64_ERRATUM_819472    |
> -| ARM            | Cortex-A53      | #845719         | ARM64_ERRATUM_845719    |
> -| ARM            | Cortex-A53      | #843419         | ARM64_ERRATUM_843419    |
> -| ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
> -| ARM            | Cortex-A57      | #852523         | N/A                     |
> -| ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
> -| ARM            | Cortex-A72      | #853709         | N/A                     |
> -| ARM            | MMU-500         | #841119,#826419 | N/A                     |
> -|                |                 |                 |                         |
> -| Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
> -| Cavium         | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144    |
> -| Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
> -| Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> -| Cavium         | ThunderX SMMUv2 | #27704          | N/A		       |
> -|                |                 |                 |                         |
> -| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> +| Implementor   | Component       | Erratum ID      | Kconfig                  |
> ++---------------+-----------------+-----------------+--------------------------+
> +| ARM           | Cortex-A53      | #826319         | ARM64_ERRATUM_826319     |
> +| ARM           | Cortex-A53      | #827319         | ARM64_ERRATUM_827319     |
> +| ARM           | Cortex-A53      | #824069         | ARM64_ERRATUM_824069     |
> +| ARM           | Cortex-A53      | #819472         | ARM64_ERRATUM_819472     |
> +| ARM           | Cortex-A53      | #845719         | ARM64_ERRATUM_845719     |
> +| ARM           | Cortex-A53      | #843419         | ARM64_ERRATUM_843419     |
> +| ARM           | Cortex-A57      | #832075         | ARM64_ERRATUM_832075     |
> +| ARM           | Cortex-A57      | #852523         | N/A                      |
> +| ARM           | Cortex-A57      | #834220         | ARM64_ERRATUM_834220     |
> +| ARM           | Cortex-A72      | #853709         | N/A                      |
> +| ARM           | MMU-500         | #841119,#826419 | N/A                      |
> +|               |                 |                 |                          |
> +| Cavium        | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375     |
> +| Cavium        | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144     |
> +| Cavium        | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154     |
> +| Cavium        | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456     |
> +| Cavium        | ThunderX SMMUv2 | #27704          | N/A                      |
> +|               |                 |                 |                          |
> +| Freescale/NXP | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585      |
> +| Qualcomm      | Falkor v1       | E1003           | QCOM_FALKOR_ERRATUM_1003 |

Please don't change the "Implementor" column width, there is no point
and it makes the patch harder to read (i.e. this hunk should only have
one line).

> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 4c63cb1..5a0a82a 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -87,6 +87,11 @@ static void flush_context(unsigned int cpu)
>  	/* Update the list of reserved ASIDs and the ASID bitmap. */
>  	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
>  
> +	/* Reserve ASID for Falkor erratum 1003 */
> +	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
> +	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
> +		__set_bit(FALKOR_RESERVED_ASID, asid_map);
> +
>  	/*
>  	 * Ensure the generation bump is observed before we xchg the
>  	 * active_asids.
> @@ -244,6 +249,11 @@ static int asids_init(void)
>  		panic("Failed to allocate bitmap for %lu ASIDs\n",
>  		      NUM_USER_ASIDS);
>  
> +	/* Reserve ASID for Falkor erratum 1003 */
> +	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
> +	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
> +		__set_bit(FALKOR_RESERVED_ASID, asid_map);
> +
>  	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
>  	return 0;
>  }

You could as well write a small static function in this file and call it
twice.

> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 32682be..9ee46df 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -23,6 +23,7 @@
>  #include <asm/assembler.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/hwcap.h>
> +#include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  #include <asm/pgtable-hwdef.h>
>  #include <asm/cpufeature.h>
> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
>  ENTRY(cpu_do_switch_mm)
>  	mmid	x1, x1				// get mm->context.id
>  	bfi	x0, x1, #48, #16		// set the ASID
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> +	mrs     x2, ttbr0_el1
> +	mov     x3, #FALKOR_RESERVED_ASID
> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
> +	msr     ttbr0_el1, x2
> +	isb
> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
> +	msr     ttbr0_el1, x2
> +	isb
> +alternative_else_nop_endif
> +#endif
>  	msr	ttbr0_el1, x0			// set TTBR0
>  	isb
>  	post_ttbr0_update_workaround

Please move the above hunk to a pre_ttbr0_update_workaround macro for
consistency with post_ttbr0_update_workaround.

-- 
Catalin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:06   ` Catalin Marinas
@ 2017-01-11 18:22     ` Marc Zyngier
  2017-01-11 18:40       ` Mark Rutland
  2017-01-12 15:55       ` Catalin Marinas
  2017-01-11 18:33     ` Mark Rutland
  2017-01-24 14:54     ` Christopher Covington
  2 siblings, 2 replies; 26+ messages in thread
From: Marc Zyngier @ 2017-01-11 18:22 UTC (permalink / raw)
  To: Catalin Marinas, Christopher Covington
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Will Deacon, kvm, linux-arm-kernel, kvmarm,
	linux-kernel, shankerd, timur, Jonathan Corbet, linux-doc,
	Jon Masters, Mark Langsdorf, Mark Salter

On 11/01/17 18:06, Catalin Marinas wrote:
> Some minor comments below, nothing fundamental (as long as you say the
> new sequence doesn't have the speculative TLB load problem I mentioned
> on a previous version).
> 
> On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index 405da11..7151aed 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -42,24 +42,25 @@ file acts as a registry of software workarounds in the Linux Kernel and
>>  will be updated when new workarounds are committed and backported to
>>  stable kernels.
>>  
>> -| Implementor    | Component       | Erratum ID      | Kconfig                 |
>> -+----------------+-----------------+-----------------+-------------------------+
>> -| ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319    |
>> -| ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319    |
>> -| ARM            | Cortex-A53      | #824069         | ARM64_ERRATUM_824069    |
>> -| ARM            | Cortex-A53      | #819472         | ARM64_ERRATUM_819472    |
>> -| ARM            | Cortex-A53      | #845719         | ARM64_ERRATUM_845719    |
>> -| ARM            | Cortex-A53      | #843419         | ARM64_ERRATUM_843419    |
>> -| ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
>> -| ARM            | Cortex-A57      | #852523         | N/A                     |
>> -| ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
>> -| ARM            | Cortex-A72      | #853709         | N/A                     |
>> -| ARM            | MMU-500         | #841119,#826419 | N/A                     |
>> -|                |                 |                 |                         |
>> -| Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>> -| Cavium         | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144    |
>> -| Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>> -| Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>> -| Cavium         | ThunderX SMMUv2 | #27704          | N/A		       |
>> -|                |                 |                 |                         |
>> -| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>> +| Implementor   | Component       | Erratum ID      | Kconfig                  |
>> ++---------------+-----------------+-----------------+--------------------------+
>> +| ARM           | Cortex-A53      | #826319         | ARM64_ERRATUM_826319     |
>> +| ARM           | Cortex-A53      | #827319         | ARM64_ERRATUM_827319     |
>> +| ARM           | Cortex-A53      | #824069         | ARM64_ERRATUM_824069     |
>> +| ARM           | Cortex-A53      | #819472         | ARM64_ERRATUM_819472     |
>> +| ARM           | Cortex-A53      | #845719         | ARM64_ERRATUM_845719     |
>> +| ARM           | Cortex-A53      | #843419         | ARM64_ERRATUM_843419     |
>> +| ARM           | Cortex-A57      | #832075         | ARM64_ERRATUM_832075     |
>> +| ARM           | Cortex-A57      | #852523         | N/A                      |
>> +| ARM           | Cortex-A57      | #834220         | ARM64_ERRATUM_834220     |
>> +| ARM           | Cortex-A72      | #853709         | N/A                      |
>> +| ARM           | MMU-500         | #841119,#826419 | N/A                      |
>> +|               |                 |                 |                          |
>> +| Cavium        | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375     |
>> +| Cavium        | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144     |
>> +| Cavium        | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154     |
>> +| Cavium        | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456     |
>> +| Cavium        | ThunderX SMMUv2 | #27704          | N/A                      |
>> +|               |                 |                 |                          |
>> +| Freescale/NXP | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585      |
>> +| Qualcomm      | Falkor v1       | E1003           | QCOM_FALKOR_ERRATUM_1003 |
> 
> Please don't change the "Implementor" column width, there is no point
> and it makes the patch harder to read (i.e. this hunk should only have
> one line).
> 
>> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
>> index 4c63cb1..5a0a82a 100644
>> --- a/arch/arm64/mm/context.c
>> +++ b/arch/arm64/mm/context.c
>> @@ -87,6 +87,11 @@ static void flush_context(unsigned int cpu)
>>  	/* Update the list of reserved ASIDs and the ASID bitmap. */
>>  	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
>>  
>> +	/* Reserve ASID for Falkor erratum 1003 */
>> +	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
>> +	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
>> +		__set_bit(FALKOR_RESERVED_ASID, asid_map);
>> +
>>  	/*
>>  	 * Ensure the generation bump is observed before we xchg the
>>  	 * active_asids.
>> @@ -244,6 +249,11 @@ static int asids_init(void)
>>  		panic("Failed to allocate bitmap for %lu ASIDs\n",
>>  		      NUM_USER_ASIDS);
>>  
>> +	/* Reserve ASID for Falkor erratum 1003 */
>> +	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
>> +	    cpus_have_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
>> +		__set_bit(FALKOR_RESERVED_ASID, asid_map);
>> +
>>  	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
>>  	return 0;
>>  }
> 
> You could as well write a small static function in this file and call it
> twice.
> 
>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>> index 32682be..9ee46df 100644
>> --- a/arch/arm64/mm/proc.S
>> +++ b/arch/arm64/mm/proc.S
>> @@ -23,6 +23,7 @@
>>  #include <asm/assembler.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/hwcap.h>
>> +#include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/pgtable-hwdef.h>
>>  #include <asm/cpufeature.h>
>> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
>>  ENTRY(cpu_do_switch_mm)
>>  	mmid	x1, x1				// get mm->context.id
>>  	bfi	x0, x1, #48, #16		// set the ASID
>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
>> +	mrs     x2, ttbr0_el1
>> +	mov     x3, #FALKOR_RESERVED_ASID
>> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
>> +	msr     ttbr0_el1, x2
>> +	isb
>> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
>> +	msr     ttbr0_el1, x2
>> +	isb
>> +alternative_else_nop_endif
>> +#endif
>>  	msr	ttbr0_el1, x0			// set TTBR0
>>  	isb
>>  	post_ttbr0_update_workaround
> 
> Please move the above hunk to a pre_ttbr0_update_workaround macro for
> consistency with post_ttbr0_update_workaround.

In which case (and also for consistency), should we add that pre_ttbr0
macro to entry.S, just before __uaccess_ttbr0_enable? It may not be
needed in the SW pan case, but it is probably worth entertaining the
idea that there may be something to do there...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:06   ` Catalin Marinas
  2017-01-11 18:22     ` Marc Zyngier
@ 2017-01-11 18:33     ` Mark Rutland
  2017-01-11 18:35       ` Timur Tabi
  2017-01-24 14:54     ` Christopher Covington
  2 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2017-01-11 18:33 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Christopher Covington, Mark Langsdorf, linux-doc, kvm,
	Marc Zyngier, Jon Masters, timur, Jonathan Corbet, Will Deacon,
	linux-kernel, linux-arm-kernel, Paolo Bonzini, kvmarm

On Wed, Jan 11, 2017 at 06:06:27PM +0000, Catalin Marinas wrote:
> On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:

> > -| Implementor    | Component       | Erratum ID      | Kconfig                 |

> > +| Implementor   | Component       | Erratum ID      | Kconfig                  |

> > +| Qualcomm      | Falkor v1       | E1003           | QCOM_FALKOR_ERRATUM_1003 |
> 
> Please don't change the "Implementor" column width, there is no point
> and it makes the patch harder to read (i.e. this hunk should only have
> one line).

It'll need to affect all lines since the kconfig column needs to expand
by at least one character to fit QCOM_FALKOR_ERRATUM_1003.

I beleive the intent here was to keep the table fitting into a width of
80 characters.

IMO we should allow the table to expand past 80 chars (everyone reading
this file should be able to resize tehir terminal), and only expand the
kconfig column.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:33     ` Mark Rutland
@ 2017-01-11 18:35       ` Timur Tabi
  2017-01-11 18:37         ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Timur Tabi @ 2017-01-11 18:35 UTC (permalink / raw)
  To: Mark Rutland, Catalin Marinas
  Cc: Christopher Covington, Mark Langsdorf, linux-doc, kvm,
	Marc Zyngier, Jon Masters, Jonathan Corbet, Will Deacon,
	linux-kernel, linux-arm-kernel, Paolo Bonzini, kvmarm

On 01/11/2017 12:33 PM, Mark Rutland wrote:
> It'll need to affect all lines since the kconfig column needs to expand
> by at least one character to fit QCOM_FALKOR_ERRATUM_1003.

Or we can make the macro shorter.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:35       ` Timur Tabi
@ 2017-01-11 18:37         ` Mark Rutland
  2017-01-11 18:40           ` Timur Tabi
  2017-01-12  9:59           ` Catalin Marinas
  0 siblings, 2 replies; 26+ messages in thread
From: Mark Rutland @ 2017-01-11 18:37 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Catalin Marinas, Christopher Covington, Mark Langsdorf,
	linux-doc, kvm, Marc Zyngier, Jon Masters, Jonathan Corbet,
	Will Deacon, linux-kernel, linux-arm-kernel, Paolo Bonzini,
	kvmarm

On Wed, Jan 11, 2017 at 12:35:55PM -0600, Timur Tabi wrote:
> On 01/11/2017 12:33 PM, Mark Rutland wrote:
> >It'll need to affect all lines since the kconfig column needs to expand
> >by at least one character to fit QCOM_FALKOR_ERRATUM_1003.
> 
> Or we can make the macro shorter.

The name, as it is, is perfectly descriptive.

Let's not sacrifice legibility over a non-issue.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:37         ` Mark Rutland
@ 2017-01-11 18:40           ` Timur Tabi
  2017-01-11 18:45             ` Mark Rutland
  2017-01-11 18:50             ` Marc Zyngier
  2017-01-12  9:59           ` Catalin Marinas
  1 sibling, 2 replies; 26+ messages in thread
From: Timur Tabi @ 2017-01-11 18:40 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Christopher Covington, Mark Langsdorf,
	linux-doc, kvm, Marc Zyngier, Jon Masters, Jonathan Corbet,
	Will Deacon, linux-kernel, linux-arm-kernel, Paolo Bonzini,
	kvmarm

On 01/11/2017 12:37 PM, Mark Rutland wrote:
> The name, as it is, is perfectly descriptive.
>
> Let's not sacrifice legibility over a non-issue.

I don't want to kick a dead horse or anything, but changing it to 
QCOM_FLKR_ERRATUM_1003 would eliminate all the spacing problems without 
sacrificing anything.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:22     ` Marc Zyngier
@ 2017-01-11 18:40       ` Mark Rutland
  2017-01-12 15:45         ` Catalin Marinas
  2017-01-12 15:55       ` Catalin Marinas
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2017-01-11 18:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Christopher Covington, Mark Langsdorf,
	linux-doc, kvm, Jon Masters, timur, Jonathan Corbet, Will Deacon,
	linux-kernel, Paolo Bonzini, kvmarm, linux-arm-kernel

On Wed, Jan 11, 2017 at 06:22:08PM +0000, Marc Zyngier wrote:
> On 11/01/17 18:06, Catalin Marinas wrote:
> > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
> >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> >> index 32682be..9ee46df 100644
> >> --- a/arch/arm64/mm/proc.S
> >> +++ b/arch/arm64/mm/proc.S
> >> @@ -23,6 +23,7 @@
> >>  #include <asm/assembler.h>
> >>  #include <asm/asm-offsets.h>
> >>  #include <asm/hwcap.h>
> >> +#include <asm/mmu_context.h>
> >>  #include <asm/pgtable.h>
> >>  #include <asm/pgtable-hwdef.h>
> >>  #include <asm/cpufeature.h>
> >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
> >>  ENTRY(cpu_do_switch_mm)
> >>  	mmid	x1, x1				// get mm->context.id
> >>  	bfi	x0, x1, #48, #16		// set the ASID
> >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
> >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> >> +	mrs     x2, ttbr0_el1
> >> +	mov     x3, #FALKOR_RESERVED_ASID
> >> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
> >> +	msr     ttbr0_el1, x2
> >> +	isb
> >> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
> >> +	msr     ttbr0_el1, x2
> >> +	isb
> >> +alternative_else_nop_endif
> >> +#endif
> >>  	msr	ttbr0_el1, x0			// set TTBR0
> >>  	isb
> >>  	post_ttbr0_update_workaround
> > 
> > Please move the above hunk to a pre_ttbr0_update_workaround macro for
> > consistency with post_ttbr0_update_workaround.
> 
> In which case (and also for consistency), should we add that pre_ttbr0
> macro to entry.S, just before __uaccess_ttbr0_enable? It may not be
> needed in the SW pan case, but it is probably worth entertaining the
> idea that there may be something to do there...

Likewise, I beleive we may need to modify cpu_set_reserved_ttbr0().

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:40           ` Timur Tabi
@ 2017-01-11 18:45             ` Mark Rutland
  2017-01-16 14:26               ` Christopher Covington
  2017-01-11 18:50             ` Marc Zyngier
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2017-01-11 18:45 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Catalin Marinas, Christopher Covington, Mark Langsdorf,
	linux-doc, kvm, Marc Zyngier, Jon Masters, Jonathan Corbet,
	Will Deacon, linux-kernel, linux-arm-kernel, Paolo Bonzini,
	kvmarm

On Wed, Jan 11, 2017 at 12:40:42PM -0600, Timur Tabi wrote:
> On 01/11/2017 12:37 PM, Mark Rutland wrote:
> >The name, as it is, is perfectly descriptive.
> >
> >Let's not sacrifice legibility over a non-issue.
> 
> I don't want to kick a dead horse or anything, but changing it to
> QCOM_FLKR_ERRATUM_1003 would eliminate all the spacing problems
> without sacrificing anything.

The CPU is called "Falkor", not "FLKR", and we're not coming up with an
ACPI table name...

The ARM Ltd. erratum numbers are global to all parts, so we don't
include the part name. Is the 1003 erratum number specific to Falkor?

If it's global, you could use QCOM_ERRATUM_1003 instead.

Otherwise, QCOM_FALKOR_ERRATUM_1003 is preferable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:40           ` Timur Tabi
  2017-01-11 18:45             ` Mark Rutland
@ 2017-01-11 18:50             ` Marc Zyngier
  1 sibling, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2017-01-11 18:50 UTC (permalink / raw)
  To: Timur Tabi, Mark Rutland
  Cc: Catalin Marinas, Christopher Covington, Mark Langsdorf,
	linux-doc, kvm, Jon Masters, Jonathan Corbet, Will Deacon,
	linux-kernel, linux-arm-kernel, Paolo Bonzini, kvmarm

[finally, some proper bikeshedding]

On 11/01/17 18:40, Timur Tabi wrote:
> On 01/11/2017 12:37 PM, Mark Rutland wrote:
>> The name, as it is, is perfectly descriptive.
>>
>> Let's not sacrifice legibility over a non-issue.
> 
> I don't want to kick a dead horse or anything, but changing it to 
> QCOM_FLKR_ERRATUM_1003 would eliminate all the spacing problems without 
> sacrificing anything.

Other than not being able to grep for the core name in the source tree,
how do you suggest we pronounce FLKR? Because so far, it rolls off the
tongue in an interesting way...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:37         ` Mark Rutland
  2017-01-11 18:40           ` Timur Tabi
@ 2017-01-12  9:59           ` Catalin Marinas
  1 sibling, 0 replies; 26+ messages in thread
From: Catalin Marinas @ 2017-01-12  9:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Timur Tabi, Mark Langsdorf, kvm, linux-doc, Marc Zyngier,
	Jonathan Corbet, Will Deacon, linux-kernel,
	Christopher Covington, Jon Masters, Paolo Bonzini, kvmarm,
	linux-arm-kernel

On Wed, Jan 11, 2017 at 06:37:39PM +0000, Mark Rutland wrote:
> On Wed, Jan 11, 2017 at 12:35:55PM -0600, Timur Tabi wrote:
> > On 01/11/2017 12:33 PM, Mark Rutland wrote:
> > >It'll need to affect all lines since the kconfig column needs to expand
> > >by at least one character to fit QCOM_FALKOR_ERRATUM_1003.
> > 
> > Or we can make the macro shorter.
> 
> The name, as it is, is perfectly descriptive.
> 
> Let's not sacrifice legibility over a non-issue.

I agree, I didn't realise that the we expand the last column already.
It's a non-issue indeed.

-- 
Catalin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:40       ` Mark Rutland
@ 2017-01-12 15:45         ` Catalin Marinas
  2017-01-12 16:12           ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Catalin Marinas @ 2017-01-12 15:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Mark Langsdorf, kvm, linux-doc, timur,
	Jonathan Corbet, Will Deacon, linux-kernel,
	Christopher Covington, Jon Masters, Paolo Bonzini, kvmarm,
	linux-arm-kernel

On Wed, Jan 11, 2017 at 06:40:52PM +0000, Mark Rutland wrote:
> On Wed, Jan 11, 2017 at 06:22:08PM +0000, Marc Zyngier wrote:
> > On 11/01/17 18:06, Catalin Marinas wrote:
> > > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
> > >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > >> index 32682be..9ee46df 100644
> > >> --- a/arch/arm64/mm/proc.S
> > >> +++ b/arch/arm64/mm/proc.S
> > >> @@ -23,6 +23,7 @@
> > >>  #include <asm/assembler.h>
> > >>  #include <asm/asm-offsets.h>
> > >>  #include <asm/hwcap.h>
> > >> +#include <asm/mmu_context.h>
> > >>  #include <asm/pgtable.h>
> > >>  #include <asm/pgtable-hwdef.h>
> > >>  #include <asm/cpufeature.h>
> > >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
> > >>  ENTRY(cpu_do_switch_mm)
> > >>  	mmid	x1, x1				// get mm->context.id
> > >>  	bfi	x0, x1, #48, #16		// set the ASID
> > >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
> > >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> > >> +	mrs     x2, ttbr0_el1
> > >> +	mov     x3, #FALKOR_RESERVED_ASID
> > >> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
> > >> +	msr     ttbr0_el1, x2
> > >> +	isb
> > >> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
> > >> +	msr     ttbr0_el1, x2
> > >> +	isb
> > >> +alternative_else_nop_endif
> > >> +#endif
> > >>  	msr	ttbr0_el1, x0			// set TTBR0
> > >>  	isb
> > >>  	post_ttbr0_update_workaround
> > > 
> > > Please move the above hunk to a pre_ttbr0_update_workaround macro for
> > > consistency with post_ttbr0_update_workaround.
> > 
> > In which case (and also for consistency), should we add that pre_ttbr0
> > macro to entry.S, just before __uaccess_ttbr0_enable? It may not be
> > needed in the SW pan case, but it is probably worth entertaining the
> > idea that there may be something to do there...
> 
> Likewise, I beleive we may need to modify cpu_set_reserved_ttbr0().

This may be fine if my assumptions about this erratum are correct. In
the cpu_set_reserved_ttbr0() case we set TTBR0_EL1 to a table without
any entries, so no new entries could be tagged with the old ASID.

-- 
Catalin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:22     ` Marc Zyngier
  2017-01-11 18:40       ` Mark Rutland
@ 2017-01-12 15:55       ` Catalin Marinas
  2017-01-12 16:07         ` Will Deacon
  1 sibling, 1 reply; 26+ messages in thread
From: Catalin Marinas @ 2017-01-12 15:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christopher Covington, Mark Langsdorf, linux-doc, kvm,
	Radim Krčmář,
	Jon Masters, timur, Jonathan Corbet, Will Deacon, linux-kernel,
	shankerd, Christoffer Dall, Mark Salter, Paolo Bonzini, kvmarm,
	linux-arm-kernel

On Wed, Jan 11, 2017 at 06:22:08PM +0000, Marc Zyngier wrote:
> On 11/01/17 18:06, Catalin Marinas wrote:
> > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
> >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> >> index 32682be..9ee46df 100644
> >> --- a/arch/arm64/mm/proc.S
> >> +++ b/arch/arm64/mm/proc.S
> >> @@ -23,6 +23,7 @@
> >>  #include <asm/assembler.h>
> >>  #include <asm/asm-offsets.h>
> >>  #include <asm/hwcap.h>
> >> +#include <asm/mmu_context.h>
> >>  #include <asm/pgtable.h>
> >>  #include <asm/pgtable-hwdef.h>
> >>  #include <asm/cpufeature.h>
> >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
> >>  ENTRY(cpu_do_switch_mm)
> >>  	mmid	x1, x1				// get mm->context.id
> >>  	bfi	x0, x1, #48, #16		// set the ASID
> >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
> >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> >> +	mrs     x2, ttbr0_el1
> >> +	mov     x3, #FALKOR_RESERVED_ASID
> >> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
> >> +	msr     ttbr0_el1, x2
> >> +	isb
> >> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
> >> +	msr     ttbr0_el1, x2
> >> +	isb
> >> +alternative_else_nop_endif
> >> +#endif
> >>  	msr	ttbr0_el1, x0			// set TTBR0
> >>  	isb
> >>  	post_ttbr0_update_workaround
> > 
> > Please move the above hunk to a pre_ttbr0_update_workaround macro for
> > consistency with post_ttbr0_update_workaround.
> 
> In which case (and also for consistency), should we add that pre_ttbr0
> macro to entry.S, just before __uaccess_ttbr0_enable? It may not be
> needed in the SW pan case, but it is probably worth entertaining the
> idea that there may be something to do there...

It may actually be needed in entry.S as well. With SW PAN, we move the
context switching from cpu_do_switch_mm to the kernel_exit macro when
returning to user. In this case we are switching from the reserved ASID
0 and reserved TTBR0_EL1 (pointing to a zeroed page) to the user's
TTBR0_EL1 and ASID. If the ASID switch isn't taken into account, we may
end up with new TLB entries being tagged with the reserved ASID. Apart
from a potential loss of protection with TTBR0 PAN, is there anything
else that could go wrong? Maybe a TLB conflict if we mix TLBs from
multiple address spaces tagged with the same reserved ASID.

If the above is an issue, we would need to patch
__uaccess_ttbr0_enable() as well, though I'm more inclined to make this
erratum not selectable when TTBR0 PAN is enabled.

-- 
Catalin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-12 15:55       ` Catalin Marinas
@ 2017-01-12 16:07         ` Will Deacon
  0 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2017-01-12 16:07 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Christopher Covington, Mark Langsdorf, linux-doc,
	kvm, Radim Krčmář,
	Jon Masters, timur, Jonathan Corbet, linux-kernel, shankerd,
	Christoffer Dall, Mark Salter, Paolo Bonzini, kvmarm,
	linux-arm-kernel

On Thu, Jan 12, 2017 at 03:55:58PM +0000, Catalin Marinas wrote:
> On Wed, Jan 11, 2017 at 06:22:08PM +0000, Marc Zyngier wrote:
> > On 11/01/17 18:06, Catalin Marinas wrote:
> > > On Wed, Jan 11, 2017 at 09:41:15AM -0500, Christopher Covington wrote:
> > >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > >> index 32682be..9ee46df 100644
> > >> --- a/arch/arm64/mm/proc.S
> > >> +++ b/arch/arm64/mm/proc.S
> > >> @@ -23,6 +23,7 @@
> > >>  #include <asm/assembler.h>
> > >>  #include <asm/asm-offsets.h>
> > >>  #include <asm/hwcap.h>
> > >> +#include <asm/mmu_context.h>
> > >>  #include <asm/pgtable.h>
> > >>  #include <asm/pgtable-hwdef.h>
> > >>  #include <asm/cpufeature.h>
> > >> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
> > >>  ENTRY(cpu_do_switch_mm)
> > >>  	mmid	x1, x1				// get mm->context.id
> > >>  	bfi	x0, x1, #48, #16		// set the ASID
> > >> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
> > >> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> > >> +	mrs     x2, ttbr0_el1
> > >> +	mov     x3, #FALKOR_RESERVED_ASID
> > >> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
> > >> +	msr     ttbr0_el1, x2
> > >> +	isb
> > >> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
> > >> +	msr     ttbr0_el1, x2
> > >> +	isb
> > >> +alternative_else_nop_endif
> > >> +#endif
> > >>  	msr	ttbr0_el1, x0			// set TTBR0
> > >>  	isb
> > >>  	post_ttbr0_update_workaround
> > > 
> > > Please move the above hunk to a pre_ttbr0_update_workaround macro for
> > > consistency with post_ttbr0_update_workaround.
> > 
> > In which case (and also for consistency), should we add that pre_ttbr0
> > macro to entry.S, just before __uaccess_ttbr0_enable? It may not be
> > needed in the SW pan case, but it is probably worth entertaining the
> > idea that there may be something to do there...
> 
> It may actually be needed in entry.S as well. With SW PAN, we move the
> context switching from cpu_do_switch_mm to the kernel_exit macro when
> returning to user. In this case we are switching from the reserved ASID
> 0 and reserved TTBR0_EL1 (pointing to a zeroed page) to the user's
> TTBR0_EL1 and ASID. If the ASID switch isn't taken into account, we may
> end up with new TLB entries being tagged with the reserved ASID. Apart
> from a potential loss of protection with TTBR0 PAN, is there anything
> else that could go wrong? Maybe a TLB conflict if we mix TLBs from
> multiple address spaces tagged with the same reserved ASID.
> 
> If the above is an issue, we would need to patch
> __uaccess_ttbr0_enable() as well, though I'm more inclined to make this
> erratum not selectable when TTBR0 PAN is enabled.

I don't think that's a reasonable approach. By all means change the
default, but we need to support kernel images with both of these kconfig
options enabled.

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-12 15:45         ` Catalin Marinas
@ 2017-01-12 16:12           ` Mark Rutland
  2017-01-24 14:27             ` Christopher Covington
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2017-01-12 16:12 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Mark Langsdorf, kvm, linux-doc, timur,
	Jonathan Corbet, Will Deacon, linux-kernel,
	Christopher Covington, Jon Masters, Paolo Bonzini, kvmarm,
	linux-arm-kernel

On Thu, Jan 12, 2017 at 03:45:48PM +0000, Catalin Marinas wrote:
> On Wed, Jan 11, 2017 at 06:40:52PM +0000, Mark Rutland wrote:

> > Likewise, I beleive we may need to modify cpu_set_reserved_ttbr0().
> 
> This may be fine if my assumptions about this erratum are correct. In
> the cpu_set_reserved_ttbr0() case we set TTBR0_EL1 to a table without
> any entries, so no new entries could be tagged with the old ASID.

For some reason, I was under the impression that the issue was old table
entries being allocated to the new ASID. Looking over the series again,
it's not clear to me precisely which cases can occur.

It would be good to see that clarified.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros
  2017-01-11 14:41 ` [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros Christopher Covington
@ 2017-01-12 16:58   ` Will Deacon
  2017-01-13 15:12     ` Christopher Covington
  0 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2017-01-12 16:58 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, kvm,
	linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Mark Langsdorf, Mark Salter, Jon Masters

Hi Christopher,

On Wed, Jan 11, 2017 at 09:41:16AM -0500, Christopher Covington wrote:
> This refactoring will allow an errata workaround that repeats tlbi dsb
> sequences to only change one location. This is not intended to change the
> generated assembly and comparison of before and after preprocessor output
> of arch/arm64/mm/mmu.c and vmlinux objdump shows no functional changes.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> ---
>  arch/arm64/include/asm/tlbflush.h | 104 +++++++++++++++++++++++++-------------
>  1 file changed, 69 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index deab523..f28813c 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -25,22 +25,69 @@
>  #include <asm/cputype.h>
>  
>  /*
> - * Raw TLBI operations.
> + * Raw TLBI, DSB operations
>   *
> - * Where necessary, use the __tlbi() macro to avoid asm()
> - * boilerplate. Drivers and most kernel code should use the TLB
> - * management routines in preference to the macro below.
> + * Where necessary, use __tlbi_*dsb() macros to avoid asm() boilerplate.
> + * Drivers and most kernel code should use the TLB management routines in
> + * preference to the macros below.
>   *
> - * The macro can be used as __tlbi(op) or __tlbi(op, arg), depending
> - * on whether a particular TLBI operation takes an argument or
> - * not. The macros handles invoking the asm with or without the
> - * register argument as appropriate.
> + * The __tlbi_dsb() macro handles invoking the asm without any register
> + * argument, with a single register argument, and with start (included)
> + * and end (excluded) range of register arguments. For example:
> + *
> + * __tlbi_dsb(op, attr)
> + *
> + * 	tlbi op
> + *	dsb attr
> + *
> + * __tlbi_dsb(op, attr, addr)
> + *
> + *	mov %[addr], =addr
> + *	tlbi op, %[addr]
> + *	dsb attr
> + *
> + * __tlbi_range_dsb(op, attr, start, end)
> + *
> + * 	mov %[arg], =start
> + *	mov %[end], =end
> + * for:
> + * 	tlbi op, %[addr]
> + * 	add %[addr], %[addr], #(1 << (PAGE_SHIFT - 12))
> + * 	cmp %[addr], %[end]
> + * 	b.ne for
> + * 	dsb attr
>   */
> -#define __TLBI_0(op, arg)		asm ("tlbi " #op)
> -#define __TLBI_1(op, arg)		asm ("tlbi " #op ", %0" : : "r" (arg))
> -#define __TLBI_N(op, arg, n, ...)	__TLBI_##n(op, arg)
>  
> -#define __tlbi(op, ...)		__TLBI_N(op, ##__VA_ARGS__, 1, 0)
> +#define __TLBI_FOR_0(ig0, ig1, ig2)
> +#define __TLBI_INSTR_0(op, ig1, ig2)	"tlbi " #op
> +#define __TLBI_IO_0(ig0, ig1, ig2)	: :
> +
> +#define __TLBI_FOR_1(ig0, ig1, ig2)
> +#define __TLBI_INSTR_1(op, ig0, ig1)	"tlbi " #op ", %0"
> +#define __TLBI_IO_1(ig0, arg, ig1)	: : "r" (arg)
> +
> +#define __TLBI_FOR_2(ig0, start, ig1)	unsigned long addr;		       \
> +					for (addr = start; addr < end;	       \
> +						addr += 1 << (PAGE_SHIFT - 12))
> +#define __TLBI_INSTR_2(op, ig0, ig1)	"tlbi " #op ", %0"
> +#define __TLBI_IO_2(ig0, ig1, ig2)	: : "r" (addr)
> +
> +#define __TLBI_FOR_N(op, a1, a2, n, ...)	__TLBI_FOR_##n(op, a1, a2)
> +#define __TLBI_INSTR_N(op, a1, a2, n, ...)	__TLBI_INSTR_##n(op, a1, a2)
> +#define __TLBI_IO_N(op, a1, a2, n, ...)	__TLBI_IO_##n(op, a1, a2)
> +
> +#define __TLBI_FOR(op, ...)		__TLBI_FOR_N(op, ##__VA_ARGS__, 2, 1, 0)
> +#define __TLBI_INSTR(op, ...)		__TLBI_INSTR_N(op, ##__VA_ARGS__, 2, 1, 0)
> +#define __TLBI_IO(op, ...)		__TLBI_IO_N(op, ##__VA_ARGS__, 2, 1, 0)
> +
> +#define __tlbi_asm_dsb(as, op, attr, ...) do {				       \
> +		__TLBI_FOR(op, ##__VA_ARGS__)				       \
> +			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
> +			__TLBI_IO(op, ##__VA_ARGS__));			       \
> +		asm volatile (	     as			"\ndsb " #attr "\n"    \
> +		: : : "memory"); } while (0)
> +
> +#define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)

I can't deny that this is cool, but ultimately it's completely unreadable.
What I was thinking you'd do would be make __tlbi expand to:

  tlbi
  dsb
  tlbi
  dsb

for Falkor, and:

  tlbi
  nop
  nop
  nop

for everybody else.

Wouldn't that localise this change sufficiently that you wouldn't need
to change all the callers and encode the looping in your cpp macros?

I realise you get an extra dsb in some places with that change, but I'd
like to see numbers for the impact of that on top of the workaround. If
it's an issue, then an alternative sequence would be:

  tlbi
  dsb
  tlbi

and you'd rely on the existing dsb to complete that.

Having said that, I don't understand how your current loop code works
when the workaround is applied. AFAICT, you end up emitting something
like:

dsb ishst
for i in 0 to n
	tlbi va+i
dsb
tlbi va+n
dsb

which looks wrong to me. Am I misreading something here?

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros
  2017-01-12 16:58   ` Will Deacon
@ 2017-01-13 15:12     ` Christopher Covington
  2017-01-13 16:12       ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Christopher Covington @ 2017-01-13 15:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, kvm,
	linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Mark Langsdorf, Mark Salter, Jon Masters

Hi Will,

On 01/12/2017 11:58 AM, Will Deacon wrote:
> Hi Christopher,
> 
> On Wed, Jan 11, 2017 at 09:41:16AM -0500, Christopher Covington wrote:
>> This refactoring will allow an errata workaround that repeats tlbi dsb
>> sequences to only change one location. This is not intended to change the
>> generated assembly and comparison of before and after preprocessor output
>> of arch/arm64/mm/mmu.c and vmlinux objdump shows no functional changes.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> ---
>>  arch/arm64/include/asm/tlbflush.h | 104 +++++++++++++++++++++++++-------------
>>  1 file changed, 69 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index deab523..f28813c 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -25,22 +25,69 @@
>>  #include <asm/cputype.h>
>>  
>>  /*
>> - * Raw TLBI operations.
>> + * Raw TLBI, DSB operations
>>   *
>> - * Where necessary, use the __tlbi() macro to avoid asm()
>> - * boilerplate. Drivers and most kernel code should use the TLB
>> - * management routines in preference to the macro below.
>> + * Where necessary, use __tlbi_*dsb() macros to avoid asm() boilerplate.
>> + * Drivers and most kernel code should use the TLB management routines in
>> + * preference to the macros below.
>>   *
>> - * The macro can be used as __tlbi(op) or __tlbi(op, arg), depending
>> - * on whether a particular TLBI operation takes an argument or
>> - * not. The macros handles invoking the asm with or without the
>> - * register argument as appropriate.
>> + * The __tlbi_dsb() macro handles invoking the asm without any register
>> + * argument, with a single register argument, and with start (included)
>> + * and end (excluded) range of register arguments. For example:
>> + *
>> + * __tlbi_dsb(op, attr)
>> + *
>> + * 	tlbi op
>> + *	dsb attr
>> + *
>> + * __tlbi_dsb(op, attr, addr)
>> + *
>> + *	mov %[addr], =addr
>> + *	tlbi op, %[addr]
>> + *	dsb attr
>> + *
>> + * __tlbi_range_dsb(op, attr, start, end)
>> + *
>> + * 	mov %[arg], =start
>> + *	mov %[end], =end
>> + * for:
>> + * 	tlbi op, %[addr]
>> + * 	add %[addr], %[addr], #(1 << (PAGE_SHIFT - 12))
>> + * 	cmp %[addr], %[end]
>> + * 	b.ne for
>> + * 	dsb attr
>>   */
>> -#define __TLBI_0(op, arg)		asm ("tlbi " #op)
>> -#define __TLBI_1(op, arg)		asm ("tlbi " #op ", %0" : : "r" (arg))
>> -#define __TLBI_N(op, arg, n, ...)	__TLBI_##n(op, arg)
>>  
>> -#define __tlbi(op, ...)		__TLBI_N(op, ##__VA_ARGS__, 1, 0)
>> +#define __TLBI_FOR_0(ig0, ig1, ig2)
>> +#define __TLBI_INSTR_0(op, ig1, ig2)	"tlbi " #op
>> +#define __TLBI_IO_0(ig0, ig1, ig2)	: :
>> +
>> +#define __TLBI_FOR_1(ig0, ig1, ig2)
>> +#define __TLBI_INSTR_1(op, ig0, ig1)	"tlbi " #op ", %0"
>> +#define __TLBI_IO_1(ig0, arg, ig1)	: : "r" (arg)
>> +
>> +#define __TLBI_FOR_2(ig0, start, ig1)	unsigned long addr;		       \
>> +					for (addr = start; addr < end;	       \
>> +						addr += 1 << (PAGE_SHIFT - 12))
>> +#define __TLBI_INSTR_2(op, ig0, ig1)	"tlbi " #op ", %0"
>> +#define __TLBI_IO_2(ig0, ig1, ig2)	: : "r" (addr)
>> +
>> +#define __TLBI_FOR_N(op, a1, a2, n, ...)	__TLBI_FOR_##n(op, a1, a2)
>> +#define __TLBI_INSTR_N(op, a1, a2, n, ...)	__TLBI_INSTR_##n(op, a1, a2)
>> +#define __TLBI_IO_N(op, a1, a2, n, ...)	__TLBI_IO_##n(op, a1, a2)
>> +
>> +#define __TLBI_FOR(op, ...)		__TLBI_FOR_N(op, ##__VA_ARGS__, 2, 1, 0)
>> +#define __TLBI_INSTR(op, ...)		__TLBI_INSTR_N(op, ##__VA_ARGS__, 2, 1, 0)
>> +#define __TLBI_IO(op, ...)		__TLBI_IO_N(op, ##__VA_ARGS__, 2, 1, 0)
>> +
>> +#define __tlbi_asm_dsb(as, op, attr, ...) do {				       \
>> +		__TLBI_FOR(op, ##__VA_ARGS__)				       \
>> +			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
>> +			__TLBI_IO(op, ##__VA_ARGS__));			       \
>> +		asm volatile (	     as			"\ndsb " #attr "\n"    \
>> +		: : : "memory"); } while (0)
>> +
>> +#define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)
> 
> I can't deny that this is cool, but ultimately it's completely unreadable.
> What I was thinking you'd do would be make __tlbi expand to:
> 
>   tlbi
>   dsb
>   tlbi
>   dsb
> 
> for Falkor, and:
> 
>   tlbi
>   nop
>   nop
>   nop
> 
> for everybody else.

Thanks for the suggestion. So would __tlbi take a dsb sharability argument in
your proposal? Or would it be communicated in some other fashion, maybe inferred
from the tlbi argument? Or would the workaround dsbs all be the worst/broadest
case?

> Wouldn't that localise this change sufficiently that you wouldn't need
> to change all the callers and encode the looping in your cpp macros?
> 
> I realise you get an extra dsb in some places with that change, but I'd
> like to see numbers for the impact of that on top of the workaround. If
> it's an issue, then an alternative sequence would be:
> 
>   tlbi
>   dsb
>   tlbi
> 
> and you'd rely on the existing dsb to complete that.
> 
> Having said that, I don't understand how your current loop code works
> when the workaround is applied. AFAICT, you end up emitting something
> like:
> 
> dsb ishst
> for i in 0 to n
> 	tlbi va+i
> dsb
> tlbi va+n
> dsb
> 
> which looks wrong to me. Am I misreading something here?

You're right, I am off by 1 << (PAGE_SHIFT - 12) here. I would need to
increment, compare, not take the loop branch (regular for loop stuff),
then decrement (missing) and perform TLB invalidation again (present but
using incorrect value).

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros
  2017-01-13 15:12     ` Christopher Covington
@ 2017-01-13 16:12       ` Will Deacon
  2017-01-24 22:03         ` Christopher Covington
  0 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2017-01-13 16:12 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Catalin Marinas, kvm,
	linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Mark Langsdorf, Mark Salter, Jon Masters

On Fri, Jan 13, 2017 at 10:12:36AM -0500, Christopher Covington wrote:
> On 01/12/2017 11:58 AM, Will Deacon wrote:
> > On Wed, Jan 11, 2017 at 09:41:16AM -0500, Christopher Covington wrote:
> >> +#define __tlbi_asm_dsb(as, op, attr, ...) do {				       \
> >> +		__TLBI_FOR(op, ##__VA_ARGS__)				       \
> >> +			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
> >> +			__TLBI_IO(op, ##__VA_ARGS__));			       \
> >> +		asm volatile (	     as			"\ndsb " #attr "\n"    \
> >> +		: : : "memory"); } while (0)
> >> +
> >> +#define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)
> > 
> > I can't deny that this is cool, but ultimately it's completely unreadable.
> > What I was thinking you'd do would be make __tlbi expand to:
> > 
> >   tlbi
> >   dsb
> >   tlbi
> >   dsb
> > 
> > for Falkor, and:
> > 
> >   tlbi
> >   nop
> >   nop
> >   nop
> > 
> > for everybody else.
> 
> Thanks for the suggestion. So would __tlbi take a dsb sharability argument in
> your proposal? Or would it be communicated in some other fashion, maybe inferred
> from the tlbi argument? Or would the workaround dsbs all be the worst/broadest
> case?

I think always using inner-shareable should be ok. If you wanted to optimise
this, you'd want to avoid the workaround altogether for non-shareable
invalidation, but that's fairly rare and I doubt you'd be able to measure
the impact.

> > Wouldn't that localise this change sufficiently that you wouldn't need
> > to change all the callers and encode the looping in your cpp macros?
> > 
> > I realise you get an extra dsb in some places with that change, but I'd
> > like to see numbers for the impact of that on top of the workaround. If
> > it's an issue, then an alternative sequence would be:
> > 
> >   tlbi
> >   dsb
> >   tlbi
> > 
> > and you'd rely on the existing dsb to complete that.
> > 
> > Having said that, I don't understand how your current loop code works
> > when the workaround is applied. AFAICT, you end up emitting something
> > like:
> > 
> > dsb ishst
> > for i in 0 to n
> > 	tlbi va+i
> > dsb
> > tlbi va+n
> > dsb
> > 
> > which looks wrong to me. Am I misreading something here?
> 
> You're right, I am off by 1 << (PAGE_SHIFT - 12) here. I would need to
> increment, compare, not take the loop branch (regular for loop stuff),
> then decrement (missing) and perform TLB invalidation again (present but
> using incorrect value).

It also strikes me as odd that you only need one extra TLBI after the loop
has finished, as opposed to a tlbi; dsb; tlbi loop body (which is what you'd
get if you modified __tlbi as I suggest).

Is it sufficient to have one extra TLBI after the loop and, if so, is the
performance impact of my suggestion therefore unacceptable?

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:45             ` Mark Rutland
@ 2017-01-16 14:26               ` Christopher Covington
  0 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-16 14:26 UTC (permalink / raw)
  To: Mark Rutland, Timur Tabi
  Cc: Catalin Marinas, Mark Langsdorf, linux-doc, kvm, Marc Zyngier,
	Jon Masters, Jonathan Corbet, Will Deacon, linux-kernel,
	linux-arm-kernel, Paolo Bonzini, kvmarm

Hi Mark,

On 01/11/2017 01:45 PM, Mark Rutland wrote:
> On Wed, Jan 11, 2017 at 12:40:42PM -0600, Timur Tabi wrote:
>> On 01/11/2017 12:37 PM, Mark Rutland wrote:
>>> The name, as it is, is perfectly descriptive.
>>>
>>> Let's not sacrifice legibility over a non-issue.
>>
>> I don't want to kick a dead horse or anything, but changing it to
>> QCOM_FLKR_ERRATUM_1003 would eliminate all the spacing problems
>> without sacrificing anything.
> 
> The CPU is called "Falkor", not "FLKR", and we're not coming up with an
> ACPI table name...
> 
> The ARM Ltd. erratum numbers are global to all parts, so we don't
> include the part name. Is the 1003 erratum number specific to Falkor?
>
> If it's global, you could use QCOM_ERRATUM_1003 instead.

E1003 is specific to Falkor, and hopefully just its first major revision.
Qualcomm Technology's first/previous generation ARMv8 custom
microarchitecture used errata numbers below 1000. I am not aware of
global coordination in the numbering, unfortunately.

> Otherwise, QCOM_FALKOR_ERRATUM_1003 is preferable.

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-12 16:12           ` Mark Rutland
@ 2017-01-24 14:27             ` Christopher Covington
  0 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-24 14:27 UTC (permalink / raw)
  To: Mark Rutland, Catalin Marinas
  Cc: Mark Langsdorf, kvm, Jonathan Corbet, Marc Zyngier, Jon Masters,
	timur, linux-doc, Will Deacon, linux-kernel, Paolo Bonzini,
	kvmarm, linux-arm-kernel

On 01/12/2017 11:12 AM, Mark Rutland wrote:
> On Thu, Jan 12, 2017 at 03:45:48PM +0000, Catalin Marinas wrote:
>> On Wed, Jan 11, 2017 at 06:40:52PM +0000, Mark Rutland wrote:
> 
>>> Likewise, I beleive we may need to modify cpu_set_reserved_ttbr0().
>>
>> This may be fine if my assumptions about this erratum are correct. In
>> the cpu_set_reserved_ttbr0() case we set TTBR0_EL1 to a table without
>> any entries, so no new entries could be tagged with the old ASID.
> 
> For some reason, I was under the impression that the issue was old table
> entries being allocated to the new ASID. Looking over the series again,
> it's not clear to me precisely which cases can occur.
> 
> It would be good to see that clarified.

I'll add more general background the commit message in the next revision
and I've also asked directly about this. The answer is that page table
entries using the new translation table base address will be allocated
into the TLB using the old ASID. If employing the workaround, it's
possible for page table entries using the new translation table base
to be allocated into the TLB using the reserved ASID.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/5] arm64: Work around Falkor erratum 1003
  2017-01-11 18:06   ` Catalin Marinas
  2017-01-11 18:22     ` Marc Zyngier
  2017-01-11 18:33     ` Mark Rutland
@ 2017-01-24 14:54     ` Christopher Covington
  2 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-24 14:54 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Paolo Bonzini, Radim Krčmář,
	Christoffer Dall, Marc Zyngier, Will Deacon, kvm,
	linux-arm-kernel, kvmarm, linux-kernel, shankerd, timur,
	Jonathan Corbet, linux-doc, Jon Masters, Mark Langsdorf,
	Mark Salter

Hi Catalin,

On 01/11/2017 01:06 PM, Catalin Marinas wrote:
> Some minor comments below, nothing fundamental (as long as you say the
> new sequence doesn't have the speculative TLB load problem I mentioned
> on a previous version).

This workaround is documented as providing functional correctness for
both explicit and speculative memory accesses.

>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>> index 32682be..9ee46df 100644
>> --- a/arch/arm64/mm/proc.S
>> +++ b/arch/arm64/mm/proc.S
>> @@ -23,6 +23,7 @@
>>  #include <asm/assembler.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/hwcap.h>
>> +#include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/pgtable-hwdef.h>
>>  #include <asm/cpufeature.h>
>> @@ -140,6 +141,18 @@ ENDPROC(cpu_do_resume)
>>  ENTRY(cpu_do_switch_mm)
>>  	mmid	x1, x1				// get mm->context.id
>>  	bfi	x0, x1, #48, #16		// set the ASID
>> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
>> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
>> +	mrs     x2, ttbr0_el1
>> +	mov     x3, #FALKOR_RESERVED_ASID
>> +	bfi     x2, x3, #48, #16                // reserved ASID + old BADDR
>> +	msr     ttbr0_el1, x2
>> +	isb
>> +	bfi     x2, x0, #0, #48                 // reserved ASID + new BADDR
>> +	msr     ttbr0_el1, x2
>> +	isb
>> +alternative_else_nop_endif
>> +#endif
>>  	msr	ttbr0_el1, x0			// set TTBR0
>>  	isb
>>  	post_ttbr0_update_workaround
> 
> Please move the above hunk to a pre_ttbr0_update_workaround macro for
> consistency with post_ttbr0_update_workaround.

How should I deal with inputs to the macro?

A) Use no input parameters and hard code x0 usage
B) Use a macro input parameter for the new TTBR value (x0)
C) Something else

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros
  2017-01-13 16:12       ` Will Deacon
@ 2017-01-24 22:03         ` Christopher Covington
  0 siblings, 0 replies; 26+ messages in thread
From: Christopher Covington @ 2017-01-24 22:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Langsdorf, Jon Masters, kvm, Radim Krčmář,
	Marc Zyngier, Catalin Marinas, timur, linux-kernel, shankerd,
	linux-arm-kernel, Mark Salter, Paolo Bonzini, kvmarm,
	Christoffer Dall

Hi Will,

On 01/13/2017 11:12 AM, Will Deacon wrote:
> On Fri, Jan 13, 2017 at 10:12:36AM -0500, Christopher Covington wrote:
>> On 01/12/2017 11:58 AM, Will Deacon wrote:
>>> On Wed, Jan 11, 2017 at 09:41:16AM -0500, Christopher Covington wrote:
>>>> +#define __tlbi_asm_dsb(as, op, attr, ...) do {				       \
>>>> +		__TLBI_FOR(op, ##__VA_ARGS__)				       \
>>>> +			asm (__TLBI_INSTR(op, ##__VA_ARGS__)		       \
>>>> +			__TLBI_IO(op, ##__VA_ARGS__));			       \
>>>> +		asm volatile (	     as			"\ndsb " #attr "\n"    \
>>>> +		: : : "memory"); } while (0)
>>>> +
>>>> +#define __tlbi_dsb(...)	__tlbi_asm_dsb("", ##__VA_ARGS__)
>>>
>>> I can't deny that this is cool, but ultimately it's completely unreadable.
>>> What I was thinking you'd do would be make __tlbi expand to:
>>>
>>>   tlbi
>>>   dsb
>>>   tlbi
>>>   dsb
>>>
>>> for Falkor, and:
>>>
>>>   tlbi
>>>   nop
>>>   nop
>>>   nop
>>>
>>> for everybody else.

I've implemented this (minus the last dsb / nop) in the next revision.

>> Thanks for the suggestion. So would __tlbi take a dsb sharability argument in
>> your proposal? Or would it be communicated in some other fashion, maybe inferred
>> from the tlbi argument? Or would the workaround dsbs all be the worst/broadest
>> case?
> 
> I think always using inner-shareable should be ok. If you wanted to optimise
> this, you'd want to avoid the workaround altogether for non-shareable
> invalidation, but that's fairly rare and I doubt you'd be able to measure
> the impact.

I did not originally notice that Shanker's original workaround implementation
unnecessarily applies the workaround to non-shareable invalidations. They're
not affected by the erratum. But as you say, it's simpler to modify __tlbi for
all cases. I'm not currently worried about that performance impact.

>>> Wouldn't that localise this change sufficiently that you wouldn't need
>>> to change all the callers and encode the looping in your cpp macros?
>>>
>>> I realise you get an extra dsb in some places with that change, but I'd
>>> like to see numbers for the impact of that on top of the workaround. If
>>> it's an issue, then an alternative sequence would be:
>>>
>>>   tlbi
>>>   dsb
>>>   tlbi
>>>
>>> and you'd rely on the existing dsb to complete that.
>>>
>>> Having said that, I don't understand how your current loop code works
>>> when the workaround is applied. AFAICT, you end up emitting something
>>> like:
>>>
>>> dsb ishst
>>> for i in 0 to n
>>> 	tlbi va+i
>>> dsb
>>> tlbi va+n
>>> dsb
>>>
>>> which looks wrong to me. Am I misreading something here?
>>
>> You're right, I am off by 1 << (PAGE_SHIFT - 12) here. I would need to
>> increment, compare, not take the loop branch (regular for loop stuff),
>> then decrement (missing) and perform TLB invalidation again (present but
>> using incorrect value).
> 
> It also strikes me as odd that you only need one extra TLBI after the loop
> has finished, as opposed to a tlbi; dsb; tlbi loop body (which is what you'd
> get if you modified __tlbi as I suggest).
> 
> Is it sufficient to have one extra TLBI after the loop and, if so, is the
> performance impact of my suggestion therefore unacceptable?

One is sufficient according to the errata documentation. I've described that
aspect in the commit message of the next revision.

I've suggested colleagues follow up regarding performance. But reliable
functionality comes first.

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-01-24 22:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 14:41 [PATCH v3 1/5] arm64: Define Falkor v1 CPU Christopher Covington
2017-01-11 14:41 ` [PATCH v3 2/5] arm64: Work around Falkor erratum 1003 Christopher Covington
2017-01-11 18:06   ` Catalin Marinas
2017-01-11 18:22     ` Marc Zyngier
2017-01-11 18:40       ` Mark Rutland
2017-01-12 15:45         ` Catalin Marinas
2017-01-12 16:12           ` Mark Rutland
2017-01-24 14:27             ` Christopher Covington
2017-01-12 15:55       ` Catalin Marinas
2017-01-12 16:07         ` Will Deacon
2017-01-11 18:33     ` Mark Rutland
2017-01-11 18:35       ` Timur Tabi
2017-01-11 18:37         ` Mark Rutland
2017-01-11 18:40           ` Timur Tabi
2017-01-11 18:45             ` Mark Rutland
2017-01-16 14:26               ` Christopher Covington
2017-01-11 18:50             ` Marc Zyngier
2017-01-12  9:59           ` Catalin Marinas
2017-01-24 14:54     ` Christopher Covington
2017-01-11 14:41 ` [PATCH v3 3/5] arm64: Create and use __tlbi_dsb() macros Christopher Covington
2017-01-12 16:58   ` Will Deacon
2017-01-13 15:12     ` Christopher Covington
2017-01-13 16:12       ` Will Deacon
2017-01-24 22:03         ` Christopher Covington
2017-01-11 14:41 ` [PATCH v3 4/5] arm64: Use __tlbi_dsb() macros in KVM code Christopher Covington
2017-01-11 14:41 ` [PATCH v3 5/5] arm64: Work around Falkor erratum 1009 Christopher Covington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).