All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] arm64 switch_mm improvements
@ 2015-10-06 17:46 Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is version two of the patches previously posted here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370720.html

Changes since v1 include:

  * More comments
  * Added reviewed-by tags

Cheers,

Will

--->8

Will Deacon (10):
  arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  arm64: proc: de-scope TLBI operation during cold boot
  arm64: flush: use local TLB and I-cache invalidation
  arm64: mm: rewrite ASID allocator and MM context-switching code
  arm64: tlbflush: remove redundant ASID casts to (unsigned long)
  arm64: tlbflush: avoid flushing when fullmm == 1
  arm64: switch_mm: simplify mm and CPU checks
  arm64: mm: kill mm_cpumask usage
  arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
  arm64: mm: remove dsb from update_mmu_cache

 arch/arm64/include/asm/cacheflush.h  |   7 ++
 arch/arm64/include/asm/mmu.h         |  15 +--
 arch/arm64/include/asm/mmu_context.h | 113 ++++-------------
 arch/arm64/include/asm/pgtable.h     |   6 +-
 arch/arm64/include/asm/thread_info.h |   1 -
 arch/arm64/include/asm/tlb.h         |  26 ++--
 arch/arm64/include/asm/tlbflush.h    |  18 ++-
 arch/arm64/kernel/asm-offsets.c      |   2 +-
 arch/arm64/kernel/efi.c              |   5 +-
 arch/arm64/kernel/smp.c              |   9 +-
 arch/arm64/kernel/suspend.c          |   2 +-
 arch/arm64/mm/context.c              | 236 +++++++++++++++++++++--------------
 arch/arm64/mm/mmu.c                  |   2 +-
 arch/arm64/mm/proc.S                 |   6 +-
 14 files changed, 222 insertions(+), 226 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07  6:17   ` Ard Biesheuvel
  2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

With commit b08d4640a3dc ("arm64: remove dead code"),
cpu_set_idmap_tcr_t0sz is no longer called and can therefore be removed
from the kernel.

This patch removes the function and effectively inlines the helper
function __cpu_set_tcr_t0sz into cpu_set_default_tcr_t0sz.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 8ec41e5f56f0..549b89554ce8 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -77,34 +77,23 @@ static inline bool __cpu_uses_extended_idmap(void)
 		unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
 }
 
-static inline void __cpu_set_tcr_t0sz(u64 t0sz)
-{
-	unsigned long tcr;
-
-	if (__cpu_uses_extended_idmap())
-		asm volatile (
-		"	mrs	%0, tcr_el1	;"
-		"	bfi	%0, %1, %2, %3	;"
-		"	msr	tcr_el1, %0	;"
-		"	isb"
-		: "=&r" (tcr)
-		: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
-}
-
-/*
- * Set TCR.T0SZ to the value appropriate for activating the identity map.
- */
-static inline void cpu_set_idmap_tcr_t0sz(void)
-{
-	__cpu_set_tcr_t0sz(idmap_t0sz);
-}
-
 /*
  * Set TCR.T0SZ to its default value (based on VA_BITS)
  */
 static inline void cpu_set_default_tcr_t0sz(void)
 {
-	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
+	unsigned long tcr;
+
+	if (!__cpu_uses_extended_idmap())
+		return;
+
+	asm volatile (
+	"	mrs	%0, tcr_el1	;"
+	"	bfi	%0, %1, %2, %3	;"
+	"	msr	tcr_el1, %0	;"
+	"	isb"
+	: "=&r" (tcr)
+	: "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
 }
 
 static inline void switch_new_context(struct mm_struct *mm)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

When cold-booting a CPU, we must invalidate any junk entries from the
local TLB prior to enabling the MMU. This doesn't require broadcasting
within the inner-shareable domain, so de-scope the operation to apply
only to the local CPU.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/proc.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index e4ee7bd8830a..bbde13d77da5 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -146,8 +146,8 @@ ENDPROC(cpu_do_switch_mm)
  *	value of the SCTLR_EL1 register.
  */
 ENTRY(__cpu_setup)
-	tlbi	vmalle1is			// invalidate I + D TLBs
-	dsb	ish
+	tlbi	vmalle1				// Invalidate local TLB
+	dsb	nsh
 
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
  2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07  1:18   ` David Daney
  2015-10-07  6:18   ` Ard Biesheuvel
  2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

There are a number of places where a single CPU is running with a
private page-table and we need to perform maintenance on the TLB and
I-cache in order to ensure correctness, but do not require the operation
to be broadcast to other CPUs.

This patch adds local variants of tlb_flush_all and __flush_icache_all
to support these use-cases and updates the callers respectively.
__local_flush_icache_all also implies an isb, since it is intended to be
used synchronously.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cacheflush.h | 7 +++++++
 arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
 arch/arm64/kernel/efi.c             | 4 ++--
 arch/arm64/kernel/smp.c             | 2 +-
 arch/arm64/kernel/suspend.c         | 2 +-
 arch/arm64/mm/context.c             | 4 ++--
 arch/arm64/mm/mmu.c                 | 2 +-
 7 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index c75b8d027eb1..54efedaf331f 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -115,6 +115,13 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
 
+static inline void __local_flush_icache_all(void)
+{
+	asm("ic iallu");
+	dsb(nsh);
+	isb();
+}
+
 static inline void __flush_icache_all(void)
 {
 	asm("ic	ialluis");
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 7bd2da021658..96f944e75dc4 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -63,6 +63,14 @@
  *		only require the D-TLB to be invalidated.
  *		- kaddr - Kernel virtual memory address
  */
+static inline void local_flush_tlb_all(void)
+{
+	dsb(nshst);
+	asm("tlbi	vmalle1");
+	dsb(nsh);
+	isb();
+}
+
 static inline void flush_tlb_all(void)
 {
 	dsb(ishst);
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 13671a9cf016..4d12926ea40d 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -344,9 +344,9 @@ static void efi_set_pgd(struct mm_struct *mm)
 	else
 		cpu_switch_mm(mm->pgd, mm);
 
-	flush_tlb_all();
+	local_flush_tlb_all();
 	if (icache_is_aivivt())
-		__flush_icache_all();
+		__local_flush_icache_all();
 }
 
 void efi_virtmap_load(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dbdaacddd9a5..fdd4d4dbd64f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -152,7 +152,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	cpu_set_default_tcr_t0sz();
 
 	preempt_disable();
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 8297d502217e..3c5e4e6dcf68 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -90,7 +90,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 		else
 			cpu_switch_mm(mm->pgd, mm);
 
-		flush_tlb_all();
+		local_flush_tlb_all();
 
 		/*
 		 * Restore per-cpu offset before any kernel
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index d70ff14dbdbd..48b53fb381af 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -48,9 +48,9 @@ static void flush_context(void)
 {
 	/* set the reserved TTBR0 before flushing the TLB */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	if (icache_is_aivivt())
-		__flush_icache_all();
+		__local_flush_icache_all();
 }
 
 static void set_mm_context(struct mm_struct *mm, unsigned int asid)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9211b8527f25..71a310478c9e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -456,7 +456,7 @@ void __init paging_init(void)
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
 	cpu_set_reserved_ttbr0();
-	flush_tlb_all();
+	local_flush_tlb_all();
 	cpu_set_default_tcr_t0sz();
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (2 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Our current switch_mm implementation suffers from a number of problems:

  (1) The ASID allocator relies on IPIs to synchronise the CPUs on a
      rollover event

  (2) Because of (1), we cannot allocate ASIDs with interrupts disabled
      and therefore make use of a TIF_SWITCH_MM flag to postpone the
      actual switch to finish_arch_post_lock_switch

  (3) We run context switch with a reserved (invalid) TTBR0 value, even
      though the ASID and pgd are updated atomically

  (4) We take a global spinlock (cpu_asid_lock) during context-switch

  (5) We use h/w broadcast TLB operations when they are not required
      (e.g. in flush_context)

This patch addresses these problems by rewriting the ASID algorithm to
match the bitmap-based arch/arm/ implementation more closely. This in
turn allows us to remove much of the complications surrounding switch_mm,
including the ugly thread flag.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu.h         |  15 +--
 arch/arm64/include/asm/mmu_context.h |  76 ++---------
 arch/arm64/include/asm/thread_info.h |   1 -
 arch/arm64/kernel/asm-offsets.c      |   2 +-
 arch/arm64/kernel/efi.c              |   1 -
 arch/arm64/mm/context.c              | 238 +++++++++++++++++++++--------------
 arch/arm64/mm/proc.S                 |   2 +-
 7 files changed, 166 insertions(+), 169 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 030208767185..990124a67eeb 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -17,15 +17,16 @@
 #define __ASM_MMU_H
 
 typedef struct {
-	unsigned int id;
-	raw_spinlock_t id_lock;
-	void *vdso;
+	atomic64_t	id;
+	void		*vdso;
 } mm_context_t;
 
-#define INIT_MM_CONTEXT(name) \
-	.context.id_lock = __RAW_SPIN_LOCK_UNLOCKED(name.context.id_lock),
-
-#define ASID(mm)	((mm)->context.id & 0xffff)
+/*
+ * This macro is only used by the TLBI code, which cannot race with an
+ * ASID change and therefore doesn't need to reload the counter using
+ * atomic64_read.
+ */
+#define ASID(mm)	((mm)->context.id.counter & 0xffff)
 
 extern void paging_init(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 549b89554ce8..f4c74a951b6c 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -28,13 +28,6 @@
 #include <asm/cputype.h>
 #include <asm/pgtable.h>
 
-#define MAX_ASID_BITS	16
-
-extern unsigned int cpu_last_asid;
-
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void __new_context(struct mm_struct *mm);
-
 #ifdef CONFIG_PID_IN_CONTEXTIDR
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
@@ -96,66 +89,19 @@ static inline void cpu_set_default_tcr_t0sz(void)
 	: "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
 }
 
-static inline void switch_new_context(struct mm_struct *mm)
-{
-	unsigned long flags;
-
-	__new_context(mm);
-
-	local_irq_save(flags);
-	cpu_switch_mm(mm->pgd, mm);
-	local_irq_restore(flags);
-}
-
-static inline void check_and_switch_context(struct mm_struct *mm,
-					    struct task_struct *tsk)
-{
-	/*
-	 * Required during context switch to avoid speculative page table
-	 * walking with the wrong TTBR.
-	 */
-	cpu_set_reserved_ttbr0();
-
-	if (!((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS))
-		/*
-		 * The ASID is from the current generation, just switch to the
-		 * new pgd. This condition is only true for calls from
-		 * context_switch() and interrupts are already disabled.
-		 */
-		cpu_switch_mm(mm->pgd, mm);
-	else if (irqs_disabled())
-		/*
-		 * Defer the new ASID allocation until after the context
-		 * switch critical region since __new_context() cannot be
-		 * called with interrupts disabled.
-		 */
-		set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM);
-	else
-		/*
-		 * That is a direct call to switch_mm() or activate_mm() with
-		 * interrupts enabled and a new context.
-		 */
-		switch_new_context(mm);
-}
-
-#define init_new_context(tsk,mm)	(__init_new_context(tsk,mm),0)
+/*
+ * It would be nice to return ASIDs back to the allocator, but unfortunately
+ * that introduces a race with a generation rollover where we could erroneously
+ * free an ASID allocated in a future generation. We could workaround this by
+ * freeing the ASID from the context of the dying mm (e.g. in arch_exit_mmap),
+ * but we'd then need to make sure that we didn't dirty any TLBs afterwards.
+ * Setting a reserved TTBR0 or EPD0 would work, but it all gets ugly when you
+ * take CPU migration into account.
+ */
 #define destroy_context(mm)		do { } while(0)
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define finish_arch_post_lock_switch \
-	finish_arch_post_lock_switch
-static inline void finish_arch_post_lock_switch(void)
-{
-	if (test_and_clear_thread_flag(TIF_SWITCH_MM)) {
-		struct mm_struct *mm = current->mm;
-		unsigned long flags;
-
-		__new_context(mm);
-
-		local_irq_save(flags);
-		cpu_switch_mm(mm->pgd, mm);
-		local_irq_restore(flags);
-	}
-}
+#define init_new_context(tsk,mm)	({ atomic64_set(&mm->context.id, 0); 0; })
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index dcd06d18a42a..555c6dec5ef2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -111,7 +111,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	20
 #define TIF_SINGLESTEP		21
 #define TIF_32BIT		22	/* 32bit process */
-#define TIF_SWITCH_MM		23	/* deferred switch_mm */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8dae55..25de8b244961 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -60,7 +60,7 @@ int main(void)
   DEFINE(S_SYSCALLNO,		offsetof(struct pt_regs, syscallno));
   DEFINE(S_FRAME_SIZE,		sizeof(struct pt_regs));
   BLANK();
-  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id));
+  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id.counter));
   BLANK();
   DEFINE(VMA_VM_MM,		offsetof(struct vm_area_struct, vm_mm));
   DEFINE(VMA_VM_FLAGS,		offsetof(struct vm_area_struct, vm_flags));
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 4d12926ea40d..a48d1f477b2e 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -48,7 +48,6 @@ static struct mm_struct efi_mm = {
 	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
 	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
 	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
-	INIT_MM_CONTEXT(efi_mm)
 };
 
 static int uefi_debug __initdata;
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 48b53fb381af..e902229b1a3d 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -17,135 +17,187 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include <linux/init.h>
+#include <linux/bitops.h>
 #include <linux/sched.h>
+#include <linux/slab.h>
 #include <linux/mm.h>
-#include <linux/smp.h>
-#include <linux/percpu.h>
 
+#include <asm/cpufeature.h>
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
-#include <asm/cachetype.h>
 
-#define asid_bits(reg) \
-	(((read_cpuid(ID_AA64MMFR0_EL1) & 0xf0) >> 2) + 8)
+static u32 asid_bits;
+static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
 
-#define ASID_FIRST_VERSION	(1 << MAX_ASID_BITS)
+static atomic64_t asid_generation;
+static unsigned long *asid_map;
 
-static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
-unsigned int cpu_last_asid = ASID_FIRST_VERSION;
+static DEFINE_PER_CPU(atomic64_t, active_asids);
+static DEFINE_PER_CPU(u64, reserved_asids);
+static cpumask_t tlb_flush_pending;
 
-/*
- * We fork()ed a process, and we need a new context for the child to run in.
- */
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+#define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
+#define ASID_FIRST_VERSION	(1UL << asid_bits)
+#define NUM_USER_ASIDS		ASID_FIRST_VERSION
+
+static void flush_context(unsigned int cpu)
 {
-	mm->context.id = 0;
-	raw_spin_lock_init(&mm->context.id_lock);
+	int i;
+	u64 asid;
+
+	/* Update the list of reserved ASIDs and the ASID bitmap. */
+	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+
+	/*
+	 * Ensure the generation bump is observed before we xchg the
+	 * active_asids.
+	 */
+	smp_wmb();
+
+	for_each_possible_cpu(i) {
+		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
+		/*
+		 * If this CPU has already been through a
+		 * rollover, but hasn't run another task in
+		 * the meantime, we must preserve its reserved
+		 * ASID, as this is the only trace we have of
+		 * the process it is still running.
+		 */
+		if (asid == 0)
+			asid = per_cpu(reserved_asids, i);
+		__set_bit(asid & ~ASID_MASK, asid_map);
+		per_cpu(reserved_asids, i) = asid;
+	}
+
+	/* Queue a TLB invalidate and flush the I-cache if necessary. */
+	cpumask_setall(&tlb_flush_pending);
+
+	if (icache_is_aivivt())
+		__flush_icache_all();
 }
 
-static void flush_context(void)
+static int is_reserved_asid(u64 asid)
 {
-	/* set the reserved TTBR0 before flushing the TLB */
-	cpu_set_reserved_ttbr0();
-	local_flush_tlb_all();
-	if (icache_is_aivivt())
-		__local_flush_icache_all();
+	int cpu;
+	for_each_possible_cpu(cpu)
+		if (per_cpu(reserved_asids, cpu) == asid)
+			return 1;
+	return 0;
 }
 
-static void set_mm_context(struct mm_struct *mm, unsigned int asid)
+static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 {
-	unsigned long flags;
+	static u32 cur_idx = 1;
+	u64 asid = atomic64_read(&mm->context.id);
+	u64 generation = atomic64_read(&asid_generation);
 
-	/*
-	 * Locking needed for multi-threaded applications where the same
-	 * mm->context.id could be set from different CPUs during the
-	 * broadcast. This function is also called via IPI so the
-	 * mm->context.id_lock has to be IRQ-safe.
-	 */
-	raw_spin_lock_irqsave(&mm->context.id_lock, flags);
-	if (likely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
+	if (asid != 0) {
 		/*
-		 * Old version of ASID found. Set the new one and reset
-		 * mm_cpumask(mm).
+		 * If our current ASID was active during a rollover, we
+		 * can continue to use it and this was just a false alarm.
 		 */
-		mm->context.id = asid;
-		cpumask_clear(mm_cpumask(mm));
+		if (is_reserved_asid(asid))
+			return generation | (asid & ~ASID_MASK);
+
+		/*
+		 * We had a valid ASID in a previous life, so try to re-use
+		 * it if possible.
+		 */
+		asid &= ~ASID_MASK;
+		if (!__test_and_set_bit(asid, asid_map))
+			goto bump_gen;
 	}
-	raw_spin_unlock_irqrestore(&mm->context.id_lock, flags);
 
 	/*
-	 * Set the mm_cpumask(mm) bit for the current CPU.
+	 * Allocate a free ASID. If we can't find one, take a note of the
+	 * currently active ASIDs and mark the TLBs as requiring flushes.
+	 * We always count from ASID #1, as we use ASID #0 when setting a
+	 * reserved TTBR0 for the init_mm.
 	 */
-	cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+	asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, cur_idx);
+	if (asid != NUM_USER_ASIDS)
+		goto set_asid;
+
+	/* We're out of ASIDs, so increment the global generation count */
+	generation = atomic64_add_return_relaxed(ASID_FIRST_VERSION,
+						 &asid_generation);
+	flush_context(cpu);
+
+	/* We have at least 1 ASID per CPU, so this will always succeed */
+	asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, 1);
+
+set_asid:
+	__set_bit(asid, asid_map);
+	cur_idx = asid;
+
+bump_gen:
+	asid |= generation;
+	cpumask_clear(mm_cpumask(mm));
+	return asid;
 }
 
-/*
- * Reset the ASID on the current CPU. This function call is broadcast from the
- * CPU handling the ASID rollover and holding cpu_asid_lock.
- */
-static void reset_context(void *info)
+void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 {
-	unsigned int asid;
-	unsigned int cpu = smp_processor_id();
-	struct mm_struct *mm = current->active_mm;
+	unsigned long flags;
+	u64 asid;
+
+	asid = atomic64_read(&mm->context.id);
 
 	/*
-	 * current->active_mm could be init_mm for the idle thread immediately
-	 * after secondary CPU boot or hotplug. TTBR0_EL1 is already set to
-	 * the reserved value, so no need to reset any context.
+	 * The memory ordering here is subtle. We rely on the control
+	 * dependency between the generation read and the update of
+	 * active_asids to ensure that we are synchronised with a
+	 * parallel rollover (i.e. this pairs with the smp_wmb() in
+	 * flush_context).
 	 */
-	if (mm == &init_mm)
-		return;
+	if (!((asid ^ atomic64_read(&asid_generation)) >> asid_bits)
+	    && atomic64_xchg_relaxed(&per_cpu(active_asids, cpu), asid))
+		goto switch_mm_fastpath;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+	/* Check that our ASID belongs to the current generation. */
+	asid = atomic64_read(&mm->context.id);
+	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+		asid = new_context(mm, cpu);
+		atomic64_set(&mm->context.id, asid);
+	}
 
-	smp_rmb();
-	asid = cpu_last_asid + cpu;
+	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
+		local_flush_tlb_all();
 
-	flush_context();
-	set_mm_context(mm, asid);
+	atomic64_set(&per_cpu(active_asids, cpu), asid);
+	cpumask_set_cpu(cpu, mm_cpumask(mm));
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
-	/* set the new ASID */
+switch_mm_fastpath:
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-void __new_context(struct mm_struct *mm)
+static int asids_init(void)
 {
-	unsigned int asid;
-	unsigned int bits = asid_bits();
-
-	raw_spin_lock(&cpu_asid_lock);
-	/*
-	 * Check the ASID again, in case the change was broadcast from another
-	 * CPU before we acquired the lock.
-	 */
-	if (!unlikely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
-		cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
-		raw_spin_unlock(&cpu_asid_lock);
-		return;
-	}
-	/*
-	 * At this point, it is guaranteed that the current mm (with an old
-	 * ASID) isn't active on any other CPU since the ASIDs are changed
-	 * simultaneously via IPI.
-	 */
-	asid = ++cpu_last_asid;
-
-	/*
-	 * If we've used up all our ASIDs, we need to start a new version and
-	 * flush the TLB.
-	 */
-	if (unlikely((asid & ((1 << bits) - 1)) == 0)) {
-		/* increment the ASID version */
-		cpu_last_asid += (1 << MAX_ASID_BITS) - (1 << bits);
-		if (cpu_last_asid == 0)
-			cpu_last_asid = ASID_FIRST_VERSION;
-		asid = cpu_last_asid + smp_processor_id();
-		flush_context();
-		smp_wmb();
-		smp_call_function(reset_context, NULL, 1);
-		cpu_last_asid += NR_CPUS - 1;
+	int fld = cpuid_feature_extract_field(read_cpuid(ID_AA64MMFR0_EL1), 4);
+
+	switch (fld) {
+	default:
+		pr_warn("Unknown ASID size (%d); assuming 8-bit\n", fld);
+		/* Fallthrough */
+	case 0:
+		asid_bits = 8;
+		break;
+	case 2:
+		asid_bits = 16;
 	}
 
-	set_mm_context(mm, asid);
-	raw_spin_unlock(&cpu_asid_lock);
+	/* If we end up with more CPUs than ASIDs, expect things to crash */
+	WARN_ON(NUM_USER_ASIDS < num_possible_cpus());
+	atomic64_set(&asid_generation, ASID_FIRST_VERSION);
+	asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS) * sizeof(*asid_map),
+			   GFP_KERNEL);
+	if (!asid_map)
+		panic("Failed to allocate bitmap for %lu ASIDs\n",
+		      NUM_USER_ASIDS);
+
+	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
+	return 0;
 }
+early_initcall(asids_init);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index bbde13d77da5..91cb2eaac256 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -130,7 +130,7 @@ ENDPROC(cpu_do_resume)
  *	- pgd_phys - physical address of new TTB
  */
 ENTRY(cpu_do_switch_mm)
-	mmid	w1, x1				// get mm->context.id
+	mmid	x1, x1				// get mm->context.id
 	bfi	x0, x1, #48, #16		// set the ASID
 	msr	ttbr0_el1, x0			// set TTBR0
 	isb
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long)
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (3 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

The ASID macro returns a 64-bit (long long) value, so there is no need
to cast to (unsigned long) before shifting prior to a TLBI operation.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 96f944e75dc4..93e9f964805c 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -81,7 +81,7 @@ static inline void flush_tlb_all(void)
 
 static inline void flush_tlb_mm(struct mm_struct *mm)
 {
-	unsigned long asid = (unsigned long)ASID(mm) << 48;
+	unsigned long asid = ASID(mm) << 48;
 
 	dsb(ishst);
 	asm("tlbi	aside1is, %0" : : "r" (asid));
@@ -91,8 +91,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 static inline void flush_tlb_page(struct vm_area_struct *vma,
 				  unsigned long uaddr)
 {
-	unsigned long addr = uaddr >> 12 |
-		((unsigned long)ASID(vma->vm_mm) << 48);
+	unsigned long addr = uaddr >> 12 | (ASID(vma->vm_mm) << 48);
 
 	dsb(ishst);
 	asm("tlbi	vale1is, %0" : : "r" (addr));
@@ -109,7 +108,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     bool last_level)
 {
-	unsigned long asid = (unsigned long)ASID(vma->vm_mm) << 48;
+	unsigned long asid = ASID(vma->vm_mm) << 48;
 	unsigned long addr;
 
 	if ((end - start) > MAX_TLB_RANGE) {
@@ -162,7 +161,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
 static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 				       unsigned long uaddr)
 {
-	unsigned long addr = uaddr >> 12 | ((unsigned long)ASID(mm) << 48);
+	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
 	dsb(ishst);
 	asm("tlbi	vae1is, %0" : : "r" (addr));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (4 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

The TLB gather code sets fullmm=1 when tearing down the entire address
space for an mm_struct on exit or execve. Given that the ASID allocator
will never re-allocate a dirty ASID, this flushing is not needed and can
simply be avoided in the flushing code.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlb.h | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index d6e6b6660380..ffdaea7954bb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -37,17 +37,21 @@ static inline void __tlb_remove_table(void *_table)
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	if (tlb->fullmm) {
-		flush_tlb_mm(tlb->mm);
-	} else {
-		struct vm_area_struct vma = { .vm_mm = tlb->mm, };
-		/*
-		 * The intermediate page table levels are already handled by
-		 * the __(pte|pmd|pud)_free_tlb() functions, so last level
-		 * TLBI is sufficient here.
-		 */
-		__flush_tlb_range(&vma, tlb->start, tlb->end, true);
-	}
+	struct vm_area_struct vma = { .vm_mm = tlb->mm, };
+
+	/*
+	 * The ASID allocator will either invalidate the ASID or mark
+	 * it as used.
+	 */
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * The intermediate page table levels are already handled by
+	 * the __(pte|pmd|pud)_free_tlb() functions, so last level
+	 * TLBI is sufficient here.
+	 */
+	__flush_tlb_range(&vma, tlb->start, tlb->end, true);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (5 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

switch_mm performs some checks to try and avoid entering the ASID
allocator:

  (1) If we're switching to the init_mm (no user mappings), then simply
      set a reserved TTBR0 value with no page table (the zero page)

  (2) If prev == next *and* the mm_cpumask indicates that we've run on
      this CPU before, then we can skip the allocator.

However, there is plenty of redundancy here. With the new ASID allocator,
if prev == next, then we know that our ASID is valid and do not need to
worry about re-allocation. Consequently, we can drop the mm_cpumask check
in (2) and move the prev == next check before the init_mm check, since
if prev == next == init_mm then there's nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 6 ++++--
 arch/arm64/mm/context.c              | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index f4c74a951b6c..c0e87898ba96 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -129,6 +129,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 {
 	unsigned int cpu = smp_processor_id();
 
+	if (prev == next)
+		return;
+
 	/*
 	 * init_mm.pgd does not contain any user mappings and it is always
 	 * active for kernel addresses in TTBR1. Just set the reserved TTBR0.
@@ -138,8 +141,7 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		return;
 	}
 
-	if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next)
-		check_and_switch_context(next, tsk);
+	check_and_switch_context(next, cpu);
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index e902229b1a3d..4b9ec4484e3f 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -166,10 +166,10 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		local_flush_tlb_all();
 
 	atomic64_set(&per_cpu(active_asids, cpu), asid);
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
 switch_mm_fastpath:
+	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (6 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

mm_cpumask isn't actually used for anything on arm64, so remove all the
code trying to keep it up-to-date.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/smp.c | 7 -------
 arch/arm64/mm/context.c | 2 --
 2 files changed, 9 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index fdd4d4dbd64f..03b0aa28ea61 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -142,7 +142,6 @@ asmlinkage void secondary_start_kernel(void)
 	 */
 	atomic_inc(&mm->mm_count);
 	current->active_mm = mm;
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 
 	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 	printk("CPU%u: Booted secondary processor\n", cpu);
@@ -233,12 +232,6 @@ int __cpu_disable(void)
 	 * OK - migrate IRQs away from this CPU
 	 */
 	migrate_irqs();
-
-	/*
-	 * Remove this CPU from the vm mask set of all processes.
-	 */
-	clear_tasks_mm_cpumask(cpu);
-
 	return 0;
 }
 
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 4b9ec4484e3f..f636a2639f03 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -132,7 +132,6 @@ set_asid:
 
 bump_gen:
 	asid |= generation;
-	cpumask_clear(mm_cpumask(mm));
 	return asid;
 }
 
@@ -169,7 +168,6 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
 
 switch_mm_fastpath:
-	cpumask_set_cpu(cpu, mm_cpumask(mm));
 	cpu_switch_mm(mm->pgd, mm);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (7 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
  2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

__flush_tlb_pgtable is used to invalidate intermediate page table
entries after they have been cleared and are about to be freed. Since
pXd_clear imply memory barriers, we don't need the extra one here.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 93e9f964805c..b460ae28e346 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -163,7 +163,6 @@ static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 {
 	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
-	dsb(ishst);
 	asm("tlbi	vae1is, %0" : : "r" (addr));
 	dsb(ish);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (8 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
@ 2015-10-06 17:46 ` Will Deacon
  2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2015-10-06 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

update_mmu_cache() consists of a dsb(ishst) instruction so that new user
mappings are guaranteed to be visible to the page table walker on
exception return.

In reality this can be a very expensive operation which is rarely needed.
Removing this barrier shows a modest improvement in hackbench scores and
, in the worst case, we re-take the user fault and establish that there
was nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 26b066690593..0d18e88e1cfa 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -646,10 +646,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 				    unsigned long addr, pte_t *ptep)
 {
 	/*
-	 * set_pte() does not have a DSB for user mappings, so make sure that
-	 * the page table write is visible.
+	 * We don't do anything here, so there's a very small chance of
+	 * us retaking a user fault which we just fixed up. The alternative
+	 * is doing a dsb(ishst), but that penalises the fastpath.
 	 */
-	dsb(ishst);
 }
 
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
@ 2015-10-07  1:18   ` David Daney
  2015-10-07  6:18   ` Ard Biesheuvel
  1 sibling, 0 replies; 15+ messages in thread
From: David Daney @ 2015-10-07  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/06/2015 10:46 AM, Will Deacon wrote:
> There are a number of places where a single CPU is running with a
> private page-table and we need to perform maintenance on the TLB and
> I-cache in order to ensure correctness, but do not require the operation
> to be broadcast to other CPUs.
>
> This patch adds local variants of tlb_flush_all and __flush_icache_all
> to support these use-cases and updates the callers respectively.
> __local_flush_icache_all also implies an isb, since it is intended to be
> used synchronously.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

I like this one.  It is similar to what some of my earlier patches did.

Acked-by: David Daney <david.daney@cavium.com>


> ---
>   arch/arm64/include/asm/cacheflush.h | 7 +++++++
>   arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
>   arch/arm64/kernel/efi.c             | 4 ++--
>   arch/arm64/kernel/smp.c             | 2 +-
>   arch/arm64/kernel/suspend.c         | 2 +-
>   arch/arm64/mm/context.c             | 4 ++--
>   arch/arm64/mm/mmu.c                 | 2 +-
>   7 files changed, 22 insertions(+), 7 deletions(-)
>
[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
  2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
@ 2015-10-07  6:17   ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2015-10-07  6:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 October 2015 at 18:46, Will Deacon <will.deacon@arm.com> wrote:
> With commit b08d4640a3dc ("arm64: remove dead code"),
> cpu_set_idmap_tcr_t0sz is no longer called and can therefore be removed
> from the kernel.
>
> This patch removes the function and effectively inlines the helper
> function __cpu_set_tcr_t0sz into cpu_set_default_tcr_t0sz.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/include/asm/mmu_context.h | 35 ++++++++++++-----------------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 8ec41e5f56f0..549b89554ce8 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -77,34 +77,23 @@ static inline bool __cpu_uses_extended_idmap(void)
>                 unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
>  }
>
> -static inline void __cpu_set_tcr_t0sz(u64 t0sz)
> -{
> -       unsigned long tcr;
> -
> -       if (__cpu_uses_extended_idmap())
> -               asm volatile (
> -               "       mrs     %0, tcr_el1     ;"
> -               "       bfi     %0, %1, %2, %3  ;"
> -               "       msr     tcr_el1, %0     ;"
> -               "       isb"
> -               : "=&r" (tcr)
> -               : "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
> -}
> -
> -/*
> - * Set TCR.T0SZ to the value appropriate for activating the identity map.
> - */
> -static inline void cpu_set_idmap_tcr_t0sz(void)
> -{
> -       __cpu_set_tcr_t0sz(idmap_t0sz);
> -}
> -
>  /*
>   * Set TCR.T0SZ to its default value (based on VA_BITS)
>   */
>  static inline void cpu_set_default_tcr_t0sz(void)
>  {
> -       __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
> +       unsigned long tcr;
> +
> +       if (!__cpu_uses_extended_idmap())
> +               return;
> +
> +       asm volatile (
> +       "       mrs     %0, tcr_el1     ;"
> +       "       bfi     %0, %1, %2, %3  ;"
> +       "       msr     tcr_el1, %0     ;"
> +       "       isb"
> +       : "=&r" (tcr)
> +       : "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
>  }
>
>  static inline void switch_new_context(struct mm_struct *mm)
> --
> 2.1.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation
  2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
  2015-10-07  1:18   ` David Daney
@ 2015-10-07  6:18   ` Ard Biesheuvel
  1 sibling, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2015-10-07  6:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 October 2015 at 18:46, Will Deacon <will.deacon@arm.com> wrote:
> There are a number of places where a single CPU is running with a
> private page-table and we need to perform maintenance on the TLB and
> I-cache in order to ensure correctness, but do not require the operation
> to be broadcast to other CPUs.
>
> This patch adds local variants of tlb_flush_all and __flush_icache_all
> to support these use-cases and updates the callers respectively.
> __local_flush_icache_all also implies an isb, since it is intended to be
> used synchronously.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/include/asm/cacheflush.h | 7 +++++++
>  arch/arm64/include/asm/tlbflush.h   | 8 ++++++++
>  arch/arm64/kernel/efi.c             | 4 ++--
>  arch/arm64/kernel/smp.c             | 2 +-
>  arch/arm64/kernel/suspend.c         | 2 +-
>  arch/arm64/mm/context.c             | 4 ++--
>  arch/arm64/mm/mmu.c                 | 2 +-
>  7 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index c75b8d027eb1..54efedaf331f 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -115,6 +115,13 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
>
> +static inline void __local_flush_icache_all(void)
> +{
> +       asm("ic iallu");
> +       dsb(nsh);
> +       isb();
> +}
> +
>  static inline void __flush_icache_all(void)
>  {
>         asm("ic ialluis");
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 7bd2da021658..96f944e75dc4 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -63,6 +63,14 @@
>   *             only require the D-TLB to be invalidated.
>   *             - kaddr - Kernel virtual memory address
>   */
> +static inline void local_flush_tlb_all(void)
> +{
> +       dsb(nshst);
> +       asm("tlbi       vmalle1");
> +       dsb(nsh);
> +       isb();
> +}
> +
>  static inline void flush_tlb_all(void)
>  {
>         dsb(ishst);
> diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> index 13671a9cf016..4d12926ea40d 100644
> --- a/arch/arm64/kernel/efi.c
> +++ b/arch/arm64/kernel/efi.c
> @@ -344,9 +344,9 @@ static void efi_set_pgd(struct mm_struct *mm)
>         else
>                 cpu_switch_mm(mm->pgd, mm);
>
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         if (icache_is_aivivt())
> -               __flush_icache_all();
> +               __local_flush_icache_all();
>  }
>
>  void efi_virtmap_load(void)
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index dbdaacddd9a5..fdd4d4dbd64f 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -152,7 +152,7 @@ asmlinkage void secondary_start_kernel(void)
>          * point to zero page to avoid speculatively fetching new entries.
>          */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         cpu_set_default_tcr_t0sz();
>
>         preempt_disable();
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index 8297d502217e..3c5e4e6dcf68 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -90,7 +90,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
>                 else
>                         cpu_switch_mm(mm->pgd, mm);
>
> -               flush_tlb_all();
> +               local_flush_tlb_all();
>
>                 /*
>                  * Restore per-cpu offset before any kernel
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index d70ff14dbdbd..48b53fb381af 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -48,9 +48,9 @@ static void flush_context(void)
>  {
>         /* set the reserved TTBR0 before flushing the TLB */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         if (icache_is_aivivt())
> -               __flush_icache_all();
> +               __local_flush_icache_all();
>  }
>
>  static void set_mm_context(struct mm_struct *mm, unsigned int asid)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 9211b8527f25..71a310478c9e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -456,7 +456,7 @@ void __init paging_init(void)
>          * point to zero page to avoid speculatively fetching new entries.
>          */
>         cpu_set_reserved_ttbr0();
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>         cpu_set_default_tcr_t0sz();
>  }
>
> --
> 2.1.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 00/10] arm64 switch_mm improvements
  2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
                   ` (9 preceding siblings ...)
  2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
@ 2015-10-07 10:59 ` Catalin Marinas
  10 siblings, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2015-10-07 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 06, 2015 at 06:46:20PM +0100, Will Deacon wrote:
> Will Deacon (10):
>   arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function
>   arm64: proc: de-scope TLBI operation during cold boot
>   arm64: flush: use local TLB and I-cache invalidation
>   arm64: mm: rewrite ASID allocator and MM context-switching code
>   arm64: tlbflush: remove redundant ASID casts to (unsigned long)
>   arm64: tlbflush: avoid flushing when fullmm == 1
>   arm64: switch_mm: simplify mm and CPU checks
>   arm64: mm: kill mm_cpumask usage
>   arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
>   arm64: mm: remove dsb from update_mmu_cache

Series queued for 4.4. Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-10-07 10:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-06 17:46 [PATCH v2 00/10] arm64 switch_mm improvements Will Deacon
2015-10-06 17:46 ` [PATCH v2 01/10] arm64: mm: remove unused cpu_set_idmap_tcr_t0sz function Will Deacon
2015-10-07  6:17   ` Ard Biesheuvel
2015-10-06 17:46 ` [PATCH v2 02/10] arm64: proc: de-scope TLBI operation during cold boot Will Deacon
2015-10-06 17:46 ` [PATCH v2 03/10] arm64: flush: use local TLB and I-cache invalidation Will Deacon
2015-10-07  1:18   ` David Daney
2015-10-07  6:18   ` Ard Biesheuvel
2015-10-06 17:46 ` [PATCH v2 04/10] arm64: mm: rewrite ASID allocator and MM context-switching code Will Deacon
2015-10-06 17:46 ` [PATCH v2 05/10] arm64: tlbflush: remove redundant ASID casts to (unsigned long) Will Deacon
2015-10-06 17:46 ` [PATCH v2 06/10] arm64: tlbflush: avoid flushing when fullmm == 1 Will Deacon
2015-10-06 17:46 ` [PATCH v2 07/10] arm64: switch_mm: simplify mm and CPU checks Will Deacon
2015-10-06 17:46 ` [PATCH v2 08/10] arm64: mm: kill mm_cpumask usage Will Deacon
2015-10-06 17:46 ` [PATCH v2 09/10] arm64: tlb: remove redundant barrier from __flush_tlb_pgtable Will Deacon
2015-10-06 17:46 ` [PATCH v2 10/10] arm64: mm: remove dsb from update_mmu_cache Will Deacon
2015-10-07 10:59 ` [PATCH v2 00/10] arm64 switch_mm improvements Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.