[PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
@ 2018-08-30 16:15 Will Deacon
  2018-08-30 16:15 ` [PATCH 01/12] arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() Will Deacon
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

Hello again,

This is v1 of the RFC I previously posted here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2018-August/597821.html

The main changes include:

  * Rewrite the comment in tlbflush.h to explain the various functions
    and justify the barrier semantics

  * Fix the "flush entire ASID" heuristic to work with !4K page sizes

  * Fixed the build on sh (well, it fails somewhere else that isn't my fault)

  * Report PxD_SHIFT instead of PxD_SIZE via tlb_get_unmap_shift()

It's also had a lot more testing, but has held up nicely so far on arm64.
I haven't figured out how to merge this yet, but I'll probably end up pulling
the core changes out onto a separate branch.

Cheers,

Will

--->8

Peter Zijlstra (1):
  asm-generic/tlb: Track freeing of page-table directories in struct
    mmu_gather

Will Deacon (11):
  arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()
  arm64: tlb: Add DSB ISHST prior to TLBI in
    __flush_tlb_[kernel_]pgtable()
  arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()
  arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()
  arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()
  arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code
  asm-generic/tlb: Guard with #ifdef CONFIG_MMU
  asm-generic/tlb: Track which levels of the page tables have been
    cleared
  arm64: tlb: Adjust stride and type of TLBI according to mmu_gather
  arm64: tlb: Avoid synchronous TLBIs when freeing page tables
  arm64: tlb: Rewrite stale comment in asm/tlbflush.h

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/pgtable.h  |  10 +++-
 arch/arm64/include/asm/tlb.h      |  34 +++++-------
 arch/arm64/include/asm/tlbflush.h | 112 ++++++++++++++++++++++++--------------
 include/asm-generic/tlb.h         |  85 +++++++++++++++++++++++++----
 mm/memory.c                       |   4 +-
 6 files changed, 168 insertions(+), 78 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/12] arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 02/12] arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable() Will Deacon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

flush_tlb_kernel_range() is only ever used to invalidate last-level
entries, so we can restrict the scope of the TLB invalidation
instruction.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index a4a1901140ee..7e2a35424ca4 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -199,7 +199,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
 
 	dsb(ishst);
 	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
-		__tlbi(vaae1is, addr);
+		__tlbi(vaale1is, addr);
 	dsb(ish);
 	isb();
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/12] arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable()
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
  2018-08-30 16:15 ` [PATCH 01/12] arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 03/12] arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d() Will Deacon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

__flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after
writing the new table entry and therefore avoid the barrier prior to the
TLBI instruction.

In preparation for delaying our walk-cache invalidation on the unmap()
path, move the DSB into the TLB invalidation routines.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 7e2a35424ca4..e257f8655b84 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -213,6 +213,7 @@ static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 {
 	unsigned long addr = __TLBI_VADDR(uaddr, ASID(mm));
 
+	dsb(ishst);
 	__tlbi(vae1is, addr);
 	__tlbi_user(vae1is, addr);
 	dsb(ish);
@@ -222,6 +223,7 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
 {
 	unsigned long addr = __TLBI_VADDR(kaddr, 0);
 
+	dsb(ishst);
 	__tlbi(vaae1is, addr);
 	dsb(ish);
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/12] arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
  2018-08-30 16:15 ` [PATCH 01/12] arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() Will Deacon
  2018-08-30 16:15 ` [PATCH 02/12] arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable() Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 04/12] arm64: tlb: Justify non-leaf invalidation in flush_tlb_range() Will Deacon
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

Now that our walk-cache invalidation routines imply a DSB before the
invalidation, we no longer need one when we are clearing an entry during
unmap.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 1bdeca8918a6..2ab2031b778c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -360,6 +360,7 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_present(pmd)	pte_present(pmd_pte(pmd))
 #define pmd_dirty(pmd)		pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
+#define pmd_valid(pmd)		pte_valid(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkold(pmd)		pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)	pte_pmd(pte_mkwrite(pmd_pte(pmd)))
@@ -431,7 +432,9 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 	WRITE_ONCE(*pmdp, pmd);
-	dsb(ishst);
+
+	if (pmd_valid(pmd))
+		dsb(ishst);
 }
 
 static inline void pmd_clear(pmd_t *pmdp)
@@ -477,11 +480,14 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
 #define pud_present(pud)	pte_present(pud_pte(pud))
+#define pud_valid(pud)		pte_valid(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
 	WRITE_ONCE(*pudp, pud);
-	dsb(ishst);
+
+	if (pud_valid(pud))
+		dsb(ishst);
 }
 
 static inline void pud_clear(pud_t *pudp)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/12] arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (2 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 03/12] arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d() Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 05/12] arm64: tlbflush: Allow stride to be specified for __flush_tlb_range() Will Deacon
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

Add a comment to explain why we can't get away with last-level
invalidation in flush_tlb_range()

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index e257f8655b84..ddbf1718669d 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -182,6 +182,10 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 static inline void flush_tlb_range(struct vm_area_struct *vma,
 				   unsigned long start, unsigned long end)
 {
+	/*
+	 * We cannot use leaf-only invalidation here, since we may be invalidating
+	 * table entries as part of collapsing hugepages or moving page tables.
+	 */
 	__flush_tlb_range(vma, start, end, false);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/12] arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (3 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 04/12] arm64: tlb: Justify non-leaf invalidation in flush_tlb_range() Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 06/12] arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code Will Deacon
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

When we are unmapping intermediate page-table entries or huge pages, we
don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the
VA range being unmapped.

Allow the invalidation stride to be passed to __flush_tlb_range(), and
adjust our "just nuke the ASID" heuristic to take this into account.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlb.h      |  2 +-
 arch/arm64/include/asm/tlbflush.h | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index a3233167be60..1e1f68ce28f4 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -53,7 +53,7 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 	 * the __(pte|pmd|pud)_free_tlb() functions, so last level
 	 * TLBI is sufficient here.
 	 */
-	__flush_tlb_range(&vma, tlb->start, tlb->end, true);
+	__flush_tlb_range(&vma, tlb->start, tlb->end, PAGE_SIZE, true);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index ddbf1718669d..37ccdb246b20 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -149,25 +149,28 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
  * This is meant to avoid soft lock-ups on large TLB flushing ranges and not
  * necessarily a performance improvement.
  */
-#define MAX_TLB_RANGE	(1024UL << PAGE_SHIFT)
+#define MAX_TLBI_OPS	1024UL
 
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
-				     bool last_level)
+				     unsigned long stride, bool last_level)
 {
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
 
-	if ((end - start) > MAX_TLB_RANGE) {
+	if ((end - start) > (MAX_TLBI_OPS * stride)) {
 		flush_tlb_mm(vma->vm_mm);
 		return;
 	}
 
+	/* Convert the stride into units of 4k */
+	stride >>= 12;
+
 	start = __TLBI_VADDR(start, asid);
 	end = __TLBI_VADDR(end, asid);
 
 	dsb(ishst);
-	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) {
+	for (addr = start; addr < end; addr += stride) {
 		if (last_level) {
 			__tlbi(vale1is, addr);
 			__tlbi_user(vale1is, addr);
@@ -186,14 +189,14 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
 	 */
-	__flush_tlb_range(vma, start, end, false);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
 	unsigned long addr;
 
-	if ((end - start) > MAX_TLB_RANGE) {
+	if ((end - start) > (MAX_TLBI_OPS * PAGE_SIZE)) {
 		flush_tlb_all();
 		return;
 	}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/12] arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (4 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 05/12] arm64: tlbflush: Allow stride to be specified for __flush_tlb_range() Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 07/12] asm-generic/tlb: Guard with #ifdef CONFIG_MMU Will Deacon
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

If there's one thing the RCU-based table freeing doesn't need, it's more
ifdeffery.

Remove the redundant !CONFIG_HAVE_RCU_TABLE_FREE code, since this option
is unconditionally selected in our Kconfig.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlb.h | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 1e1f68ce28f4..bd00017d529a 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,16 +22,10 @@
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-
-#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
 static inline void __tlb_remove_table(void *_table)
 {
 	free_page_and_swap_cache((struct page *)_table);
 }
-#else
-#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
 static void tlb_flush(struct mmu_gather *tlb);
 
@@ -61,7 +55,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	__flush_tlb_pgtable(tlb->mm, addr);
 	pgtable_page_dtor(pte);
-	tlb_remove_entry(tlb, pte);
+	tlb_remove_table(tlb, pte);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 2
@@ -69,7 +63,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	__flush_tlb_pgtable(tlb->mm, addr);
-	tlb_remove_entry(tlb, virt_to_page(pmdp));
+	tlb_remove_table(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -78,7 +72,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	__flush_tlb_pgtable(tlb->mm, addr);
-	tlb_remove_entry(tlb, virt_to_page(pudp));
+	tlb_remove_table(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/12] asm-generic/tlb: Guard with #ifdef CONFIG_MMU
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (5 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 06/12] arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 08/12] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather Will Deacon
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

The inner workings of the mmu_gather-based TLB invalidation mechanism
are not relevant to nommu configurations, so guard them with an #ifdef.
This allows us to implement future functions using static inlines
without breaking the build.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/tlb.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..a25e236f7a7f 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -20,6 +20,8 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
+#ifdef CONFIG_MMU
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -310,6 +312,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #endif
 #endif
 
+#endif /* CONFIG_MMU */
+
 #define tlb_migrate_finish(mm) do {} while (0)
 
 #endif /* _ASM_GENERIC__TLB_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/12] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (6 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 07/12] asm-generic/tlb: Guard with #ifdef CONFIG_MMU Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared Will Deacon
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

From: Peter Zijlstra <peterz@infradead.org>

Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.

Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/tlb.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index a25e236f7a7f..2b444ad94566 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -99,12 +99,22 @@ struct mmu_gather {
 #endif
 	unsigned long		start;
 	unsigned long		end;
-	/* we are in the middle of an operation to clear
-	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
-	/* we have performed an operation which
-	 * requires a complete flush of the tlb */
-				need_flush_all : 1;
+	/*
+	 * we are in the middle of an operation to clear
+	 * a full mm and can make some optimizations
+	 */
+	unsigned int		fullmm : 1;
+
+	/*
+	 * we have performed an operation which
+	 * requires a complete flush of the tlb
+	 */
+	unsigned int		need_flush_all : 1;
+
+	/*
+	 * we have removed page directories
+	 */
+	unsigned int		freed_tables : 1;
 
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
@@ -139,6 +149,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 		tlb->start = TASK_SIZE;
 		tlb->end = 0;
 	}
+	tlb->freed_tables = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -280,6 +291,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 #endif
@@ -287,7 +299,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 #endif
@@ -297,6 +310,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
@@ -306,7 +320,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef p4d_free_tlb
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->freed_tables = 1;			\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (7 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 08/12] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-31  1:23   ` Nicholas Piggin
  2018-08-30 16:15 ` [PATCH 10/12] arm64: tlb: Adjust stride and type of TLBI according to mmu_gather Will Deacon
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.

Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/tlb.h | 58 ++++++++++++++++++++++++++++++++++++++++-------
 mm/memory.c               |  4 +++-
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 2b444ad94566..9791e98122a0 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,6 +116,14 @@ struct mmu_gather {
 	 */
 	unsigned int		freed_tables : 1;
 
+	/*
+	 * at which levels have we cleared entries?
+	 */
+	unsigned int		cleared_ptes : 1;
+	unsigned int		cleared_pmds : 1;
+	unsigned int		cleared_puds : 1;
+	unsigned int		cleared_p4ds : 1;
+
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
@@ -150,6 +158,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 		tlb->end = 0;
 	}
 	tlb->freed_tables = 0;
+	tlb->cleared_ptes = 0;
+	tlb->cleared_pmds = 0;
+	tlb->cleared_puds = 0;
+	tlb->cleared_p4ds = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -199,6 +211,25 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 }
 #endif
 
+static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
+{
+	if (tlb->cleared_ptes)
+		return PAGE_SHIFT;
+	if (tlb->cleared_pmds)
+		return PMD_SHIFT;
+	if (tlb->cleared_puds)
+		return PUD_SHIFT;
+	if (tlb->cleared_p4ds)
+		return P4D_SHIFT;
+
+	return PAGE_SHIFT;
+}
+
+static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb)
+{
+	return 1UL << tlb_get_unmap_shift(tlb);
+}
+
 /*
  * In the case of tlb vma handling, we can optimise these away in the
  * case where we're doing a full MM flush.  When we're doing a munmap,
@@ -232,13 +263,19 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)		\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		tlb->cleared_ptes = 1;				\
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	     \
-	do {							     \
-		__tlb_adjust_range(tlb, address, huge_page_size(h)); \
-		__tlb_remove_tlb_entry(tlb, ptep, address);	     \
+#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
+	do {							\
+		unsigned long _sz = huge_page_size(h);		\
+		__tlb_adjust_range(tlb, address, _sz);		\
+		if (_sz == PMD_SIZE)				\
+			tlb->cleared_pmds = 1;			\
+		else if (_sz == PUD_SIZE)			\
+			tlb->cleared_puds = 1;			\
+		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
 /**
@@ -252,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);	\
+		tlb->cleared_pmds = 1;					\
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);		\
 	} while (0)
 
@@ -266,6 +304,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pud_tlb_entry(tlb, pudp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE);	\
+		tlb->cleared_puds = 1;					\
 		__tlb_remove_pud_tlb_entry(tlb, pudp, address);		\
 	} while (0)
 
@@ -291,7 +330,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_pmds = 1;				\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 #endif
@@ -300,7 +340,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_puds = 1;				\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 #endif
@@ -310,7 +351,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
+		tlb->cleared_p4ds = 1;				\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
@@ -321,7 +363,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
-		tlb->freed_tables = 1;			\
+		tlb->freed_tables = 1;				\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index c467102a5cbc..9135f48e8d84 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -267,8 +267,10 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 {
 	struct mmu_gather_batch *batch, *next;
 
-	if (force)
+	if (force) {
+		__tlb_reset_range(tlb);
 		__tlb_adjust_range(tlb, start, end - start);
+	}
 
 	tlb_flush_mmu(tlb);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/12] arm64: tlb: Adjust stride and type of TLBI according to mmu_gather
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (8 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 11/12] arm64: tlb: Avoid synchronous TLBIs when freeing page tables Will Deacon
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

Now that the core mmu_gather code keeps track of both the levels of page
table cleared and also whether or not these entries correspond to
intermediate entries, we can use this in our tlb_flush() callback to
reduce the number of invalidations we issue as well as their scope.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlb.h | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index bd00017d529a..b078fdec10d5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -34,20 +34,21 @@ static void tlb_flush(struct mmu_gather *tlb);
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
 	struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
+	bool last_level = !tlb->freed_tables;
+	unsigned long stride = tlb_get_unmap_size(tlb);
 
 	/*
-	 * The ASID allocator will either invalidate the ASID or mark
-	 * it as used.
+	 * If we're tearing down the address space then we only care about
+	 * invalidating the walk-cache, since the ASID allocator won't
+	 * reallocate our ASID without invalidating the entire TLB.
 	 */
-	if (tlb->fullmm)
+	if (tlb->fullmm) {
+		if (!last_level)
+			flush_tlb_mm(tlb->mm);
 		return;
+	}
 
-	/*
-	 * The intermediate page table levels are already handled by
-	 * the __(pte|pmd|pud)_free_tlb() functions, so last level
-	 * TLBI is sufficient here.
-	 */
-	__flush_tlb_range(&vma, tlb->start, tlb->end, PAGE_SIZE, true);
+	__flush_tlb_range(&vma, tlb->start, tlb->end, stride, last_level);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/12] arm64: tlb: Avoid synchronous TLBIs when freeing page tables
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (9 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 10/12] arm64: tlb: Adjust stride and type of TLBI according to mmu_gather Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:15 ` [PATCH 12/12] arm64: tlb: Rewrite stale comment in asm/tlbflush.h Will Deacon
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being
called if we fail to batch table pages for freeing. This in turn allows
us to postpone walk-cache invalidation until tlb_finish_mmu(), which
avoids lots of unnecessary DSBs and means we can shoot down the ASID if
the range is large enough.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                |  1 +
 arch/arm64/include/asm/tlb.h      |  3 ---
 arch/arm64/include/asm/tlbflush.h | 11 -----------
 3 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 29e75b47becd..89059ee1eccc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RCU_TABLE_FREE
+	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index b078fdec10d5..106fdc951b6e 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -54,7 +54,6 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 				  unsigned long addr)
 {
-	__flush_tlb_pgtable(tlb->mm, addr);
 	pgtable_page_dtor(pte);
 	tlb_remove_table(tlb, pte);
 }
@@ -63,7 +62,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	__flush_tlb_pgtable(tlb->mm, addr);
 	tlb_remove_table(tlb, virt_to_page(pmdp));
 }
 #endif
@@ -72,7 +70,6 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	__flush_tlb_pgtable(tlb->mm, addr);
 	tlb_remove_table(tlb, virt_to_page(pudp));
 }
 #endif
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 37ccdb246b20..c98ed8871030 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -215,17 +215,6 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
  * Used to invalidate the TLB (walk caches) corresponding to intermediate page
  * table levels (pgd/pud/pmd).
  */
-static inline void __flush_tlb_pgtable(struct mm_struct *mm,
-				       unsigned long uaddr)
-{
-	unsigned long addr = __TLBI_VADDR(uaddr, ASID(mm));
-
-	dsb(ishst);
-	__tlbi(vae1is, addr);
-	__tlbi_user(vae1is, addr);
-	dsb(ish);
-}
-
 static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
 {
 	unsigned long addr = __TLBI_VADDR(kaddr, 0);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/12] arm64: tlb: Rewrite stale comment in asm/tlbflush.h
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (10 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 11/12] arm64: tlb: Avoid synchronous TLBIs when freeing page tables Will Deacon
@ 2018-08-30 16:15 ` Will Deacon
  2018-08-30 16:39 ` [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Linus Torvalds
  2018-08-30 17:11 ` Peter Zijlstra
  13 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-08-30 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, benh, torvalds, npiggin, catalin.marinas,
	linux-arm-kernel, Will Deacon

Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but
actually that whole block comment needs to be rewritten.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 80 +++++++++++++++++++++++++++------------
 1 file changed, 55 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index c98ed8871030..c3c0387aee18 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -70,43 +70,73 @@
 	})
 
 /*
- *	TLB Management
- *	==============
+ *	TLB Invalidation
+ *	================
  *
- *	The TLB specific code is expected to perform whatever tests it needs
- *	to determine if it should invalidate the TLB for each call.  Start
- *	addresses are inclusive and end addresses are exclusive; it is safe to
- *	round these addresses down.
+ * 	This header file implements the low-level TLB invalidation routines
+ *	(sometimes referred to as "flushing" in the kernel) for arm64.
  *
- *	flush_tlb_all()
+ *	Every invalidation operation uses the following template:
+ *
+ *	DSB ISHST	// Ensure prior page-table updates have completed
+ *	TLBI ...	// Invalidate the TLB
+ *	DSB ISH		// Ensure the TLB invalidation has completed
+ *      if (invalidated kernel mappings)
+ *		ISB	// Discard any instructions fetched from the old mapping
+ *
+ *
+ *	The following functions form part of the "core" TLB invalidation API,
+ *	as documented in Documentation/core-api/cachetlb.rst:
  *
- *		Invalidate the entire TLB.
+ *	flush_tlb_all()
+ *		Invalidate the entire TLB (kernel + user) on all CPUs
  *
  *	flush_tlb_mm(mm)
+ *		Invalidate an entire user address space on all CPUs.
+ *		The 'mm' argument identifies the ASID to invalidate.
+ *
+ *	flush_tlb_range(vma, start, end)
+ *		Invalidate the virtual-address range '[start, end)' on all
+ *		CPUs for the user address space corresponding to 'vma->mm'.
+ *		Note that this operation also invalidates any walk-cache
+ *		entries associated with translations for the specified address
+ *		range.
+ *
+ *	flush_tlb_kernel_range(start, end)
+ *		Same as flush_tlb_range(..., start, end), but applies to
+ * 		kernel mappings rather than a particular user address space.
+ *		Whilst not explicitly documented, this function is used when
+ *		unmapping pages from vmalloc/io space.
+ *
+ *	flush_tlb_page(vma, addr)
+ *		Invalidate a single user mapping for address 'addr' in the
+ *		address space corresponding to 'vma->mm'.  Note that this
+ *		operation only invalidates a single, last-level page-table
+ *		entry and therefore does not affect any walk-caches.
  *
- *		Invalidate all TLB entries in a particular address space.
- *		- mm	- mm_struct describing address space
  *
- *	flush_tlb_range(mm,start,end)
+ *	Next, we have some undocumented invalidation routines that you probably
+ *	don't want to call unless you know what you're doing:
  *
- *		Invalidate a range of TLB entries in the specified address
- *		space.
- *		- mm	- mm_struct describing address space
- *		- start - start address (may not be aligned)
- *		- end	- end address (exclusive, may not be aligned)
+ *	local_flush_tlb_all()
+ *		Same as flush_tlb_all(), but only applies to the calling CPU.
  *
- *	flush_tlb_page(vaddr,vma)
+ *	__flush_tlb_kernel_pgtable(addr)
+ *		Invalidate a single kernel mapping for address 'addr' on all
+ *		CPUs, ensuring that any walk-cache entries associated with the
+ *		translation are also invalidated.
  *
- *		Invalidate the specified page in the specified address range.
- *		- vaddr - virtual address (may not be aligned)
- *		- vma	- vma_struct describing address range
+ *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *		Invalidate the virtual-address range '[start, end)' on all
+ *		CPUs for the user address space corresponding to 'vma->mm'.
+ *		The invalidation operations are issued at a granularity
+ *		determined by 'stride' and only affect any walk-cache entries
+ *		if 'last_level' is equal to false.
  *
- *	flush_kern_tlb_page(kaddr)
  *
- *		Invalidate the TLB entry for the specified page.  The address
- *		will be in the kernels virtual memory space.  Current uses
- *		only require the D-TLB to be invalidated.
- *		- kaddr - Kernel virtual memory address
+ *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
+ *	on top of these routines, since that is our interface to the mmu_gather
+ *	API as used by munmap() and friends.
  */
 static inline void local_flush_tlb_all(void)
 {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (11 preceding siblings ...)
  2018-08-30 16:15 ` [PATCH 12/12] arm64: tlb: Rewrite stale comment in asm/tlbflush.h Will Deacon
@ 2018-08-30 16:39 ` Linus Torvalds
  2018-08-31  1:00   ` Nicholas Piggin
  2018-08-30 17:11 ` Peter Zijlstra
  13 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2018-08-30 16:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux Kernel Mailing List, Peter Zijlstra,
	Benjamin Herrenschmidt, Nick Piggin, Catalin Marinas,
	linux-arm-kernel

On Thu, Aug 30, 2018 at 9:15 AM Will Deacon <will.deacon@arm.com> wrote:
>
> It's also had a lot more testing, but has held up nicely so far on arm64.
> I haven't figured out how to merge this yet, but I'll probably end up pulling
> the core changes out onto a separate branch.

This looks fine, and I'm actually ok getting the core changes through
the arm64 branch, since this has been discussed across architectures,
and I think "whoever does the work gets to drive the car".

After all, a lot of the core changes originally came from x86 people
(pretty much all of it, historically). No reason why arm64 can't get
some of that too.

But with the glory comes the blame when something breaks ;)

                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
                   ` (12 preceding siblings ...)
  2018-08-30 16:39 ` [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Linus Torvalds
@ 2018-08-30 17:11 ` Peter Zijlstra
  13 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2018-08-30 17:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, benh, torvalds, npiggin, catalin.marinas, linux-arm-kernel

On Thu, Aug 30, 2018 at 05:15:34PM +0100, Will Deacon wrote:
> Peter Zijlstra (1):
>   asm-generic/tlb: Track freeing of page-table directories in struct
>     mmu_gather
> 
> Will Deacon (11):
>   arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()
>   arm64: tlb: Add DSB ISHST prior to TLBI in
>     __flush_tlb_[kernel_]pgtable()
>   arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()
>   arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()
>   arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()
>   arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code
>   asm-generic/tlb: Guard with #ifdef CONFIG_MMU
>   asm-generic/tlb: Track which levels of the page tables have been
>     cleared
>   arm64: tlb: Adjust stride and type of TLBI according to mmu_gather
>   arm64: tlb: Avoid synchronous TLBIs when freeing page tables
>   arm64: tlb: Rewrite stale comment in asm/tlbflush.h
> 
>  arch/arm64/Kconfig                |   1 +
>  arch/arm64/include/asm/pgtable.h  |  10 +++-
>  arch/arm64/include/asm/tlb.h      |  34 +++++-------
>  arch/arm64/include/asm/tlbflush.h | 112 ++++++++++++++++++++++++--------------
>  include/asm-generic/tlb.h         |  85 +++++++++++++++++++++++++----
>  mm/memory.c                       |   4 +-
>  6 files changed, 168 insertions(+), 78 deletions(-)

These patches look good to me, thanks!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-30 16:39 ` [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Linus Torvalds
@ 2018-08-31  1:00   ` Nicholas Piggin
  2018-08-31  1:04     ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Nicholas Piggin @ 2018-08-31  1:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Will Deacon, Linux Kernel Mailing List, Peter Zijlstra,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel

On Thu, 30 Aug 2018 09:39:38 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Aug 30, 2018 at 9:15 AM Will Deacon <will.deacon@arm.com> wrote:
> >
> > It's also had a lot more testing, but has held up nicely so far on arm64.
> > I haven't figured out how to merge this yet, but I'll probably end up pulling
> > the core changes out onto a separate branch.  
> 
> This looks fine, and I'm actually ok getting the core changes through
> the arm64 branch, since this has been discussed across architectures,
> and I think "whoever does the work gets to drive the car".

Well it would help if powerpc say wanted to start using them without a
merge cycle lag. Not a huge issue because powerpc already does
reasonably well here and there's other work that can be done.

I will try to review the core changes carefully next week.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31  1:00   ` Nicholas Piggin
@ 2018-08-31  1:04     ` Linus Torvalds
  2018-08-31  9:54       ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2018-08-31  1:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Will Deacon, Linux Kernel Mailing List, Peter Zijlstra,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel

On Thu, Aug 30, 2018 at 6:01 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Well it would help if powerpc say wanted to start using them without a
> merge cycle lag. Not a huge issue because powerpc already does
> reasonably well here and there's other work that can be done.

Sure. If somebody wants to send the generic changes I can just take
them directly to make it easier for people to work on this.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared
  2018-08-30 16:15 ` [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared Will Deacon
@ 2018-08-31  1:23   ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2018-08-31  1:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, peterz, benh, torvalds, catalin.marinas, linux-arm-kernel

On Thu, 30 Aug 2018 17:15:43 +0100
Will Deacon <will.deacon@arm.com> wrote:

> It is common for architectures with hugepage support to require only a
> single TLB invalidation operation per hugepage during unmap(), rather than
> iterating through the mapping at a PAGE_SIZE increment. Currently,
> however, the level in the page table where the unmap() operation occurs
> is not stored in the mmu_gather structure, therefore forcing
> architectures to issue additional TLB invalidation operations or to give
> up and over-invalidate by e.g. invalidating the entire TLB.
> 
> Ideally, we could add an interval rbtree to the mmu_gather structure,
> which would allow us to associate the correct mapping granule with the
> various sub-mappings within the range being invalidated. However, this
> is costly in terms of book-keeping and memory management, so instead we
> approximate by keeping track of the page table levels that are cleared
> and provide a means to query the smallest granule required for invalidation.

Actually the generic patches are pretty simple, and they look okay to
me. powerpc *should* be able to switch to Peter's patch with a few
lines of code with unchanged functionality as far as I can see.

These flags we may use as well, but even if not if x86 and arm64 are
using it, it seems reasonable to go in generic code for now. For the
3 generic patches,

Acked-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31  1:04     ` Linus Torvalds
@ 2018-08-31  9:54       ` Will Deacon
  2018-08-31 10:10         ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Will Deacon @ 2018-08-31  9:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Linux Kernel Mailing List, Peter Zijlstra,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel

On Thu, Aug 30, 2018 at 06:04:12PM -0700, Linus Torvalds wrote:
> On Thu, Aug 30, 2018 at 6:01 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > Well it would help if powerpc say wanted to start using them without a
> > merge cycle lag. Not a huge issue because powerpc already does
> > reasonably well here and there's other work that can be done.
> 
> Sure. If somebody wants to send the generic changes I can just take
> them directly to make it easier for people to work on this.

Tell you what: how about I stick the following patches (with Nick's and
Peter's acks) on a separate, stable branch:

  asm-generic/tlb: Track which levels of the page tables have been cleared
  asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
  asm-generic/tlb: Guard with #ifdef CONFIG_MMU

and then anybody who needs them can just pull that in for the merge window?

Also, how would people feel about adding a MAINTAINERS entry for all the
tlb.h files? A big part of the recent "fun" was us figuring out what the
code is actually doing ("It used to do foo() but that may have changed").
and it certainly took me the best part of a day to figure things out again.
If we're trying to do more in the generic code and less in the arch code,
it would help if we're on top of the changes in this area.

Proposal below (omitted Linus because that seems to be the pattern elsewhere
in the file and he's not going to shout at himself when things break :)
Anybody I've missed?

Will

--->8

diff --git a/MAINTAINERS b/MAINTAINERS
index a5b256b25905..7224b5618883 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9681,6 +9681,15 @@ S:	Maintained
 F:	arch/arm/boot/dts/mmp*
 F:	arch/arm/mach-mmp/

+MMU GATHER AND TLB INVALIDATION
+M:	Will Deacon <will.deacon@arm.com>
+M:	Nick Piggin <npiggin@gmail.com>
+M:	Peter Zijlstra <peterz@infradead.org>
+L:	linux-arch@vger.kernel.org
+S:	Maintained
+F:	include/asm-generic/tlb.h
+F:	arch/*/include/asm/tlb.h
+
 MN88472 MEDIA DRIVER
 M:	Antti Palosaari <crope@iki.fi>
 L:	linux-media@vger.kernel.org

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31  9:54       ` Will Deacon
@ 2018-08-31 10:10         ` Peter Zijlstra
  2018-08-31 10:32           ` Nicholas Piggin
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2018-08-31 10:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Nick Piggin, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel

On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote:

> Proposal below (omitted Linus because that seems to be the pattern elsewhere
> in the file and he's not going to shout at himself when things break :)
> Anybody I've missed?
> 
> Will
> 
> --->8
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a5b256b25905..7224b5618883 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9681,6 +9681,15 @@ S:	Maintained
>  F:	arch/arm/boot/dts/mmp*
>  F:	arch/arm/mach-mmp/
>  
> +MMU GATHER AND TLB INVALIDATION
> +M:	Will Deacon <will.deacon@arm.com>
> +M:	Nick Piggin <npiggin@gmail.com>
> +M:	Peter Zijlstra <peterz@infradead.org>
> +L:	linux-arch@vger.kernel.org
> +S:	Maintained
> +F:	include/asm-generic/tlb.h
> +F:	arch/*/include/asm/tlb.h
> +
>  MN88472 MEDIA DRIVER
>  M:	Antti Palosaari <crope@iki.fi>
>  L:	linux-media@vger.kernel.org

If we're going to do that (and I'm not opposed); it might make sense to
do something like the below and add:

 F:  mm/mmu_gather.c

---
 b/mm/mmu_gather.c         |  250 ++++++++++++++++++++++++++++++++++++++++++++++
 include/asm-generic/tlb.h |    2 
 mm/Makefile               |    2 
 mm/memory.c               |  247 ---------------------------------------------
 4 files changed, 253 insertions(+), 248 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -149,6 +149,8 @@ static inline void tlb_flush_mmu_tlbonly
 	__tlb_reset_range(tlb);
 }
 
+extern void tlb_flush_mmu_free(struct mmu_gather *tlb);
+
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
 					struct page *page, int page_size)
 {
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -22,7 +22,7 @@ KCOV_INSTRUMENT_mmzone.o := n
 KCOV_INSTRUMENT_vmstat.o := n
 
 mmu-y			:= nommu.o
-mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mincore.o \
+mmu-$(CONFIG_MMU)	:= gup.o highmem.o memory.o mmu_gather.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o \
 			   page_vma_mapped.o pagewalk.o pgtable-generic.o \
 			   rmap.o vmalloc.o
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -186,253 +186,6 @@ static void check_sync_rss_stat(struct t
 
 #endif /* SPLIT_RSS_COUNTING */
 
-#ifdef HAVE_GENERIC_MMU_GATHER
-
-static bool tlb_next_batch(struct mmu_gather *tlb)
-{
-	struct mmu_gather_batch *batch;
-
-	batch = tlb->active;
-	if (batch->next) {
-		tlb->active = batch->next;
-		return true;
-	}
-
-	if (tlb->batch_count == MAX_GATHER_BATCH_COUNT)
-		return false;
-
-	batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-	if (!batch)
-		return false;
-
-	tlb->batch_count++;
-	batch->next = NULL;
-	batch->nr   = 0;
-	batch->max  = MAX_GATHER_BATCH;
-
-	tlb->active->next = batch;
-	tlb->active = batch;
-
-	return true;
-}
-
-void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-				unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-
-	/* Is it from 0 to ~0? */
-	tlb->fullmm     = !(start | (end+1));
-	tlb->need_flush_all = 0;
-	tlb->local.next = NULL;
-	tlb->local.nr   = 0;
-	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
-	tlb->active     = &tlb->local;
-	tlb->batch_count = 0;
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
-	tlb->page_size = 0;
-
-	__tlb_reset_range(tlb);
-}
-
-static void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	struct mmu_gather_batch *batch;
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
-		free_pages_and_swap_cache(batch->pages, batch->nr);
-		batch->nr = 0;
-	}
-	tlb->active = &tlb->local;
-}
-
-void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-/* tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-void arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	struct mmu_gather_batch *batch, *next;
-
-	if (force)
-		__tlb_adjust_range(tlb, start, end - start);
-
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	for (batch = tlb->local.next; batch; batch = next) {
-		next = batch->next;
-		free_pages((unsigned long)batch, 0);
-	}
-	tlb->local.next = NULL;
-}
-
-/* __tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while
- *	handling the additional races in SMP caused by other CPUs caching valid
- *	mappings in their TLBs. Returns the number of free page slots left.
- *	When out of page slots we must call tlb_flush_mmu().
- *returns true if the caller should flush.
- */
-bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
-{
-	struct mmu_gather_batch *batch;
-
-	VM_BUG_ON(!tlb->end);
-	VM_WARN_ON(tlb->page_size != page_size);
-
-	batch = tlb->active;
-	/*
-	 * Add the page and check if we are full. If so
-	 * force a flush.
-	 */
-	batch->pages[batch->nr++] = page;
-	if (batch->nr == batch->max) {
-		if (!tlb_next_batch(tlb))
-			return true;
-		batch = tlb->active;
-	}
-	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
-
-	return false;
-}
-
-#endif /* HAVE_GENERIC_MMU_GATHER */
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-
-/*
- * See the comment near struct mmu_table_batch.
- */
-
-/*
- * If we want tlb_remove_table() to imply TLB invalidates.
- */
-static inline void tlb_table_invalidate(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
-	/*
-	 * Invalidate page-table caches used by hardware walkers. Then we still
-	 * need to RCU-sched wait while freeing the pages because software
-	 * walkers can still be in-flight.
-	 */
-	tlb_flush_mmu_tlbonly(tlb);
-#endif
-}
-
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely on
-	 * IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		tlb_table_invalidate(tlb);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			tlb_table_invalidate(tlb);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_table_flush(tlb);
-}
-
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
-
-/**
- * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down
- * @tlb: the mmu_gather structure to initialize
- * @mm: the mm_struct of the target address space
- * @start: start of the region that will be removed from the page-table
- * @end: end of the region that will be removed from the page-table
- *
- * Called to initialize an (on-stack) mmu_gather structure for page-table
- * tear-down from @mm. The @start and @end are set to 0 and -1
- * respectively when @mm is without users and we're going to destroy
- * the full address space (exit/execve).
- */
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	arch_tlb_gather_mmu(tlb, mm, start, end);
-	inc_tlb_flush_pending(tlb->mm);
-}
-
-void tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end)
-{
-	/*
-	 * If there are parallel threads are doing PTE changes on same range
-	 * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
-	 * flush by batching, a thread has stable TLB entry can fail to flush
-	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
-	 * forcefully if we detect parallel PTE batching threads.
-	 */
-	bool force = mm_tlb_flush_nested(tlb->mm);
-
-	arch_tlb_finish_mmu(tlb, start, end, force);
-	dec_tlb_flush_pending(tlb->mm);
-}
-
 /*
  * Note: this doesn't free the actual pages themselves. That
  * has been handled earlier when unmapping all the memory regions.
--- /dev/null
+++ b/mm/mmu_gather.c
@@ -0,0 +1,250 @@
+#include <linux/smp.h>
+#include "asm/tlb.h"
+
+#ifdef HAVE_GENERIC_MMU_GATHER
+
+static bool tlb_next_batch(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch;
+
+	batch = tlb->active;
+	if (batch->next) {
+		tlb->active = batch->next;
+		return true;
+	}
+
+	if (tlb->batch_count == MAX_GATHER_BATCH_COUNT)
+		return false;
+
+	batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+	if (!batch)
+		return false;
+
+	tlb->batch_count++;
+	batch->next = NULL;
+	batch->nr   = 0;
+	batch->max  = MAX_GATHER_BATCH;
+
+	tlb->active->next = batch;
+	tlb->active = batch;
+
+	return true;
+}
+
+void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
+				unsigned long start, unsigned long end)
+{
+	tlb->mm = mm;
+
+	/* Is it from 0 to ~0? */
+	tlb->fullmm     = !(start | (end+1));
+	tlb->need_flush_all = 0;
+	tlb->local.next = NULL;
+	tlb->local.nr   = 0;
+	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
+	tlb->active     = &tlb->local;
+	tlb->batch_count = 0;
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
+	tlb->page_size = 0;
+
+	__tlb_reset_range(tlb);
+}
+
+void tlb_flush_mmu_free(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch;
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
+	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
+		free_pages_and_swap_cache(batch->pages, batch->nr);
+		batch->nr = 0;
+	}
+	tlb->active = &tlb->local;
+}
+
+void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu_tlbonly(tlb);
+	tlb_flush_mmu_free(tlb);
+}
+
+/* tlb_finish_mmu
+ *	Called at the end of the shootdown operation to free up any resources
+ *	that were required.
+ */
+void arch_tlb_finish_mmu(struct mmu_gather *tlb,
+		unsigned long start, unsigned long end, bool force)
+{
+	struct mmu_gather_batch *batch, *next;
+
+	if (force)
+		__tlb_adjust_range(tlb, start, end - start);
+
+	tlb_flush_mmu(tlb);
+
+	/* keep the page table cache within bounds */
+	check_pgt_cache();
+
+	for (batch = tlb->local.next; batch; batch = next) {
+		next = batch->next;
+		free_pages((unsigned long)batch, 0);
+	}
+	tlb->local.next = NULL;
+}
+
+/* __tlb_remove_page
+ *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while
+ *	handling the additional races in SMP caused by other CPUs caching valid
+ *	mappings in their TLBs. Returns the number of free page slots left.
+ *	When out of page slots we must call tlb_flush_mmu().
+ *returns true if the caller should flush.
+ */
+bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
+{
+	struct mmu_gather_batch *batch;
+
+	VM_BUG_ON(!tlb->end);
+	VM_WARN_ON(tlb->page_size != page_size);
+
+	batch = tlb->active;
+	/*
+	 * Add the page and check if we are full. If so
+	 * force a flush.
+	 */
+	batch->pages[batch->nr++] = page;
+	if (batch->nr == batch->max) {
+		if (!tlb_next_batch(tlb))
+			return true;
+		batch = tlb->active;
+	}
+	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
+
+	return false;
+}
+
+#endif /* HAVE_GENERIC_MMU_GATHER */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+/*
+ * See the comment near struct mmu_table_batch.
+ */
+
+/*
+ * If we want tlb_remove_table() to imply TLB invalidates.
+ */
+static inline void tlb_table_invalidate(struct mmu_gather *tlb)
+{
+#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+	/*
+	 * Invalidate page-table caches used by hardware walkers. Then we still
+	 * need to RCU-sched wait while freeing the pages because software
+	 * walkers can still be in-flight.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+#endif
+}
+
+static void tlb_remove_table_smp_sync(void *arg)
+{
+	/* Simply deliver the interrupt */
+}
+
+static void tlb_remove_table_one(void *table)
+{
+	/*
+	 * This isn't an RCU grace period and hence the page-tables cannot be
+	 * assumed to be actually RCU-freed.
+	 *
+	 * It is however sufficient for software page-table walkers that rely on
+	 * IRQ disabling. See the comment near struct mmu_table_batch.
+	 */
+	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+	__tlb_remove_table(table);
+}
+
+static void tlb_remove_table_rcu(struct rcu_head *head)
+{
+	struct mmu_table_batch *batch;
+	int i;
+
+	batch = container_of(head, struct mmu_table_batch, rcu);
+
+	for (i = 0; i < batch->nr; i++)
+		__tlb_remove_table(batch->tables[i]);
+
+	free_page((unsigned long)batch);
+}
+
+void tlb_table_flush(struct mmu_gather *tlb)
+{
+	struct mmu_table_batch **batch = &tlb->batch;
+
+	if (*batch) {
+		tlb_table_invalidate(tlb);
+		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+		*batch = NULL;
+	}
+}
+
+void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	struct mmu_table_batch **batch = &tlb->batch;
+
+	if (*batch == NULL) {
+		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+		if (*batch == NULL) {
+			tlb_table_invalidate(tlb);
+			tlb_remove_table_one(table);
+			return;
+		}
+		(*batch)->nr = 0;
+	}
+
+	(*batch)->tables[(*batch)->nr++] = table;
+	if ((*batch)->nr == MAX_TABLE_BATCH)
+		tlb_table_flush(tlb);
+}
+
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
+/**
+ * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down
+ * @tlb: the mmu_gather structure to initialize
+ * @mm: the mm_struct of the target address space
+ * @start: start of the region that will be removed from the page-table
+ * @end: end of the region that will be removed from the page-table
+ *
+ * Called to initialize an (on-stack) mmu_gather structure for page-table
+ * tear-down from @mm. The @start and @end are set to 0 and -1
+ * respectively when @mm is without users and we're going to destroy
+ * the full address space (exit/execve).
+ */
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+	arch_tlb_gather_mmu(tlb, mm, start, end);
+	inc_tlb_flush_pending(tlb->mm);
+}
+
+void tlb_finish_mmu(struct mmu_gather *tlb,
+		unsigned long start, unsigned long end)
+{
+	/*
+	 * If there are parallel threads are doing PTE changes on same range
+	 * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
+	 * flush by batching, a thread has stable TLB entry can fail to flush
+	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
+	 * forcefully if we detect parallel PTE batching threads.
+	 */
+	bool force = mm_tlb_flush_nested(tlb->mm);
+
+	arch_tlb_finish_mmu(tlb, start, end, force);
+	dec_tlb_flush_pending(tlb->mm);
+}
+

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 10:10         ` Peter Zijlstra
@ 2018-08-31 10:32           ` Nicholas Piggin
  2018-08-31 10:49             ` Peter Zijlstra
  2018-09-03 12:52             ` Will Deacon
  0 siblings, 2 replies; 26+ messages in thread
From: Nicholas Piggin @ 2018-08-31 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, 31 Aug 2018 12:10:14 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote:
> 
> > Proposal below (omitted Linus because that seems to be the pattern elsewhere
> > in the file and he's not going to shout at himself when things break :)
> > Anybody I've missed?
> > 
> > Will
> >   
> > --->8  
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index a5b256b25905..7224b5618883 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -9681,6 +9681,15 @@ S:	Maintained
> >  F:	arch/arm/boot/dts/mmp*
> >  F:	arch/arm/mach-mmp/
> >  
> > +MMU GATHER AND TLB INVALIDATION
> > +M:	Will Deacon <will.deacon@arm.com>
> > +M:	Nick Piggin <npiggin@gmail.com>

Oh gee, I suppose. powerpc hash is kind of interesting because it's
crazy, Aneesh knows that code a lot better than I do. radix modulo
some minor details of exact instructions is fairly like x86 (he 
wrote a lot of that code too AFAIK).

> > +M:	Peter Zijlstra <peterz@infradead.org>
> > +L:	linux-arch@vger.kernel.org

Maybe put linux-mm as well? Or should there just be one list?

> > +S:	Maintained
> > +F:	include/asm-generic/tlb.h
> > +F:	arch/*/include/asm/tlb.h
> > +
> >  MN88472 MEDIA DRIVER
> >  M:	Antti Palosaari <crope@iki.fi>
> >  L:	linux-media@vger.kernel.org  
> 
> If we're going to do that (and I'm not opposed); it might make sense to
> do something like the below and add:
> 
>  F:  mm/mmu_gather.c

I think that is a good idea regardless. How do feel about calling it
tlb.c? Easier to type and autocompletes sooner.

> 
> ---
>  b/mm/mmu_gather.c         |  250 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/asm-generic/tlb.h |    2 
>  mm/Makefile               |    2 
>  mm/memory.c               |  247 ---------------------------------------------

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 10:32           ` Nicholas Piggin
@ 2018-08-31 10:49             ` Peter Zijlstra
  2018-08-31 11:12               ` Will Deacon
  2018-08-31 11:50               ` Nicholas Piggin
  2018-09-03 12:52             ` Will Deacon
  1 sibling, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2018-08-31 10:49 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote:
> Oh gee, I suppose. powerpc hash is kind of interesting because it's
> crazy, Aneesh knows that code a lot better than I do. radix modulo
> some minor details of exact instructions is fairly like x86 

The whole TLB broadcast vs explicit IPIs is a fairly big difference in
my book.

Anyway, have you guys tried the explicit IPI approach? Depending on how
IPIs are routed vs broadcasts it might save a little bus traffic. No
point in getting all CPUs to process the TLBI when there's only a hand
full that really need it.

OTOH, I suppose the broadcast thing has been optimized to death on the
hardware side, so who knows..

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 10:49             ` Peter Zijlstra
@ 2018-08-31 11:12               ` Will Deacon
  2018-08-31 11:20                 ` Peter Zijlstra
  2018-08-31 11:50               ` Nicholas Piggin
  1 sibling, 1 reply; 26+ messages in thread
From: Will Deacon @ 2018-08-31 11:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, Aug 31, 2018 at 12:49:45PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote:
> > Oh gee, I suppose. powerpc hash is kind of interesting because it's
> > crazy, Aneesh knows that code a lot better than I do. radix modulo
> > some minor details of exact instructions is fairly like x86 
> 
> The whole TLB broadcast vs explicit IPIs is a fairly big difference in
> my book.
> 
> Anyway, have you guys tried the explicit IPI approach? Depending on how
> IPIs are routed vs broadcasts it might save a little bus traffic. No
> point in getting all CPUs to process the TLBI when there's only a hand
> full that really need it.
> 
> OTOH, I suppose the broadcast thing has been optimized to death on the
> hardware side, so who knows..

You also can't IPI an IOMMU or a GPU ;)

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 11:12               ` Will Deacon
@ 2018-08-31 11:20                 ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2018-08-31 11:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nicholas Piggin, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, Aug 31, 2018 at 12:12:48PM +0100, Will Deacon wrote:
> On Fri, Aug 31, 2018 at 12:49:45PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote:
> > > Oh gee, I suppose. powerpc hash is kind of interesting because it's
> > > crazy, Aneesh knows that code a lot better than I do. radix modulo
> > > some minor details of exact instructions is fairly like x86 
> > 
> > The whole TLB broadcast vs explicit IPIs is a fairly big difference in
> > my book.
> > 
> > Anyway, have you guys tried the explicit IPI approach? Depending on how
> > IPIs are routed vs broadcasts it might save a little bus traffic. No
> > point in getting all CPUs to process the TLBI when there's only a hand
> > full that really need it.
> > 
> > OTOH, I suppose the broadcast thing has been optimized to death on the
> > hardware side, so who knows..
> 
> You also can't IPI an IOMMU or a GPU ;)

Oh, right you are. I suppose that is why x86-iommu is using those mmu_notifiers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 10:49             ` Peter Zijlstra
  2018-08-31 11:12               ` Will Deacon
@ 2018-08-31 11:50               ` Nicholas Piggin
  1 sibling, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2018-08-31 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, 31 Aug 2018 12:49:45 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote:
> > Oh gee, I suppose. powerpc hash is kind of interesting because it's
> > crazy, Aneesh knows that code a lot better than I do. radix modulo
> > some minor details of exact instructions is fairly like x86   
> 
> The whole TLB broadcast vs explicit IPIs is a fairly big difference in
> my book.

That's true I guess. Maybe arm64 is closer.

> Anyway, have you guys tried the explicit IPI approach? Depending on how
> IPIs are routed vs broadcasts it might save a little bus traffic. No
> point in getting all CPUs to process the TLBI when there's only a hand
> full that really need it.

It has been looked at now and again there's a lot of variables to
weigh. And things are also sized and speced to cover various
hypervisors, OSes, hash and radix, etc. This is something we need to
evaluate on radix a bit better.

> 
> OTOH, I suppose the broadcast thing has been optimized to death on the
> hardware side, so who knows..

There are some advantages of doing it in hardware. Also some of doing
IPIs though. The "problem" is actually Linux is well optimised and it
can be hard to notice much impact until you get to big systems. At
least I don't know of any problem workloads outside micro benchmarks or
stress tests.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64
  2018-08-31 10:32           ` Nicholas Piggin
  2018-08-31 10:49             ` Peter Zijlstra
@ 2018-09-03 12:52             ` Will Deacon
  1 sibling, 0 replies; 26+ messages in thread
From: Will Deacon @ 2018-09-03 12:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Benjamin Herrenschmidt, Catalin Marinas, linux-arm-kernel,
	Aneesh Kumar K.V

On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote:
> On Fri, 31 Aug 2018 12:10:14 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote:
> > 
> > > Proposal below (omitted Linus because that seems to be the pattern elsewhere
> > > in the file and he's not going to shout at himself when things break :)
> > > Anybody I've missed?
> > > 
> > > Will
> > >   
> > > --->8  
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index a5b256b25905..7224b5618883 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -9681,6 +9681,15 @@ S:	Maintained
> > >  F:	arch/arm/boot/dts/mmp*
> > >  F:	arch/arm/mach-mmp/
> > >  
> > > +MMU GATHER AND TLB INVALIDATION
> > > +M:	Will Deacon <will.deacon@arm.com>
> > > +M:	Nick Piggin <npiggin@gmail.com>
> 
> Oh gee, I suppose. powerpc hash is kind of interesting because it's
> crazy, Aneesh knows that code a lot better than I do. radix modulo
> some minor details of exact instructions is fairly like x86 (he 
> wrote a lot of that code too AFAIK).

Sure, as long as we have Power represented here. Would you rather add Aneesh
instead of yourself, or shall we just add you both?

> 
> > > +M:	Peter Zijlstra <peterz@infradead.org>
> > > +L:	linux-arch@vger.kernel.org
> 
> Maybe put linux-mm as well? Or should there just be one list?

If we do the landgrab on mmu_gather (which I think makes sense), then adding
both lists makes sense to me. I'll spin this as a proper patch, along with
Peter's code move.

> > > +S:	Maintained
> > > +F:	include/asm-generic/tlb.h
> > > +F:	arch/*/include/asm/tlb.h
> > > +
> > >  MN88472 MEDIA DRIVER
> > >  M:	Antti Palosaari <crope@iki.fi>
> > >  L:	linux-media@vger.kernel.org  
> > 
> > If we're going to do that (and I'm not opposed); it might make sense to
> > do something like the below and add:
> > 
> >  F:  mm/mmu_gather.c
> 
> I think that is a good idea regardless. How do feel about calling it
> tlb.c? Easier to type and autocompletes sooner.

No strong opinion on name, but I slightly prefer mmu_gather.c so that it
avoids any remote possibility of confusion with tlb.c vs hugetlb.c

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-09-03 12:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 16:15 [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Will Deacon
2018-08-30 16:15 ` [PATCH 01/12] arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() Will Deacon
2018-08-30 16:15 ` [PATCH 02/12] arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable() Will Deacon
2018-08-30 16:15 ` [PATCH 03/12] arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d() Will Deacon
2018-08-30 16:15 ` [PATCH 04/12] arm64: tlb: Justify non-leaf invalidation in flush_tlb_range() Will Deacon
2018-08-30 16:15 ` [PATCH 05/12] arm64: tlbflush: Allow stride to be specified for __flush_tlb_range() Will Deacon
2018-08-30 16:15 ` [PATCH 06/12] arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code Will Deacon
2018-08-30 16:15 ` [PATCH 07/12] asm-generic/tlb: Guard with #ifdef CONFIG_MMU Will Deacon
2018-08-30 16:15 ` [PATCH 08/12] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather Will Deacon
2018-08-30 16:15 ` [PATCH 09/12] asm-generic/tlb: Track which levels of the page tables have been cleared Will Deacon
2018-08-31  1:23   ` Nicholas Piggin
2018-08-30 16:15 ` [PATCH 10/12] arm64: tlb: Adjust stride and type of TLBI according to mmu_gather Will Deacon
2018-08-30 16:15 ` [PATCH 11/12] arm64: tlb: Avoid synchronous TLBIs when freeing page tables Will Deacon
2018-08-30 16:15 ` [PATCH 12/12] arm64: tlb: Rewrite stale comment in asm/tlbflush.h Will Deacon
2018-08-30 16:39 ` [PATCH 00/12] Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 Linus Torvalds
2018-08-31  1:00   ` Nicholas Piggin
2018-08-31  1:04     ` Linus Torvalds
2018-08-31  9:54       ` Will Deacon
2018-08-31 10:10         ` Peter Zijlstra
2018-08-31 10:32           ` Nicholas Piggin
2018-08-31 10:49             ` Peter Zijlstra
2018-08-31 11:12               ` Will Deacon
2018-08-31 11:20                 ` Peter Zijlstra
2018-08-31 11:50               ` Nicholas Piggin
2018-09-03 12:52             ` Will Deacon
2018-08-30 17:11 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).