linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/17] Fixup page directory freeing
@ 2019-12-11 12:07 Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 01/17] sh/tlb: Fix PGTABLE_LEVELS > 2 Peter Zijlstra
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Hi All,

While fixing a silly bug on SH (patch #1), I realized that even with the
trivial patch to restore prior behaviour, page directory freeing was still
broken.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this any concurrent page-table walk could end up with a Use-after-Free.
This is esp. trivial for anything that has software page-table walkers
(HAVE_FAST_GUP / software TLB fill) or the hardware caches partial page-walks
(ie. caches page directories).

Even on UP this might give issues, since mmu_gather is preemptible these days.
An interrupt or preempted task accessing user pages might stumble into the free
page if the hardware caches page directories.

So I've converted everything to always observe the above order, simply so we
don't have to worry about it.

If however I've been over zealous and your arch/mmu really doesn't need this
and you're offended by this potentially superfluous code, please let me know
and I'll replace the patch with one that adds a comment describing your
rationale for why it is not needed.

Also included are some patches that rename/document some of the mmu gather
options.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/17] sh/tlb: Fix PGTABLE_LEVELS > 2
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations Peter Zijlstra
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley,
	Geert Uytterhoeven

Geert reported that his SH7722-based Migo-R board failed to boot after
commit:

  c5b27a889da9 ("sh/tlb: Convert SH to generic mmu_gather")

That commit fell victim to copying the wrong pattern --
__pmd_free_tlb() used to be implemented with pmd_free().

Fixes: c5b27a889da9 ("sh/tlb: Convert SH to generic mmu_gather")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -12,6 +12,7 @@ extern void pgd_free(struct mm_struct *m
 extern void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd);
 extern pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address);
 extern void pmd_free(struct mm_struct *mm, pmd_t *pmd);
+#define __pmd_free_tlb(tlb, pmdp, addr)		pmd_free((tlb)->mm, (pmdp))
 #endif
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
@@ -33,13 +34,4 @@ do {							\
 	tlb_remove_page((tlb), (pte));			\
 } while (0)
 
-#if CONFIG_PGTABLE_LEVELS > 2
-#define __pmd_free_tlb(tlb, pmdp, addr)			\
-do {							\
-	struct page *page = virt_to_page(pmdp);		\
-	pgtable_pmd_page_dtor(page);			\
-	tlb_remove_page((tlb), page);			\
-} while (0);
-#endif
-
 #endif /* __ASM_SH_PGALLOC_H */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 01/17] sh/tlb: Fix PGTABLE_LEVELS > 2 Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 13:19   ` Geert Uytterhoeven
  2019-12-11 12:07 ` [PATCH 03/17] asm-generic/tlb: Add missing CONFIG symbol Peter Zijlstra
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

We removed the actual functions a while ago.

Fixes: 1808d65b55e4 ("asm-generic/tlb: Remove arch_tlb*_mmu()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    4 ----
 1 file changed, 4 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -277,11 +277,7 @@ struct mmu_gather {
 #endif
 };
 
-void arch_tlb_gather_mmu(struct mmu_gather *tlb,
-	struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);
-void arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			 unsigned long start, unsigned long end, bool force);
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 03/17] asm-generic/tlb: Add missing CONFIG symbol
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 01/17] sh/tlb: Fix PGTABLE_LEVELS > 2 Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 04/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_FREE Peter Zijlstra
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Without this the symbol will not actually end up in .config files.

Fixes: a30e32bd79e9 ("asm-generic/tlb: Provide generic tlb_flush() based on flush_tlb_mm()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -402,6 +402,9 @@ config HAVE_RCU_TABLE_NO_INVALIDATE
 config HAVE_MMU_GATHER_PAGE_SIZE
 	bool
 
+config MMU_GATHER_NO_RANGE
+	bool
+
 config HAVE_MMU_GATHER_NO_GATHER
 	bool
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 04/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_FREE
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (2 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 03/17] asm-generic/tlb: Add missing CONFIG symbol Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE Peter Zijlstra
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Towards a more consistent naming scheme.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig               |    2 +-
 arch/arm/Kconfig           |    2 +-
 arch/arm/include/asm/tlb.h |    2 +-
 arch/arm64/Kconfig         |    2 +-
 arch/powerpc/Kconfig       |    4 ++--
 arch/s390/Kconfig          |    2 +-
 arch/sparc/Kconfig         |    4 ++--
 arch/x86/Kconfig           |    2 +-
 arch/x86/include/asm/tlb.h |    4 ++--
 include/asm-generic/tlb.h  |   10 +++++-----
 mm/gup.c                   |    2 +-
 mm/mmu_gather.c            |    8 ++++----
 12 files changed, 22 insertions(+), 22 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -393,7 +393,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_ARCH_JUMP_LABEL_RELATIVE
 	bool
 
-config HAVE_RCU_TABLE_FREE
+config MMU_GATHER_RCU_TABLE_FREE
 	bool
 
 config HAVE_RCU_TABLE_NO_INVALIDATE
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -101,7 +101,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
-	select HAVE_RCU_TABLE_FREE if SMP && ARM_LPAE
+	select MMU_GATHER_RCU_TABLE_FREE if SMP && ARM_LPAE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -37,7 +37,7 @@ static inline void __tlb_remove_table(vo
 
 #include <asm-generic/tlb.h>
 
-#ifndef CONFIG_HAVE_RCU_TABLE_FREE
+#ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 #define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
 #endif
 
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -161,7 +161,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_FUNCTION_ARG_ACCESS_API
-	select HAVE_RCU_TABLE_FREE
+	select MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -222,8 +222,8 @@ config PPC
 	select HAVE_HARDLOCKUP_DETECTOR_PERF	if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
-	select HAVE_RCU_TABLE_FREE		if SMP
-	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
+	select MMU_GATHER_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -168,7 +168,7 @@ config S390
 	select HAVE_OPROFILE
 	select HAVE_PCI
 	select HAVE_PERF_EVENTS
-	select HAVE_RCU_TABLE_FREE
+	select MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_RSEQ
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,8 +64,8 @@ config SPARC64
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
-	select HAVE_RCU_TABLE_FREE if SMP
-	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
+	select MMU_GATHER_RCU_TABLE_FREE if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -200,7 +200,7 @@ config X86
 	select HAVE_PCI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
-	select HAVE_RCU_TABLE_FREE		if PARAVIRT
+	select MMU_GATHER_RCU_TABLE_FREE		if PARAVIRT
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_FUNCTION_ARG_ACCESS_API
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -29,8 +29,8 @@ static inline void tlb_flush(struct mmu_
  * shootdown, enablement code for several hypervisors overrides
  * .flush_tlb_others hook in pv_mmu_ops and implements it by issuing
  * a hypercall. To keep software pagetable walkers safe in this case we
- * switch to RCU based table free (HAVE_RCU_TABLE_FREE). See the comment
- * below 'ifdef CONFIG_HAVE_RCU_TABLE_FREE' in include/asm-generic/tlb.h
+ * switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the comment
+ * below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
  * for more details.
  */
 static inline void __tlb_remove_table(void *table)
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -126,7 +126,7 @@
  *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
  *  changes the size and provides mmu_gather::page_size to tlb_flush().
  *
- *  HAVE_RCU_TABLE_FREE
+ *  MMU_GATHER_RCU_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
  *  for page directores (__p*_free_tlb()). This provides separate freeing of
@@ -139,9 +139,9 @@
  *
  *  HAVE_RCU_TABLE_NO_INVALIDATE
  *
- *  This makes HAVE_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
+ *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
  *  freeing the page-table pages. This can be avoided if you use
- *  HAVE_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
+ *  MMU_GATHER_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
  *  page-tables natively.
  *
  *  MMU_GATHER_NO_RANGE
@@ -149,7 +149,7 @@
  *  Use this if your architecture lacks an efficient flush_tlb_range().
  */
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
  *
@@ -227,7 +227,7 @@ extern bool __tlb_remove_page_size(struc
 struct mmu_gather {
 	struct mm_struct	*mm;
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
 
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1729,7 +1729,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  * Before activating this code, please be aware that the following assumptions
  * are currently made:
  *
- *  *) Either HAVE_RCU_TABLE_FREE is enabled, and tlb_remove_table() is used to
+ *  *) Either MMU_GATHER_RCU_TABLE_FREE is enabled, and tlb_remove_table() is used to
  *  free pages containing page tables or TLB flushing requires IPI broadcast.
  *
  *  *) ptes can be read atomically by the architecture.
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,7 +91,7 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 
 /*
  * See the comment near struct mmu_table_batch.
@@ -173,11 +173,11 @@ void tlb_remove_table(struct mmu_gather
 		tlb_table_flush(tlb);
 }
 
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
 
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
 #ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
@@ -220,7 +220,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->batch_count = 0;
 #endif
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
 #ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (3 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 04/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_FREE Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-16 12:31   ` Aneesh Kumar K.V
  2019-12-11 12:07 ` [PATCH 06/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Towards a more consistent naming scheme.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    3 ++-
 arch/powerpc/Kconfig      |    2 +-
 arch/sparc/Kconfig        |    2 +-
 include/asm-generic/tlb.h |    2 +-
 mm/mmu_gather.c           |    2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -396,8 +396,9 @@ config HAVE_ARCH_JUMP_LABEL_RELATIVE
 config MMU_GATHER_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_NO_INVALIDATE
+config MMU_GATHER_NO_TABLE_INVALIDATE
 	bool
+	depends on MMU_GATHER_RCU_TABLE_FREE
 
 config HAVE_MMU_GATHER_PAGE_SIZE
 	bool
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -223,7 +223,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select MMU_GATHER_RCU_TABLE_FREE		if SMP
-	select HAVE_RCU_TABLE_NO_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
+	select MMU_GATHER_NO_TABLE_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -65,7 +65,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select MMU_GATHER_RCU_TABLE_FREE if SMP
-	select HAVE_RCU_TABLE_NO_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
+	select MMU_GATHER_NO_TABLE_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -137,7 +137,7 @@
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
- *  HAVE_RCU_TABLE_NO_INVALIDATE
+ *  MMU_GATHER_NO_TABLE_INVALIDATE
  *
  *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
  *  freeing the page-table pages. This can be avoided if you use
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -102,7 +102,7 @@ bool __tlb_remove_page_size(struct mmu_g
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
+#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
 	/*
 	 * Invalidate page-table caches used by hardware walkers. Then we still
 	 * need to RCU-sched wait while freeing the pages because software



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 06/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_PAGE_SIZE
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (4 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 07/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Towards a more consistent naming scheme.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    2 +-
 arch/powerpc/Kconfig      |    2 +-
 include/asm-generic/tlb.h |    9 ++++++---
 mm/mmu_gather.c           |    4 ++--
 4 files changed, 10 insertions(+), 7 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -400,7 +400,7 @@ config MMU_GATHER_NO_TABLE_INVALIDATE
 	bool
 	depends on MMU_GATHER_RCU_TABLE_FREE
 
-config HAVE_MMU_GATHER_PAGE_SIZE
+config MMU_GATHER_PAGE_SIZE
 	bool
 
 config MMU_GATHER_NO_RANGE
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -224,7 +224,7 @@ config PPC
 	select HAVE_PERF_USER_STACK_DUMP
 	select MMU_GATHER_RCU_TABLE_FREE		if SMP
 	select MMU_GATHER_NO_TABLE_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
-	select HAVE_MMU_GATHER_PAGE_SIZE
+	select MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -121,11 +121,14 @@
  *
  * Additionally there are a few opt-in features:
  *
- *  HAVE_MMU_GATHER_PAGE_SIZE
+ *  MMU_GATHER_PAGE_SIZE
  *
  *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
  *  changes the size and provides mmu_gather::page_size to tlb_flush().
  *
+ *  This might be useful if your architecture has size specific TLB
+ *  invalidation instructions.
+ *
  *  MMU_GATHER_RCU_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
@@ -271,7 +274,7 @@ struct mmu_gather {
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
 
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+#ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	unsigned int page_size;
 #endif
 #endif
@@ -422,7 +425,7 @@ static inline void tlb_remove_page(struc
 static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+#ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	if (tlb->page_size && tlb->page_size != page_size) {
 		if (!tlb->fullmm && !tlb->need_flush_all)
 			tlb_flush_mmu(tlb);
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -69,7 +69,7 @@ bool __tlb_remove_page_size(struct mmu_g
 
 	VM_BUG_ON(!tlb->end);
 
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+#ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	VM_WARN_ON(tlb->page_size != page_size);
 #endif
 
@@ -223,7 +223,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+#ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
 #endif
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 07/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_NO_GATHER
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (5 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 06/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE Peter Zijlstra
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Towards a more consistent naming scheme.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    2 +-
 arch/s390/Kconfig         |    2 +-
 include/asm-generic/tlb.h |   14 ++++++++++++--
 mm/mmu_gather.c           |   10 +++++-----
 4 files changed, 19 insertions(+), 9 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -406,7 +406,7 @@ config MMU_GATHER_PAGE_SIZE
 config MMU_GATHER_NO_RANGE
 	bool
 
-config HAVE_MMU_GATHER_NO_GATHER
+config MMU_GATHER_NO_GATHER
 	bool
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -162,7 +162,7 @@ config S390
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MEMBLOCK_PHYS_MAP
-	select HAVE_MMU_GATHER_NO_GATHER
+	select MMU_GATHER_NO_GATHER
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NOP_MCOUNT
 	select HAVE_OPROFILE
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -147,6 +147,16 @@
  *  MMU_GATHER_NO_RANGE
  *
  *  Use this if your architecture lacks an efficient flush_tlb_range().
+ *
+ *  MMU_GATHER_NO_GATHER
+ *
+ *  If the option is set the mmu_gather will not track individual pages for
+ *  delayed page free anymore. A platform that enables the option needs to
+ *  provide its own implementation of the __tlb_remove_page_size() function to
+ *  free pages.
+ *
+ *  This is useful if your architecture already flushes TLB entries in the
+ *  various ptep_get_and_clear() functions.
  */
 
 #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
@@ -191,7 +201,7 @@ extern void tlb_remove_table(struct mmu_
 
 #endif
 
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -266,7 +276,7 @@ struct mmu_gather {
 
 	unsigned int		batch_count;
 
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -11,7 +11,7 @@
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 
 static bool tlb_next_batch(struct mmu_gather *tlb)
 {
@@ -89,7 +89,7 @@ bool __tlb_remove_page_size(struct mmu_g
 	return false;
 }
 
-#endif /* HAVE_MMU_GATHER_NO_GATHER */
+#endif /* MMU_GATHER_NO_GATHER */
 
 #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 
@@ -180,7 +180,7 @@ static void tlb_flush_mmu_free(struct mm
 #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 	tlb_batch_pages_flush(tlb);
 #endif
 }
@@ -211,7 +211,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	/* Is it from 0 to ~0? */
 	tlb->fullmm     = !(start | (end+1));
 
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 	tlb->need_flush_all = 0;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
@@ -271,7 +271,7 @@ void tlb_finish_mmu(struct mmu_gather *t
 
 	tlb_flush_mmu(tlb);
 
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+#ifndef CONFIG_MMU_GATHER_NO_GATHER
 	tlb_batch_list_free(tlb);
 #endif
 	dec_tlb_flush_pending(tlb->mm);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (6 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 07/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-12  9:30   ` Peter Zijlstra
  2019-12-12  9:32   ` [PATCH mk-II " Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 09/17] sh/tlb: Fix __pmd_free_tlb() Peter Zijlstra
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

As described in the comment, the correct order for freeing pages is:

 1) unhook page
 2) TLB invalidate page
 3) free page

This order equally applies to page directories.

Currently there are two correct options:

 - use tlb_remove_page(), when all page directores are full pages and
   there are no futher contraints placed by things like software
   walkers (HAVE_FAST_GUP).

 - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
   architecture does not do IPI based TLB invalidate and has
   HAVE_FAST_GUP (or software TLB fill).

This however leaves architectures that don't have page based
directories but don't need RCU in a bind. For those, provide
MMU_GATHER_TABLE_FREE, which provides the independent batching for
directories without the additional RCU freeing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    5 +
 include/asm-generic/tlb.h |   76 ++++++++++++++---------------
 mm/mmu_gather.c           |  120 +++++++++++++++++++++++++++++++++-------------
 3 files changed, 131 insertions(+), 70 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -393,8 +393,12 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_ARCH_JUMP_LABEL_RELATIVE
 	bool
 
+config MMU_GATHER_TABLE_FREE
+	bool
+
 config MMU_GATHER_RCU_TABLE_FREE
 	bool
+	select MMU_GATHER_TABLE_FREE
 
 config MMU_GATHER_NO_TABLE_INVALIDATE
 	bool
@@ -408,6 +412,7 @@ config MMU_GATHER_NO_RANGE
 
 config MMU_GATHER_NO_GATHER
 	bool
+	depends on MMU_GATHER_TABLE_FREE
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -56,6 +56,15 @@
  *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
  *    there's large holes between the VMAs.
  *
+ *  - tlb_remove_table()
+ *
+ *    tlb_remove_table() is the basic primitive to free page-table directories
+ *    (__p*_free_tlb()).  In it's most primitive form it is an alias for
+ *    tlb_remove_page() below, for when page directories are pages and have no
+ *    additional constraints.
+ *
+ *    See also MMU_GATHER_TABLE_FREE and MMU_GATHER_RCU_TABLE_FREE.
+ *
  *  - tlb_remove_page() / __tlb_remove_page()
  *  - tlb_remove_page_size() / __tlb_remove_page_size()
  *
@@ -129,21 +138,28 @@
  *  This might be useful if your architecture has size specific TLB
  *  invalidation instructions.
  *
- *  MMU_GATHER_RCU_TABLE_FREE
+ *  MMU_GATHER_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
- *  for page directores (__p*_free_tlb()). This provides separate freeing of
- *  the page-table pages themselves in a semi-RCU fashion (see comment below).
- *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
- *  and therefore doesn't naturally serialize with software page-table walkers.
+ *  for page directores (__p*_free_tlb()).
+ *
+ *  Useful if your architecture has non-page page directories.
  *
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
+ *  MMU_GATHER_RCU_TABLE_FREE
+ *
+ *  Like MMU_GATHER_TABLE_FREE, and adds semi-RCU semantics to the free (see
+ *  comment below).
+ *
+ *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
+ *  and therefore doesn't naturally serialize with software page-table walkers.
+ *
  *  MMU_GATHER_NO_TABLE_INVALIDATE
  *
- *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
- *  freeing the page-table pages. This can be avoided if you use
+ *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly()
+ *  before freeing the page-table pages. This can be avoided if you use
  *  MMU_GATHER_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
  *  page-tables natively.
  *
@@ -162,37 +178,12 @@
  *  various ptep_get_and_clear() functions.
  */
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-/*
- * Semi RCU freeing of the page directories.
- *
- * This is needed by some architectures to implement software pagetable walkers.
- *
- * gup_fast() and other software pagetable walkers do a lockless page-table
- * walk and therefore needs some synchronization with the freeing of the page
- * directories. The chosen means to accomplish that is by disabling IRQs over
- * the walk.
- *
- * Architectures that use IPIs to flush TLBs will then automagically DTRT,
- * since we unlink the page, flush TLBs, free the page. Since the disabling of
- * IRQs delays the completion of the TLB flush we can never observe an already
- * freed page.
- *
- * Architectures that do not have this (PPC) need to delay the freeing by some
- * other means, this is that means.
- *
- * What we do is batch the freed directory pages (tables) and RCU free them.
- * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
- * holds off grace periods.
- *
- * However, in order to batch these pages we need to allocate storage, this
- * allocation is deep inside the MM code and can thus easily fail on memory
- * pressure. To guarantee progress we fall back to single table freeing, see
- * the implementation of tlb_remove_table_one().
- *
- */
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
+
 struct mmu_table_batch {
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	struct rcu_head		rcu;
+#endif
 	unsigned int		nr;
 	void			*tables[0];
 };
@@ -202,7 +193,16 @@ struct mmu_table_batch {
 
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
-#endif
+#else /* !CONFIG_MMU_GATHER_HAVE_TABLE_FREE */
+
+/*
+ * Without either HAVE_TABLE_FREE || CONFIG_HAVE_RCU_TABLE_FREE the
+ * architecture is assumed to have page based page directories and
+ * we can use the normal page batching to free them.
+ */
+#define tlb_remove_table(tlb, page) tlb_remove_page((tlb), (page))
+
+#endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
 /*
@@ -240,7 +240,7 @@ extern bool __tlb_remove_page_size(struc
 struct mmu_gather {
 	struct mm_struct	*mm;
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
 
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,56 +91,106 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* MMU_GATHER_NO_GATHER */
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
 
-/*
- * See the comment near struct mmu_table_batch.
- */
+static void __tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	int i;
+
+	for (i = 0; i < batch->nr; i++)
+		__tlb_remove_table(batch->tables[i]);
+
+	free_page((unsigned long)batch);
+}
+
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 
 /*
- * If we want tlb_remove_table() to imply TLB invalidates.
+ * Semi RCU freeing of the page directories.
+ *
+ * This is needed by some architectures to implement software pagetable walkers.
+ *
+ * gup_fast() and other software pagetable walkers do a lockless page-table
+ * walk and therefore needs some synchronization with the freeing of the page
+ * directories. The chosen means to accomplish that is by disabling IRQs over
+ * the walk.
+ *
+ * Architectures that use IPIs to flush TLBs will then automagically DTRT,
+ * since we unlink the page, flush TLBs, free the page. Since the disabling of
+ * IRQs delays the completion of the TLB flush we can never observe an already
+ * freed page.
+ *
+ * Architectures that do not have this (PPC) need to delay the freeing by some
+ * other means, this is that means.
+ *
+ * What we do is batch the freed directory pages (tables) and RCU free them.
+ * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
+ * holds off grace periods.
+ *
+ * However, in order to batch these pages we need to allocate storage, this
+ * allocation is deep inside the MM code and can thus easily fail on memory
+ * pressure. To guarantee progress we fall back to single table freeing, see
+ * the implementation of tlb_remove_table_one().
+ *
  */
-static inline void tlb_table_invalidate(struct mmu_gather *tlb)
-{
-#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
-	/*
-	 * Invalidate page-table caches used by hardware walkers. Then we still
-	 * need to RCU-sched wait while freeing the pages because software
-	 * walkers can still be in-flight.
-	 */
-	tlb_flush_mmu_tlbonly(tlb);
-#endif
-}
 
 static void tlb_remove_table_smp_sync(void *arg)
 {
 	/* Simply deliver the interrupt */
 }
 
-static void tlb_remove_table_one(void *table)
+static void tlb_remove_table_sync_one(void)
 {
 	/*
 	 * This isn't an RCU grace period and hence the page-tables cannot be
 	 * assumed to be actually RCU-freed.
 	 *
 	 * It is however sufficient for software page-table walkers that rely on
-	 * IRQ disabling. See the comment near struct mmu_table_batch.
+	 * IRQ disabling.
 	 */
 	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
 }
 
 static void tlb_remove_table_rcu(struct rcu_head *head)
 {
-	struct mmu_table_batch *batch;
-	int i;
+	__tlb_remove_table_free(container_of(head, struct mmu_table_batch, rcu));
+}
 
-	batch = container_of(head, struct mmu_table_batch, rcu);
+static void tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	call_rcu(&batch->rcu, tlb_remove_table_rcu);
+}
 
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
+#else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */
 
-	free_page((unsigned long)batch);
+static void tlb_remove_table_sync_one(void) { }
+
+static void tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	__tlb_remove_table_free(batch);
+}
+
+#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
+
+/*
+ * If we want tlb_remove_table() to imply TLB invalidates.
+ */
+static inline void tlb_table_invalidate(struct mmu_gather *tlb)
+{
+#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
+	/*
+	 * Invalidate page-table caches used by hardware walkers. Then we still
+	 * need to RCU-sched wait while freeing the pages because software
+	 * walkers can still be in-flight.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+#endif
+}
+
+static void tlb_remove_table_one(void *table)
+{
+	tlb_remove_table_sync_one();
+	__tlb_remove_table(table);
 }
 
 static void tlb_table_flush(struct mmu_gather *tlb)
@@ -149,7 +199,7 @@ static void tlb_table_flush(struct mmu_g
 
 	if (*batch) {
 		tlb_table_invalidate(tlb);
-		call_rcu(&(*batch)->rcu, tlb_remove_table_rcu);
+		tlb_remove_table_free(*batch);
 		*batch = NULL;
 	}
 }
@@ -173,13 +223,21 @@ void tlb_remove_table(struct mmu_gather
 		tlb_table_flush(tlb);
 }
 
-#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
+static inline void tlb_table_init(struct mmu_gather *tlb)
+{
+	tlb->batch = NULL;
+}
+
+#else /* !CONFIG_MMU_GATHER_TABLE_FREE */
+
+static inline void tlb_table_flush(struct mmu_gather *tlb) { }
+static inline void tlb_table_init(struct mmu_gather *tlb) { }
+
+#endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
-#endif
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
 	tlb_batch_pages_flush(tlb);
 #endif
@@ -220,9 +278,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->batch_count = 0;
 #endif
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
+	tlb_table_init(tlb);
 #ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
 #endif



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 09/17] sh/tlb: Fix __pmd_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (7 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 10/17] sparc32/tlb: Fix __p*_free_tlb() Peter Zijlstra
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

SH violates the freeing order for it's PMD page directories. It's
__pmd_free_tlb() does not ensure there is a TLB invalidation between
itself and the eventualy freeing of the page.

Further complicating the situation is that SH uses non page based
allocation for it's PMDs.

Use the shiny new HAVE_TABLE_FREE option to enable a custom page table
freeer.

(SuperH uses IPI based TLB invalidation and therefore doesn't need
HAVE_RCU_TABLE_FREE for its HAVE_FAST_GUP).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sh/Kconfig               |    1 +
 arch/sh/include/asm/pgalloc.h |    3 ++-
 arch/sh/mm/pgtable.c          |    6 ++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -15,6 +15,7 @@ config SUPERH
 	select HAVE_PERF_EVENTS
 	select HAVE_DEBUG_BUGVERBOSE
 	select HAVE_FAST_GUP if MMU
+	select MMU_GATHER_TABLE_FREE if X2TLB
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
 	select ARCH_HAS_GCOV_PROFILE_ALL
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -12,7 +12,8 @@ extern void pgd_free(struct mm_struct *m
 extern void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd);
 extern pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address);
 extern void pmd_free(struct mm_struct *mm, pmd_t *pmd);
-#define __pmd_free_tlb(tlb, pmdp, addr)		pmd_free((tlb)->mm, (pmdp))
+extern void __tlb_remove_table(void *table);
+#define __pmd_free_tlb(tlb, pmdp, addr)	tlb_remove_table((tlb), (pmdp))
 #endif
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
--- a/arch/sh/mm/pgtable.c
+++ b/arch/sh/mm/pgtable.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <asm/pgalloc.h>
 
 #define PGALLOC_GFP GFP_KERNEL | __GFP_ZERO
 
@@ -55,4 +56,9 @@ void pmd_free(struct mm_struct *mm, pmd_
 {
 	kmem_cache_free(pmd_cachep, pmd);
 }
+
+void __tlb_remove_table(void *table)
+{
+	pmd_free(NULL, table);
+}
 #endif /* PAGETABLE_LEVELS > 2 */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 10/17] sparc32/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (8 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 09/17] sh/tlb: Fix __pmd_free_tlb() Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 11/17] parisc/tlb: " Peter Zijlstra
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Because Sparc32 has non-page based page directories, use a custom
table freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sparc/Kconfig                  |    1 +
 arch/sparc/include/asm/pgalloc_32.h |    7 +++++--
 arch/sparc/mm/srmmu.c               |   28 ++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -57,6 +57,7 @@ config SPARC32
 	select CLZ_TAB
 	select HAVE_UID16
 	select OLD_SIGACTION
+	select MMU_GATHER_TABLE_FREE
 
 config SPARC64
 	def_bool 64BIT
--- a/arch/sparc/include/asm/pgalloc_32.h
+++ b/arch/sparc/include/asm/pgalloc_32.h
@@ -12,6 +12,9 @@
 
 struct page;
 
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int idx);
+extern void __tlb_remove_table(void *table);
+
 void *srmmu_get_nocache(int size, int align);
 void srmmu_free_nocache(void *addr, int size);
 
@@ -48,7 +51,7 @@ static inline void free_pmd_fast(pmd_t *
 }
 
 #define pmd_free(mm, pmd)		free_pmd_fast(pmd)
-#define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, addr)	pgtable_free_tlb((tlb), (pmd), 1)
 
 void pmd_populate(struct mm_struct *mm, pmd_t *pmdp, struct page *ptep);
 #define pmd_pgtable(pmd) pmd_page(pmd)
@@ -72,6 +75,6 @@ static inline void free_pte_fast(pte_t *
 #define pte_free_kernel(mm, pte)	free_pte_fast(pte)
 
 void pte_free(struct mm_struct * mm, pgtable_t pte);
-#define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
+#define __pte_free_tlb(tlb, pte, addr)	pgtable_free_tlb((tlb), (pte), 0)
 
 #endif /* _SPARC_PGALLOC_H */
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -38,6 +38,7 @@
 #include <asm/page.h>
 #include <asm/asi.h>
 #include <asm/smp.h>
+#include <asm/tlb.h>
 #include <asm/io.h>
 
 /* Now the cpu specific definitions. */
@@ -1849,3 +1850,30 @@ void __init load_mmu(void)
 		sun4m_init_smp();
 #endif
 }
+
+#define TLB_IDX_MASK	1UL
+
+void __tlb_remove_table(void *table)
+{
+	void *dir = (void *)((unsigned long)table & ~TLB_IDX_MASK);
+	int idx = (unsigned long)table & TLB_IDX_MASK;
+
+	switch (idx) {
+	case 1: /* PMD */
+		pmd_free(NULL, dir);
+		break;
+	case 0: /* PTE */
+		pte_free(NULL, dir);
+		break;
+	}
+}
+
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int idx)
+{
+	unsigned long pgf = (unsigned long)table;
+	BUG_ON(idx > TLB_IDX_MASK);
+	BUG_ON(pgf & TLB_IDX_MASK);
+	pgf |= idx;
+	tlb_remove_table(tlb, (void *)pgf);
+}
+



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 11/17] parisc/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (9 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 10/17] sparc32/tlb: Fix __p*_free_tlb() Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 12/17] m68k/tlb: " Peter Zijlstra
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Because PARISC has non-page-size PMDs, use a custom table freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/parisc/Kconfig               |    1 +
 arch/parisc/include/asm/pgalloc.h |   25 ++++++++++++++++++++++---
 arch/parisc/include/asm/tlb.h     |    7 +++++--
 3 files changed, 28 insertions(+), 5 deletions(-)

--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -62,6 +62,7 @@ config PARISC
 	select HAVE_FTRACE_MCOUNT_RECORD if HAVE_DYNAMIC_FTRACE
 	select HAVE_KPROBES_ON_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
+	select MMU_GATHER_TABLE_FREE if PGTABLE_LEVELS >= 3
 
 	help
 	  The PA-RISC microprocessor is designed by Hewlett-Packard and used
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -73,7 +73,7 @@ static inline pmd_t *pmd_alloc_one(struc
 	return pmd;
 }
 
-static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+static inline bool __pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
 	if (pmd_flag(*pmd) & PxD_FLAG_ATTACHED) {
 		/*
@@ -83,11 +83,28 @@ static inline void pmd_free(struct mm_st
 		 * done by generic mm code.
 		 */
 		mm_inc_nr_pmds(mm);
-		return;
+		return false;
 	}
-	free_pages((unsigned long)pmd, PMD_ORDER);
+	return true;
+}
+
+static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+	if (__pmd_free(mm, pmd))
+		free_pages((unsigned long)pmd, PMD_ORDER);
 }
 
+static inline void __tlb_remove_table(void *table)
+{
+	free_pages((unsigned long)table, PMD_ORDER);
+}
+
+#define __pmd_free_tlb(tlb, pmd, addr)		\
+do {						\
+	if (__pmd_free((tlb)->mm, (pmd)))	\
+		tlb_remove_table((tlb), (pmd));	\
+} while (0)
+
 #else
 
 /* Two Level Page Table Support for pmd's */
@@ -101,6 +118,8 @@ static inline void pmd_free(struct mm_st
 #define pmd_free(mm, x)			do { } while (0)
 #define pgd_populate(mm, pmd, pte)	BUG()
 
+#define __pmd_free_tlb(tlb, pmd, addr)	do { } while (0)
+
 #endif
 
 static inline void
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -4,7 +4,10 @@
 
 #include <asm-generic/tlb.h>
 
-#define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tlb)->mm, pmd)
-#define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
+#define __pte_free_tlb(tlb,pte,addr)			\
+do {							\
+	pgtable_pte_page_dtor(pte);			\
+	tlb_remove_page((tlb), (pte));			\
+} while (0)
 
 #endif



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 12/17] m68k/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (10 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 11/17] parisc/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 13/17] mips/tlb: " Peter Zijlstra
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

The Motorola MMU has funny PMD stuff, so use a custom table freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/m68k/Kconfig                        |    1 +
 arch/m68k/include/asm/mcf_pgalloc.h      |   11 +++++------
 arch/m68k/include/asm/motorola_pgalloc.h |   22 ++++++++++------------
 3 files changed, 16 insertions(+), 18 deletions(-)

--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -31,6 +31,7 @@ config M68K
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
 	select MMU_GATHER_NO_RANGE if MMU
+	select MMU_GATHER_TABLE_FREE if MMU_MOTOROLA
 
 config CPU_BIG_ENDIAN
 	def_bool y
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -38,12 +38,11 @@ extern inline pmd_t *pmd_alloc_kernel(pg
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t page,
-				  unsigned long address)
-{
-	pgtable_pte_page_dtor(page);
-	__free_page(page);
-}
+#define __pte_free_tlb(tlb, pte, address)	\
+do {						\
+	pgtable_pte_page_dtor(pte);		\
+	tlb_remove_page((tlb), (pte));		\
+} while (0)
 
 #define __pmd_free_tlb(tlb, pmd, address) do { } while (0)
 
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -57,15 +57,13 @@ static inline void pte_free(struct mm_st
 	__free_page(page);
 }
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t page,
-				  unsigned long address)
-{
-	pgtable_pte_page_dtor(page);
-	cache_page(kmap(page));
-	kunmap(page);
-	__free_page(page);
-}
-
+#define __pte_free_tlb(tlb, pte, address)		\
+do {							\
+	pgtable_pte_page_dtor(pte);			\
+	cache_page(kmap(pte));				\
+	kunmap(pte);					\
+	tlb_remove_page((tlb), (pte));			\
+} while (0)
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
@@ -77,12 +75,12 @@ static inline int pmd_free(struct mm_str
 	return free_pointer_table(pmd);
 }
 
-static inline int __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-				 unsigned long address)
+static inline void __tlb_remove_table(void *table)
 {
-	return free_pointer_table(pmd);
+	free_pointer_table(table);
 }
 
+#define __pmd_free_tlb(tlb, pmd, address) tlb_remove_table((tlb), (pmd))
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 13/17] mips/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (11 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 12/17] m68k/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 14/17] ia64/tlb: " Peter Zijlstra
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley,
	Paul Burton

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

MIPS has (thanks to Paul Burton for setting me straight):

 - HAVE_FAST_GUP, which means we need to consider software
   walkers (many MIPS also have software TLB-fill, which is a similar
   software walker).

 - non-page, page directories, which requires a custom table free.

 - Mostly IPI-based TLB invalidate, but it wouldn't be MIPS if it
   didn't also have broadcast TLB invalidate (I6500, GINVT).

This means that in generic MIPS needs HAVE_RCU_TABLE_FREE.

Cc: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/Kconfig               |    1 +
 arch/mips/include/asm/pgalloc.h |   13 ++++++-------
 arch/mips/mm/pgtable.c          |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+), 7 deletions(-)

--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -57,6 +57,7 @@ config MIPS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP
+	select MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -41,6 +41,9 @@ static inline void pud_populate(struct m
 }
 #endif
 
+extern void __tlb_remove_table(void *table);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int idx);
+
 /*
  * Initialize a new pgd / pmd table with invalid pointers.
  */
@@ -52,11 +55,7 @@ static inline void pgd_free(struct mm_st
 	free_pages((unsigned long)pgd, PGD_ORDER);
 }
 
-#define __pte_free_tlb(tlb,pte,address)			\
-do {							\
-	pgtable_pte_page_dtor(pte);			\
-	tlb_remove_page((tlb), pte);			\
-} while (0)
+#define __pte_free_tlb(tlb,pte,address)	pgtable_free_tlb((tlb), (pte), 0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
 
@@ -75,7 +74,7 @@ static inline void pmd_free(struct mm_st
 	free_pages((unsigned long)pmd, PMD_ORDER);
 }
 
-#define __pmd_free_tlb(tlb, x, addr)	pmd_free((tlb)->mm, x)
+#define __pmd_free_tlb(tlb, x, addr)	pgtable_free_tlb((tlb), (x), 1)
 
 #endif
 
@@ -101,7 +100,7 @@ static inline void p4d_populate(struct m
 	set_p4d(p4d, __p4d((unsigned long)pud));
 }
 
-#define __pud_free_tlb(tlb, x, addr)	pud_free((tlb)->mm, x)
+#define __pud_free_tlb(tlb, x, addr)	pgtable_free_tlb((tlb), (x), 2)
 
 #endif /* __PAGETABLE_PUD_FOLDED */
 
--- a/arch/mips/mm/pgtable.c
+++ b/arch/mips/mm/pgtable.c
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/string.h>
 #include <asm/pgalloc.h>
+#include <asm-generic/tlb.h>
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
@@ -23,3 +24,36 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(pgd_alloc);
+
+#define TLB_IDX_MASK	0x3
+
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int idx)
+{
+	unsigned long pgf = (unsigned long)table;
+	BUG_ON(idx > TLB_IDX_MASK);
+	BUG_ON(pgf & TLB_IDX_MASK);
+	pgf |= idx;
+	tlb_remove_table(tlb, (void *)pgf);
+}
+
+void __tlb_remove_table(void *table)
+{
+	void *dir = (void *)((unsigned long)table & ~TLB_IDX_MASK);
+	int idx = (unsigned long)table & TLB_IDX_MASK;
+
+	switch (idx) {
+#ifndef __PAGETABLE_PUD_FOLDED
+	case 2: /* PUD */
+		pud_free(NULL, dir);
+		break;
+#endif
+#ifndef __PAGETABLE_PMD_FOLDED
+	case 1: /* PMD */
+		pmd_free(NULL, dir);
+		break;
+#endif
+	case 0: /* PTE */
+		pte_free(NULL, dir);
+		break;
+	}
+}



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 14/17] ia64/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (12 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 13/17] mips/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 15/17] alpha/tlb: " Peter Zijlstra
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Since IA64 has page based P[UM]Ds, no software walkers and IPI based
TLB invalidation, it can use the simple tlb_remove_page() based
freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/include/asm/pgalloc.h |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/arch/ia64/include/asm/pgalloc.h
+++ b/arch/ia64/include/asm/pgalloc.h
@@ -50,7 +50,9 @@ static inline void pud_free(struct mm_st
 {
 	free_page((unsigned long)pud);
 }
-#define __pud_free_tlb(tlb, pud, address)	pud_free((tlb)->mm, pud)
+
+#define __pud_free_tlb(tlb, pud, address)	\
+	tlb_remove_table((tlb), virt_to_page(pud))
 #endif /* CONFIG_PGTABLE_LEVELS == 4 */
 
 static inline void
@@ -69,7 +71,8 @@ static inline void pmd_free(struct mm_st
 	free_page((unsigned long)pmd);
 }
 
-#define __pmd_free_tlb(tlb, pmd, address)	pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, address)	\
+	tlb_remove_table((tlb), virt_to_page(pmd)
 
 static inline void
 pmd_populate(struct mm_struct *mm, pmd_t * pmd_entry, pgtable_t pte)
@@ -84,6 +87,10 @@ pmd_populate_kernel(struct mm_struct *mm
 	pmd_val(*pmd_entry) = __pa(pte);
 }
 
-#define __pte_free_tlb(tlb, pte, address)	pte_free((tlb)->mm, pte)
+#define __pte_free_tlb(tlb, pte, address)	\
+do {						\
+	pgtable_pte_page_dtor(pte);		\
+	tlb_remove_table((tlb), (pte));		\
+} while (0)
 
 #endif				/* _ASM_IA64_PGALLOC_H */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 15/17] alpha/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (13 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 14/17] ia64/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 16/17] nds32/tlb: " Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 17/17] riscv/tlb: " Peter Zijlstra
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Since Alpha has page based PMDs, no software walkers and IPI based TLB
invalidation, it can use the simple tlb_remove_page() based freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/tlb.h |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -4,7 +4,16 @@
 
 #include <asm-generic/tlb.h>
 
-#define __pte_free_tlb(tlb, pte, address)		pte_free((tlb)->mm, pte)
-#define __pmd_free_tlb(tlb, pmd, address)		pmd_free((tlb)->mm, pmd)
+extern void ___pte_free_tlb(struct mmu_gather *tlb, pte_t *pte);
+extern void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
+
+#define __pte_free_tlb(tlb, pte, address)	\
+do {						\
+	pgtable_pte_page_dtor(pte);		\
+	tlb_remove_table((tlb), (pte));		\
+} while (0)
+
+#define __pmd_free_tlb(tlb, pmd, address)	\
+	tlb_remove_table((tlb), virt_to_page(pmd))
  
 #endif



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 16/17] nds32/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (14 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 15/17] alpha/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  2019-12-11 12:07 ` [PATCH 17/17] riscv/tlb: " Peter Zijlstra
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Even though NDS32 is UP only, we still need to observe this order
because mmu_gather is preemptible.

NDS32 does not in fact have PMDs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/nds32/include/asm/tlb.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -6,7 +6,12 @@
 
 #include <asm-generic/tlb.h>
 
-#define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
+#define __pte_free_tlb(tlb, pte, address)	\
+do {						\
+	pgtable_pte_page_dtor(pte);		\
+	tlb_remove_table((tlb), (pte));		\
+} while (0)
+
 #define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tln)->mm, pmd)
 
 #endif



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 17/17] riscv/tlb: Fix __p*_free_tlb()
  2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
                   ` (15 preceding siblings ...)
  2019-12-11 12:07 ` [PATCH 16/17] nds32/tlb: " Peter Zijlstra
@ 2019-12-11 12:07 ` Peter Zijlstra
  16 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-11 12:07 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Just like regular pages, page directories need to observe the
following order:

 1) unhook
 2) TLB invalidate
 3) free

to ensure it is safe against concurrent accesses.

Since RISC-V has page based PMDs, no software walkers and IPI based
TLB invalidation, it can use the simple tlb_remove_page() based
freeer.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/riscv/include/asm/pgalloc.h |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -73,14 +73,15 @@ static inline void pmd_free(struct mm_st
 	free_page((unsigned long)pmd);
 }
 
-#define __pmd_free_tlb(tlb, pmd, addr)  pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, addr)	\
+	tlb_remove_table((tlb), virt_to_page(pmd))
 
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-#define __pte_free_tlb(tlb, pte, buf)   \
-do {                                    \
-	pgtable_pte_page_dtor(pte);     \
-	tlb_remove_page((tlb), pte);    \
+#define __pte_free_tlb(tlb, pte, buf)	\
+do {					\
+	pgtable_pte_page_dtor(pte);	\
+	tlb_remove_table((tlb), (pte));	\
 } while (0)
 #endif /* CONFIG_MMU */
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations
  2019-12-11 12:07 ` [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations Peter Zijlstra
@ 2019-12-11 13:19   ` Geert Uytterhoeven
  0 siblings, 0 replies; 38+ messages in thread
From: Geert Uytterhoeven @ 2019-12-11 13:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Linux-Arch, Linux MM, Linux Kernel Mailing List, Yoshinori Sato,
	Rich Felker, David S. Miller, Helge Deller, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Wed, Dec 11, 2019 at 1:31 PM Peter Zijlstra <peterz@infradead.org> wrote:
> We removed the actual functions a while ago.
>
> Fixes: 1808d65b55e4 ("asm-generic/tlb: Remove arch_tlb*_mmu()")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2019-12-11 12:07 ` [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE Peter Zijlstra
@ 2019-12-12  9:30   ` Peter Zijlstra
  2019-12-12  9:32   ` [PATCH mk-II " Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-12  9:30 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Wed, Dec 11, 2019 at 01:07:21PM +0100, Peter Zijlstra wrote:
> @@ -56,6 +56,15 @@
>   *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
>   *    there's large holes between the VMAs.
>   *
> + *  - tlb_remove_table()
> + *
> + *    tlb_remove_table() is the basic primitive to free page-table directories
> + *    (__p*_free_tlb()).  In it's most primitive form it is an alias for
> + *    tlb_remove_page() below, for when page directories are pages and have no
> + *    additional constraints.
> + *
> + *    See also MMU_GATHER_TABLE_FREE and MMU_GATHER_RCU_TABLE_FREE.
> + *
>   *  - tlb_remove_page() / __tlb_remove_page()
>   *  - tlb_remove_page_size() / __tlb_remove_page_size()
>   *

> @@ -202,7 +193,16 @@ struct mmu_table_batch {
>  
>  extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
>  
> -#endif
> +#else /* !CONFIG_MMU_GATHER_HAVE_TABLE_FREE */
> +
> +/*
> + * Without either HAVE_TABLE_FREE || CONFIG_HAVE_RCU_TABLE_FREE the
> + * architecture is assumed to have page based page directories and
> + * we can use the normal page batching to free them.
> + */
> +#define tlb_remove_table(tlb, page) tlb_remove_page((tlb), (page))
> +
> +#endif /* CONFIG_MMU_GATHER_TABLE_FREE */

The build robot kindly notified me that this breaks ARM because it does
the exact same #define for the same reason.

(and I noticed the comment is stale)

I'll post a new version of this patch with the below delta.

---
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -37,10 +37,6 @@ static inline void __tlb_remove_table(vo
 
 #include <asm-generic/tlb.h>
 
-#ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
-#endif
-
 static inline void
 __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -196,9 +196,8 @@ extern void tlb_remove_table(struct mmu_
 #else /* !CONFIG_MMU_GATHER_HAVE_TABLE_FREE */
 
 /*
- * Without either HAVE_TABLE_FREE || CONFIG_HAVE_RCU_TABLE_FREE the
- * architecture is assumed to have page based page directories and
- * we can use the normal page batching to free them.
+ * Without MMU_GATHER_TABLE_FREE the architecture is assumed to have page based
+ * page directories and we can use the normal page batching to free them.
  */
 #define tlb_remove_table(tlb, page) tlb_remove_page((tlb), (page))
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2019-12-11 12:07 ` [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE Peter Zijlstra
  2019-12-12  9:30   ` Peter Zijlstra
@ 2019-12-12  9:32   ` Peter Zijlstra
  2020-01-26 15:52     ` Guenter Roeck
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-12  9:32 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley


As described in the comment, the correct order for freeing pages is:

 1) unhook page
 2) TLB invalidate page
 3) free page

This order equally applies to page directories.

Currently there are two correct options:

 - use tlb_remove_page(), when all page directores are full pages and
   there are no futher contraints placed by things like software
   walkers (HAVE_FAST_GUP).

 - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
   architecture does not do IPI based TLB invalidate and has
   HAVE_FAST_GUP (or software TLB fill).

This however leaves architectures that don't have page based
directories but don't need RCU in a bind. For those, provide
MMU_GATHER_TABLE_FREE, which provides the independent batching for
directories without the additional RCU freeing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig               |    5 +
 arch/arm/include/asm/tlb.h |    4 -
 include/asm-generic/tlb.h  |   75 +++++++++++++---------------
 mm/mmu_gather.c            |  120 +++++++++++++++++++++++++++++++++------------
 4 files changed, 130 insertions(+), 74 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -393,8 +393,12 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_ARCH_JUMP_LABEL_RELATIVE
 	bool
 
+config MMU_GATHER_TABLE_FREE
+	bool
+
 config MMU_GATHER_RCU_TABLE_FREE
 	bool
+	select MMU_GATHER_TABLE_FREE
 
 config MMU_GATHER_NO_TABLE_INVALIDATE
 	bool
@@ -408,6 +412,7 @@ config MMU_GATHER_NO_RANGE
 
 config MMU_GATHER_NO_GATHER
 	bool
+	depends on MMU_GATHER_TABLE_FREE
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -37,10 +37,6 @@ static inline void __tlb_remove_table(vo
 
 #include <asm-generic/tlb.h>
 
-#ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
-#endif
-
 static inline void
 __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -56,6 +56,15 @@
  *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
  *    there's large holes between the VMAs.
  *
+ *  - tlb_remove_table()
+ *
+ *    tlb_remove_table() is the basic primitive to free page-table directories
+ *    (__p*_free_tlb()).  In it's most primitive form it is an alias for
+ *    tlb_remove_page() below, for when page directories are pages and have no
+ *    additional constraints.
+ *
+ *    See also MMU_GATHER_TABLE_FREE and MMU_GATHER_RCU_TABLE_FREE.
+ *
  *  - tlb_remove_page() / __tlb_remove_page()
  *  - tlb_remove_page_size() / __tlb_remove_page_size()
  *
@@ -129,21 +138,28 @@
  *  This might be useful if your architecture has size specific TLB
  *  invalidation instructions.
  *
- *  MMU_GATHER_RCU_TABLE_FREE
+ *  MMU_GATHER_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
- *  for page directores (__p*_free_tlb()). This provides separate freeing of
- *  the page-table pages themselves in a semi-RCU fashion (see comment below).
- *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
- *  and therefore doesn't naturally serialize with software page-table walkers.
+ *  for page directores (__p*_free_tlb()).
+ *
+ *  Useful if your architecture has non-page page directories.
  *
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
+ *  MMU_GATHER_RCU_TABLE_FREE
+ *
+ *  Like MMU_GATHER_TABLE_FREE, and adds semi-RCU semantics to the free (see
+ *  comment below).
+ *
+ *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
+ *  and therefore doesn't naturally serialize with software page-table walkers.
+ *
  *  MMU_GATHER_NO_TABLE_INVALIDATE
  *
- *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
- *  freeing the page-table pages. This can be avoided if you use
+ *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly()
+ *  before freeing the page-table pages. This can be avoided if you use
  *  MMU_GATHER_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
  *  page-tables natively.
  *
@@ -162,37 +178,12 @@
  *  various ptep_get_and_clear() functions.
  */
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-/*
- * Semi RCU freeing of the page directories.
- *
- * This is needed by some architectures to implement software pagetable walkers.
- *
- * gup_fast() and other software pagetable walkers do a lockless page-table
- * walk and therefore needs some synchronization with the freeing of the page
- * directories. The chosen means to accomplish that is by disabling IRQs over
- * the walk.
- *
- * Architectures that use IPIs to flush TLBs will then automagically DTRT,
- * since we unlink the page, flush TLBs, free the page. Since the disabling of
- * IRQs delays the completion of the TLB flush we can never observe an already
- * freed page.
- *
- * Architectures that do not have this (PPC) need to delay the freeing by some
- * other means, this is that means.
- *
- * What we do is batch the freed directory pages (tables) and RCU free them.
- * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
- * holds off grace periods.
- *
- * However, in order to batch these pages we need to allocate storage, this
- * allocation is deep inside the MM code and can thus easily fail on memory
- * pressure. To guarantee progress we fall back to single table freeing, see
- * the implementation of tlb_remove_table_one().
- *
- */
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
+
 struct mmu_table_batch {
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	struct rcu_head		rcu;
+#endif
 	unsigned int		nr;
 	void			*tables[0];
 };
@@ -202,7 +193,15 @@ struct mmu_table_batch {
 
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
-#endif
+#else /* !CONFIG_MMU_GATHER_HAVE_TABLE_FREE */
+
+/*
+ * Without MMU_GATHER_TABLE_FREE the architecture is assumed to have page based
+ * page directories and we can use the normal page batching to free them.
+ */
+#define tlb_remove_table(tlb, page) tlb_remove_page((tlb), (page))
+
+#endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
 /*
@@ -240,7 +239,7 @@ extern bool __tlb_remove_page_size(struc
 struct mmu_gather {
 	struct mm_struct	*mm;
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
 
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,56 +91,106 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* MMU_GATHER_NO_GATHER */
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
+#ifdef CONFIG_MMU_GATHER_TABLE_FREE
 
-/*
- * See the comment near struct mmu_table_batch.
- */
+static void __tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	int i;
+
+	for (i = 0; i < batch->nr; i++)
+		__tlb_remove_table(batch->tables[i]);
+
+	free_page((unsigned long)batch);
+}
+
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 
 /*
- * If we want tlb_remove_table() to imply TLB invalidates.
+ * Semi RCU freeing of the page directories.
+ *
+ * This is needed by some architectures to implement software pagetable walkers.
+ *
+ * gup_fast() and other software pagetable walkers do a lockless page-table
+ * walk and therefore needs some synchronization with the freeing of the page
+ * directories. The chosen means to accomplish that is by disabling IRQs over
+ * the walk.
+ *
+ * Architectures that use IPIs to flush TLBs will then automagically DTRT,
+ * since we unlink the page, flush TLBs, free the page. Since the disabling of
+ * IRQs delays the completion of the TLB flush we can never observe an already
+ * freed page.
+ *
+ * Architectures that do not have this (PPC) need to delay the freeing by some
+ * other means, this is that means.
+ *
+ * What we do is batch the freed directory pages (tables) and RCU free them.
+ * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling
+ * holds off grace periods.
+ *
+ * However, in order to batch these pages we need to allocate storage, this
+ * allocation is deep inside the MM code and can thus easily fail on memory
+ * pressure. To guarantee progress we fall back to single table freeing, see
+ * the implementation of tlb_remove_table_one().
+ *
  */
-static inline void tlb_table_invalidate(struct mmu_gather *tlb)
-{
-#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
-	/*
-	 * Invalidate page-table caches used by hardware walkers. Then we still
-	 * need to RCU-sched wait while freeing the pages because software
-	 * walkers can still be in-flight.
-	 */
-	tlb_flush_mmu_tlbonly(tlb);
-#endif
-}
 
 static void tlb_remove_table_smp_sync(void *arg)
 {
 	/* Simply deliver the interrupt */
 }
 
-static void tlb_remove_table_one(void *table)
+static void tlb_remove_table_sync_one(void)
 {
 	/*
 	 * This isn't an RCU grace period and hence the page-tables cannot be
 	 * assumed to be actually RCU-freed.
 	 *
 	 * It is however sufficient for software page-table walkers that rely on
-	 * IRQ disabling. See the comment near struct mmu_table_batch.
+	 * IRQ disabling.
 	 */
 	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
 }
 
 static void tlb_remove_table_rcu(struct rcu_head *head)
 {
-	struct mmu_table_batch *batch;
-	int i;
+	__tlb_remove_table_free(container_of(head, struct mmu_table_batch, rcu));
+}
 
-	batch = container_of(head, struct mmu_table_batch, rcu);
+static void tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	call_rcu(&batch->rcu, tlb_remove_table_rcu);
+}
 
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
+#else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */
 
-	free_page((unsigned long)batch);
+static void tlb_remove_table_sync_one(void) { }
+
+static void tlb_remove_table_free(struct mmu_table_batch *batch)
+{
+	__tlb_remove_table_free(batch);
+}
+
+#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
+
+/*
+ * If we want tlb_remove_table() to imply TLB invalidates.
+ */
+static inline void tlb_table_invalidate(struct mmu_gather *tlb)
+{
+#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
+	/*
+	 * Invalidate page-table caches used by hardware walkers. Then we still
+	 * need to RCU-sched wait while freeing the pages because software
+	 * walkers can still be in-flight.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+#endif
+}
+
+static void tlb_remove_table_one(void *table)
+{
+	tlb_remove_table_sync_one();
+	__tlb_remove_table(table);
 }
 
 static void tlb_table_flush(struct mmu_gather *tlb)
@@ -149,7 +199,7 @@ static void tlb_table_flush(struct mmu_g
 
 	if (*batch) {
 		tlb_table_invalidate(tlb);
-		call_rcu(&(*batch)->rcu, tlb_remove_table_rcu);
+		tlb_remove_table_free(*batch);
 		*batch = NULL;
 	}
 }
@@ -173,13 +223,21 @@ void tlb_remove_table(struct mmu_gather
 		tlb_table_flush(tlb);
 }
 
-#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
+static inline void tlb_table_init(struct mmu_gather *tlb)
+{
+	tlb->batch = NULL;
+}
+
+#else /* !CONFIG_MMU_GATHER_TABLE_FREE */
+
+static inline void tlb_table_flush(struct mmu_gather *tlb) { }
+static inline void tlb_table_init(struct mmu_gather *tlb) { }
+
+#endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
-#endif
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
 	tlb_batch_pages_flush(tlb);
 #endif
@@ -220,9 +278,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->batch_count = 0;
 #endif
 
-#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
+	tlb_table_init(tlb);
 #ifdef CONFIG_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
 #endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-11 12:07 ` [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE Peter Zijlstra
@ 2019-12-16 12:31   ` Aneesh Kumar K.V
  2019-12-16 12:37     ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-16 12:31 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Andrew Morton, Nick Piggin, Peter Zijlstra
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

Peter Zijlstra <peterz@infradead.org> writes:

> Towards a more consistent naming scheme.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/Kconfig              |    3 ++-
>  arch/powerpc/Kconfig      |    2 +-
>  arch/sparc/Kconfig        |    2 +-
>  include/asm-generic/tlb.h |    2 +-
>  mm/mmu_gather.c           |    2 +-
>  5 files changed, 6 insertions(+), 5 deletions(-)
>
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -396,8 +396,9 @@ config HAVE_ARCH_JUMP_LABEL_RELATIVE
>  config MMU_GATHER_RCU_TABLE_FREE
>  	bool
>  
> -config HAVE_RCU_TABLE_NO_INVALIDATE
> +config MMU_GATHER_NO_TABLE_INVALIDATE
>  	bool
> +	depends on MMU_GATHER_RCU_TABLE_FREE


Can we drop this Kernel config option instead use
MMU_GATHER_RCU_TABLE_FREE? IMHO reducing the kernel config related to
mmu_gather can reduce the complexity. 

>  
>  config HAVE_MMU_GATHER_PAGE_SIZE
>  	bool
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -223,7 +223,7 @@ config PPC
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
>  	select MMU_GATHER_RCU_TABLE_FREE		if SMP
> -	select HAVE_RCU_TABLE_NO_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
> +	select MMU_GATHER_NO_TABLE_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
>  	select HAVE_MMU_GATHER_PAGE_SIZE
>  	select HAVE_REGS_AND_STACK_ACCESS_API
>  	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -65,7 +65,7 @@ config SPARC64
>  	select HAVE_KRETPROBES
>  	select HAVE_KPROBES
>  	select MMU_GATHER_RCU_TABLE_FREE if SMP
> -	select HAVE_RCU_TABLE_NO_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
> +	select MMU_GATHER_NO_TABLE_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
>  	select HAVE_MEMBLOCK_NODE_MAP
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select HAVE_DYNAMIC_FTRACE
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -137,7 +137,7 @@
>   *  When used, an architecture is expected to provide __tlb_remove_table()
>   *  which does the actual freeing of these pages.
>   *
> - *  HAVE_RCU_TABLE_NO_INVALIDATE
> + *  MMU_GATHER_NO_TABLE_INVALIDATE
>   *
>   *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
>   *  freeing the page-table pages. This can be avoided if you use
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -102,7 +102,7 @@ bool __tlb_remove_page_size(struct mmu_g
>   */
>  static inline void tlb_table_invalidate(struct mmu_gather *tlb)
>  {
> -#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
> +#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
>  	/*
>  	 * Invalidate page-table caches used by hardware walkers. Then we still
>  	 * need to RCU-sched wait while freeing the pages because software

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 12:31   ` Aneesh Kumar K.V
@ 2019-12-16 12:37     ` Peter Zijlstra
  2019-12-16 13:13       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 12:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 06:01:58PM +0530, Aneesh Kumar K.V wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > Towards a more consistent naming scheme.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/Kconfig              |    3 ++-
> >  arch/powerpc/Kconfig      |    2 +-
> >  arch/sparc/Kconfig        |    2 +-
> >  include/asm-generic/tlb.h |    2 +-
> >  mm/mmu_gather.c           |    2 +-
> >  5 files changed, 6 insertions(+), 5 deletions(-)
> >
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -396,8 +396,9 @@ config HAVE_ARCH_JUMP_LABEL_RELATIVE
> >  config MMU_GATHER_RCU_TABLE_FREE
> >  	bool
> >  
> > -config HAVE_RCU_TABLE_NO_INVALIDATE
> > +config MMU_GATHER_NO_TABLE_INVALIDATE
> >  	bool
> > +	depends on MMU_GATHER_RCU_TABLE_FREE
> 
> 
> Can we drop this Kernel config option instead use
> MMU_GATHER_RCU_TABLE_FREE? IMHO reducing the kernel config related to
> mmu_gather can reduce the complexity. 

I'm confused, are you saing you're happy to have PowerPC eat the extra
TLB invalidates? I thought you cared about PPC performance :-)



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 12:37     ` Peter Zijlstra
@ 2019-12-16 13:13       ` Aneesh Kumar K.V
  2019-12-16 13:20         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-16 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On 12/16/19 6:07 PM, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 06:01:58PM +0530, Aneesh Kumar K.V wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> Towards a more consistent naming scheme.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> ---
>>>   arch/Kconfig              |    3 ++-
>>>   arch/powerpc/Kconfig      |    2 +-
>>>   arch/sparc/Kconfig        |    2 +-
>>>   include/asm-generic/tlb.h |    2 +-
>>>   mm/mmu_gather.c           |    2 +-
>>>   5 files changed, 6 insertions(+), 5 deletions(-)
>>>
>>> --- a/arch/Kconfig
>>> +++ b/arch/Kconfig
>>> @@ -396,8 +396,9 @@ config HAVE_ARCH_JUMP_LABEL_RELATIVE
>>>   config MMU_GATHER_RCU_TABLE_FREE
>>>   	bool
>>>   
>>> -config HAVE_RCU_TABLE_NO_INVALIDATE
>>> +config MMU_GATHER_NO_TABLE_INVALIDATE
>>>   	bool
>>> +	depends on MMU_GATHER_RCU_TABLE_FREE
>>
>>
>> Can we drop this Kernel config option instead use
>> MMU_GATHER_RCU_TABLE_FREE? IMHO reducing the kernel config related to
>> mmu_gather can reduce the complexity.
> 
> I'm confused, are you saing you're happy to have PowerPC eat the extra
> TLB invalidates? I thought you cared about PPC performance :-)
> 
> 

Instead can we do

static inline void tlb_table_invalidate(struct mmu_gather *tlb)
{
#ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
	 * Invalidate page-table caches used by hardware walkers. Then we still
	 * need to RCU-sched wait while freeing the pages because software
	 * walkers can still be in-flight.
	 */
	tlb_flush_mmu_tlbonly(tlb);
#endif
}

-aneesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 13:13       ` Aneesh Kumar K.V
@ 2019-12-16 13:20         ` Peter Zijlstra
  2019-12-16 13:40           ` Aneesh Kumar K.V
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 13:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 06:43:53PM +0530, Aneesh Kumar K.V wrote:
> On 12/16/19 6:07 PM, Peter Zijlstra wrote:

> > I'm confused, are you saing you're happy to have PowerPC eat the extra
> > TLB invalidates? I thought you cared about PPC performance :-)
> > 
> > 
> 
> Instead can we do
> 
> static inline void tlb_table_invalidate(struct mmu_gather *tlb)
> {
> #ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> 	 * Invalidate page-table caches used by hardware walkers. Then we still
> 	 * need to RCU-sched wait while freeing the pages because software
> 	 * walkers can still be in-flight.
> 	 */
> 	tlb_flush_mmu_tlbonly(tlb);
> #endif
> }

How does that not break ARM/ARM64/s390 and x86 ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 13:20         ` Peter Zijlstra
@ 2019-12-16 13:40           ` Aneesh Kumar K.V
  2019-12-16 13:54             ` Aneesh Kumar K.V
  2019-12-16 14:00             ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-16 13:40 UTC (permalink / raw)
  To: Peter Zijlstra, Michael Ellerman
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On 12/16/19 6:50 PM, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 06:43:53PM +0530, Aneesh Kumar K.V wrote:
>> On 12/16/19 6:07 PM, Peter Zijlstra wrote:
> 
>>> I'm confused, are you saing you're happy to have PowerPC eat the extra
>>> TLB invalidates? I thought you cared about PPC performance :-)
>>>
>>>
>>
>> Instead can we do
>>
>> static inline void tlb_table_invalidate(struct mmu_gather *tlb)
>> {
>> #ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
>> 	 * Invalidate page-table caches used by hardware walkers. Then we still
>> 	 * need to RCU-sched wait while freeing the pages because software
>> 	 * walkers can still be in-flight.
>> 	 */
>> 	tlb_flush_mmu_tlbonly(tlb);
>> #endif
>> }
> 
> How does that not break ARM/ARM64/s390 and x86 ?
> 

Hmm I missed that usage of RCU_TABLE_NO_INVALIDATE.

Ok I guess we need to revert this change that went upstream this merge 
window then

commit 52162ec784fa05f3a4b1d8e84421279998be3773
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Thu Oct 24 13:28:00 2019 +0530

     powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all



I will review the change closely.

-aneesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 13:40           ` Aneesh Kumar K.V
@ 2019-12-16 13:54             ` Aneesh Kumar K.V
  2019-12-16 14:50               ` Peter Zijlstra
  2019-12-16 14:00             ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-16 13:54 UTC (permalink / raw)
  To: Peter Zijlstra, Michael Ellerman
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On 12/16/19 7:10 PM, Aneesh Kumar K.V wrote:
> On 12/16/19 6:50 PM, Peter Zijlstra wrote:
>> On Mon, Dec 16, 2019 at 06:43:53PM +0530, Aneesh Kumar K.V wrote:
>>> On 12/16/19 6:07 PM, Peter Zijlstra wrote:
>>
>>>> I'm confused, are you saing you're happy to have PowerPC eat the extra
>>>> TLB invalidates? I thought you cared about PPC performance :-)
>>>>
>>>>
>>>
>>> Instead can we do
>>>
>>> static inline void tlb_table_invalidate(struct mmu_gather *tlb)
>>> {
>>> #ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
>>>      * Invalidate page-table caches used by hardware walkers. Then we 
>>> still
>>>      * need to RCU-sched wait while freeing the pages because software
>>>      * walkers can still be in-flight.
>>>      */
>>>     tlb_flush_mmu_tlbonly(tlb);
>>> #endif
>>> }
>>
>> How does that not break ARM/ARM64/s390 and x86 ?
>>
> 
> Hmm I missed that usage of RCU_TABLE_NO_INVALIDATE.
> 
> Ok I guess we need to revert this change that went upstream this merge 
> window then
> 
> commit 52162ec784fa05f3a4b1d8e84421279998be3773
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Date:   Thu Oct 24 13:28:00 2019 +0530
> 
>      powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all
> 
> 
> 
> I will review the change closely.


So __p*_free_tlb() routines on ppc64 just mark that we need a page walk 
cache flush and the actual flush in done in tlb_flush_mmu. As per

d86564a2f085b79ec046a5cba90188e61235280 (mm/tlb, x86/mm: Support 
invalidating TLB caches for RCU_TABLE_FREE ) that is not sufficient?

-aneesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 13:40           ` Aneesh Kumar K.V
  2019-12-16 13:54             ` Aneesh Kumar K.V
@ 2019-12-16 14:00             ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 14:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 07:10:30PM +0530, Aneesh Kumar K.V wrote:
> On 12/16/19 6:50 PM, Peter Zijlstra wrote:
> > On Mon, Dec 16, 2019 at 06:43:53PM +0530, Aneesh Kumar K.V wrote:
> > > On 12/16/19 6:07 PM, Peter Zijlstra wrote:
> > 
> > > > I'm confused, are you saing you're happy to have PowerPC eat the extra
> > > > TLB invalidates? I thought you cared about PPC performance :-)
> > > > 
> > > > 
> > > 
> > > Instead can we do
> > > 
> > > static inline void tlb_table_invalidate(struct mmu_gather *tlb)
> > > {
> > > #ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> > > 	 * Invalidate page-table caches used by hardware walkers. Then we still
> > > 	 * need to RCU-sched wait while freeing the pages because software
> > > 	 * walkers can still be in-flight.
> > > 	 */
> > > 	tlb_flush_mmu_tlbonly(tlb);
> > > #endif
> > > }
> > 
> > How does that not break ARM/ARM64/s390 and x86 ?
> > 
> 
> Hmm I missed that usage of RCU_TABLE_NO_INVALIDATE.

What use? Only PPC and SPARC64 use that option. The reason they can use
it is because they don't have a hardware walker (with exception of
PPC-Radix, which I suppose your below patch fudges ?!).

So HAVE_RCU_TABLE_FREE will provide tlb_remove_table() for the use of
freeing page-tables/directories. This is required for all architectures
that have software walkers and !IPI TLBI.

Arm, Arm64, Power, Sparc64, s390 and x86 use this option. While x86
natively has IPI based TLBI, a bunch of the virtualization solutions got
rid of the IPI for performance.

Of those, Arm, Arm64, s390, x86 (and PPC-Radix) also have hardware
walkers on those page-tables, and thus _must_ TLBI in between unhooking
and freeing these pages.

PPC-Hash and Sparc64 OTOH only ever access the linux page-tables through
the software walker and thus can forgo this TLBI, and _THAT_ is what
TABLE_NO_INVALIDATE is about (there actually is a comment that clearly
states this).

> Ok I guess we need to revert this change that went upstream this merge
> window then
> 
> commit 52162ec784fa05f3a4b1d8e84421279998be3773
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Date:   Thu Oct 24 13:28:00 2019 +0530
> 
>     powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all
> 

This really looks like you've got PPC-Radix wrong. As soon as you got
hardware walkers on the linux page-tables, you must not use
TABLE_NO_INVALIDATE.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 13:54             ` Aneesh Kumar K.V
@ 2019-12-16 14:50               ` Peter Zijlstra
  2019-12-16 15:14                 ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 14:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 07:24:24PM +0530, Aneesh Kumar K.V wrote:

> So __p*_free_tlb() routines on ppc64 just mark that we need a page walk
> cache flush and the actual flush in done in tlb_flush_mmu.

Not quite, your __p*_free_tlb() goes to pgtable_free_tlb() which call
tlb_remove_table().

> As per
> 
> d86564a2f085b79ec046a5cba90188e61235280 (mm/tlb, x86/mm: Support
> invalidating TLB caches for RCU_TABLE_FREE ) that is not sufficient?

96bc9567cbe1 ("asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE")

And no. Since you have TABLE_NO_INVALIDATE set, tlb_remove_table() will
not TLBI when it fails to allocate a batch page, which is an error for
PPC-Radix.

There is also no TLBI when the batch page is full and the RCU callback
happens, which is also a bug on PPC-Radix.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 14:50               ` Peter Zijlstra
@ 2019-12-16 15:14                 ` Peter Zijlstra
  2019-12-16 15:30                   ` Peter Zijlstra
  2019-12-17  8:51                   ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 15:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 03:50:41PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 07:24:24PM +0530, Aneesh Kumar K.V wrote:
> 
> > So __p*_free_tlb() routines on ppc64 just mark that we need a page walk
> > cache flush and the actual flush in done in tlb_flush_mmu.
> 
> Not quite, your __p*_free_tlb() goes to pgtable_free_tlb() which call
> tlb_remove_table().
> 
> > As per
> > 
> > d86564a2f085b79ec046a5cba90188e61235280 (mm/tlb, x86/mm: Support
> > invalidating TLB caches for RCU_TABLE_FREE ) that is not sufficient?
> 
> 96bc9567cbe1 ("asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE")
> 
> And no. Since you have TABLE_NO_INVALIDATE set, tlb_remove_table() will
> not TLBI when it fails to allocate a batch page, which is an error for
> PPC-Radix.
> 
> There is also no TLBI when the batch page is full and the RCU callback
> happens, which is also a bug on PPC-Radix.

It seems to me you need something like this here patch, all you need to
add is a suitable definition of tlb_needs_table_invalidate() for Power.

---

diff --git a/arch/Kconfig b/arch/Kconfig
index c44ef15866a3..98de654b79b3 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -400,10 +400,6 @@ config MMU_GATHER_RCU_TABLE_FREE
 	bool
 	select MMU_GATHER_TABLE_FREE
 
-config MMU_GATHER_NO_TABLE_INVALIDATE
-	bool
-	depends on MMU_GATHER_RCU_TABLE_FREE
-
 config MMU_GATHER_PAGE_SIZE
 	bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3dea4c8d39f2..2ddf24822d5b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -223,7 +223,6 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select MMU_GATHER_RCU_TABLE_FREE		if SMP
-	select MMU_GATHER_NO_TABLE_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
 	select MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a76e915ab207..acf20b6c0a54 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -66,7 +66,6 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select MMU_GATHER_RCU_TABLE_FREE if SMP
-	select MMU_GATHER_NO_TABLE_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36..ac8e74a96122 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,12 @@ void flush_tlb_pending(void);
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #define tlb_flush(tlb)	flush_tlb_pending()
 
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+#define tlb_needs_table_invalidate()	(false)
+
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index fe0ea6ff3636..4108d6d18ca5 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -156,13 +156,6 @@
  *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
  *  and therefore doesn't naturally serialize with software page-table walkers.
  *
- *  MMU_GATHER_NO_TABLE_INVALIDATE
- *
- *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly()
- *  before freeing the page-table pages. This can be avoided if you use
- *  MMU_GATHER_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
- *  page-tables natively.
- *
  *  MMU_GATHER_NO_RANGE
  *
  *  Use this if your architecture lacks an efficient flush_tlb_range().
@@ -203,6 +196,24 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 #endif /* CONFIG_MMU_GATHER_TABLE_FREE */
 
+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
+
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_invalidate
+#define tlb_needs_table_invalidate() (true)
+#endif
+
+#else
+
+#ifdef tlb_needs_table_invalidate
+#error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE
+#endif
+
+#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
+
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
 /*
  * If we can't allocate a page to make a big batch of page pointers
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 9d103031568d..a3538cb2bcbe 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -177,14 +177,14 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch)
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
-	/*
-	 * Invalidate page-table caches used by hardware walkers. Then we still
-	 * need to RCU-sched wait while freeing the pages because software
-	 * walkers can still be in-flight.
-	 */
-	tlb_flush_mmu_tlbonly(tlb);
-#endif
+	if (tlb_needs_table_invalidate()) {
+		/*
+		 * Invalidate page-table caches used by hardware walkers. Then
+		 * we still need to RCU-sched wait while freeing the pages
+		 * because software walkers can still be in-flight.
+		 */
+		tlb_flush_mmu_tlbonly(tlb);
+	}
 }
 
 static void tlb_remove_table_one(void *table)

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 15:14                 ` Peter Zijlstra
@ 2019-12-16 15:30                   ` Peter Zijlstra
  2019-12-16 17:00                     ` Aneesh Kumar K.V
  2019-12-17  8:51                   ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-16 15:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 04:14:19PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 03:50:41PM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 16, 2019 at 07:24:24PM +0530, Aneesh Kumar K.V wrote:
> > 
> > > So __p*_free_tlb() routines on ppc64 just mark that we need a page walk
> > > cache flush and the actual flush in done in tlb_flush_mmu.
> > 
> > Not quite, your __p*_free_tlb() goes to pgtable_free_tlb() which call
> > tlb_remove_table().
> > 
> > > As per
> > > 
> > > d86564a2f085b79ec046a5cba90188e61235280 (mm/tlb, x86/mm: Support
> > > invalidating TLB caches for RCU_TABLE_FREE ) that is not sufficient?
> > 
> > 96bc9567cbe1 ("asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE")
> > 
> > And no. Since you have TABLE_NO_INVALIDATE set, tlb_remove_table() will
> > not TLBI when it fails to allocate a batch page, which is an error for
> > PPC-Radix.
> > 
> > There is also no TLBI when the batch page is full and the RCU callback
> > happens, which is also a bug on PPC-Radix.
> 
> It seems to me you need something like this here patch, all you need to
> add is a suitable definition of tlb_needs_table_invalidate() for Power.

I'm thinking this:

#define tlb_needs_table_invalidate()	radix_enabled()

should work for you. When you have Radix you need that TLBI, if you have
Hash you don't.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 15:30                   ` Peter Zijlstra
@ 2019-12-16 17:00                     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 38+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-16 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On 12/16/19 9:00 PM, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 04:14:19PM +0100, Peter Zijlstra wrote:
>> On Mon, Dec 16, 2019 at 03:50:41PM +0100, Peter Zijlstra wrote:
>>> On Mon, Dec 16, 2019 at 07:24:24PM +0530, Aneesh Kumar K.V wrote:
>>>
>>>> So __p*_free_tlb() routines on ppc64 just mark that we need a page walk
>>>> cache flush and the actual flush in done in tlb_flush_mmu.
>>>
>>> Not quite, your __p*_free_tlb() goes to pgtable_free_tlb() which call
>>> tlb_remove_table().
>>>
>>>> As per
>>>>
>>>> d86564a2f085b79ec046a5cba90188e61235280 (mm/tlb, x86/mm: Support
>>>> invalidating TLB caches for RCU_TABLE_FREE ) that is not sufficient?
>>>
>>> 96bc9567cbe1 ("asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE")
>>>
>>> And no. Since you have TABLE_NO_INVALIDATE set, tlb_remove_table() will
>>> not TLBI when it fails to allocate a batch page, which is an error for
>>> PPC-Radix.
>>>
>>> There is also no TLBI when the batch page is full and the RCU callback
>>> happens, which is also a bug on PPC-Radix.
>>
>> It seems to me you need something like this here patch, all you need to
>> add is a suitable definition of tlb_needs_table_invalidate() for Power.
> 
> I'm thinking this:
> 
> #define tlb_needs_table_invalidate()	radix_enabled()
> 
> should work for you. When you have Radix you need that TLBI, if you have
> Hash you don't.
> 

yes. I was doing something similar with #ifdef around 
tlb_table_invalidate(). I will take your approach rather than an arch 
override of tlb_table_invalidate()

-aneesh

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE
  2019-12-16 15:14                 ` Peter Zijlstra
  2019-12-16 15:30                   ` Peter Zijlstra
@ 2019-12-17  8:51                   ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2019-12-17  8:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Dec 16, 2019 at 04:14:19PM +0100, Peter Zijlstra wrote:
> It seems to me you need something like this here patch, all you need to
> add is a suitable definition of tlb_needs_table_invalidate() for Power.

FWIW, Paul (Burton), MIPS should be able to have
tlb_needs_table_invalidate() return false when it has pure software TLB
fill. I tried to have a quick look for P5600 and P6600 to see if I could
find the right state that indicates hardware TLB, but couldn't find
anything.

> ---
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index c44ef15866a3..98de654b79b3 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -400,10 +400,6 @@ config MMU_GATHER_RCU_TABLE_FREE
>  	bool
>  	select MMU_GATHER_TABLE_FREE
>  
> -config MMU_GATHER_NO_TABLE_INVALIDATE
> -	bool
> -	depends on MMU_GATHER_RCU_TABLE_FREE
> -
>  config MMU_GATHER_PAGE_SIZE
>  	bool
>  
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 3dea4c8d39f2..2ddf24822d5b 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -223,7 +223,6 @@ config PPC
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
>  	select MMU_GATHER_RCU_TABLE_FREE		if SMP
> -	select MMU_GATHER_NO_TABLE_INVALIDATE	if MMU_GATHER_RCU_TABLE_FREE
>  	select MMU_GATHER_PAGE_SIZE
>  	select HAVE_REGS_AND_STACK_ACCESS_API
>  	select HAVE_RELIABLE_STACKTRACE		if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index a76e915ab207..acf20b6c0a54 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -66,7 +66,6 @@ config SPARC64
>  	select HAVE_KRETPROBES
>  	select HAVE_KPROBES
>  	select MMU_GATHER_RCU_TABLE_FREE if SMP
> -	select MMU_GATHER_NO_TABLE_INVALIDATE if MMU_GATHER_RCU_TABLE_FREE
>  	select HAVE_MEMBLOCK_NODE_MAP
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select HAVE_DYNAMIC_FTRACE
> diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
> index a2f3fa61ee36..ac8e74a96122 100644
> --- a/arch/sparc/include/asm/tlb_64.h
> +++ b/arch/sparc/include/asm/tlb_64.h
> @@ -28,6 +28,12 @@ void flush_tlb_pending(void);
>  #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
>  #define tlb_flush(tlb)	flush_tlb_pending()
>  
> +/*
> + * SPARC64's hardware TLB fill does not use the Linux page-tables
> + * and therefore we don't need a TLBI when freeing page-table pages.
> + */
> +#define tlb_needs_table_invalidate()	(false)
> +
>  #include <asm-generic/tlb.h>
>  
>  #endif /* _SPARC64_TLB_H */
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index fe0ea6ff3636..4108d6d18ca5 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -156,13 +156,6 @@
>   *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
>   *  and therefore doesn't naturally serialize with software page-table walkers.
>   *
> - *  MMU_GATHER_NO_TABLE_INVALIDATE
> - *
> - *  This makes MMU_GATHER_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly()
> - *  before freeing the page-table pages. This can be avoided if you use
> - *  MMU_GATHER_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
> - *  page-tables natively.
> - *
>   *  MMU_GATHER_NO_RANGE
>   *
>   *  Use this if your architecture lacks an efficient flush_tlb_range().
> @@ -203,6 +196,24 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
>  
>  #endif /* CONFIG_MMU_GATHER_TABLE_FREE */
>  
> +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> +
> +/*
> + * This allows an architecture that does not use the linux page-tables for
> + * hardware to skip the TLBI when freeing page tables.
> + */
> +#ifndef tlb_needs_table_invalidate
> +#define tlb_needs_table_invalidate() (true)
> +#endif
> +
> +#else
> +
> +#ifdef tlb_needs_table_invalidate
> +#error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE
> +#endif
> +
> +#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */
> +
>  #ifndef CONFIG_MMU_GATHER_NO_GATHER
>  /*
>   * If we can't allocate a page to make a big batch of page pointers
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 9d103031568d..a3538cb2bcbe 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -177,14 +177,14 @@ static void tlb_remove_table_free(struct mmu_table_batch *batch)
>   */
>  static inline void tlb_table_invalidate(struct mmu_gather *tlb)
>  {
> -#ifndef CONFIG_MMU_GATHER_NO_TABLE_INVALIDATE
> -	/*
> -	 * Invalidate page-table caches used by hardware walkers. Then we still
> -	 * need to RCU-sched wait while freeing the pages because software
> -	 * walkers can still be in-flight.
> -	 */
> -	tlb_flush_mmu_tlbonly(tlb);
> -#endif
> +	if (tlb_needs_table_invalidate()) {
> +		/*
> +		 * Invalidate page-table caches used by hardware walkers. Then
> +		 * we still need to RCU-sched wait while freeing the pages
> +		 * because software walkers can still be in-flight.
> +		 */
> +		tlb_flush_mmu_tlbonly(tlb);
> +	}
>  }
>  
>  static void tlb_remove_table_one(void *table)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2019-12-12  9:32   ` [PATCH mk-II " Peter Zijlstra
@ 2020-01-26 15:52     ` Guenter Roeck
  2020-01-27  8:11       ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Guenter Roeck @ 2020-01-26 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Thu, Dec 12, 2019 at 10:32:05AM +0100, Peter Zijlstra wrote:
> As described in the comment, the correct order for freeing pages is:
> 
>  1) unhook page
>  2) TLB invalidate page
>  3) free page
> 
> This order equally applies to page directories.
> 
> Currently there are two correct options:
> 
>  - use tlb_remove_page(), when all page directores are full pages and
>    there are no futher contraints placed by things like software
>    walkers (HAVE_FAST_GUP).
> 
>  - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
>    architecture does not do IPI based TLB invalidate and has
>    HAVE_FAST_GUP (or software TLB fill).
> 
> This however leaves architectures that don't have page based
> directories but don't need RCU in a bind. For those, provide
> MMU_GATHER_TABLE_FREE, which provides the independent batching for
> directories without the additional RCU freeing.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Various sparc64 builds (allnoconfig, tinyconfig, as well as builds
with SMP disabled):

mm/mmu_gather.c: In function '__tlb_remove_table_free':
mm/mmu_gather.c:101:3: error: implicit declaration of function '__tlb_remove_table'; did you mean 'tlb_remove_table'?

Guenter

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2020-01-26 15:52     ` Guenter Roeck
@ 2020-01-27  8:11       ` Peter Zijlstra
  2020-01-27  8:13         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-01-27  8:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Sun, Jan 26, 2020 at 07:52:05AM -0800, Guenter Roeck wrote:
> On Thu, Dec 12, 2019 at 10:32:05AM +0100, Peter Zijlstra wrote:
> > As described in the comment, the correct order for freeing pages is:
> > 
> >  1) unhook page
> >  2) TLB invalidate page
> >  3) free page
> > 
> > This order equally applies to page directories.
> > 
> > Currently there are two correct options:
> > 
> >  - use tlb_remove_page(), when all page directores are full pages and
> >    there are no futher contraints placed by things like software
> >    walkers (HAVE_FAST_GUP).
> > 
> >  - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
> >    architecture does not do IPI based TLB invalidate and has
> >    HAVE_FAST_GUP (or software TLB fill).
> > 
> > This however leaves architectures that don't have page based
> > directories but don't need RCU in a bind. For those, provide
> > MMU_GATHER_TABLE_FREE, which provides the independent batching for
> > directories without the additional RCU freeing.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> 
> Various sparc64 builds (allnoconfig, tinyconfig, as well as builds
> with SMP disabled):
> 
> mm/mmu_gather.c: In function '__tlb_remove_table_free':
> mm/mmu_gather.c:101:3: error: implicit declaration of function '__tlb_remove_table'; did you mean 'tlb_remove_table'?

Thanks; I'll respin these patches against Aneesh' pile and make sure to
look into this when I do so.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2020-01-27  8:11       ` Peter Zijlstra
@ 2020-01-27  8:13         ` Aneesh Kumar K.V
  2020-01-27 13:05           ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-27  8:13 UTC (permalink / raw)
  To: Peter Zijlstra, Guenter Roeck
  Cc: Will Deacon, Andrew Morton, Nick Piggin, linux-arch, linux-mm,
	linux-kernel, Yoshinori Sato, Rich Felker, David S. Miller,
	Helge Deller, Geert Uytterhoeven, Paul Burton, Tony Luck,
	Richard Henderson, Nick Hu, Paul Walmsley

On 1/27/20 1:41 PM, Peter Zijlstra wrote:
> On Sun, Jan 26, 2020 at 07:52:05AM -0800, Guenter Roeck wrote:
>> On Thu, Dec 12, 2019 at 10:32:05AM +0100, Peter Zijlstra wrote:
>>> As described in the comment, the correct order for freeing pages is:
>>>
>>>   1) unhook page
>>>   2) TLB invalidate page
>>>   3) free page
>>>
>>> This order equally applies to page directories.
>>>
>>> Currently there are two correct options:
>>>
>>>   - use tlb_remove_page(), when all page directores are full pages and
>>>     there are no futher contraints placed by things like software
>>>     walkers (HAVE_FAST_GUP).
>>>
>>>   - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
>>>     architecture does not do IPI based TLB invalidate and has
>>>     HAVE_FAST_GUP (or software TLB fill).
>>>
>>> This however leaves architectures that don't have page based
>>> directories but don't need RCU in a bind. For those, provide
>>> MMU_GATHER_TABLE_FREE, which provides the independent batching for
>>> directories without the additional RCU freeing.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> ---
>>
>> Various sparc64 builds (allnoconfig, tinyconfig, as well as builds
>> with SMP disabled):
>>
>> mm/mmu_gather.c: In function '__tlb_remove_table_free':
>> mm/mmu_gather.c:101:3: error: implicit declaration of function '__tlb_remove_table'; did you mean 'tlb_remove_table'?
> 
> Thanks; I'll respin these patches against Aneesh' pile and make sure to
> look into this when I do so.
> 
> 

I did send a change to fix that. it is to drop !SMP change in the patch

https://lore.kernel.org/linux-mm/87v9p9mhnr.fsf@linux.ibm.com


-aneesh


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2020-01-27  8:13         ` Aneesh Kumar K.V
@ 2020-01-27 13:05           ` Peter Zijlstra
  2020-01-27 13:42             ` Aneesh Kumar K.V
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-01-27 13:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Guenter Roeck, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On Mon, Jan 27, 2020 at 01:43:34PM +0530, Aneesh Kumar K.V wrote:

> I did send a change to fix that. it is to drop !SMP change in the patch
> 
> https://lore.kernel.org/linux-mm/87v9p9mhnr.fsf@linux.ibm.com

Indeed you did. Did those patches land anywhere, or is it all still up
in the air? (I was hoping to find those patches in a tree somewhere)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH mk-II 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE
  2020-01-27 13:05           ` Peter Zijlstra
@ 2020-01-27 13:42             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 38+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-27 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Guenter Roeck, Will Deacon, Andrew Morton, Nick Piggin,
	linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	David S. Miller, Helge Deller, Geert Uytterhoeven, Paul Burton,
	Tony Luck, Richard Henderson, Nick Hu, Paul Walmsley

On 1/27/20 6:35 PM, Peter Zijlstra wrote:
> On Mon, Jan 27, 2020 at 01:43:34PM +0530, Aneesh Kumar K.V wrote:
> 
>> I did send a change to fix that. it is to drop !SMP change in the patch
>>
>> https://lore.kernel.org/linux-mm/87v9p9mhnr.fsf@linux.ibm.com
> 
> Indeed you did. Did those patches land anywhere, or is it all still up
> in the air? (I was hoping to find those patches in a tree somewhere)
> 

Andrew did pick the series. I am not sure whether he got to pick the 
build fix.

Guenter,

Can you confirm that patch did fix the build issue?


-aneesh


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-01-27 13:43 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 12:07 [PATCH 00/17] Fixup page directory freeing Peter Zijlstra
2019-12-11 12:07 ` [PATCH 01/17] sh/tlb: Fix PGTABLE_LEVELS > 2 Peter Zijlstra
2019-12-11 12:07 ` [PATCH 02/17] asm-gemeric/tlb: Remove stray function declarations Peter Zijlstra
2019-12-11 13:19   ` Geert Uytterhoeven
2019-12-11 12:07 ` [PATCH 03/17] asm-generic/tlb: Add missing CONFIG symbol Peter Zijlstra
2019-12-11 12:07 ` [PATCH 04/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_FREE Peter Zijlstra
2019-12-11 12:07 ` [PATCH 05/17] asm-generic/tlb: Rename HAVE_RCU_TABLE_NO_INVALIDATE Peter Zijlstra
2019-12-16 12:31   ` Aneesh Kumar K.V
2019-12-16 12:37     ` Peter Zijlstra
2019-12-16 13:13       ` Aneesh Kumar K.V
2019-12-16 13:20         ` Peter Zijlstra
2019-12-16 13:40           ` Aneesh Kumar K.V
2019-12-16 13:54             ` Aneesh Kumar K.V
2019-12-16 14:50               ` Peter Zijlstra
2019-12-16 15:14                 ` Peter Zijlstra
2019-12-16 15:30                   ` Peter Zijlstra
2019-12-16 17:00                     ` Aneesh Kumar K.V
2019-12-17  8:51                   ` Peter Zijlstra
2019-12-16 14:00             ` Peter Zijlstra
2019-12-11 12:07 ` [PATCH 06/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
2019-12-11 12:07 ` [PATCH 07/17] asm-generic/tlb: Rename HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
2019-12-11 12:07 ` [PATCH 08/17] asm-generic/tlb: Provide MMU_GATHER_TABLE_FREE Peter Zijlstra
2019-12-12  9:30   ` Peter Zijlstra
2019-12-12  9:32   ` [PATCH mk-II " Peter Zijlstra
2020-01-26 15:52     ` Guenter Roeck
2020-01-27  8:11       ` Peter Zijlstra
2020-01-27  8:13         ` Aneesh Kumar K.V
2020-01-27 13:05           ` Peter Zijlstra
2020-01-27 13:42             ` Aneesh Kumar K.V
2019-12-11 12:07 ` [PATCH 09/17] sh/tlb: Fix __pmd_free_tlb() Peter Zijlstra
2019-12-11 12:07 ` [PATCH 10/17] sparc32/tlb: Fix __p*_free_tlb() Peter Zijlstra
2019-12-11 12:07 ` [PATCH 11/17] parisc/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 12/17] m68k/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 13/17] mips/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 14/17] ia64/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 15/17] alpha/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 16/17] nds32/tlb: " Peter Zijlstra
2019-12-11 12:07 ` [PATCH 17/17] riscv/tlb: " Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).