linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/11] my generic mmu_gather patches
@ 2018-09-13  9:21 Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
                   ` (10 more replies)
  0 siblings, 11 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Hi,

Here are some further mmu_gather/tlb patches I have that go on top of Will's
current tlb branch.

I mostly wrote them 2 weeks ago and haven't been able to get back to them; but
Will offered to have a wee look.

Esp. the full arch conversions (ARM, SH, UM, IA64) were based on patches I did
7 years ago and haven't been tested other than with a compiler.

The notable exception is s390, which after this series, is the only remaining
architecture with a private mmu_gather implementation. I didn't get around to
converting that.

Anyway, have a look, hopefully there's a few good bits in :-)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13 10:30   ` Martin Schwidefsky
  2018-09-14 16:48   ` Will Deacon
  2018-09-13  9:21 ` [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Write a comment explaining some of this..

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |  120 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 117 insertions(+), 3 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -22,6 +22,119 @@
 
 #ifdef CONFIG_MMU
 
+/*
+ * Generic MMU-gather implementation.
+ *
+ * The mmu_gather data structure is used by the mm code to implement the
+ * correct and efficient ordering of freeing pages and TLB invalidations.
+ *
+ * This correct ordering is:
+ *
+ *  1) unhook page
+ *  2) TLB invalidate page
+ *  3) free page
+ *
+ * That is, we must never free a page before we have ensured there are no live
+ * translations left to it. Otherwise it might be possible to observe (or
+ * worse, change) the page content after it has been reused.
+ *
+ * The mmu_gather API consists of:
+ *
+ *  - tlb_gather_mmu() / tlb_finish_mmu(); start and finish a mmu_gather
+ *
+ *    Finish in particular will issue a (final) TLB invalidate and free
+ *    all (remaining) queued pages.
+ *
+ *  - tlb_start_vma() / tlb_end_vma(); marks the start / end of a VMA
+ *
+ *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
+ *    there's large holes between the VMAs.
+ *
+ *  - tlb_remove_page() / __tlb_remove_page()
+ *  - tlb_remove_page_size() / __tlb_remove_page_size()
+ *
+ *    __tlb_remove_page_size() is the basic primitive that queues a page for
+ *    freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
+ *    boolean indicating if the queue is (now) full and a call to
+ *    tlb_flush_mmu() is required.
+ *
+ *    tlb_remove_page() and tlb_remove_page_size() imply the call to
+ *    tlb_flush_mmu() when required and has no return value.
+ *
+ *  - tlb_change_page_size()
+ *
+ *    call before __tlb_remove_page*() to set the current page-size; implies a
+ *    possible tlb_flush_mmu() call.
+ *
+ *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
+ *
+ *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
+ *                              related state, like the range)
+ *
+ *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
+ *			     sure no additional tlb_remove_page()
+ *			     calls happen between _tlbonly() and this.
+ *
+ *    tlb_flush_mmu() - the above two calls.
+ *
+ *  - mmu_gather::fullmm
+ *
+ *    A flag set by tlb_gather_mmu() to indicate we're going to free
+ *    the entire mm; this allows a number of optimizations.
+ *
+ *    XXX list optimizations
+ *
+ *  - mmu_gather::need_flush_all
+ *
+ *    A flag that can be set by the arch code if it wants to force
+ *    flush the entire TLB irrespective of the range. For instance
+ *    x86-PAE needs this when changing top-level entries.
+ *
+ * And requires the architecture to provide and implement tlb_flush().
+ *
+ * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
+ * use of:
+ *
+ *  - mmu_gather::start / mmu_gather::end
+ *
+ *    which (when !need_flush_all; fullmm will have start = end = ~0UL) provides
+ *    the range that needs to be flushed to cover the pages to be freed.
+ *
+ *  - mmu_gather::freed_tables
+ *
+ *    set when we freed page table pages
+ *
+ *  - tlb_get_unmap_shift() / tlb_get_unmap_size()
+ *
+ *    returns the smallest TLB entry size unmapped in this range
+ *
+ * Additionally there are a few opt-in features:
+ *
+ *  HAVE_MMU_GATHER_PAGE_SIZE
+ *
+ *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
+ *  changes the size and provides mmu_gather::page_size to tlb_flush().
+ *
+ *  HAVE_RCU_TABLE_FREE
+ *
+ *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
+ *  for page directores (__p*_free_tlb()). This provides separate freeing of
+ *  the page-table pages themselves in a semi-RCU fashion (see comment below).
+ *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
+ *  and therefore doesn't naturally serialize with software page-table walkers.
+ *
+ *  When used, an architecture is expected to provide __tlb_remove_table()
+ *  which does the actual freeing of these pages.
+ *
+ *  HAVE_RCU_TABLE_INVALIDATE
+ *
+ *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
+ *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
+ *  architecture uses the Linux page-tables natively.
+ *
+ */
+#define HAVE_GENERIC_MMU_GATHER
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -89,14 +202,17 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
+
 	unsigned long		start;
 	unsigned long		end;
 	/*
@@ -131,8 +247,6 @@ struct mmu_gather {
 	int page_size;
 };
 
-#define HAVE_GENERIC_MMU_GATHER
-
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
 	struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-14 16:56   ` Will Deacon
  2018-09-13  9:21 ` [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Move the mmu_gather::page_size things into the generic code instead of
powerpc specific bits.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig                   |    3 +++
 arch/arm/include/asm/tlb.h     |    3 +--
 arch/ia64/include/asm/tlb.h    |    3 +--
 arch/powerpc/Kconfig           |    1 +
 arch/powerpc/include/asm/tlb.h |   17 -----------------
 arch/s390/include/asm/tlb.h    |    4 +---
 arch/sh/include/asm/tlb.h      |    4 +---
 arch/um/include/asm/tlb.h      |    4 +---
 include/asm-generic/tlb.h      |   25 +++++++++++++------------
 mm/huge_memory.c               |    4 ++--
 mm/hugetlb.c                   |    2 +-
 mm/madvise.c                   |    2 +-
 mm/memory.c                    |    4 ++--
 mm/mmu_gather.c                |    5 +++++
 14 files changed, 33 insertions(+), 48 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -365,6 +365,9 @@ config HAVE_RCU_TABLE_FREE
 config HAVE_RCU_TABLE_INVALIDATE
 	bool
 
+config HAVE_MMU_GATHER_PAGE_SIZE
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -286,8 +286,7 @@ tlb_remove_pmd_tlb_entry(struct mmu_gath
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -282,8 +282,7 @@ do {							\
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -27,7 +27,6 @@
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
 
@@ -46,22 +45,6 @@ static inline void __tlb_remove_tlb_entr
 #endif
 }
 
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-	if (!tlb->page_size)
-		tlb->page_size = page_size;
-	else if (tlb->page_size != page_size) {
-		if (!tlb->fullmm)
-			tlb_flush_mmu(tlb);
-		/*
-		 * update the page size after flush for the new
-		 * mmu_gather.
-		 */
-		tlb->page_size = page_size;
-	}
-}
-
 #ifdef CONFIG_SMP
 static inline int mm_is_core_local(struct mm_struct *mm)
 {
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -180,9 +180,7 @@ static inline void pud_free_tlb(struct m
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -127,9 +127,7 @@ static inline void tlb_remove_page_size(
 	return tlb_remove_page(tlb, page);
 }
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -146,9 +146,7 @@ static inline void tlb_remove_page_size(
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -240,11 +240,15 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	unsigned int		batch_count;
+
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
-	unsigned int		batch_count;
-	int page_size;
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	unsigned int page_size;
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -310,21 +314,18 @@ static inline void tlb_remove_page(struc
 	return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
-#ifndef tlb_remove_check_page_size_change
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
-	/*
-	 * We don't care about page size change, just update
-	 * mmu_gather page size here so that debug checks
-	 * doesn't throw false warning.
-	 */
-#ifdef CONFIG_DEBUG_VM
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	if (tlb->page_size && tlb->page_size != page_size) {
+		if (!tlb->fullmm)
+			tlb_flush_mmu(tlb);
+	}
+
 	tlb->page_size = page_size;
 #endif
 }
-#endif
 
 static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
 {
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1617,7 +1617,7 @@ bool madvise_free_huge_pmd(struct mmu_ga
 	struct mm_struct *mm = tlb->mm;
 	bool ret = false;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
@@ -1693,7 +1693,7 @@ int zap_huge_pmd(struct mmu_gather *tlb,
 	pmd_t orig_pmd;
 	spinlock_t *ptl;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = __pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3337,7 +3337,7 @@ void __unmap_hugepage_range(struct mmu_g
 	 * This is a hugetlb vma, all the pte entries should point
 	 * to huge page.
 	 */
-	tlb_remove_check_page_size_change(tlb, sz);
+	tlb_change_page_size(tlb, sz);
 	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -328,7 +328,7 @@ static int madvise_free_pte_range(pmd_t
 	if (pmd_trans_unstable(pmd))
 		return 0;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	flush_tlb_batched_pending(mm);
 	arch_enter_lazy_mmu_mode();
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -355,7 +355,7 @@ void free_pgd_range(struct mmu_gather *t
 	 * We add page table cache pages with PAGE_SIZE,
 	 * (see pte_free_tlb()), flush the tlb if we need
 	 */
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	pgd = pgd_offset(tlb->mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1046,7 +1046,7 @@ static unsigned long zap_pte_range(struc
 	pte_t *pte;
 	swp_entry_t entry;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 again:
 	init_rss_vec(rss);
 	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -58,7 +58,9 @@ void arch_tlb_gather_mmu(struct mmu_gath
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
+#endif
 
 	__tlb_reset_range(tlb);
 }
@@ -121,7 +123,10 @@ bool __tlb_remove_page_size(struct mmu_g
 	struct mmu_gather_batch *batch;
 
 	VM_BUG_ON(!tlb->end);
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	VM_WARN_ON(tlb->page_size != page_size);
+#endif
 
 	batch = tlb->active;
 	/*



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13 17:22   ` Dave Hansen
  2018-09-13  9:21 ` [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, Dave Hansen

Use the new tlb_get_unmap_shift() to determine the stride of the
INVLPG loop.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/tlb.h      |   21 ++++++++++++++-------
 arch/x86/include/asm/tlbflush.h |   10 ++++++----
 arch/x86/mm/tlb.c               |   10 +++++-----
 3 files changed, 25 insertions(+), 16 deletions(-)

--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,16 +6,23 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb)							\
-{									\
-	if (!tlb->fullmm && !tlb->need_flush_all) 			\
-		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end, 0UL);	\
-	else								\
-		flush_tlb_mm_range(tlb->mm, 0UL, TLB_FLUSH_ALL, 0UL);	\
-}
+static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
 
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	unsigned long start = 0UL, end = TLB_FLUSH_ALL;
+	unsigned int invl_shift = tlb_get_unmap_shift(tlb);
+
+	if (!tlb->fullmm && !tlb->need_flush_all) {
+		start = tlb->start;
+		end = tlb->end;
+	}
+
+	flush_tlb_mm_range(tlb->mm, start, end, invl_shift);
+}
+
 /*
  * While x86 architecture in general requires an IPI to perform TLB
  * shootdown, enablement code for several hypervisors overrides
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -507,23 +507,25 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
+	unsigned int		invl_shift;
 };
 
 #define local_flush_tlb() __flush_tlb()
 
 #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
 
-#define flush_tlb_range(vma, start, end)	\
-		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
+#define flush_tlb_range(vma, start, end)			\
+		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
+				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
 
 extern void flush_tlb_all(void);
 extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag);
+				unsigned long end, unsigned int invl_shift);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
 {
-	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, VM_NONE);
+	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT);
 }
 
 void native_flush_tlb_others(const struct cpumask *cpumask,
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -522,12 +522,12 @@ static void flush_tlb_func_common(const
 	    f->new_tlb_gen == mm_tlb_gen) {
 		/* Partial flush */
 		unsigned long addr;
-		unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
+		unsigned long nr_pages = (f->end - f->start) >> f->invl_shift;
 
 		addr = f->start;
 		while (addr < f->end) {
 			__flush_tlb_one_user(addr);
-			addr += PAGE_SIZE;
+			addr += 1UL << f->invl_shift;
 		}
 		if (local)
 			count_vm_tlb_events(NR_TLB_LOCAL_FLUSH_ONE, nr_pages);
@@ -616,12 +616,13 @@ void native_flush_tlb_others(const struc
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag)
+				unsigned long end, unsigned int invl_shift)
 {
 	int cpu;
 
 	struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
 		.mm = mm,
+		.invl_shift = invl_shift,
 	};
 
 	cpu = get_cpu();
@@ -631,8 +632,7 @@ void flush_tlb_mm_range(struct mm_struct
 
 	/* Should we flush just the requested range? */
 	if ((end != TLB_FLUSH_ALL) &&
-	    !(vmflag & VM_HUGETLB) &&
-	    ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
+	    ((end - start) >> invl_shift) <= tlb_single_page_flush_ceiling) {
 		info.start = start;
 		info.end = end;
 	} else {



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (2 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-14 16:56   ` Will Deacon
  2018-09-13  9:21 ` [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, David Miller, Guan Xuetao

The one obvious thing SH and ARM want is a sensible default for
tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )

Avoid all VIPT architectures providing their own tlb_start_vma()
implementation and rely on architectures to provide a no-op
flush_cache_range() when it is not relevant.

The below makes tlb_start_vma() default to flush_cache_range(), which
should be right and sufficient. The only exceptions that I found where
(oddly):

  - m68k-mmu
  - sparc64
  - unicore

Those architectures appear to have flush_cache_range(), but their
current tlb_start_vma() does not call it.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arc/include/asm/tlb.h      |    9 ---------
 arch/mips/include/asm/tlb.h     |    9 ---------
 arch/nds32/include/asm/tlb.h    |    6 ------
 arch/nios2/include/asm/tlb.h    |   10 ----------
 arch/parisc/include/asm/tlb.h   |    5 -----
 arch/sparc/include/asm/tlb_32.h |    5 -----
 arch/xtensa/include/asm/tlb.h   |    9 ---------
 include/asm-generic/tlb.h       |   19 +++++++++++--------
 8 files changed, 11 insertions(+), 61 deletions(-)

--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -23,15 +23,6 @@ do {						\
  *
  * Note, read http://lkml.org/lkml/2004/1/15/6
  */
-#ifndef CONFIG_ARC_CACHE_VIPT_ALIASING
-#define tlb_start_vma(tlb, vma)
-#else
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while(0)
-#endif
 
 #define tlb_end_vma(tlb, vma)						\
 do {									\
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,15 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-/*
- * MIPS doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,12 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_start_vma(tlb,vma)						\
-	do {								\
-		if (!tlb->fullmm)					\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
 #define tlb_end_vma(tlb,vma)				\
 	do { 						\
 		if(!tlb->fullmm)			\
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -15,16 +15,6 @@
 
 extern void set_mmu_pid(unsigned long pid);
 
-/*
- * NiosII doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for the area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
-
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -7,11 +7,6 @@ do {	if ((tlb)->fullmm)		\
 		flush_tlb_mm((tlb)->mm);\
 } while (0)
 
-#define tlb_start_vma(tlb, vma) \
-do {	if (!(tlb)->fullmm)	\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
 #define tlb_end_vma(tlb, vma)	\
 do {	if (!(tlb)->fullmm)	\
 		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,11 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {								\
-	flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
 #define tlb_end_vma(tlb, vma) \
 do {								\
 	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -16,19 +16,10 @@
 
 #if (DCACHE_WAY_SIZE <= PAGE_SIZE)
 
-/* Note, read http://lkml.org/lkml/2004/1/15/6 */
-
-# define tlb_start_vma(tlb,vma)			do { } while (0)
 # define tlb_end_vma(tlb,vma)			do { } while (0)
 
 #else
 
-# define tlb_start_vma(tlb, vma)					      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_cache_range(vma, vma->vm_start, vma->vm_end);   \
-	} while(0)
-
 # define tlb_end_vma(tlb, vma)						      \
 	do {								      \
 		if (!tlb->fullmm)					      \
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,7 @@
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 
 #ifdef CONFIG_MMU
 
@@ -351,17 +352,19 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma) do { } while (0)
+#define tlb_start_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
+} while (0)
 #endif
 
-#define __tlb_end_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			tlb_flush_mmu_tlbonly(tlb);		\
-	} while (0)
-
 #ifndef tlb_end_vma
-#define tlb_end_vma	__tlb_end_vma
+#define tlb_end_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		tlb_flush_mmu_tlbonly(tlb);				\
+} while (0)
 #endif
 
 #ifndef __tlb_remove_tlb_entry



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (3 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13 13:09   ` Jann Horn
  2018-09-13  9:21 ` [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Provide a generic tlb_flush() implementation that relies on
flush_tlb_range(). This is a little awkward because flush_tlb_range()
assumes a VMA for range invalidation, but we no longer have one.

Audit of all flush_tlb_range() implementations shows only vma->vm_mm
and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
invalidates) and VM_HUGETLB (large TLB invalidate) are used.

Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
'fake' VMA.

This allows architectures that have a reasonably efficient
flush_tlb_range() to not require any additional effort.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/tlb.h   |    1 
 arch/powerpc/include/asm/tlb.h |    1 
 arch/riscv/include/asm/tlb.h   |    1 
 arch/x86/include/asm/tlb.h     |    1 
 include/asm-generic/tlb.h      |   80 +++++++++++++++++++++++++++++++++++------
 5 files changed, 74 insertions(+), 10 deletions(-)

--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -27,6 +27,7 @@ static inline void __tlb_remove_table(vo
 	free_page_and_swap_cache((struct page *)_table);
 }
 
+#define tlb_flush tlb_flush
 static void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,7 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 
+#define tlb_flush tlb_flush
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -18,6 +18,7 @@ struct mmu_gather;
 
 static void tlb_flush(struct mmu_gather *tlb);
 
+#define tlb_flush tlb_flush
 #include <asm-generic/tlb.h>
 
 static inline void tlb_flush(struct mmu_gather *tlb)
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,6 +6,7 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
+#define tlb_flush tlb_flush
 static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -241,6 +241,12 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	/*
+	 * tracks VM_EXEC | VM_HUGETLB in tlb_start_vma
+	 */
+	unsigned int		vma_exec : 1;
+	unsigned int		vma_huge : 1;
+
 	unsigned int		batch_count;
 
 	struct mmu_gather_batch *active;
@@ -282,7 +288,35 @@ static inline void __tlb_reset_range(str
 	tlb->cleared_pmds = 0;
 	tlb->cleared_puds = 0;
 	tlb->cleared_p4ds = 0;
+	/*
+	 * Do not reset mmu_gather::vma_* fields here, we do not
+	 * call into tlb_start_vma() again to set them if there is an
+	 * intermediate flush.
+	 */
+}
+
+#ifndef tlb_flush
+
+#if defined(tlb_start_vma) || defined(tlb_end_vma)
+#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
+#endif
+
+#define tlb_flush tlb_flush
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	if (tlb->fullmm || tlb->need_flush_all) {
+		flush_tlb_mm(tlb->mm);
+	} else {
+		struct vm_area_struct vma = {
+			.vm_mm = tlb->mm,
+			.vm_flags = tlb->vma_exec ? VM_EXEC    : 0 |
+				    tlb->vma_huge ? VM_HUGETLB : 0,
+		};
+
+		flush_tlb_range(&vma, tlb->start, tlb->end);
+	}
 }
+#endif
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
@@ -353,19 +387,45 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
+#define tlb_start_vma tlb_start_vma
+static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
+	 * mips-4k) flush only large pages.
+	 *
+	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
+	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
+	 * range.
+	 *
+	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
+	 * these values the batch is empty.
+	 */
+	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
+	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
+
+	flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
 #endif
 
 #ifndef tlb_end_vma
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		tlb_flush_mmu_tlbonly(tlb);				\
-} while (0)
+#define tlb_end_vma tlb_end_vma
+static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * Do a TLB flush and reset the range at VMA boundaries; this avoids
+	 * the ranges growing with the unused space between consecutive VMAs,
+	 * but also the mmu_gather::vma_* flags from tlb_start_vma() rely on
+	 * this.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+}
 #endif
 
 #ifndef __tlb_remove_tlb_entry



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (4 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-14 16:57   ` Will Deacon
  2018-09-13  9:21 ` [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Needed for ia64 -- alternatively we drop the entire hook.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -539,6 +539,8 @@ static inline void tlb_end_vma(struct mm
 
 #endif /* CONFIG_MMU */
 
+#ifndef tlb_migrate_finish
 #define tlb_migrate_finish(mm) do {} while (0)
+#endif
 
 #endif /* _ASM_GENERIC__TLB_H */



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (5 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-18 14:10   ` Will Deacon
  2018-09-13  9:21 ` [RFC][PATCH 08/11] ia64/tlb: Conver " Peter Zijlstra
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

Generic mmu_gather provides everything that ARM needs:

 - range tracking
 - RCU table free
 - VM_EXEC tracking
 - VIPT cache flushing

The one notable curiosity is the 'funny' range tracking for classical
ARM in __pte_free_tlb().

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
 1 file changed, 14 insertions(+), 241 deletions(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,270 +33,43 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-#define MMU_GATHER_BUNDLE	8
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 static inline void __tlb_remove_table(void *_table)
 {
 	free_page_and_swap_cache((struct page *)_table);
 }
 
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
-#else
-#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch	*batch;
-	unsigned int		need_flush;
-#endif
-	unsigned int		fullmm;
-	struct vm_area_struct	*vma;
-	unsigned long		start, end;
-	unsigned long		range_start;
-	unsigned long		range_end;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		**pages;
-	struct page		*local[MMU_GATHER_BUNDLE];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * This is unnecessarily complex.  There's three ways the TLB shootdown
- * code is used:
- *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
- *     tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.
- *  2. Unmapping all vmas.  See exit_mmap().
- *     tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.  Additionally, page tables will be freed.
- *  3. Unmapping argument pages.  See shift_arg_pages().
- *     tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
- *     tlb->vma will be NULL.
- */
-static inline void tlb_flush(struct mmu_gather *tlb)
-{
-	if (tlb->fullmm || !tlb->vma)
-		flush_tlb_mm(tlb->mm);
-	else if (tlb->range_end > 0) {
-		flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
-{
-	if (!tlb->fullmm) {
-		if (addr < tlb->range_start)
-			tlb->range_start = addr;
-		if (addr + PAGE_SIZE > tlb->range_end)
-			tlb->range_end = addr + PAGE_SIZE;
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(struct page *);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	tlb_flush(tlb);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	free_pages_and_swap_cache(tlb->pages, tlb->nr);
-	tlb->nr = 0;
-	if (tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->vma = NULL;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	__tlb_alloc_page(tlb);
+#include <asm-generic/tlb.h>
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
+#ifndef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
 #endif
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->range_start = start;
-		tlb->range_end = end;
-	}
-
-	tlb_flush_mmu(tlb);
 
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
 static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
-	tlb_add_flush(tlb, addr);
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm) {
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-		tlb->vma = vma;
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		tlb_flush(tlb);
-}
-
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-	unsigned long addr)
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
 	pgtable_page_dtor(pte);
 
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-#else
+#ifndef CONFIG_ARM_LPAE
 	/*
 	 * With the classic ARM MMU, a pte page has two corresponding pmd
 	 * entries, each covering 1MB.
 	 */
-	addr &= PMD_MASK;
-	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
-	tlb_add_flush(tlb, addr + SZ_1M);
+	addr = (addr & PMD_MASK) + SZ_1M;
+	__tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE);
 #endif
 
-	tlb_remove_entry(tlb, pte);
-}
-
-static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
-				  unsigned long addr)
-{
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-	tlb_remove_entry(tlb, virt_to_page(pmdp));
-#endif
+	tlb_remove_table(tlb, pte);
 }
 
 static inline void
-tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
+__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
-	tlb_add_flush(tlb, addr);
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-static inline void tlb_flush_remove_tables(struct mm_struct *mm)
-{
-}
+#ifdef CONFIG_ARM_LPAE
+	struct page *page = virt_to_page(pmdp);
 
-static inline void tlb_flush_remove_tables_local(void *arg)
-{
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
+#endif
 }
 
 #endif /* CONFIG_MMU */



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 08/11] ia64/tlb: Conver to generic mmu_gather
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (6 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 09/11] sh/tlb: Convert SH " Peter Zijlstra
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, Tony Luck

Generic mmu_gather provides everything ia64 needs (range tracking).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/include/asm/tlb.h      |  256 ---------------------------------------
 arch/ia64/include/asm/tlbflush.h |   25 +++
 arch/ia64/mm/tlb.c               |   23 +++
 3 files changed, 47 insertions(+), 257 deletions(-)

--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -47,262 +47,8 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define	IA64_GATHER_BUNDLE	8
-
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		nr;
-	unsigned int		max;
-	unsigned char		fullmm;		/* non-zero means full mm flush */
-	unsigned char		need_flush;	/* really unmapped some PTEs? */
-	unsigned long		start, end;
-	unsigned long		start_addr;
-	unsigned long		end_addr;
-	struct page		**pages;
-	struct page		*local[IA64_GATHER_BUNDLE];
-};
-
-struct ia64_tr_entry {
-	u64 ifa;
-	u64 itir;
-	u64 pte;
-	u64 rr;
-}; /*Record for tr entry!*/
-
-extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
-extern void ia64_ptr_entry(u64 target_mask, int slot);
-
-extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
-
-/*
- region register macros
-*/
-#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
-#define RR_VE(val)	(((val) & 0x0000000000000001) << 0)
-#define RR_VE_MASK	0x0000000000000001L
-#define RR_VE_SHIFT	0
-#define RR_TO_PS(val)	(((val) >> 2) & 0x000000000000003f)
-#define RR_PS(val)	(((val) & 0x000000000000003f) << 2)
-#define RR_PS_MASK	0x00000000000000fcL
-#define RR_PS_SHIFT	2
-#define RR_RID_MASK	0x00000000ffffff00L
-#define RR_TO_RID(val) 	((val >> 8) & 0xffffff)
-
-static inline void
-ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb->need_flush = 0;
-
-	if (tlb->fullmm) {
-		/*
-		 * Tearing down the entire address space.  This happens both as a result
-		 * of exit() and execve().  The latter case necessitates the call to
-		 * flush_tlb_mm() here.
-		 */
-		flush_tlb_mm(tlb->mm);
-	} else if (unlikely (end - start >= 1024*1024*1024*1024UL
-			     || REGION_NUMBER(start) != REGION_NUMBER(end - 1)))
-	{
-		/*
-		 * If we flush more than a tera-byte or across regions, we're probably
-		 * better off just flushing the entire TLB(s).  This should be very rare
-		 * and is not worth optimizing for.
-		 */
-		flush_tlb_all();
-	} else {
-		/*
-		 * flush_tlb_range() takes a vma instead of a mm pointer because
-		 * some architectures want the vm_flags for ITLB/DTLB flush.
-		 */
-		struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
-
-		/* flush the address range from the tlb: */
-		flush_tlb_range(&vma, start, end);
-		/* now flush the virt. page-table area mapping the address range: */
-		flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
-	}
-
-}
-
-static inline void
-ia64_tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	unsigned long i;
-	unsigned int nr;
-
-	/* lastly, release the freed pages */
-	nr = tlb->nr;
-
-	tlb->nr = 0;
-	tlb->start_addr = ~0UL;
-	for (i = 0; i < nr; ++i)
-		free_page_and_swap_cache(tlb->pages[i]);
-}
-
-/*
- * Flush the TLB for address range START to END and, if not in fast mode, release the
- * freed pages that where gathered up to this point.
- */
-static inline void
-ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	if (!tlb->need_flush)
-		return;
-	ia64_tlb_flush_mmu_tlbonly(tlb, start, end);
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(void *);
-	}
-}
-
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->start_addr = ~0UL;
-}
-
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force)
-		tlb->need_flush = 1;
-	/*
-	 * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
-	 * tlb->end_addr.
-	 */
-	ia64_tlb_flush_mmu(tlb, start, end);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Logically, this routine frees PAGE.  On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-
-	if (!tlb->nr && tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_tlbonly(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS.  This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start_addr == ~0UL)
-		tlb->start_addr = address;
-	tlb->end_addr = address + PAGE_SIZE;
-}
-
 #define tlb_migrate_finish(mm)	platform_tlb_migrate_finish(mm)
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
-} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pte_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pmd_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pud_free_tlb(tlb, pudp, address);		\
-} while (0)
+#include <asm-generic/tlb.h>
 
 #endif /* _ASM_IA64_TLB_H */
--- a/arch/ia64/include/asm/tlbflush.h
+++ b/arch/ia64/include/asm/tlbflush.h
@@ -14,6 +14,31 @@
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 
+struct ia64_tr_entry {
+	u64 ifa;
+	u64 itir;
+	u64 pte;
+	u64 rr;
+}; /*Record for tr entry!*/
+
+extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
+extern void ia64_ptr_entry(u64 target_mask, int slot);
+extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
+
+/*
+ region register macros
+*/
+#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
+#define RR_VE(val)     (((val) & 0x0000000000000001) << 0)
+#define RR_VE_MASK     0x0000000000000001L
+#define RR_VE_SHIFT    0
+#define RR_TO_PS(val)  (((val) >> 2) & 0x000000000000003f)
+#define RR_PS(val)     (((val) & 0x000000000000003f) << 2)
+#define RR_PS_MASK     0x00000000000000fcL
+#define RR_PS_SHIFT    2
+#define RR_RID_MASK    0x00000000ffffff00L
+#define RR_TO_RID(val)         ((val >> 8) & 0xffffff)
+
 /*
  * Now for some TLB flushing routines.  This is the kind of stuff that
  * can be very expensive, so try to avoid them whenever possible.
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -297,8 +297,8 @@ local_flush_tlb_all (void)
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
 
-void
-flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
+static void
+__flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
 		 unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -335,6 +335,25 @@ flush_tlb_range (struct vm_area_struct *
 	preempt_enable();
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
+
+void flush_tlb_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
+{
+	if (unlikely(end - start >= 1024*1024*1024*1024UL
+			|| REGION_NUMBER(start) != REGION_NUMBER(end - 1))) {
+		/*
+		 * If we flush more than a tera-byte or across regions, we're
+		 * probably better off just flushing the entire TLB(s).  This
+		 * should be very rare and is not worth optimizing for.
+		 */
+		flush_tlb_all();
+	} else {
+		/* flush the address range from the tlb */
+		__flush_tlb_range(vma, start, end);
+		/* flush the virt. page-table area mapping the addr range */
+		__flush_tlb_range(vma, ia64_thash(start), ia64_thash(end));
+	}
+}
 EXPORT_SYMBOL(flush_tlb_range);
 
 void ia64_tlb_init(void)



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 09/11] sh/tlb: Convert SH to generic mmu_gather
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (7 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 08/11] ia64/tlb: Conver " Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 10/11] um/tlb: Convert " Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 11/11] arch/tlb: Clean up simple architectures Peter Zijlstra
  10 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, Yoshinori Sato, Rich Felker

Generic mmu_gather provides everything SH needs (range tracking and
cache coherency).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sh/include/asm/pgalloc.h |    7 ++
 arch/sh/include/asm/tlb.h     |  130 ------------------------------------------
 2 files changed, 8 insertions(+), 129 deletions(-)

--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -72,6 +72,15 @@ do {							\
 	tlb_remove_page((tlb), (pte));			\
 } while (0)
 
+#if CONFIG_PGTABLE_LEVELS > 2
+#define __pmd_free_tlb(tlb, pmdp, addr)			\
+do {							\
+	struct page *page = virt_to_page(pmdp);		\
+	pgtable_pmd_page_dtor(page);			\
+	tlb_remove_page((tlb), page);			\
+} while (0);
+#endif
+
 static inline void check_pgt_cache(void)
 {
 	quicklist_trim(QUICK_PT, NULL, 25, 16);
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -11,131 +11,8 @@
 
 #ifdef CONFIG_MMU
 #include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	unsigned long		start, end;
-};
 
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (tlb->fullmm || force)
-		flush_tlb_mm(tlb->mm);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm && tlb->end) {
-		flush_tlb_range(vma, tlb->start, tlb->end);
-		init_tlb_gather(tlb);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
+#include <asm-generic/tlb.h>
 
 #if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
 extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -155,11 +32,6 @@ static inline void tlb_unwire_entry(void
 
 #else /* CONFIG_MMU */
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
-#define tlb_flush(tlb)					do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* CONFIG_MMU */



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 10/11] um/tlb: Convert to generic mmu_gather
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (8 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 09/11] sh/tlb: Convert SH " Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  2018-09-13  9:21 ` [RFC][PATCH 11/11] arch/tlb: Clean up simple architectures Peter Zijlstra
  10 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, Richard Weinberger

Generic mmu_gather provides the simple flush_tlb_range() based range
tracking mmu_gather UM needs.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/include/asm/tlb.h |  156 ----------------------------------------------
 1 file changed, 2 insertions(+), 154 deletions(-)

--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -2,160 +2,8 @@
 #ifndef __UM_TLB_H
 #define __UM_TLB_H
 
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/percpu.h>
-#include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
-
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		need_flush; /* Really unmapped some ptes? */
-	unsigned long		start;
-	unsigned long		end;
-	unsigned int		fullmm; /* non-zero means full mm flush */
-};
-
-static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
-					  unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->need_flush = 0;
-
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			       unsigned long end);
-
-static inline void
-tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end);
-}
-
-static inline void
-tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-/* arch_tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-		tlb->need_flush = 1;
-	}
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-/* tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)),
- *	while handling the additional races in SMP caused by other CPUs
- *	caching valid mappings in their TLBs.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/**
- * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
- *
- * Record the fact that pte's were really umapped in ->need_flush, so we can
- * later optimise away the tlb invalidate.   This helps when userspace is
- * unmapping already-unmapped pages, which happens quite a lot.
- */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
-	do {							\
-		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
-	} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
-
-#define pud_free_tlb(tlb, pudp, addr) __pud_free_tlb(tlb, pudp, addr)
-
-#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
-
-#define tlb_migrate_finish(mm) do {} while (0)
+#include <asm-generic/cacheflush.h>
+#include <asm-generic/tlb.h>
 
 #endif



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC][PATCH 11/11] arch/tlb: Clean up simple architectures
  2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
                   ` (9 preceding siblings ...)
  2018-09-13  9:21 ` [RFC][PATCH 10/11] um/tlb: Convert " Peter Zijlstra
@ 2018-09-13  9:21 ` Peter Zijlstra
  10 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13  9:21 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens

There are generally two cases:

 1) either the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) or an architecture lacks an efficient flush_tlb_range() and
    we override tlb_end_vma() and tlb_flush().

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
h8300:	    has no mmu
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/tlb.h      |    2 --
 arch/arc/include/asm/tlb.h        |   23 -----------------------
 arch/c6x/include/asm/tlb.h        |    1 +
 arch/h8300/include/asm/tlb.h      |    2 --
 arch/hexagon/include/asm/tlb.h    |   12 ------------
 arch/m68k/include/asm/tlb.h       |    1 -
 arch/microblaze/include/asm/tlb.h |    4 +---
 arch/mips/include/asm/tlb.h       |    8 --------
 arch/nds32/include/asm/tlb.h      |   10 ----------
 arch/nios2/include/asm/tlb.h      |    8 +++++---
 arch/openrisc/include/asm/tlb.h   |    6 ++++--
 arch/parisc/include/asm/tlb.h     |   13 -------------
 arch/powerpc/include/asm/tlb.h    |    1 -
 arch/sparc/include/asm/tlb_32.h   |   13 -------------
 arch/unicore32/include/asm/tlb.h  |   10 ++++++----
 arch/xtensa/include/asm/tlb.h     |   17 -----------------
 16 files changed, 17 insertions(+), 114 deletions(-)

--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -4,8 +4,6 @@
 
 #define tlb_start_vma(tlb, vma)			do { } while (0)
 #define tlb_end_vma(tlb, vma)			do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
-
 #define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -9,29 +9,6 @@
 #ifndef _ASM_ARC_TLB_H
 #define _ASM_ARC_TLB_H
 
-#define tlb_flush(tlb)				\
-do {						\
-	if (tlb->fullmm)			\
-		flush_tlb_mm((tlb)->mm);	\
-} while (0)
-
-/*
- * This pair is called at time of munmap/exit to flush cache and TLB entries
- * for mappings being torn down.
- * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
- * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
- *
- * Note, read http://lkml.org/lkml/2004/1/15/6
- */
-
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, ptep, address)
-
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/c6x/include/asm/tlb.h
+++ b/arch/c6x/include/asm/tlb.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_C6X_TLB_H
 #define _ASM_C6X_TLB_H
 
+#define tlb_end_vma(tlb,vma) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/h8300/include/asm/tlb.h
+++ b/arch/h8300/include/asm/tlb.h
@@ -2,8 +2,6 @@
 #ifndef __H8300_TLB_H__
 #define __H8300_TLB_H__
 
-#define tlb_flush(tlb)	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/hexagon/include/asm/tlb.h
+++ b/arch/hexagon/include/asm/tlb.h
@@ -22,18 +22,6 @@
 #include <linux/pagemap.h>
 #include <asm/tlbflush.h>
 
-/*
- * We don't need any special per-pte or per-vma handling...
- */
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/m68k/include/asm/tlb.h
+++ b/arch/m68k/include/asm/tlb.h
@@ -8,7 +8,6 @@
  */
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
 /*
  * .. because we flush the whole mm when it
--- a/arch/microblaze/include/asm/tlb.h
+++ b/arch/microblaze/include/asm/tlb.h
@@ -11,14 +11,12 @@
 #ifndef _ASM_MICROBLAZE_TLB_H
 #define _ASM_MICROBLAZE_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <linux/pagemap.h>
 
 #ifdef CONFIG_MMU
 #define tlb_start_vma(tlb, vma)		do { } while (0)
 #define tlb_end_vma(tlb, vma)		do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 #endif
 
 #include <asm-generic/tlb.h>
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,14 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #define _UNIQUE_ENTRYHI(base, idx)					\
 		(((base) + ((idx) << (PAGE_SHIFT + 1))) |		\
 		 (cpu_has_tlbinv ? MIPS_ENTRYHI_EHINV : 0))
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,16 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_end_vma(tlb,vma)				\
-	do { 						\
-		if(!tlb->fullmm)			\
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, addr) do { } while (0)
-
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -11,12 +11,14 @@
 #ifndef _ASM_NIOS2_TLB_H
 #define _ASM_NIOS2_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 extern void set_mmu_pid(unsigned long pid);
 
+/*
+ * NIOS32 does have flush_tlb_range(), but it lacks a limit and fallback to
+ * full mm invalidation. So use flush_tlb_mm() for everything.
+ */
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
+#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
 
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
--- a/arch/openrisc/include/asm/tlb.h
+++ b/arch/openrisc/include/asm/tlb.h
@@ -22,12 +22,14 @@
 /*
  * or32 doesn't need any special per-pte or
  * per-vma handling..
+ *
+ * OpenRISC doesn't have an efficient flush_tlb_range() so use flush_tlb_mm()
+ * for everything.
  */
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -2,19 +2,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_flush(tlb)			\
-do {	if ((tlb)->fullmm)		\
-		flush_tlb_mm((tlb)->mm);\
-} while (0)
-
-#define tlb_end_vma(tlb, vma)	\
-do {	if (!(tlb)->fullmm)	\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tlb)->mm, pmd)
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,19 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_end_vma(tlb, vma) \
-do {								\
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
-#define tlb_flush(tlb) \
-do {								\
-	flush_tlb_mm((tlb)->mm);				\
-} while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC_TLB_H */
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -12,10 +12,12 @@
 #ifndef __UNICORE_TLB_H__
 #define __UNICORE_TLB_H__
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+/*
+ * unicore32 lacks an afficient flush_tlb_range(), use flush_tlb_mm().
+ */
+#define tlb_start_vma(tlb, vma)		do { } while (0)
+#define tlb_end_vma(tlb, vma)		do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 
 #define __pte_free_tlb(tlb, pte, addr)				\
 	do {							\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -14,23 +14,6 @@
 #include <asm/cache.h>
 #include <asm/page.h>
 
-#if (DCACHE_WAY_SIZE <= PAGE_SIZE)
-
-# define tlb_end_vma(tlb,vma)			do { } while (0)
-
-#else
-
-# define tlb_end_vma(tlb, vma)						      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end);     \
-	} while(0)
-
-#endif
-
-#define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, address)	pte_free((tlb)->mm, pte)



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
@ 2018-09-13 10:30   ` Martin Schwidefsky
  2018-09-13 10:57     ` Peter Zijlstra
  2018-09-14 16:48   ` Will Deacon
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Schwidefsky @ 2018-09-13 10:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, 13 Sep 2018 11:21:11 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Write a comment explaining some of this..
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/tlb.h |  120 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 117 insertions(+), 3 deletions(-)
> 
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -22,6 +22,119 @@
> 
>  #ifdef CONFIG_MMU
> 
> +/*
> + * Generic MMU-gather implementation.
> + *
> + * The mmu_gather data structure is used by the mm code to implement the
> + * correct and efficient ordering of freeing pages and TLB invalidations.
> + *
> + * This correct ordering is:
> + *
> + *  1) unhook page
> + *  2) TLB invalidate page
> + *  3) free page
> + *
> + * That is, we must never free a page before we have ensured there are no live
> + * translations left to it. Otherwise it might be possible to observe (or
> + * worse, change) the page content after it has been reused.
> + *

This first comment already includes the reason why s390 is probably better off
with its own mmu-gather implementation. It depends on the situation if we have

1) unhook the page and do a TLB flush at the same time
2) free page

or

1) unhook page
2) free page
3) final TLB flush of the whole mm

A variant of the second order we had in the past is to do the mm TLB flush first,
then the unhooks and frees of the individual pages. The are some tricky corners
switching between the two variants, see finish_arch_post_lock_switch.

The point is: we *never* have the order 1) unhook, 2) TLB invalidate, 3) free.
If there is concurrency due to a multi-threaded application we have to do the
unhook of the page-table entry and the TLB flush with a single instruction.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13 10:30   ` Martin Schwidefsky
@ 2018-09-13 10:57     ` Peter Zijlstra
  2018-09-13 12:18       ` Martin Schwidefsky
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 10:57 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, Sep 13, 2018 at 12:30:14PM +0200, Martin Schwidefsky wrote:

> > + * The mmu_gather data structure is used by the mm code to implement the
> > + * correct and efficient ordering of freeing pages and TLB invalidations.
> > + *
> > + * This correct ordering is:
> > + *
> > + *  1) unhook page
> > + *  2) TLB invalidate page
> > + *  3) free page
> > + *
> > + * That is, we must never free a page before we have ensured there are no live
> > + * translations left to it. Otherwise it might be possible to observe (or
> > + * worse, change) the page content after it has been reused.
> > + *
> 
> This first comment already includes the reason why s390 is probably better off
> with its own mmu-gather implementation. It depends on the situation if we have
> 
> 1) unhook the page and do a TLB flush at the same time
> 2) free page
> 
> or
> 
> 1) unhook page
> 2) free page
> 3) final TLB flush of the whole mm

that's the fullmm case, right?

> A variant of the second order we had in the past is to do the mm TLB flush first,
> then the unhooks and frees of the individual pages. The are some tricky corners
> switching between the two variants, see finish_arch_post_lock_switch.
> 
> The point is: we *never* have the order 1) unhook, 2) TLB invalidate, 3) free.
> If there is concurrency due to a multi-threaded application we have to do the
> unhook of the page-table entry and the TLB flush with a single instruction.

You can still get the thing you want if for !fullmm you have a no-op
tlb_flush() implementation, assuming your arch page-table frobbing thing
has the required TLB flush in.

Note that that's not utterly unlike how the PowerPC/Sparc hash things
work, they clear and invalidate entries different from others and don't
use the mmu_gather tlb-flush.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13 10:57     ` Peter Zijlstra
@ 2018-09-13 12:18       ` Martin Schwidefsky
  2018-09-13 12:39         ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Schwidefsky @ 2018-09-13 12:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, 13 Sep 2018 12:57:38 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 13, 2018 at 12:30:14PM +0200, Martin Schwidefsky wrote:
> 
> > > + * The mmu_gather data structure is used by the mm code to implement the
> > > + * correct and efficient ordering of freeing pages and TLB invalidations.
> > > + *
> > > + * This correct ordering is:
> > > + *
> > > + *  1) unhook page
> > > + *  2) TLB invalidate page
> > > + *  3) free page
> > > + *
> > > + * That is, we must never free a page before we have ensured there are no live
> > > + * translations left to it. Otherwise it might be possible to observe (or
> > > + * worse, change) the page content after it has been reused.
> > > + *  
> > 
> > This first comment already includes the reason why s390 is probably better off
> > with its own mmu-gather implementation. It depends on the situation if we have
> > 
> > 1) unhook the page and do a TLB flush at the same time
> > 2) free page
> > 
> > or
> > 
> > 1) unhook page
> > 2) free page
> > 3) final TLB flush of the whole mm  
> 
> that's the fullmm case, right?

That includes the fullmm case but we use it for e.g. munmap of a single-threaded
program as well.
 
> > A variant of the second order we had in the past is to do the mm TLB flush first,
> > then the unhooks and frees of the individual pages. The are some tricky corners
> > switching between the two variants, see finish_arch_post_lock_switch.
> > 
> > The point is: we *never* have the order 1) unhook, 2) TLB invalidate, 3) free.
> > If there is concurrency due to a multi-threaded application we have to do the
> > unhook of the page-table entry and the TLB flush with a single instruction.  
> 
> You can still get the thing you want if for !fullmm you have a no-op
> tlb_flush() implementation, assuming your arch page-table frobbing thing
> has the required TLB flush in.

We have a non-empty tlb_flush_mmu_tlbonly to do a full-mm flush for two cases
1) batches of page-table entries for single-threaded programs
2) flushing of the pages used for the page-table structure itself

In fact only the page-table pages are added to the mmu_gather batch, the target
page of the virtual mapping is always freed immediately.
 
> Note that that's not utterly unlike how the PowerPC/Sparc hash things
> work, they clear and invalidate entries different from others and don't
> use the mmu_gather tlb-flush.

We may get something working with a common code mmu_gather, but I fear the
day someone makes a "minor" change to that subtly break s390. The debugging of
TLB related problems is just horrible..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13 12:18       ` Martin Schwidefsky
@ 2018-09-13 12:39         ` Peter Zijlstra
  2018-09-14 10:28           ` Martin Schwidefsky
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 12:39 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, Sep 13, 2018 at 02:18:27PM +0200, Martin Schwidefsky wrote:
> We may get something working with a common code mmu_gather, but I fear the
> day someone makes a "minor" change to that subtly break s390. The debugging of
> TLB related problems is just horrible..

Yes it is, not just on s390 :/

And this is not something that's easy to write sanity checks for either
AFAIK. I mean we can do a few multi-threaded mmap()/mprotect()/munmap()
proglets and catch faults, but that doesn't even get close to covering
all the 'fun' spots.

Then again, you're more than welcome to the new:

  MMU GATHER AND TLB INVALIDATION

section in MAINTAINERS.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush
  2018-09-13  9:21 ` [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
@ 2018-09-13 13:09   ` Jann Horn
  2018-09-13 14:06     ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Jann Horn @ 2018-09-13 13:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Aneesh Kumar K.V, Andrew Morton, npiggin,
	linux-arch, Linux-MM, kernel list, Russell King - ARM Linux,
	Heiko Carstens

On Thu, Sep 13, 2018 at 3:01 PM Peter Zijlstra <peterz@infradead.org> wrote:
> Provide a generic tlb_flush() implementation that relies on
> flush_tlb_range(). This is a little awkward because flush_tlb_range()
> assumes a VMA for range invalidation, but we no longer have one.
>
> Audit of all flush_tlb_range() implementations shows only vma->vm_mm
> and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
> invalidates) and VM_HUGETLB (large TLB invalidate) are used.
>
> Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
> 'fake' VMA.
>
> This allows architectures that have a reasonably efficient
> flush_tlb_range() to not require any additional effort.
[...]
> +#define tlb_flush tlb_flush
> +static inline void tlb_flush(struct mmu_gather *tlb)
> +{
> +       if (tlb->fullmm || tlb->need_flush_all) {
> +               flush_tlb_mm(tlb->mm);
> +       } else {
> +               struct vm_area_struct vma = {
> +                       .vm_mm = tlb->mm,
> +                       .vm_flags = tlb->vma_exec ? VM_EXEC    : 0 |
> +                                   tlb->vma_huge ? VM_HUGETLB : 0,

This looks wrong to me. Bitwise OR has higher precedence than the
ternary operator, so I think this code is equivalent to:

.vm_flags = tlb->vma_exec ? VM_EXEC    : (0 | tlb->vma_huge) ? VM_HUGETLB : 0

meaning that executable+huge mappings would only get VM_EXEC, but not
VM_HUGETLB.

> +               };
> +
> +               flush_tlb_range(&vma, tlb->start, tlb->end);
> +       }
>  }
> +#endif

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush
  2018-09-13 13:09   ` Jann Horn
@ 2018-09-13 14:06     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 14:06 UTC (permalink / raw)
  To: Jann Horn
  Cc: Will Deacon, Aneesh Kumar K.V, Andrew Morton, npiggin,
	linux-arch, Linux-MM, kernel list, Russell King - ARM Linux,
	Heiko Carstens

On Thu, Sep 13, 2018 at 03:09:47PM +0200, Jann Horn wrote:
> On Thu, Sep 13, 2018 at 3:01 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > Provide a generic tlb_flush() implementation that relies on
> > flush_tlb_range(). This is a little awkward because flush_tlb_range()
> > assumes a VMA for range invalidation, but we no longer have one.
> >
> > Audit of all flush_tlb_range() implementations shows only vma->vm_mm
> > and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
> > invalidates) and VM_HUGETLB (large TLB invalidate) are used.
> >
> > Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
> > 'fake' VMA.
> >
> > This allows architectures that have a reasonably efficient
> > flush_tlb_range() to not require any additional effort.
> [...]
> > +#define tlb_flush tlb_flush
> > +static inline void tlb_flush(struct mmu_gather *tlb)
> > +{
> > +       if (tlb->fullmm || tlb->need_flush_all) {
> > +               flush_tlb_mm(tlb->mm);
> > +       } else {
> > +               struct vm_area_struct vma = {
> > +                       .vm_mm = tlb->mm,
> > +                       .vm_flags = tlb->vma_exec ? VM_EXEC    : 0 |
> > +                                   tlb->vma_huge ? VM_HUGETLB : 0,
> 
> This looks wrong to me. Bitwise OR has higher precedence than the
> ternary operator, so I think this code is equivalent to:
> 
> .vm_flags = tlb->vma_exec ? VM_EXEC    : (0 | tlb->vma_huge) ? VM_HUGETLB : 0
> 
> meaning that executable+huge mappings would only get VM_EXEC, but not
> VM_HUGETLB.

Bah. Fixed that. Thanks!

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -309,8 +309,8 @@ static inline void tlb_flush(struct mmu_
 	} else {
 		struct vm_area_struct vma = {
 			.vm_mm = tlb->mm,
-			.vm_flags = tlb->vma_exec ? VM_EXEC    : 0 |
-				    tlb->vma_huge ? VM_HUGETLB : 0,
+			.vm_flags = (tlb->vma_exec ? VM_EXEC    : 0) |
+				    (tlb->vma_huge ? VM_HUGETLB : 0),
 		};
 
 		flush_tlb_range(&vma, tlb->start, tlb->end);


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13  9:21 ` [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
@ 2018-09-13 17:22   ` Dave Hansen
  2018-09-13 18:42     ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Hansen @ 2018-09-13 17:22 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, linux, heiko.carstens

> +static inline void tlb_flush(struct mmu_gather *tlb)
> +{
> +	unsigned long start = 0UL, end = TLB_FLUSH_ALL;
> +	unsigned int invl_shift = tlb_get_unmap_shift(tlb);

I had to go back and look at

	https://patchwork.kernel.org/patch/10587207/

to figure out what was going on.  I wonder if we could make the code a
bit more standalone.

This at least needs a comment about what it's getting from 'tlb'.  Maybe
just:

	/* Find the smallest page size that we unmapped: */

> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -507,23 +507,25 @@ struct flush_tlb_info {
>  	unsigned long		start;
>  	unsigned long		end;
>  	u64			new_tlb_gen;
> +	unsigned int		invl_shift;
>  };

Maybe we really should just call this flush_stride or something.

>  #define local_flush_tlb() __flush_tlb()
>  
>  #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
>  
> -#define flush_tlb_range(vma, start, end)	\
> -		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
> +#define flush_tlb_range(vma, start, end)			\
> +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
> +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)

This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
hugetlb page?

>  void native_flush_tlb_others(const struct cpumask *cpumask,
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -522,12 +522,12 @@ static void flush_tlb_func_common(const
>  	    f->new_tlb_gen == mm_tlb_gen) {
>  		/* Partial flush */
>  		unsigned long addr;
> -		unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
> +		unsigned long nr_pages = (f->end - f->start) >> f->invl_shift;

We might want to make this nr_invalidations or nr_flushes now so we
don't get it confused with PAGE_SIZE stuff.

Otherwise, this makes me a *tiny* bit nervous.  I think we're good about
ensuring that we fully flush 4k mappings from the TLB before going up to
a 2MB mapping because of all the errata we've had there over the years.
But, had we left 4k mappings around, the old flushing code would have
cleaned them up for us.

This certainly tightly ties the invalidations to what was in the page
tables.  If that diverged from the TLB at some point, there's certainly
more exposure here.

Looks fun, though. :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13 17:22   ` Dave Hansen
@ 2018-09-13 18:42     ` Peter Zijlstra
  2018-09-13 18:46       ` Peter Zijlstra
  2018-09-13 18:47       ` Dave Hansen
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 18:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, Sep 13, 2018 at 10:22:58AM -0700, Dave Hansen wrote:
> > +static inline void tlb_flush(struct mmu_gather *tlb)
> > +{
> > +	unsigned long start = 0UL, end = TLB_FLUSH_ALL;
> > +	unsigned int invl_shift = tlb_get_unmap_shift(tlb);
> 
> I had to go back and look at
> 
> 	https://patchwork.kernel.org/patch/10587207/

I so hate patchwork...

> to figure out what was going on.  I wonder if we could make the code a
> bit more standalone.
> 
> This at least needs a comment about what it's getting from 'tlb'.  Maybe
> just:
> 
> 	/* Find the smallest page size that we unmapped: */
> 
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -507,23 +507,25 @@ struct flush_tlb_info {
> >  	unsigned long		start;
> >  	unsigned long		end;
> >  	u64			new_tlb_gen;
> > +	unsigned int		invl_shift;
> >  };
> 
> Maybe we really should just call this flush_stride or something.

But its a shift, not a size. stride_shift?

> >  #define local_flush_tlb() __flush_tlb()
> >  
> >  #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
> >  
> > -#define flush_tlb_range(vma, start, end)	\
> > -		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
> > +#define flush_tlb_range(vma, start, end)			\
> > +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
> > +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
> 
> This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
> hugetlb page?

It could be, but can we tell at that point?

> >  void native_flush_tlb_others(const struct cpumask *cpumask,
> > --- a/arch/x86/mm/tlb.c
> > +++ b/arch/x86/mm/tlb.c
> > @@ -522,12 +522,12 @@ static void flush_tlb_func_common(const
> >  	    f->new_tlb_gen == mm_tlb_gen) {
> >  		/* Partial flush */
> >  		unsigned long addr;
> > -		unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
> > +		unsigned long nr_pages = (f->end - f->start) >> f->invl_shift;
> 
> We might want to make this nr_invalidations or nr_flushes now so we
> don't get it confused with PAGE_SIZE stuff.

Sure, can rename.

> Otherwise, this makes me a *tiny* bit nervous.  I think we're good about
> ensuring that we fully flush 4k mappings from the TLB before going up to
> a 2MB mapping because of all the errata we've had there over the years.
> But, had we left 4k mappings around, the old flushing code would have
> cleaned them up for us.

Indeed.

> This certainly tightly ties the invalidations to what was in the page
> tables.  If that diverged from the TLB at some point, there's certainly
> more exposure here.
>
> Looks fun, though. :)

:-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13 18:42     ` Peter Zijlstra
@ 2018-09-13 18:46       ` Peter Zijlstra
  2018-09-13 18:48         ` Peter Zijlstra
  2018-09-13 18:49         ` Dave Hansen
  2018-09-13 18:47       ` Dave Hansen
  1 sibling, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 18:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, Sep 13, 2018 at 08:42:30PM +0200, Peter Zijlstra wrote:
> > > +#define flush_tlb_range(vma, start, end)			\
> > > +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
> > > +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
> > 
> > This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
> > hugetlb page?
> 
> It could be, but can we tell at that point?

I had me a look in higetlb.h, would something like so work?

#define flush_tlb_range(vma, start, end)			\
	flush_tlb_mm_range((vma)->vm_mm, start, end,		\
			   huge_page_shift(hstate_vma(vma)))



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13 18:42     ` Peter Zijlstra
  2018-09-13 18:46       ` Peter Zijlstra
@ 2018-09-13 18:47       ` Dave Hansen
  1 sibling, 0 replies; 39+ messages in thread
From: Dave Hansen @ 2018-09-13 18:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

>>> --- a/arch/x86/include/asm/tlbflush.h
>>> +++ b/arch/x86/include/asm/tlbflush.h
>>> @@ -507,23 +507,25 @@ struct flush_tlb_info {
>>>  	unsigned long		start;
>>>  	unsigned long		end;
>>>  	u64			new_tlb_gen;
>>> +	unsigned int		invl_shift;
>>>  };
>>
>> Maybe we really should just call this flush_stride or something.
> 
> But its a shift, not a size. stride_shift?

Yeah, sounds better than 'invl' to me.

>>>  #define local_flush_tlb() __flush_tlb()
>>>  
>>>  #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
>>>  
>>> -#define flush_tlb_range(vma, start, end)	\
>>> -		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
>>> +#define flush_tlb_range(vma, start, end)			\
>>> +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
>>> +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
>>
>> This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
>> hugetlb page?
> 
> It could be, but can we tell at that point?

We should have the page size via huge_page_shift(hstate_vma(vma)).  No
idea if it'll work in practice, though.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13 18:46       ` Peter Zijlstra
@ 2018-09-13 18:48         ` Peter Zijlstra
  2018-09-13 18:49         ` Dave Hansen
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-13 18:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, Sep 13, 2018 at 08:46:32PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 13, 2018 at 08:42:30PM +0200, Peter Zijlstra wrote:
> > > > +#define flush_tlb_range(vma, start, end)			\
> > > > +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
> > > > +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
> > > 
> > > This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
> > > hugetlb page?
> > 
> > It could be, but can we tell at that point?
> 
> I had me a look in higetlb.h, would something like so work?
> 
> #define flush_tlb_range(vma, start, end)			\
> 	flush_tlb_mm_range((vma)->vm_mm, start, end,		\
> 			   huge_page_shift(hstate_vma(vma)))
> 

D'uh

#define flush_tlb_range(vma, start, end)			\
	flush_tlb_mm_range((vma)->vm_mm, start, end,		\
	   (vma)->vm_flags & VM_HUGETLB ? huge_page_shift(hstate_vma(vma)) : PAGE_SHIFT)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-13 18:46       ` Peter Zijlstra
  2018-09-13 18:48         ` Peter Zijlstra
@ 2018-09-13 18:49         ` Dave Hansen
  1 sibling, 0 replies; 39+ messages in thread
From: Dave Hansen @ 2018-09-13 18:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On 09/13/2018 11:46 AM, Peter Zijlstra wrote:
> On Thu, Sep 13, 2018 at 08:42:30PM +0200, Peter Zijlstra wrote:
>>>> +#define flush_tlb_range(vma, start, end)			\
>>>> +		flush_tlb_mm_range((vma)->vm_mm, start, end,	\
>>>> +				(vma)->vm_flags & VM_HUGETLB ? PMD_SHIFT : PAGE_SHIFT)
>>> This is safe.  But, Couldn't this PMD_SHIFT also be PUD_SHIFT for a 1G
>>> hugetlb page?
>> It could be, but can we tell at that point?
> I had me a look in higetlb.h, would something like so work?
> 
> #define flush_tlb_range(vma, start, end)			\
> 	flush_tlb_mm_range((vma)->vm_mm, start, end,		\
> 			   huge_page_shift(hstate_vma(vma)))
> 

I think you still need the VM_HUGETLB check somewhere because
hstate_vma() won't work on non-VM_HUGETLB VMAs.  But, yeah, something
close to that should be OK.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13 12:39         ` Peter Zijlstra
@ 2018-09-14 10:28           ` Martin Schwidefsky
  2018-09-14 13:02             ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Schwidefsky @ 2018-09-14 10:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens

On Thu, 13 Sep 2018 14:39:37 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 13, 2018 at 02:18:27PM +0200, Martin Schwidefsky wrote:
> > We may get something working with a common code mmu_gather, but I fear the
> > day someone makes a "minor" change to that subtly break s390. The debugging of
> > TLB related problems is just horrible..  
> 
> Yes it is, not just on s390 :/
> 
> And this is not something that's easy to write sanity checks for either
> AFAIK. I mean we can do a few multi-threaded mmap()/mprotect()/munmap()
> proglets and catch faults, but that doesn't even get close to covering
> all the 'fun' spots.
> 
> Then again, you're more than welcome to the new:
> 
>   MMU GATHER AND TLB INVALIDATION
> 
> section in MAINTAINERS.

I spent some time to get s390 converted to the common mmu_gather code.
There is one thing I would like to request, namely the ability to
disable the page gather part of mmu_gather. For my prototype patch
see below, it defines the negative HAVE_RCU_NO_GATHER_PAGES Kconfig
symbol that if defined will remove some parts from common code.
Ugly but good enough for the prototype to convey the idea.
For the final solution we better use a positive Kconfig symbol and
add that to all arch Kconfig files except for s390.

The code itself is less hairy than I feared, it worked on the first
try and survived my fork/munmap/mprotect TLB stress test. But as
this is TLB flushing there probably is some subtle problem left..

Here we go:
--
From f222a7e40427b625700f2ca0919c32f07931c19a Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date: Fri, 14 Sep 2018 10:50:58 +0200
Subject: [PATCH] s390/tlb: convert s390 to generic mmu_gather

Introduce HAVE_RCU_NO_GATHER_PAGES to allow the arch code to disable
page gathering in the generic mmu_gather code, then enable the generic
mmu_gather code for s390.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/Kconfig                |   3 +
 arch/s390/Kconfig           |   3 +
 arch/s390/include/asm/tlb.h | 131 ++++++++++++++------------------------------
 arch/s390/mm/pgalloc.c      |  63 +--------------------
 include/asm-generic/tlb.h   |   7 +++
 mm/mmu_gather.c             |  18 +++++-
 6 files changed, 72 insertions(+), 153 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 053c44703539..9b257929a7c1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -359,6 +359,9 @@ config HAVE_PERF_USER_STACK_DUMP
 config HAVE_ARCH_JUMP_LABEL
 	bool
 
+config HAVE_RCU_NO_GATHER_PAGES
+	bool
+
 config HAVE_RCU_TABLE_FREE
 	bool
 
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 9a9c7a6fe925..521457e3c5e4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -161,6 +161,9 @@ config S390
 	select HAVE_NOP_MCOUNT
 	select HAVE_OPROFILE
 	select HAVE_PERF_EVENTS
+	select HAVE_RCU_NO_GATHER_PAGES
+	select HAVE_RCU_TABLE_FREE
+	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index cf3d64313740..8073ff272b2b 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -22,98 +22,40 @@
  * Pages used for the page tables is a different story. FIXME: more
  */
 
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/processor.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-
-struct mmu_gather {
-	struct mm_struct *mm;
-	struct mmu_table_batch *batch;
-	unsigned int fullmm;
-	unsigned long start, end;
-};
-
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-	tlb->batch = NULL;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	__tlb_flush_mm_lazy(tlb->mm);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	tlb_table_flush(tlb);
-}
-
+void __tlb_remove_table(void *_table);
+static inline void tlb_flush(struct mmu_gather *tlb);
+static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
+					  struct page *page, int page_size);
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
+#define tlb_start_vma(tlb, vma)			do { } while (0)
+#define tlb_end_vma(tlb, vma)			do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-	}
+#define tlb_flush tlb_flush
+#define pte_free_tlb pte_free_tlb
+#define pmd_free_tlb pmd_free_tlb
+#define p4d_free_tlb p4d_free_tlb
+#define pud_free_tlb pud_free_tlb
 
-	tlb_flush_mmu(tlb);
-}
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm-generic/tlb.h>
 
 /*
  * Release the page cache reference for a pte removed by
  * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
  * has already been freed, so just do free_page_and_swap_cache.
  */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-}
-
 static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
 					  struct page *page, int page_size)
 {
-	return __tlb_remove_page(tlb, page);
+	free_page_and_swap_cache(page);
+	return false;
 }
 
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
+static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	return tlb_remove_page(tlb, page);
+	__tlb_flush_mm_lazy(tlb->mm);
 }
 
 /*
@@ -121,9 +63,18 @@ static inline void tlb_remove_page_size(struct mmu_gather *tlb,
  * page table from the tlb.
  */
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address)
+                                unsigned long address)
 {
-	page_table_free_rcu(tlb, (unsigned long *) pte, address);
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_ptes = 1;
+	/*
+	 * page_table_free_rcu takes care of the allocation bit masks
+	 * of the 2K table fragments in the 4K page table page,
+	 * then calls tlb_remove_table.
+	 */
+        page_table_free_rcu(tlb, (unsigned long *) pte, address);
 }
 
 /*
@@ -139,6 +90,10 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
 	if (tlb->mm->context.asce_limit <= _REGION3_SIZE)
 		return;
 	pgtable_pmd_page_dtor(virt_to_page(pmd));
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pmd);
 }
 
@@ -154,6 +109,10 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 {
 	if (tlb->mm->context.asce_limit <= _REGION1_SIZE)
 		return;
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_p4ds = 1;
 	tlb_remove_table(tlb, p4d);
 }
 
@@ -169,19 +128,11 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 {
 	if (tlb->mm->context.asce_limit <= _REGION2_SIZE)
 		return;
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pud);
 }
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-#define tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
-#define tlb_remove_pmd_tlb_entry(tlb, pmdp, addr)	do { } while (0)
-#define tlb_migrate_finish(mm)			do { } while (0)
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
 
 #endif /* _S390_TLB_H */
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 76d89ee8b428..f7656a0b3a1a 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -288,7 +288,7 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table,
 	tlb_remove_table(tlb, table);
 }
 
-static void __tlb_remove_table(void *_table)
+void __tlb_remove_table(void *_table)
 {
 	unsigned int mask = (unsigned long) _table & 3;
 	void *table = (void *)((unsigned long) _table ^ mask);
@@ -314,67 +314,6 @@ static void __tlb_remove_table(void *_table)
 	}
 }
 
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely
-	 * on IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	tlb->mm->context.flush_mm = 1;
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)
-			__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			__tlb_flush_mm_lazy(tlb->mm);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_flush_mmu(tlb);
-}
-
 /*
  * Base infrastructure required to generate basic asces, region, segment,
  * and page tables that do not make use of enhanced features like EDAT1.
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 21c751cd751e..930e25abf4de 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -179,6 +179,7 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 #endif
 
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -203,6 +204,8 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
+#endif
+
 /*
  * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
@@ -249,6 +252,7 @@ struct mmu_gather {
 
 	unsigned int		batch_count;
 
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
@@ -256,6 +260,7 @@ struct mmu_gather {
 #ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	unsigned int page_size;
 #endif
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -264,8 +269,10 @@ void tlb_flush_mmu(struct mmu_gather *tlb);
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
 void tlb_flush_mmu_free(struct mmu_gather *tlb);
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
 				   int page_size);
+#endif
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 2d5e617131f6..d3d2763d91b2 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -13,6 +13,8 @@
 
 #ifdef HAVE_GENERIC_MMU_GATHER
 
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
+
 static bool tlb_next_batch(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
@@ -41,6 +43,8 @@ static bool tlb_next_batch(struct mmu_gather *tlb)
 	return true;
 }
 
+#endif
+
 void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 				unsigned long start, unsigned long end)
 {
@@ -49,12 +53,14 @@ void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 	/* Is it from 0 to ~0? */
 	tlb->fullmm     = !(start | (end+1));
 	tlb->need_flush_all = 0;
+
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 	tlb->batch_count = 0;
-
+#endif
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
@@ -67,16 +73,20 @@ void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 
 void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	struct mmu_gather_batch *batch;
+#endif
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
 	}
 	tlb->active = &tlb->local;
+#endif
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -92,7 +102,9 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 		unsigned long start, unsigned long end, bool force)
 {
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	struct mmu_gather_batch *batch, *next;
+#endif
 
 	if (force) {
 		__tlb_reset_range(tlb);
@@ -104,13 +116,16 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 	/* keep the page table cache within bounds */
 	check_pgt_cache();
 
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 	for (batch = tlb->local.next; batch; batch = next) {
 		next = batch->next;
 		free_pages((unsigned long)batch, 0);
 	}
 	tlb->local.next = NULL;
+#endif
 }
 
+#ifndef CONFIG_HAVE_RCU_NO_GATHER_PAGES
 /* __tlb_remove_page
  *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while
  *	handling the additional races in SMP caused by other CPUs caching valid
@@ -143,6 +158,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
 
 	return false;
 }
+#endif
 
 #endif /* HAVE_GENERIC_MMU_GATHER */
 
-- 
2.16.4


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-14 10:28           ` Martin Schwidefsky
@ 2018-09-14 13:02             ` Peter Zijlstra
  2018-09-14 14:07               ` Martin Schwidefsky
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-14 13:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, Linus Torvalds

On Fri, Sep 14, 2018 at 12:28:24PM +0200, Martin Schwidefsky wrote:

> I spent some time to get s390 converted to the common mmu_gather code.
> There is one thing I would like to request, namely the ability to
> disable the page gather part of mmu_gather. For my prototype patch
> see below, it defines the negative HAVE_RCU_NO_GATHER_PAGES Kconfig
> symbol that if defined will remove some parts from common code.
> Ugly but good enough for the prototype to convey the idea.
> For the final solution we better use a positive Kconfig symbol and
> add that to all arch Kconfig files except for s390.

In a private thread ealier Linus raised the point that the batching and
freeing of lots of pages at once is probably better for I$.

> +config HAVE_RCU_NO_GATHER_PAGES
> +	bool

I have a problem with the name more than anything else; this name
suggests it is the RCU table freeing that should not batch, which is not
the case, you want the regular page gather gone, but very much require
the RCU table gather to batch.

So I would like to propose calling it:

config HAVE_MMU_GATHER_NO_GATHER

Or something along those lines.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-14 13:02             ` Peter Zijlstra
@ 2018-09-14 14:07               ` Martin Schwidefsky
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Schwidefsky @ 2018-09-14 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, Linus Torvalds

On Fri, 14 Sep 2018 15:02:07 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 14, 2018 at 12:28:24PM +0200, Martin Schwidefsky wrote:
> 
> > I spent some time to get s390 converted to the common mmu_gather code.
> > There is one thing I would like to request, namely the ability to
> > disable the page gather part of mmu_gather. For my prototype patch
> > see below, it defines the negative HAVE_RCU_NO_GATHER_PAGES Kconfig
> > symbol that if defined will remove some parts from common code.
> > Ugly but good enough for the prototype to convey the idea.
> > For the final solution we better use a positive Kconfig symbol and
> > add that to all arch Kconfig files except for s390.  
> 
> In a private thread ealier Linus raised the point that the batching and
> freeing of lots of pages at once is probably better for I$.

That would be something to try. For now I would like to do a conversion
that more or less preserves the old behavior. You know these pesky TLB
related bugs..

> > +config HAVE_RCU_NO_GATHER_PAGES
> > +	bool  
> 
> I have a problem with the name more than anything else; this name
> suggests it is the RCU table freeing that should not batch, which is not
> the case, you want the regular page gather gone, but very much require
> the RCU table gather to batch.
> 
> So I would like to propose calling it:
> 
> config HAVE_MMU_GATHER_NO_GATHER
> 
> Or something along those lines.
 
Imho a positive config option like HAVE_MMU_GATHER_PAGES would make the
most sense. It has the downside that it needs to be added to all
arch/*/Kconfig files except for s390. 

But I am not hung-up on a name, whatever does not sound to awful will do
for me. HAVE_MMU_GATHER_NO_GATHER would be ok.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
  2018-09-13 10:30   ` Martin Schwidefsky
@ 2018-09-14 16:48   ` Will Deacon
  2018-09-19 11:33     ` Peter Zijlstra
  2018-09-19 11:51     ` Peter Zijlstra
  1 sibling, 2 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-14 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

Hi Peter,

On Thu, Sep 13, 2018 at 11:21:11AM +0200, Peter Zijlstra wrote:
> Write a comment explaining some of this..

This comment is much-needed, thanks! Some comments inline.

> + * The mmu_gather API consists of:
> + *
> + *  - tlb_gather_mmu() / tlb_finish_mmu(); start and finish a mmu_gather
> + *
> + *    Finish in particular will issue a (final) TLB invalidate and free
> + *    all (remaining) queued pages.
> + *
> + *  - tlb_start_vma() / tlb_end_vma(); marks the start / end of a VMA
> + *
> + *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
> + *    there's large holes between the VMAs.
> + *
> + *  - tlb_remove_page() / __tlb_remove_page()
> + *  - tlb_remove_page_size() / __tlb_remove_page_size()
> + *
> + *    __tlb_remove_page_size() is the basic primitive that queues a page for
> + *    freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
> + *    boolean indicating if the queue is (now) full and a call to
> + *    tlb_flush_mmu() is required.
> + *
> + *    tlb_remove_page() and tlb_remove_page_size() imply the call to
> + *    tlb_flush_mmu() when required and has no return value.
> + *
> + *  - tlb_change_page_size()

This doesn't seem to exist in my tree.
[since realised you rename to it in the next patch]

> + *
> + *    call before __tlb_remove_page*() to set the current page-size; implies a
> + *    possible tlb_flush_mmu() call.
> + *
> + *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
> + *
> + *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
> + *                              related state, like the range)
> + *
> + *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
> + *			     sure no additional tlb_remove_page()
> + *			     calls happen between _tlbonly() and this.
> + *
> + *    tlb_flush_mmu() - the above two calls.
> + *
> + *  - mmu_gather::fullmm
> + *
> + *    A flag set by tlb_gather_mmu() to indicate we're going to free
> + *    the entire mm; this allows a number of optimizations.
> + *
> + *    XXX list optimizations

On arm64, we can elide the invalidation altogether because we won't
re-allocate the ASID. We also have an invalidate-by-ASID (mm) instruction,
which we could use if we needed to.

> + *
> + *  - mmu_gather::need_flush_all
> + *
> + *    A flag that can be set by the arch code if it wants to force
> + *    flush the entire TLB irrespective of the range. For instance
> + *    x86-PAE needs this when changing top-level entries.
> + *
> + * And requires the architecture to provide and implement tlb_flush().
> + *
> + * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
> + * use of:
> + *
> + *  - mmu_gather::start / mmu_gather::end
> + *
> + *    which (when !need_flush_all; fullmm will have start = end = ~0UL) provides
> + *    the range that needs to be flushed to cover the pages to be freed.

I don't understand the mention of need_flush_all here -- I didn't think it
was used by the core code at all.

> + *
> + *  - mmu_gather::freed_tables
> + *
> + *    set when we freed page table pages
> + *
> + *  - tlb_get_unmap_shift() / tlb_get_unmap_size()
> + *
> + *    returns the smallest TLB entry size unmapped in this range
> + *
> + * Additionally there are a few opt-in features:
> + *
> + *  HAVE_MMU_GATHER_PAGE_SIZE
> + *
> + *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
> + *  changes the size and provides mmu_gather::page_size to tlb_flush().

Ah, you add this later in the series. I think Nick reckoned we could get rid
of this (the page_size field) eventually...

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE
  2018-09-13  9:21 ` [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
@ 2018-09-14 16:56   ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-14 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Thu, Sep 13, 2018 at 11:21:12AM +0200, Peter Zijlstra wrote:
> Move the mmu_gather::page_size things into the generic code instead of
> powerpc specific bits.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/Kconfig                   |    3 +++
>  arch/arm/include/asm/tlb.h     |    3 +--
>  arch/ia64/include/asm/tlb.h    |    3 +--
>  arch/powerpc/Kconfig           |    1 +
>  arch/powerpc/include/asm/tlb.h |   17 -----------------
>  arch/s390/include/asm/tlb.h    |    4 +---
>  arch/sh/include/asm/tlb.h      |    4 +---
>  arch/um/include/asm/tlb.h      |    4 +---
>  include/asm-generic/tlb.h      |   25 +++++++++++++------------
>  mm/huge_memory.c               |    4 ++--
>  mm/hugetlb.c                   |    2 +-
>  mm/madvise.c                   |    2 +-
>  mm/memory.c                    |    4 ++--
>  mm/mmu_gather.c                |    5 +++++
>  14 files changed, 33 insertions(+), 48 deletions(-)

Looks fine to me, but I hope we can remove this option altogether in future:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush
  2018-09-13  9:21 ` [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
@ 2018-09-14 16:56   ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-14 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, David Miller, Guan Xuetao

On Thu, Sep 13, 2018 at 11:21:14AM +0200, Peter Zijlstra wrote:
> The one obvious thing SH and ARM want is a sensible default for
> tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )
> 
> Avoid all VIPT architectures providing their own tlb_start_vma()
> implementation and rely on architectures to provide a no-op
> flush_cache_range() when it is not relevant.
> 
> The below makes tlb_start_vma() default to flush_cache_range(), which
> should be right and sufficient. The only exceptions that I found where
> (oddly):
> 
>   - m68k-mmu
>   - sparc64
>   - unicore
> 
> Those architectures appear to have flush_cache_range(), but their
> current tlb_start_vma() does not call it.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arc/include/asm/tlb.h      |    9 ---------
>  arch/mips/include/asm/tlb.h     |    9 ---------
>  arch/nds32/include/asm/tlb.h    |    6 ------
>  arch/nios2/include/asm/tlb.h    |   10 ----------
>  arch/parisc/include/asm/tlb.h   |    5 -----
>  arch/sparc/include/asm/tlb_32.h |    5 -----
>  arch/xtensa/include/asm/tlb.h   |    9 ---------
>  include/asm-generic/tlb.h       |   19 +++++++++++--------
>  8 files changed, 11 insertions(+), 61 deletions(-)

LGTM and makes no difference to arm/arm64:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-13  9:21 ` [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
@ 2018-09-14 16:57   ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-14 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Thu, Sep 13, 2018 at 11:21:16AM +0200, Peter Zijlstra wrote:
> Needed for ia64 -- alternatively we drop the entire hook.

s/hook/architecture/

/me runs away for the weekend

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather
  2018-09-13  9:21 ` [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
@ 2018-09-18 14:10   ` Will Deacon
  2018-09-19 11:28     ` Peter Zijlstra
  2018-09-19 11:30     ` Peter Zijlstra
  0 siblings, 2 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-18 14:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

Hi Peter,

On Thu, Sep 13, 2018 at 11:21:17AM +0200, Peter Zijlstra wrote:
> Generic mmu_gather provides everything that ARM needs:
> 
>  - range tracking
>  - RCU table free
>  - VM_EXEC tracking
>  - VIPT cache flushing
> 
> The one notable curiosity is the 'funny' range tracking for classical
> ARM in __pte_free_tlb().
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
>  1 file changed, 14 insertions(+), 241 deletions(-)

So whilst I was reviewing this, I realised that I think we should be
selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE.

Whilst we don't distinguish between invalidation of intermediate and leaf
levels on 32-bit, the CPU is still permitted to cache partial translation
table walks even if the leaf entry indicates a fault. That means that
after tearing down the PTEs, we can still get walk cache allocations and
so if the RCU batching of the page tables fails, we need to invalidate
the TLB after clearing the intermediate entries but before freeing them.

> -static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> -	unsigned long addr)
> +__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
>  	pgtable_page_dtor(pte);
>  
> -#ifdef CONFIG_ARM_LPAE
> -	tlb_add_flush(tlb, addr);
> -#else
> +#ifndef CONFIG_ARM_LPAE
>  	/*
>  	 * With the classic ARM MMU, a pte page has two corresponding pmd
>  	 * entries, each covering 1MB.
>  	 */
> -	addr &= PMD_MASK;
> -	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
> -	tlb_add_flush(tlb, addr + SZ_1M);
> +	addr = (addr & PMD_MASK) + SZ_1M;
> +	__tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE);

Hmm, I don't think you've got the range correct here. Don't we want
something like:

	__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE)

to ensure that we flush on both sides of the 1M boundary?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather
  2018-09-18 14:10   ` Will Deacon
@ 2018-09-19 11:28     ` Peter Zijlstra
  2018-09-19 12:24       ` Will Deacon
  2018-09-19 11:30     ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-19 11:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote:

> So whilst I was reviewing this, I realised that I think we should be
> selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE.

Yes very much so. Let me invert that option, you normally want that,
except if you don't natively use the linux page-tables.

---
Subject: asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Sep 19 13:24:41 CEST 2018

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would this require this.

Make it an opt-out, instead of an opt-in.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    2 +-
 arch/arm64/Kconfig        |    1 -
 arch/powerpc/Kconfig      |    1 +
 arch/sparc/Kconfig        |    1 +
 arch/x86/Kconfig          |    1 -
 include/asm-generic/tlb.h |    9 +++++----
 mm/mmu_gather.c           |    2 +-
 7 files changed, 9 insertions(+), 8 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -362,7 +362,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
 	bool
 
 config HAVE_MMU_GATHER_PAGE_SIZE
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,7 +142,6 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RCU_TABLE_FREE
-	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if PARAVIRT
-	select HAVE_RCU_TABLE_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_STACKPROTECTOR		if CC_HAS_SANE_STACKPROTECTOR
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -127,11 +127,12 @@
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
- *  HAVE_RCU_TABLE_INVALIDATE
+ *  HAVE_RCU_TABLE_NO_INVALIDATE
  *
- *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
- *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
- *  architecture uses the Linux page-tables natively.
+ *  This makes HAVE_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
+ *  freeing the page-table pages. This can be avoided if you use
+ *  HAVE_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
+ *  page-tables natively.
  *
  */
 #define HAVE_GENERIC_MMU_GATHER
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -157,7 +157,7 @@ bool __tlb_remove_page_size(struct mmu_g
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
 	/*
 	 * Invalidate page-table caches used by hardware walkers. Then we still
 	 * need to RCU-sched wait while freeing the pages because software

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather
  2018-09-18 14:10   ` Will Deacon
  2018-09-19 11:28     ` Peter Zijlstra
@ 2018-09-19 11:30     ` Peter Zijlstra
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-19 11:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote:
> > +	addr = (addr & PMD_MASK) + SZ_1M;
> > +	__tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE);
> 
> Hmm, I don't think you've got the range correct here. Don't we want
> something like:
> 
> 	__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE)
> 
> to ensure that we flush on both sides of the 1M boundary?

Argh indeed. I confused {start,size} with {start,end}. Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-14 16:48   ` Will Deacon
@ 2018-09-19 11:33     ` Peter Zijlstra
  2018-09-19 11:51     ` Peter Zijlstra
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-19 11:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Fri, Sep 14, 2018 at 05:48:57PM +0100, Will Deacon wrote:

> > + *  - tlb_change_page_size()
> 
> This doesn't seem to exist in my tree.
> [since realised you rename to it in the next patch]
> 

> > + * Additionally there are a few opt-in features:
> > + *
> > + *  HAVE_MMU_GATHER_PAGE_SIZE
> > + *
> > + *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
> > + *  changes the size and provides mmu_gather::page_size to tlb_flush().
> 
> Ah, you add this later in the series. I think Nick reckoned we could get rid
> of this (the page_size field) eventually...

Right; let me fix that ordering..

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-14 16:48   ` Will Deacon
  2018-09-19 11:33     ` Peter Zijlstra
@ 2018-09-19 11:51     ` Peter Zijlstra
  2018-09-19 12:23       ` Will Deacon
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-19 11:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Fri, Sep 14, 2018 at 05:48:57PM +0100, Will Deacon wrote:

> > + *  - mmu_gather::fullmm
> > + *
> > + *    A flag set by tlb_gather_mmu() to indicate we're going to free
> > + *    the entire mm; this allows a number of optimizations.
> > + *
> > + *    XXX list optimizations
> 
> On arm64, we can elide the invalidation altogether because we won't
> re-allocate the ASID. We also have an invalidate-by-ASID (mm) instruction,
> which we could use if we needed to.

Right, but I was also struggling to put into words the normal fullmm
case.

I now ended up with:

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -82,7 +82,11 @@
  *    A flag set by tlb_gather_mmu() to indicate we're going to free
  *    the entire mm; this allows a number of optimizations.
  *
- *    XXX list optimizations
+ *    - We can ignore tlb_{start,end}_vma(); because we don't
+ *      care about ranges. Everything will be shot down.
+ *
+ *    - (RISC) architectures that use ASIDs can cycle to a new ASID
+ *      and delay the invalidation until ASID space runs out.
  *
  *  - mmu_gather::need_flush_all
  *

Does that about cover things; or do we need more?

> > + *
> > + *  - mmu_gather::need_flush_all
> > + *
> > + *    A flag that can be set by the arch code if it wants to force
> > + *    flush the entire TLB irrespective of the range. For instance
> > + *    x86-PAE needs this when changing top-level entries.
> > + *
> > + * And requires the architecture to provide and implement tlb_flush().
> > + *
> > + * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
> > + * use of:
> > + *
> > + *  - mmu_gather::start / mmu_gather::end
> > + *
> > + *    which (when !need_flush_all; fullmm will have start = end = ~0UL) provides
> > + *    the range that needs to be flushed to cover the pages to be freed.
> 
> I don't understand the mention of need_flush_all here -- I didn't think it
> was used by the core code at all.

The core does indeed not use that flag; but if the architecture set
that, the range is still ignored.

Can you suggest clearer wording?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-19 11:51     ` Peter Zijlstra
@ 2018-09-19 12:23       ` Will Deacon
  2018-09-19 13:12         ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2018-09-19 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Wed, Sep 19, 2018 at 01:51:58PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 14, 2018 at 05:48:57PM +0100, Will Deacon wrote:
> 
> > > + *  - mmu_gather::fullmm
> > > + *
> > > + *    A flag set by tlb_gather_mmu() to indicate we're going to free
> > > + *    the entire mm; this allows a number of optimizations.
> > > + *
> > > + *    XXX list optimizations
> > 
> > On arm64, we can elide the invalidation altogether because we won't
> > re-allocate the ASID. We also have an invalidate-by-ASID (mm) instruction,
> > which we could use if we needed to.
> 
> Right, but I was also struggling to put into words the normal fullmm
> case.
> 
> I now ended up with:
> 
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -82,7 +82,11 @@
>   *    A flag set by tlb_gather_mmu() to indicate we're going to free
>   *    the entire mm; this allows a number of optimizations.
>   *
> - *    XXX list optimizations
> + *    - We can ignore tlb_{start,end}_vma(); because we don't
> + *      care about ranges. Everything will be shot down.
> + *
> + *    - (RISC) architectures that use ASIDs can cycle to a new ASID
> + *      and delay the invalidation until ASID space runs out.
>   *
>   *  - mmu_gather::need_flush_all
>   *
> 
> Does that about cover things; or do we need more?

I think that's fine as a starting point. People can always add more.

> > > + *
> > > + *  - mmu_gather::need_flush_all
> > > + *
> > > + *    A flag that can be set by the arch code if it wants to force
> > > + *    flush the entire TLB irrespective of the range. For instance
> > > + *    x86-PAE needs this when changing top-level entries.
> > > + *
> > > + * And requires the architecture to provide and implement tlb_flush().
> > > + *
> > > + * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
> > > + * use of:
> > > + *
> > > + *  - mmu_gather::start / mmu_gather::end
> > > + *
> > > + *    which (when !need_flush_all; fullmm will have start = end = ~0UL) provides
> > > + *    the range that needs to be flushed to cover the pages to be freed.
> > 
> > I don't understand the mention of need_flush_all here -- I didn't think it
> > was used by the core code at all.
> 
> The core does indeed not use that flag; but if the architecture set
> that, the range is still ignored.
> 
> Can you suggest clearer wording?

The range is only ignored if the default tlb_flush() implementation is used
though, right? Since this text is about the fields that tlb_flush() can use,
I think we can just delete the part in brackets.

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather
  2018-09-19 11:28     ` Peter Zijlstra
@ 2018-09-19 12:24       ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2018-09-19 12:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Wed, Sep 19, 2018 at 01:28:29PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote:
> 
> > So whilst I was reviewing this, I realised that I think we should be
> > selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE.
> 
> Yes very much so. Let me invert that option, you normally want that,
> except if you don't natively use the linux page-tables.

Yeah, inverting this to be opt-out is definitely the safe thing to do.
Patch below looks good:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

> ---
> Subject: asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Sep 19 13:24:41 CEST 2018
> 
> Make issuing a TLB invalidate for page-table pages the normal case.
> 
> The reason is twofold:
> 
>  - too many invalidates is safer than too few,
>  - most architectures use the linux page-tables natively
>    and would this require this.
> 
> Make it an opt-out, instead of an opt-in.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
  2018-09-19 12:23       ` Will Deacon
@ 2018-09-19 13:12         ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2018-09-19 13:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens

On Wed, Sep 19, 2018 at 01:23:29PM +0100, Will Deacon wrote:

> > > > + *    which (when !need_flush_all; fullmm will have start = end = ~0UL) provides
> > > > + *    the range that needs to be flushed to cover the pages to be freed.
> > > 
> > > I don't understand the mention of need_flush_all here -- I didn't think it
> > > was used by the core code at all.
> > 
> > The core does indeed not use that flag; but if the architecture set
> > that, the range is still ignored.
> > 
> > Can you suggest clearer wording?
> 
> The range is only ignored if the default tlb_flush() implementation is used
> though, right? Since this text is about the fields that tlb_flush() can use,
> I think we can just delete the part in brackets.

Well, any architecture that actually uses need_flush_all will obviously
require a tlb_flush implementation that looks at it.

But OK, I'll remove the note.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-09-19 13:13 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13  9:21 [RFC][PATCH 00/11] my generic mmu_gather patches Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment Peter Zijlstra
2018-09-13 10:30   ` Martin Schwidefsky
2018-09-13 10:57     ` Peter Zijlstra
2018-09-13 12:18       ` Martin Schwidefsky
2018-09-13 12:39         ` Peter Zijlstra
2018-09-14 10:28           ` Martin Schwidefsky
2018-09-14 13:02             ` Peter Zijlstra
2018-09-14 14:07               ` Martin Schwidefsky
2018-09-14 16:48   ` Will Deacon
2018-09-19 11:33     ` Peter Zijlstra
2018-09-19 11:51     ` Peter Zijlstra
2018-09-19 12:23       ` Will Deacon
2018-09-19 13:12         ` Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 02/11] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
2018-09-14 16:56   ` Will Deacon
2018-09-13  9:21 ` [RFC][PATCH 03/11] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
2018-09-13 17:22   ` Dave Hansen
2018-09-13 18:42     ` Peter Zijlstra
2018-09-13 18:46       ` Peter Zijlstra
2018-09-13 18:48         ` Peter Zijlstra
2018-09-13 18:49         ` Dave Hansen
2018-09-13 18:47       ` Dave Hansen
2018-09-13  9:21 ` [RFC][PATCH 04/11] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
2018-09-14 16:56   ` Will Deacon
2018-09-13  9:21 ` [RFC][PATCH 05/11] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
2018-09-13 13:09   ` Jann Horn
2018-09-13 14:06     ` Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 06/11] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
2018-09-14 16:57   ` Will Deacon
2018-09-13  9:21 ` [RFC][PATCH 07/11] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
2018-09-18 14:10   ` Will Deacon
2018-09-19 11:28     ` Peter Zijlstra
2018-09-19 12:24       ` Will Deacon
2018-09-19 11:30     ` Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 08/11] ia64/tlb: Conver " Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 09/11] sh/tlb: Convert SH " Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 10/11] um/tlb: Convert " Peter Zijlstra
2018-09-13  9:21 ` [RFC][PATCH 11/11] arch/tlb: Clean up simple architectures Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).