linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] my generic mmu_gather patches
@ 2018-09-26 11:36 Peter Zijlstra
  2018-09-26 11:36 ` Peter Zijlstra
                   ` (20 more replies)
  0 siblings, 21 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, fengguang.wu

Hi,

Here is my current stash of generic mmu_gather patches that goes on top of Will's
tlb patches:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tlb/asm-generic

And they include the s390 patches done by Heiko. At the end of this, there is
not a single arch left with a custom mmu_gather.

I've been slow posting these, because the 0-day bot seems to be having trouble
and I've not been getting the regular cross-build green light emails that I
otherwise rely upon.

I hope to have addressed all the feedback from the last time, and I've added a
bunch of missing Cc's from last time.

Please review with care.

---
 arch/Kconfig                      |   8 +-
 arch/alpha/include/asm/tlb.h      |   2 -
 arch/arc/include/asm/tlb.h        |  32 -----
 arch/arm/include/asm/tlb.h        | 256 +++----------------------------------
 arch/arm64/Kconfig                |   1 -
 arch/arm64/include/asm/tlb.h      |   1 +
 arch/c6x/include/asm/tlb.h        |   1 +
 arch/h8300/include/asm/tlb.h      |   2 -
 arch/hexagon/include/asm/tlb.h    |  12 --
 arch/ia64/include/asm/tlb.h       | 257 +-------------------------------------
 arch/ia64/include/asm/tlbflush.h  |  25 ++++
 arch/ia64/mm/tlb.c                |  23 +++-
 arch/m68k/include/asm/tlb.h       |   1 -
 arch/microblaze/include/asm/tlb.h |   4 +-
 arch/mips/include/asm/tlb.h       |  17 ---
 arch/nds32/include/asm/tlb.h      |  16 ---
 arch/nios2/include/asm/tlb.h      |  14 +--
 arch/openrisc/include/asm/tlb.h   |   6 +-
 arch/parisc/include/asm/tlb.h     |  18 ---
 arch/powerpc/Kconfig              |   2 +
 arch/powerpc/include/asm/tlb.h    |  18 +--
 arch/riscv/include/asm/tlb.h      |   1 +
 arch/s390/Kconfig                 |   2 +
 arch/s390/include/asm/tlb.h       | 130 ++++++-------------
 arch/s390/mm/pgalloc.c            |  63 +---------
 arch/sh/include/asm/pgalloc.h     |   9 ++
 arch/sh/include/asm/tlb.h         | 132 +-------------------
 arch/sparc/Kconfig                |   1 +
 arch/sparc/include/asm/tlb_32.h   |  18 ---
 arch/um/include/asm/tlb.h         | 158 +----------------------
 arch/unicore32/include/asm/tlb.h  |  10 +-
 arch/x86/Kconfig                  |   1 -
 arch/x86/include/asm/tlb.h        |  22 ++--
 arch/x86/include/asm/tlbflush.h   |  12 +-
 arch/x86/mm/tlb.c                 |  17 ++-
 arch/xtensa/include/asm/tlb.h     |  26 ----
 include/asm-generic/tlb.h         | 238 +++++++++++++++++++++++++++++++----
 mm/huge_memory.c                  |   4 +-
 mm/hugetlb.c                      |   2 +-
 mm/madvise.c                      |   2 +-
 mm/memory.c                       |   6 +-
 mm/mmu_gather.c                   | 129 ++++++++++---------
 mm/pgtable-generic.c              |   1 +
 43 files changed, 460 insertions(+), 1240 deletions(-)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 00/18] my generic mmu_gather patches
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 01/18] asm-generic/tlb: Provide a comment Peter Zijlstra
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, fengguang.wu

Hi,

Here is my current stash of generic mmu_gather patches that goes on top of Will's
tlb patches:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tlb/asm-generic

And they include the s390 patches done by Heiko. At the end of this, there is
not a single arch left with a custom mmu_gather.

I've been slow posting these, because the 0-day bot seems to be having trouble
and I've not been getting the regular cross-build green light emails that I
otherwise rely upon.

I hope to have addressed all the feedback from the last time, and I've added a
bunch of missing Cc's from last time.

Please review with care.

---
 arch/Kconfig                      |   8 +-
 arch/alpha/include/asm/tlb.h      |   2 -
 arch/arc/include/asm/tlb.h        |  32 -----
 arch/arm/include/asm/tlb.h        | 256 +++----------------------------------
 arch/arm64/Kconfig                |   1 -
 arch/arm64/include/asm/tlb.h      |   1 +
 arch/c6x/include/asm/tlb.h        |   1 +
 arch/h8300/include/asm/tlb.h      |   2 -
 arch/hexagon/include/asm/tlb.h    |  12 --
 arch/ia64/include/asm/tlb.h       | 257 +-------------------------------------
 arch/ia64/include/asm/tlbflush.h  |  25 ++++
 arch/ia64/mm/tlb.c                |  23 +++-
 arch/m68k/include/asm/tlb.h       |   1 -
 arch/microblaze/include/asm/tlb.h |   4 +-
 arch/mips/include/asm/tlb.h       |  17 ---
 arch/nds32/include/asm/tlb.h      |  16 ---
 arch/nios2/include/asm/tlb.h      |  14 +--
 arch/openrisc/include/asm/tlb.h   |   6 +-
 arch/parisc/include/asm/tlb.h     |  18 ---
 arch/powerpc/Kconfig              |   2 +
 arch/powerpc/include/asm/tlb.h    |  18 +--
 arch/riscv/include/asm/tlb.h      |   1 +
 arch/s390/Kconfig                 |   2 +
 arch/s390/include/asm/tlb.h       | 130 ++++++-------------
 arch/s390/mm/pgalloc.c            |  63 +---------
 arch/sh/include/asm/pgalloc.h     |   9 ++
 arch/sh/include/asm/tlb.h         | 132 +-------------------
 arch/sparc/Kconfig                |   1 +
 arch/sparc/include/asm/tlb_32.h   |  18 ---
 arch/um/include/asm/tlb.h         | 158 +----------------------
 arch/unicore32/include/asm/tlb.h  |  10 +-
 arch/x86/Kconfig                  |   1 -
 arch/x86/include/asm/tlb.h        |  22 ++--
 arch/x86/include/asm/tlbflush.h   |  12 +-
 arch/x86/mm/tlb.c                 |  17 ++-
 arch/xtensa/include/asm/tlb.h     |  26 ----
 include/asm-generic/tlb.h         | 238 +++++++++++++++++++++++++++++++----
 mm/huge_memory.c                  |   4 +-
 mm/hugetlb.c                      |   2 +-
 mm/madvise.c                      |   2 +-
 mm/memory.c                       |   6 +-
 mm/mmu_gather.c                   | 129 ++++++++++---------
 mm/pgtable-generic.c              |   1 +
 43 files changed, 460 insertions(+), 1240 deletions(-)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 01/18] asm-generic/tlb: Provide a comment
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
  2018-09-26 11:36 ` Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Write a comment explaining some of this..

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |  119 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 116 insertions(+), 3 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -22,6 +22,118 @@
 
 #ifdef CONFIG_MMU
 
+/*
+ * Generic MMU-gather implementation.
+ *
+ * The mmu_gather data structure is used by the mm code to implement the
+ * correct and efficient ordering of freeing pages and TLB invalidations.
+ *
+ * This correct ordering is:
+ *
+ *  1) unhook page
+ *  2) TLB invalidate page
+ *  3) free page
+ *
+ * That is, we must never free a page before we have ensured there are no live
+ * translations left to it. Otherwise it might be possible to observe (or
+ * worse, change) the page content after it has been reused.
+ *
+ * The mmu_gather API consists of:
+ *
+ *  - tlb_gather_mmu() / tlb_finish_mmu(); start and finish a mmu_gather
+ *
+ *    Finish in particular will issue a (final) TLB invalidate and free
+ *    all (remaining) queued pages.
+ *
+ *  - tlb_start_vma() / tlb_end_vma(); marks the start / end of a VMA
+ *
+ *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
+ *    there's large holes between the VMAs.
+ *
+ *  - tlb_remove_page() / __tlb_remove_page()
+ *  - tlb_remove_page_size() / __tlb_remove_page_size()
+ *
+ *    __tlb_remove_page_size() is the basic primitive that queues a page for
+ *    freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
+ *    boolean indicating if the queue is (now) full and a call to
+ *    tlb_flush_mmu() is required.
+ *
+ *    tlb_remove_page() and tlb_remove_page_size() imply the call to
+ *    tlb_flush_mmu() when required and has no return value.
+ *
+ *  - tlb_remove_check_page_size_change()
+ *
+ *    call before __tlb_remove_page*() to set the current page-size; implies a
+ *    possible tlb_flush_mmu() call.
+ *
+ *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
+ *
+ *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
+ *                              related state, like the range)
+ *
+ *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
+ *			     sure no additional tlb_remove_page()
+ *			     calls happen between _tlbonly() and this.
+ *
+ *    tlb_flush_mmu() - the above two calls.
+ *
+ *  - mmu_gather::fullmm
+ *
+ *    A flag set by tlb_gather_mmu() to indicate we're going to free
+ *    the entire mm; this allows a number of optimizations.
+ *
+ *    - We can ignore tlb_{start,end}_vma(); because we don't
+ *      care about ranges. Everything will be shot down.
+ *
+ *    - (RISC) architectures that use ASIDs can cycle to a new ASID
+ *      and delay the invalidation until ASID space runs out.
+ *
+ *  - mmu_gather::need_flush_all
+ *
+ *    A flag that can be set by the arch code if it wants to force
+ *    flush the entire TLB irrespective of the range. For instance
+ *    x86-PAE needs this when changing top-level entries.
+ *
+ * And requires the architecture to provide and implement tlb_flush().
+ *
+ * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
+ * use of:
+ *
+ *  - mmu_gather::start / mmu_gather::end
+ *
+ *    which provides the range that needs to be flushed to cover the pages to
+ *    be freed.
+ *
+ *  - mmu_gather::freed_tables
+ *
+ *    set when we freed page table pages
+ *
+ *  - tlb_get_unmap_shift() / tlb_get_unmap_size()
+ *
+ *    returns the smallest TLB entry size unmapped in this range
+ *
+ * Additionally there are a few opt-in features:
+ *
+ *  HAVE_RCU_TABLE_FREE
+ *
+ *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
+ *  for page directores (__p*_free_tlb()). This provides separate freeing of
+ *  the page-table pages themselves in a semi-RCU fashion (see comment below).
+ *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
+ *  and therefore doesn't naturally serialize with software page-table walkers.
+ *
+ *  When used, an architecture is expected to provide __tlb_remove_table()
+ *  which does the actual freeing of these pages.
+ *
+ *  HAVE_RCU_TABLE_INVALIDATE
+ *
+ *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
+ *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
+ *  architecture uses the Linux page-tables natively.
+ *
+ */
+#define HAVE_GENERIC_MMU_GATHER
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -89,14 +201,17 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
+
 	unsigned long		start;
 	unsigned long		end;
 	/*
@@ -131,8 +246,6 @@ struct mmu_gather {
 	int page_size;
 };
 
-#define HAVE_GENERIC_MMU_GATHER
-
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
 	struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 01/18] asm-generic/tlb: Provide a comment
  2018-09-26 11:36 ` [PATCH 01/18] asm-generic/tlb: Provide a comment Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Write a comment explaining some of this..

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |  119 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 116 insertions(+), 3 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -22,6 +22,118 @@
 
 #ifdef CONFIG_MMU
 
+/*
+ * Generic MMU-gather implementation.
+ *
+ * The mmu_gather data structure is used by the mm code to implement the
+ * correct and efficient ordering of freeing pages and TLB invalidations.
+ *
+ * This correct ordering is:
+ *
+ *  1) unhook page
+ *  2) TLB invalidate page
+ *  3) free page
+ *
+ * That is, we must never free a page before we have ensured there are no live
+ * translations left to it. Otherwise it might be possible to observe (or
+ * worse, change) the page content after it has been reused.
+ *
+ * The mmu_gather API consists of:
+ *
+ *  - tlb_gather_mmu() / tlb_finish_mmu(); start and finish a mmu_gather
+ *
+ *    Finish in particular will issue a (final) TLB invalidate and free
+ *    all (remaining) queued pages.
+ *
+ *  - tlb_start_vma() / tlb_end_vma(); marks the start / end of a VMA
+ *
+ *    Defaults to flushing at tlb_end_vma() to reset the range; helps when
+ *    there's large holes between the VMAs.
+ *
+ *  - tlb_remove_page() / __tlb_remove_page()
+ *  - tlb_remove_page_size() / __tlb_remove_page_size()
+ *
+ *    __tlb_remove_page_size() is the basic primitive that queues a page for
+ *    freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
+ *    boolean indicating if the queue is (now) full and a call to
+ *    tlb_flush_mmu() is required.
+ *
+ *    tlb_remove_page() and tlb_remove_page_size() imply the call to
+ *    tlb_flush_mmu() when required and has no return value.
+ *
+ *  - tlb_remove_check_page_size_change()
+ *
+ *    call before __tlb_remove_page*() to set the current page-size; implies a
+ *    possible tlb_flush_mmu() call.
+ *
+ *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
+ *
+ *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
+ *                              related state, like the range)
+ *
+ *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
+ *			     sure no additional tlb_remove_page()
+ *			     calls happen between _tlbonly() and this.
+ *
+ *    tlb_flush_mmu() - the above two calls.
+ *
+ *  - mmu_gather::fullmm
+ *
+ *    A flag set by tlb_gather_mmu() to indicate we're going to free
+ *    the entire mm; this allows a number of optimizations.
+ *
+ *    - We can ignore tlb_{start,end}_vma(); because we don't
+ *      care about ranges. Everything will be shot down.
+ *
+ *    - (RISC) architectures that use ASIDs can cycle to a new ASID
+ *      and delay the invalidation until ASID space runs out.
+ *
+ *  - mmu_gather::need_flush_all
+ *
+ *    A flag that can be set by the arch code if it wants to force
+ *    flush the entire TLB irrespective of the range. For instance
+ *    x86-PAE needs this when changing top-level entries.
+ *
+ * And requires the architecture to provide and implement tlb_flush().
+ *
+ * tlb_flush() may, in addition to the above mentioned mmu_gather fields, make
+ * use of:
+ *
+ *  - mmu_gather::start / mmu_gather::end
+ *
+ *    which provides the range that needs to be flushed to cover the pages to
+ *    be freed.
+ *
+ *  - mmu_gather::freed_tables
+ *
+ *    set when we freed page table pages
+ *
+ *  - tlb_get_unmap_shift() / tlb_get_unmap_size()
+ *
+ *    returns the smallest TLB entry size unmapped in this range
+ *
+ * Additionally there are a few opt-in features:
+ *
+ *  HAVE_RCU_TABLE_FREE
+ *
+ *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
+ *  for page directores (__p*_free_tlb()). This provides separate freeing of
+ *  the page-table pages themselves in a semi-RCU fashion (see comment below).
+ *  Useful if your architecture doesn't use IPIs for remote TLB invalidates
+ *  and therefore doesn't naturally serialize with software page-table walkers.
+ *
+ *  When used, an architecture is expected to provide __tlb_remove_table()
+ *  which does the actual freeing of these pages.
+ *
+ *  HAVE_RCU_TABLE_INVALIDATE
+ *
+ *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
+ *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
+ *  architecture uses the Linux page-tables natively.
+ *
+ */
+#define HAVE_GENERIC_MMU_GATHER
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -89,14 +201,17 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
+
 	unsigned long		start;
 	unsigned long		end;
 	/*
@@ -131,8 +246,6 @@ struct mmu_gather {
 	int page_size;
 };
 
-#define HAVE_GENERIC_MMU_GATHER
-
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
 	struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
  2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 01/18] asm-generic/tlb: Provide a comment Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Move the mmu_gather::page_size things into the generic code instead of
powerpc specific bits.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig                   |    3 +++
 arch/arm/include/asm/tlb.h     |    3 +--
 arch/ia64/include/asm/tlb.h    |    3 +--
 arch/powerpc/Kconfig           |    1 +
 arch/powerpc/include/asm/tlb.h |   17 -----------------
 arch/s390/include/asm/tlb.h    |    4 +---
 arch/sh/include/asm/tlb.h      |    4 +---
 arch/um/include/asm/tlb.h      |    4 +---
 include/asm-generic/tlb.h      |   32 +++++++++++++++++++-------------
 mm/huge_memory.c               |    4 ++--
 mm/hugetlb.c                   |    2 +-
 mm/madvise.c                   |    2 +-
 mm/memory.c                    |    4 ++--
 mm/mmu_gather.c                |    5 +++++
 14 files changed, 39 insertions(+), 49 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -365,6 +365,9 @@ config HAVE_RCU_TABLE_FREE
 config HAVE_RCU_TABLE_INVALIDATE
 	bool
 
+config HAVE_MMU_GATHER_PAGE_SIZE
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -286,8 +286,7 @@ tlb_remove_pmd_tlb_entry(struct mmu_gath
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -282,8 +282,7 @@ do {							\
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -27,7 +27,6 @@
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
 
@@ -46,22 +45,6 @@ static inline void __tlb_remove_tlb_entr
 #endif
 }
 
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-	if (!tlb->page_size)
-		tlb->page_size = page_size;
-	else if (tlb->page_size != page_size) {
-		if (!tlb->fullmm)
-			tlb_flush_mmu(tlb);
-		/*
-		 * update the page size after flush for the new
-		 * mmu_gather.
-		 */
-		tlb->page_size = page_size;
-	}
-}
-
 #ifdef CONFIG_SMP
 static inline int mm_is_core_local(struct mm_struct *mm)
 {
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -180,9 +180,7 @@ static inline void pud_free_tlb(struct m
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -127,9 +127,7 @@ static inline void tlb_remove_page_size(
 	return tlb_remove_page(tlb, page);
 }
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -146,9 +146,7 @@ static inline void tlb_remove_page_size(
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,7 +61,7 @@
  *    tlb_remove_page() and tlb_remove_page_size() imply the call to
  *    tlb_flush_mmu() when required and has no return value.
  *
- *  - tlb_remove_check_page_size_change()
+ *  - tlb_change_page_size()
  *
  *    call before __tlb_remove_page*() to set the current page-size; implies a
  *    possible tlb_flush_mmu() call.
@@ -110,6 +110,11 @@
  *
  * Additionally there are a few opt-in features:
  *
+ *  HAVE_MMU_GATHER_PAGE_SIZE
+ *
+ *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
+ *  changes the size and provides mmu_gather::page_size to tlb_flush().
+ *
  *  HAVE_RCU_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
@@ -235,11 +240,15 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	unsigned int		batch_count;
+
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
-	unsigned int		batch_count;
-	int page_size;
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	unsigned int page_size;
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -305,21 +314,18 @@ static inline void tlb_remove_page(struc
 	return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
-#ifndef tlb_remove_check_page_size_change
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
-	/*
-	 * We don't care about page size change, just update
-	 * mmu_gather page size here so that debug checks
-	 * doesn't throw false warning.
-	 */
-#ifdef CONFIG_DEBUG_VM
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	if (tlb->page_size && tlb->page_size != page_size) {
+		if (!tlb->fullmm)
+			tlb_flush_mmu(tlb);
+	}
+
 	tlb->page_size = page_size;
 #endif
 }
-#endif
 
 static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
 {
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1617,7 +1617,7 @@ bool madvise_free_huge_pmd(struct mmu_ga
 	struct mm_struct *mm = tlb->mm;
 	bool ret = false;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
@@ -1693,7 +1693,7 @@ int zap_huge_pmd(struct mmu_gather *tlb,
 	pmd_t orig_pmd;
 	spinlock_t *ptl;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = __pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3337,7 +3337,7 @@ void __unmap_hugepage_range(struct mmu_g
 	 * This is a hugetlb vma, all the pte entries should point
 	 * to huge page.
 	 */
-	tlb_remove_check_page_size_change(tlb, sz);
+	tlb_change_page_size(tlb, sz);
 	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -328,7 +328,7 @@ static int madvise_free_pte_range(pmd_t
 	if (pmd_trans_unstable(pmd))
 		return 0;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	flush_tlb_batched_pending(mm);
 	arch_enter_lazy_mmu_mode();
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -355,7 +355,7 @@ void free_pgd_range(struct mmu_gather *t
 	 * We add page table cache pages with PAGE_SIZE,
 	 * (see pte_free_tlb()), flush the tlb if we need
 	 */
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	pgd = pgd_offset(tlb->mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1046,7 +1046,7 @@ static unsigned long zap_pte_range(struc
 	pte_t *pte;
 	swp_entry_t entry;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 again:
 	init_rss_vec(rss);
 	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -58,7 +58,9 @@ void arch_tlb_gather_mmu(struct mmu_gath
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
+#endif
 
 	__tlb_reset_range(tlb);
 }
@@ -121,7 +123,10 @@ bool __tlb_remove_page_size(struct mmu_g
 	struct mmu_gather_batch *batch;
 
 	VM_BUG_ON(!tlb->end);
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	VM_WARN_ON(tlb->page_size != page_size);
+#endif
 
 	batch = tlb->active;
 	/*

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE
  2018-09-26 11:36 ` [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Move the mmu_gather::page_size things into the generic code instead of
powerpc specific bits.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig                   |    3 +++
 arch/arm/include/asm/tlb.h     |    3 +--
 arch/ia64/include/asm/tlb.h    |    3 +--
 arch/powerpc/Kconfig           |    1 +
 arch/powerpc/include/asm/tlb.h |   17 -----------------
 arch/s390/include/asm/tlb.h    |    4 +---
 arch/sh/include/asm/tlb.h      |    4 +---
 arch/um/include/asm/tlb.h      |    4 +---
 include/asm-generic/tlb.h      |   32 +++++++++++++++++++-------------
 mm/huge_memory.c               |    4 ++--
 mm/hugetlb.c                   |    2 +-
 mm/madvise.c                   |    2 +-
 mm/memory.c                    |    4 ++--
 mm/mmu_gather.c                |    5 +++++
 14 files changed, 39 insertions(+), 49 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -365,6 +365,9 @@ config HAVE_RCU_TABLE_FREE
 config HAVE_RCU_TABLE_INVALIDATE
 	bool
 
+config HAVE_MMU_GATHER_PAGE_SIZE
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -286,8 +286,7 @@ tlb_remove_pmd_tlb_entry(struct mmu_gath
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -282,8 +282,7 @@ do {							\
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
 }
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -27,7 +27,6 @@
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
 
@@ -46,22 +45,6 @@ static inline void __tlb_remove_tlb_entr
 #endif
 }
 
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-	if (!tlb->page_size)
-		tlb->page_size = page_size;
-	else if (tlb->page_size != page_size) {
-		if (!tlb->fullmm)
-			tlb_flush_mmu(tlb);
-		/*
-		 * update the page size after flush for the new
-		 * mmu_gather.
-		 */
-		tlb->page_size = page_size;
-	}
-}
-
 #ifdef CONFIG_SMP
 static inline int mm_is_core_local(struct mm_struct *mm)
 {
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -180,9 +180,7 @@ static inline void pud_free_tlb(struct m
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -127,9 +127,7 @@ static inline void tlb_remove_page_size(
 	return tlb_remove_page(tlb, page);
 }
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -146,9 +146,7 @@ static inline void tlb_remove_page_size(
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
 	tlb_remove_tlb_entry(tlb, ptep, address)
 
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
-						     unsigned int page_size)
+static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
 {
 }
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,7 +61,7 @@
  *    tlb_remove_page() and tlb_remove_page_size() imply the call to
  *    tlb_flush_mmu() when required and has no return value.
  *
- *  - tlb_remove_check_page_size_change()
+ *  - tlb_change_page_size()
  *
  *    call before __tlb_remove_page*() to set the current page-size; implies a
  *    possible tlb_flush_mmu() call.
@@ -110,6 +110,11 @@
  *
  * Additionally there are a few opt-in features:
  *
+ *  HAVE_MMU_GATHER_PAGE_SIZE
+ *
+ *  This ensures we call tlb_flush() every time tlb_change_page_size() actually
+ *  changes the size and provides mmu_gather::page_size to tlb_flush().
+ *
  *  HAVE_RCU_TABLE_FREE
  *
  *  This provides tlb_remove_table(), to be used instead of tlb_remove_page()
@@ -235,11 +240,15 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	unsigned int		batch_count;
+
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
-	unsigned int		batch_count;
-	int page_size;
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	unsigned int page_size;
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -305,21 +314,18 @@ static inline void tlb_remove_page(struc
 	return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
-#ifndef tlb_remove_check_page_size_change
-#define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
-static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
+static inline void tlb_change_page_size(struct mmu_gather *tlb,
 						     unsigned int page_size)
 {
-	/*
-	 * We don't care about page size change, just update
-	 * mmu_gather page size here so that debug checks
-	 * doesn't throw false warning.
-	 */
-#ifdef CONFIG_DEBUG_VM
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	if (tlb->page_size && tlb->page_size != page_size) {
+		if (!tlb->fullmm)
+			tlb_flush_mmu(tlb);
+	}
+
 	tlb->page_size = page_size;
 #endif
 }
-#endif
 
 static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
 {
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1617,7 +1617,7 @@ bool madvise_free_huge_pmd(struct mmu_ga
 	struct mm_struct *mm = tlb->mm;
 	bool ret = false;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
@@ -1693,7 +1693,7 @@ int zap_huge_pmd(struct mmu_gather *tlb,
 	pmd_t orig_pmd;
 	spinlock_t *ptl;
 
-	tlb_remove_check_page_size_change(tlb, HPAGE_PMD_SIZE);
+	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
 	ptl = __pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3337,7 +3337,7 @@ void __unmap_hugepage_range(struct mmu_g
 	 * This is a hugetlb vma, all the pte entries should point
 	 * to huge page.
 	 */
-	tlb_remove_check_page_size_change(tlb, sz);
+	tlb_change_page_size(tlb, sz);
 	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -328,7 +328,7 @@ static int madvise_free_pte_range(pmd_t
 	if (pmd_trans_unstable(pmd))
 		return 0;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	flush_tlb_batched_pending(mm);
 	arch_enter_lazy_mmu_mode();
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -355,7 +355,7 @@ void free_pgd_range(struct mmu_gather *t
 	 * We add page table cache pages with PAGE_SIZE,
 	 * (see pte_free_tlb()), flush the tlb if we need
 	 */
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 	pgd = pgd_offset(tlb->mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1046,7 +1046,7 @@ static unsigned long zap_pte_range(struc
 	pte_t *pte;
 	swp_entry_t entry;
 
-	tlb_remove_check_page_size_change(tlb, PAGE_SIZE);
+	tlb_change_page_size(tlb, PAGE_SIZE);
 again:
 	init_rss_vec(rss);
 	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -58,7 +58,9 @@ void arch_tlb_gather_mmu(struct mmu_gath
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	tlb->page_size = 0;
+#endif
 
 	__tlb_reset_range(tlb);
 }
@@ -121,7 +123,10 @@ bool __tlb_remove_page_size(struct mmu_g
 	struct mmu_gather_batch *batch;
 
 	VM_BUG_ON(!tlb->end);
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	VM_WARN_ON(tlb->page_size != page_size);
+#endif
 
 	batch = tlb->active;
 	/*

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (2 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Dave Hansen

Use the new tlb_get_unmap_shift() to determine the stride of the
INVLPG loop.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/tlb.h      |   21 ++++++++++++++-------
 arch/x86/include/asm/tlbflush.h |   12 ++++++++----
 arch/x86/mm/tlb.c               |   17 ++++++++---------
 mm/pgtable-generic.c            |    1 +
 4 files changed, 31 insertions(+), 20 deletions(-)

--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,16 +6,23 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb)							\
-{									\
-	if (!tlb->fullmm && !tlb->need_flush_all) 			\
-		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end, 0UL);	\
-	else								\
-		flush_tlb_mm_range(tlb->mm, 0UL, TLB_FLUSH_ALL, 0UL);	\
-}
+static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
 
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	unsigned long start = 0UL, end = TLB_FLUSH_ALL;
+	unsigned int stride_shift = tlb_get_unmap_shift(tlb);
+
+	if (!tlb->fullmm && !tlb->need_flush_all) {
+		start = tlb->start;
+		end = tlb->end;
+	}
+
+	flush_tlb_mm_range(tlb->mm, start, end, stride_shift);
+}
+
 /*
  * While x86 architecture in general requires an IPI to perform TLB
  * shootdown, enablement code for several hypervisors overrides
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -547,23 +547,27 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
+	unsigned int		stride_shift;
 };
 
 #define local_flush_tlb() __flush_tlb()
 
 #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
 
-#define flush_tlb_range(vma, start, end)	\
-		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
+#define flush_tlb_range(vma, start, end)				\
+	flush_tlb_mm_range((vma)->vm_mm, start, end,			\
+			   ((vma)->vm_flags & VM_HUGETLB)		\
+				? huge_page_shift(hstate_vma(vma))	\
+				: PAGE_SHIFT)
 
 extern void flush_tlb_all(void);
 extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag);
+				unsigned long end, unsigned int stride_shift);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
 {
-	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, VM_NONE);
+	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT);
 }
 
 void native_flush_tlb_others(const struct cpumask *cpumask,
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -528,17 +528,16 @@ static void flush_tlb_func_common(const
 	    f->new_tlb_gen == local_tlb_gen + 1 &&
 	    f->new_tlb_gen == mm_tlb_gen) {
 		/* Partial flush */
-		unsigned long addr;
-		unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
+		unsigned long nr_invalidate = (f->end - f->start) >> f->stride_shift;
+		unsigned long addr = f->start;
 
-		addr = f->start;
 		while (addr < f->end) {
 			__flush_tlb_one_user(addr);
-			addr += PAGE_SIZE;
+			addr += 1UL << f->stride_shift;
 		}
 		if (local)
-			count_vm_tlb_events(NR_TLB_LOCAL_FLUSH_ONE, nr_pages);
-		trace_tlb_flush(reason, nr_pages);
+			count_vm_tlb_events(NR_TLB_LOCAL_FLUSH_ONE, nr_invalidate);
+		trace_tlb_flush(reason, nr_invalidate);
 	} else {
 		/* Full flush. */
 		local_flush_tlb();
@@ -623,12 +622,13 @@ void native_flush_tlb_others(const struc
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag)
+				unsigned long end, unsigned int stride_shift)
 {
 	int cpu;
 
 	struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
 		.mm = mm,
+		.stride_shift = stride_shift,
 	};
 
 	cpu = get_cpu();
@@ -638,8 +638,7 @@ void flush_tlb_mm_range(struct mm_struct
 
 	/* Should we flush just the requested range? */
 	if ((end != TLB_FLUSH_ALL) &&
-	    !(vmflag & VM_HUGETLB) &&
-	    ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
+	    ((end - start) >> stride_shift) <= tlb_single_page_flush_ceiling) {
 		info.start = start;
 		info.end = end;
 	} else {
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/pagemap.h>
+#include <linux/hugetlb.h>
 #include <asm/tlb.h>
 #include <asm-generic/pgtable.h>
 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range()
  2018-09-26 11:36 ` [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Dave Hansen

Use the new tlb_get_unmap_shift() to determine the stride of the
INVLPG loop.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/tlb.h      |   21 ++++++++++++++-------
 arch/x86/include/asm/tlbflush.h |   12 ++++++++----
 arch/x86/mm/tlb.c               |   17 ++++++++---------
 mm/pgtable-generic.c            |    1 +
 4 files changed, 31 insertions(+), 20 deletions(-)

--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,16 +6,23 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb)							\
-{									\
-	if (!tlb->fullmm && !tlb->need_flush_all) 			\
-		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end, 0UL);	\
-	else								\
-		flush_tlb_mm_range(tlb->mm, 0UL, TLB_FLUSH_ALL, 0UL);	\
-}
+static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
 
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	unsigned long start = 0UL, end = TLB_FLUSH_ALL;
+	unsigned int stride_shift = tlb_get_unmap_shift(tlb);
+
+	if (!tlb->fullmm && !tlb->need_flush_all) {
+		start = tlb->start;
+		end = tlb->end;
+	}
+
+	flush_tlb_mm_range(tlb->mm, start, end, stride_shift);
+}
+
 /*
  * While x86 architecture in general requires an IPI to perform TLB
  * shootdown, enablement code for several hypervisors overrides
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -547,23 +547,27 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
+	unsigned int		stride_shift;
 };
 
 #define local_flush_tlb() __flush_tlb()
 
 #define flush_tlb_mm(mm)	flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL)
 
-#define flush_tlb_range(vma, start, end)	\
-		flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
+#define flush_tlb_range(vma, start, end)				\
+	flush_tlb_mm_range((vma)->vm_mm, start, end,			\
+			   ((vma)->vm_flags & VM_HUGETLB)		\
+				? huge_page_shift(hstate_vma(vma))	\
+				: PAGE_SHIFT)
 
 extern void flush_tlb_all(void);
 extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag);
+				unsigned long end, unsigned int stride_shift);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
 {
-	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, VM_NONE);
+	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT);
 }
 
 void native_flush_tlb_others(const struct cpumask *cpumask,
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -528,17 +528,16 @@ static void flush_tlb_func_common(const
 	    f->new_tlb_gen == local_tlb_gen + 1 &&
 	    f->new_tlb_gen == mm_tlb_gen) {
 		/* Partial flush */
-		unsigned long addr;
-		unsigned long nr_pages = (f->end - f->start) >> PAGE_SHIFT;
+		unsigned long nr_invalidate = (f->end - f->start) >> f->stride_shift;
+		unsigned long addr = f->start;
 
-		addr = f->start;
 		while (addr < f->end) {
 			__flush_tlb_one_user(addr);
-			addr += PAGE_SIZE;
+			addr += 1UL << f->stride_shift;
 		}
 		if (local)
-			count_vm_tlb_events(NR_TLB_LOCAL_FLUSH_ONE, nr_pages);
-		trace_tlb_flush(reason, nr_pages);
+			count_vm_tlb_events(NR_TLB_LOCAL_FLUSH_ONE, nr_invalidate);
+		trace_tlb_flush(reason, nr_invalidate);
 	} else {
 		/* Full flush. */
 		local_flush_tlb();
@@ -623,12 +622,13 @@ void native_flush_tlb_others(const struc
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned long vmflag)
+				unsigned long end, unsigned int stride_shift)
 {
 	int cpu;
 
 	struct flush_tlb_info info __aligned(SMP_CACHE_BYTES) = {
 		.mm = mm,
+		.stride_shift = stride_shift,
 	};
 
 	cpu = get_cpu();
@@ -638,8 +638,7 @@ void flush_tlb_mm_range(struct mm_struct
 
 	/* Should we flush just the requested range? */
 	if ((end != TLB_FLUSH_ALL) &&
-	    !(vmflag & VM_HUGETLB) &&
-	    ((end - start) >> PAGE_SHIFT) <= tlb_single_page_flush_ceiling) {
+	    ((end - start) >> stride_shift) <= tlb_single_page_flush_ceiling) {
 		info.start = start;
 		info.end = end;
 	} else {
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/pagemap.h>
+#include <linux/hugetlb.h>
 #include <asm/tlb.h>
 #include <asm-generic/pgtable.h>
 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (3 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, David Miller, Guan Xuetao

The one obvious thing SH and ARM want is a sensible default for
tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )

Avoid all VIPT architectures providing their own tlb_start_vma()
implementation and rely on architectures to provide a no-op
flush_cache_range() when it is not relevant.

The below makes tlb_start_vma() default to flush_cache_range(), which
should be right and sufficient. The only exceptions that I found where
(oddly):

  - m68k-mmu
  - sparc64
  - unicore

Those architectures appear to have flush_cache_range(), but their
current tlb_start_vma() does not call it.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arc/include/asm/tlb.h      |    9 ---------
 arch/mips/include/asm/tlb.h     |    9 ---------
 arch/nds32/include/asm/tlb.h    |    6 ------
 arch/nios2/include/asm/tlb.h    |   10 ----------
 arch/parisc/include/asm/tlb.h   |    5 -----
 arch/sparc/include/asm/tlb_32.h |    5 -----
 arch/xtensa/include/asm/tlb.h   |    9 ---------
 include/asm-generic/tlb.h       |   19 +++++++++++--------
 8 files changed, 11 insertions(+), 61 deletions(-)

--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -23,15 +23,6 @@ do {						\
  *
  * Note, read http://lkml.org/lkml/2004/1/15/6
  */
-#ifndef CONFIG_ARC_CACHE_VIPT_ALIASING
-#define tlb_start_vma(tlb, vma)
-#else
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while(0)
-#endif
 
 #define tlb_end_vma(tlb, vma)						\
 do {									\
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,15 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-/*
- * MIPS doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,12 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_start_vma(tlb,vma)						\
-	do {								\
-		if (!tlb->fullmm)					\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
 #define tlb_end_vma(tlb,vma)				\
 	do { 						\
 		if(!tlb->fullmm)			\
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -15,16 +15,6 @@
 
 extern void set_mmu_pid(unsigned long pid);
 
-/*
- * NiosII doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for the area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
-
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -7,11 +7,6 @@ do {	if ((tlb)->fullmm)		\
 		flush_tlb_mm((tlb)->mm);\
 } while (0)
 
-#define tlb_start_vma(tlb, vma) \
-do {	if (!(tlb)->fullmm)	\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
 #define tlb_end_vma(tlb, vma)	\
 do {	if (!(tlb)->fullmm)	\
 		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,11 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {								\
-	flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
 #define tlb_end_vma(tlb, vma) \
 do {								\
 	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -16,19 +16,10 @@
 
 #if (DCACHE_WAY_SIZE <= PAGE_SIZE)
 
-/* Note, read http://lkml.org/lkml/2004/1/15/6 */
-
-# define tlb_start_vma(tlb,vma)			do { } while (0)
 # define tlb_end_vma(tlb,vma)			do { } while (0)
 
 #else
 
-# define tlb_start_vma(tlb, vma)					      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_cache_range(vma, vma->vm_start, vma->vm_end);   \
-	} while(0)
-
 # define tlb_end_vma(tlb, vma)						      \
 	do {								      \
 		if (!tlb->fullmm)					      \
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,7 @@
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 
 #ifdef CONFIG_MMU
 
@@ -351,17 +352,19 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma) do { } while (0)
+#define tlb_start_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
+} while (0)
 #endif
 
-#define __tlb_end_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			tlb_flush_mmu_tlbonly(tlb);		\
-	} while (0)
-
 #ifndef tlb_end_vma
-#define tlb_end_vma	__tlb_end_vma
+#define tlb_end_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		tlb_flush_mmu_tlbonly(tlb);				\
+} while (0)
 #endif
 
 #ifndef __tlb_remove_tlb_entry

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush
  2018-09-26 11:36 ` [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, David Miller, Guan Xuetao

The one obvious thing SH and ARM want is a sensible default for
tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )

Avoid all VIPT architectures providing their own tlb_start_vma()
implementation and rely on architectures to provide a no-op
flush_cache_range() when it is not relevant.

The below makes tlb_start_vma() default to flush_cache_range(), which
should be right and sufficient. The only exceptions that I found where
(oddly):

  - m68k-mmu
  - sparc64
  - unicore

Those architectures appear to have flush_cache_range(), but their
current tlb_start_vma() does not call it.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arc/include/asm/tlb.h      |    9 ---------
 arch/mips/include/asm/tlb.h     |    9 ---------
 arch/nds32/include/asm/tlb.h    |    6 ------
 arch/nios2/include/asm/tlb.h    |   10 ----------
 arch/parisc/include/asm/tlb.h   |    5 -----
 arch/sparc/include/asm/tlb_32.h |    5 -----
 arch/xtensa/include/asm/tlb.h   |    9 ---------
 include/asm-generic/tlb.h       |   19 +++++++++++--------
 8 files changed, 11 insertions(+), 61 deletions(-)

--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -23,15 +23,6 @@ do {						\
  *
  * Note, read http://lkml.org/lkml/2004/1/15/6
  */
-#ifndef CONFIG_ARC_CACHE_VIPT_ALIASING
-#define tlb_start_vma(tlb, vma)
-#else
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while(0)
-#endif
 
 #define tlb_end_vma(tlb, vma)						\
 do {									\
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,15 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-/*
- * MIPS doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,12 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_start_vma(tlb,vma)						\
-	do {								\
-		if (!tlb->fullmm)					\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
 #define tlb_end_vma(tlb,vma)				\
 	do { 						\
 		if(!tlb->fullmm)			\
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -15,16 +15,6 @@
 
 extern void set_mmu_pid(unsigned long pid);
 
-/*
- * NiosII doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for the area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
-
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -7,11 +7,6 @@ do {	if ((tlb)->fullmm)		\
 		flush_tlb_mm((tlb)->mm);\
 } while (0)
 
-#define tlb_start_vma(tlb, vma) \
-do {	if (!(tlb)->fullmm)	\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
 #define tlb_end_vma(tlb, vma)	\
 do {	if (!(tlb)->fullmm)	\
 		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,11 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {								\
-	flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
 #define tlb_end_vma(tlb, vma) \
 do {								\
 	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -16,19 +16,10 @@
 
 #if (DCACHE_WAY_SIZE <= PAGE_SIZE)
 
-/* Note, read http://lkml.org/lkml/2004/1/15/6 */
-
-# define tlb_start_vma(tlb,vma)			do { } while (0)
 # define tlb_end_vma(tlb,vma)			do { } while (0)
 
 #else
 
-# define tlb_start_vma(tlb, vma)					      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_cache_range(vma, vma->vm_start, vma->vm_end);   \
-	} while(0)
-
 # define tlb_end_vma(tlb, vma)						      \
 	do {								      \
 		if (!tlb->fullmm)					      \
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,7 @@
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 
 #ifdef CONFIG_MMU
 
@@ -351,17 +352,19 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma) do { } while (0)
+#define tlb_start_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
+} while (0)
 #endif
 
-#define __tlb_end_vma(tlb, vma)					\
-	do {							\
-		if (!tlb->fullmm)				\
-			tlb_flush_mmu_tlbonly(tlb);		\
-	} while (0)
-
 #ifndef tlb_end_vma
-#define tlb_end_vma	__tlb_end_vma
+#define tlb_end_vma(tlb, vma)						\
+do {									\
+	if (!tlb->fullmm)						\
+		tlb_flush_mmu_tlbonly(tlb);				\
+} while (0)
 #endif
 
 #ifndef __tlb_remove_tlb_entry

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (4 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:53   ` Will Deacon
  2018-09-26 11:36 ` [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Provide a generic tlb_flush() implementation that relies on
flush_tlb_range(). This is a little awkward because flush_tlb_range()
assumes a VMA for range invalidation, but we no longer have one.

Audit of all flush_tlb_range() implementations shows only vma->vm_mm
and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
invalidates) and VM_HUGETLB (large TLB invalidate) are used.

Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
'fake' VMA.

This allows architectures that have a reasonably efficient
flush_tlb_range() to not require any additional effort.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/tlb.h   |    1 
 arch/powerpc/include/asm/tlb.h |    1 
 arch/riscv/include/asm/tlb.h   |    1 
 arch/x86/include/asm/tlb.h     |    1 
 include/asm-generic/tlb.h      |   80 +++++++++++++++++++++++++++++++++++------
 5 files changed, 74 insertions(+), 10 deletions(-)

--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -27,6 +27,7 @@ static inline void __tlb_remove_table(vo
 	free_page_and_swap_cache((struct page *)_table);
 }
 
+#define tlb_flush tlb_flush
 static void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,7 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 
+#define tlb_flush tlb_flush
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -18,6 +18,7 @@ struct mmu_gather;
 
 static void tlb_flush(struct mmu_gather *tlb);
 
+#define tlb_flush tlb_flush
 #include <asm-generic/tlb.h>
 
 static inline void tlb_flush(struct mmu_gather *tlb)
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,6 +6,7 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
+#define tlb_flush tlb_flush
 static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -241,6 +241,12 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	/*
+	 * tracks VM_EXEC | VM_HUGETLB in tlb_start_vma
+	 */
+	unsigned int		vma_exec : 1;
+	unsigned int		vma_huge : 1;
+
 	unsigned int		batch_count;
 
 	struct mmu_gather_batch *active;
@@ -282,7 +288,35 @@ static inline void __tlb_reset_range(str
 	tlb->cleared_pmds = 0;
 	tlb->cleared_puds = 0;
 	tlb->cleared_p4ds = 0;
+	/*
+	 * Do not reset mmu_gather::vma_* fields here, we do not
+	 * call into tlb_start_vma() again to set them if there is an
+	 * intermediate flush.
+	 */
+}
+
+#ifndef tlb_flush
+
+#if defined(tlb_start_vma) || defined(tlb_end_vma)
+#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
+#endif
+
+#define tlb_flush tlb_flush
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	if (tlb->fullmm || tlb->need_flush_all) {
+		flush_tlb_mm(tlb->mm);
+	} else {
+		struct vm_area_struct vma = {
+			.vm_mm = tlb->mm,
+			.vm_flags = (tlb->vma_exec ? VM_EXEC    : 0) |
+				    (tlb->vma_huge ? VM_HUGETLB : 0),
+		};
+
+		flush_tlb_range(&vma, tlb->start, tlb->end);
+	}
 }
+#endif
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
@@ -353,19 +387,45 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
+#define tlb_start_vma tlb_start_vma
+static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
+	 * mips-4k) flush only large pages.
+	 *
+	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
+	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
+	 * range.
+	 *
+	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
+	 * these values the batch is empty.
+	 */
+	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
+	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
+
+	flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
 #endif
 
 #ifndef tlb_end_vma
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		tlb_flush_mmu_tlbonly(tlb);				\
-} while (0)
+#define tlb_end_vma tlb_end_vma
+static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * Do a TLB flush and reset the range at VMA boundaries; this avoids
+	 * the ranges growing with the unused space between consecutive VMAs,
+	 * but also the mmu_gather::vma_* flags from tlb_start_vma() rely on
+	 * this.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+}
 #endif
 
 #ifndef __tlb_remove_tlb_entry

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 11:36 ` [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:53   ` Will Deacon
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Provide a generic tlb_flush() implementation that relies on
flush_tlb_range(). This is a little awkward because flush_tlb_range()
assumes a VMA for range invalidation, but we no longer have one.

Audit of all flush_tlb_range() implementations shows only vma->vm_mm
and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
invalidates) and VM_HUGETLB (large TLB invalidate) are used.

Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
'fake' VMA.

This allows architectures that have a reasonably efficient
flush_tlb_range() to not require any additional effort.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/tlb.h   |    1 
 arch/powerpc/include/asm/tlb.h |    1 
 arch/riscv/include/asm/tlb.h   |    1 
 arch/x86/include/asm/tlb.h     |    1 
 include/asm-generic/tlb.h      |   80 +++++++++++++++++++++++++++++++++++------
 5 files changed, 74 insertions(+), 10 deletions(-)

--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -27,6 +27,7 @@ static inline void __tlb_remove_table(vo
 	free_page_and_swap_cache((struct page *)_table);
 }
 
+#define tlb_flush tlb_flush
 static void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -28,6 +28,7 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 
+#define tlb_flush tlb_flush
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -18,6 +18,7 @@ struct mmu_gather;
 
 static void tlb_flush(struct mmu_gather *tlb);
 
+#define tlb_flush tlb_flush
 #include <asm-generic/tlb.h>
 
 static inline void tlb_flush(struct mmu_gather *tlb)
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -6,6 +6,7 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
+#define tlb_flush tlb_flush
 static inline void tlb_flush(struct mmu_gather *tlb);
 
 #include <asm-generic/tlb.h>
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -241,6 +241,12 @@ struct mmu_gather {
 	unsigned int		cleared_puds : 1;
 	unsigned int		cleared_p4ds : 1;
 
+	/*
+	 * tracks VM_EXEC | VM_HUGETLB in tlb_start_vma
+	 */
+	unsigned int		vma_exec : 1;
+	unsigned int		vma_huge : 1;
+
 	unsigned int		batch_count;
 
 	struct mmu_gather_batch *active;
@@ -282,7 +288,35 @@ static inline void __tlb_reset_range(str
 	tlb->cleared_pmds = 0;
 	tlb->cleared_puds = 0;
 	tlb->cleared_p4ds = 0;
+	/*
+	 * Do not reset mmu_gather::vma_* fields here, we do not
+	 * call into tlb_start_vma() again to set them if there is an
+	 * intermediate flush.
+	 */
+}
+
+#ifndef tlb_flush
+
+#if defined(tlb_start_vma) || defined(tlb_end_vma)
+#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
+#endif
+
+#define tlb_flush tlb_flush
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	if (tlb->fullmm || tlb->need_flush_all) {
+		flush_tlb_mm(tlb->mm);
+	} else {
+		struct vm_area_struct vma = {
+			.vm_mm = tlb->mm,
+			.vm_flags = (tlb->vma_exec ? VM_EXEC    : 0) |
+				    (tlb->vma_huge ? VM_HUGETLB : 0),
+		};
+
+		flush_tlb_range(&vma, tlb->start, tlb->end);
+	}
 }
+#endif
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
@@ -353,19 +387,45 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
+#define tlb_start_vma tlb_start_vma
+static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
+	 * mips-4k) flush only large pages.
+	 *
+	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
+	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
+	 * range.
+	 *
+	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
+	 * these values the batch is empty.
+	 */
+	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
+	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
+
+	flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
 #endif
 
 #ifndef tlb_end_vma
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		tlb_flush_mmu_tlbonly(tlb);				\
-} while (0)
+#define tlb_end_vma tlb_end_vma
+static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * Do a TLB flush and reset the range at VMA boundaries; this avoids
+	 * the ranges growing with the unused space between consecutive VMAs,
+	 * but also the mmu_gather::vma_* flags from tlb_start_vma() rely on
+	 * this.
+	 */
+	tlb_flush_mmu_tlbonly(tlb);
+}
 #endif
 
 #ifndef __tlb_remove_tlb_entry

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (5 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:53   ` Will Deacon
  2018-09-26 11:36 ` [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE Peter Zijlstra
                   ` (13 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Needed for ia64 -- alternatively we drop the entire hook.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -539,6 +539,8 @@ static inline void tlb_end_vma(struct mm
 
 #endif /* CONFIG_MMU */
 
+#ifndef tlb_migrate_finish
 #define tlb_migrate_finish(mm) do {} while (0)
+#endif
 
 #endif /* _ASM_GENERIC__TLB_H */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-26 11:36 ` [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:53   ` Will Deacon
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Needed for ia64 -- alternatively we drop the entire hook.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -539,6 +539,8 @@ static inline void tlb_end_vma(struct mm
 
 #endif /* CONFIG_MMU */
 
+#ifndef tlb_migrate_finish
 #define tlb_migrate_finish(mm) do {} while (0)
+#endif
 
 #endif /* _ASM_GENERIC__TLB_H */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (6 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 08/18] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would thus require this.

Make it an opt-out, instead of an opt-in.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    2 +-
 arch/arm64/Kconfig        |    1 -
 arch/powerpc/Kconfig      |    1 +
 arch/sparc/Kconfig        |    1 +
 arch/x86/Kconfig          |    1 -
 include/asm-generic/tlb.h |    9 +++++----
 mm/mmu_gather.c           |    2 +-
 7 files changed, 9 insertions(+), 8 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -362,7 +362,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
 	bool
 
 config HAVE_MMU_GATHER_PAGE_SIZE
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,7 +142,6 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RCU_TABLE_FREE
-	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if PARAVIRT
-	select HAVE_RCU_TABLE_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_STACKPROTECTOR		if CC_HAS_SANE_STACKPROTECTOR
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -127,11 +127,12 @@
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
- *  HAVE_RCU_TABLE_INVALIDATE
+ *  HAVE_RCU_TABLE_NO_INVALIDATE
  *
- *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
- *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
- *  architecture uses the Linux page-tables natively.
+ *  This makes HAVE_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
+ *  freeing the page-table pages. This can be avoided if you use
+ *  HAVE_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
+ *  page-tables natively.
  *
  */
 #define HAVE_GENERIC_MMU_GATHER
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -157,7 +157,7 @@ bool __tlb_remove_page_size(struct mmu_g
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
 	/*
 	 * Invalidate page-table caches used by hardware walkers. Then we still
 	 * need to RCU-sched wait while freeing the pages because software

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE
  2018-09-26 11:36 ` [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would thus require this.

Make it an opt-out, instead of an opt-in.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/Kconfig              |    2 +-
 arch/arm64/Kconfig        |    1 -
 arch/powerpc/Kconfig      |    1 +
 arch/sparc/Kconfig        |    1 +
 arch/x86/Kconfig          |    1 -
 include/asm-generic/tlb.h |    9 +++++----
 mm/mmu_gather.c           |    2 +-
 7 files changed, 9 insertions(+), 8 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -362,7 +362,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
 	bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
 	bool
 
 config HAVE_MMU_GATHER_PAGE_SIZE
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,7 +142,6 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RCU_TABLE_FREE
-	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_MMU_GATHER_PAGE_SIZE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if PPC64 && CPU_LITTLE_ENDIAN
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_DYNAMIC_FTRACE
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if PARAVIRT
-	select HAVE_RCU_TABLE_INVALIDATE	if HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_STACKPROTECTOR		if CC_HAS_SANE_STACKPROTECTOR
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -127,11 +127,12 @@
  *  When used, an architecture is expected to provide __tlb_remove_table()
  *  which does the actual freeing of these pages.
  *
- *  HAVE_RCU_TABLE_INVALIDATE
+ *  HAVE_RCU_TABLE_NO_INVALIDATE
  *
- *  This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing
- *  the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your
- *  architecture uses the Linux page-tables natively.
+ *  This makes HAVE_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before
+ *  freeing the page-table pages. This can be avoided if you use
+ *  HAVE_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux
+ *  page-tables natively.
  *
  */
 #define HAVE_GENERIC_MMU_GATHER
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -157,7 +157,7 @@ bool __tlb_remove_page_size(struct mmu_g
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
 	/*
 	 * Invalidate page-table caches used by hardware walkers. Then we still
 	 * need to RCU-sched wait while freeing the pages because software

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 08/18] arm/tlb: Convert to generic mmu_gather
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (7 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:54   ` Will Deacon
  2018-09-26 11:36 ` [PATCH 09/18] ia64/tlb: Conver " Peter Zijlstra
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Generic mmu_gather provides everything that ARM needs:

 - range tracking
 - RCU table free
 - VM_EXEC tracking
 - VIPT cache flushing

The one notable curiosity is the 'funny' range tracking for classical
ARM in __pte_free_tlb().

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
 1 file changed, 14 insertions(+), 241 deletions(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,270 +33,43 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-#define MMU_GATHER_BUNDLE	8
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 static inline void __tlb_remove_table(void *_table)
 {
 	free_page_and_swap_cache((struct page *)_table);
 }
 
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
-#else
-#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch	*batch;
-	unsigned int		need_flush;
-#endif
-	unsigned int		fullmm;
-	struct vm_area_struct	*vma;
-	unsigned long		start, end;
-	unsigned long		range_start;
-	unsigned long		range_end;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		**pages;
-	struct page		*local[MMU_GATHER_BUNDLE];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * This is unnecessarily complex.  There's three ways the TLB shootdown
- * code is used:
- *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
- *     tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.
- *  2. Unmapping all vmas.  See exit_mmap().
- *     tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.  Additionally, page tables will be freed.
- *  3. Unmapping argument pages.  See shift_arg_pages().
- *     tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
- *     tlb->vma will be NULL.
- */
-static inline void tlb_flush(struct mmu_gather *tlb)
-{
-	if (tlb->fullmm || !tlb->vma)
-		flush_tlb_mm(tlb->mm);
-	else if (tlb->range_end > 0) {
-		flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
-{
-	if (!tlb->fullmm) {
-		if (addr < tlb->range_start)
-			tlb->range_start = addr;
-		if (addr + PAGE_SIZE > tlb->range_end)
-			tlb->range_end = addr + PAGE_SIZE;
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(struct page *);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	tlb_flush(tlb);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	free_pages_and_swap_cache(tlb->pages, tlb->nr);
-	tlb->nr = 0;
-	if (tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->vma = NULL;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	__tlb_alloc_page(tlb);
+#include <asm-generic/tlb.h>
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
+#ifndef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
 #endif
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->range_start = start;
-		tlb->range_end = end;
-	}
-
-	tlb_flush_mmu(tlb);
 
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
 static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
-	tlb_add_flush(tlb, addr);
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm) {
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-		tlb->vma = vma;
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		tlb_flush(tlb);
-}
-
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-	unsigned long addr)
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
 	pgtable_page_dtor(pte);
 
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-#else
+#ifndef CONFIG_ARM_LPAE
 	/*
 	 * With the classic ARM MMU, a pte page has two corresponding pmd
 	 * entries, each covering 1MB.
 	 */
-	addr &= PMD_MASK;
-	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
-	tlb_add_flush(tlb, addr + SZ_1M);
+	addr = (addr & PMD_MASK) + SZ_1M;
+	__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
 #endif
 
-	tlb_remove_entry(tlb, pte);
-}
-
-static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
-				  unsigned long addr)
-{
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-	tlb_remove_entry(tlb, virt_to_page(pmdp));
-#endif
+	tlb_remove_table(tlb, pte);
 }
 
 static inline void
-tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
+__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
-	tlb_add_flush(tlb, addr);
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-static inline void tlb_flush_remove_tables(struct mm_struct *mm)
-{
-}
+#ifdef CONFIG_ARM_LPAE
+	struct page *page = virt_to_page(pmdp);
 
-static inline void tlb_flush_remove_tables_local(void *arg)
-{
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
+#endif
 }
 
 #endif /* CONFIG_MMU */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 08/18] arm/tlb: Convert to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 08/18] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:54   ` Will Deacon
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Generic mmu_gather provides everything that ARM needs:

 - range tracking
 - RCU table free
 - VM_EXEC tracking
 - VIPT cache flushing

The one notable curiosity is the 'funny' range tracking for classical
ARM in __pte_free_tlb().

Cc: Nick Piggin <npiggin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
 1 file changed, 14 insertions(+), 241 deletions(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,270 +33,43 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-#define MMU_GATHER_BUNDLE	8
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 static inline void __tlb_remove_table(void *_table)
 {
 	free_page_and_swap_cache((struct page *)_table);
 }
 
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
-#else
-#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
-#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch	*batch;
-	unsigned int		need_flush;
-#endif
-	unsigned int		fullmm;
-	struct vm_area_struct	*vma;
-	unsigned long		start, end;
-	unsigned long		range_start;
-	unsigned long		range_end;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		**pages;
-	struct page		*local[MMU_GATHER_BUNDLE];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * This is unnecessarily complex.  There's three ways the TLB shootdown
- * code is used:
- *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
- *     tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.
- *  2. Unmapping all vmas.  See exit_mmap().
- *     tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.  Additionally, page tables will be freed.
- *  3. Unmapping argument pages.  See shift_arg_pages().
- *     tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
- *     tlb->vma will be NULL.
- */
-static inline void tlb_flush(struct mmu_gather *tlb)
-{
-	if (tlb->fullmm || !tlb->vma)
-		flush_tlb_mm(tlb->mm);
-	else if (tlb->range_end > 0) {
-		flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
-{
-	if (!tlb->fullmm) {
-		if (addr < tlb->range_start)
-			tlb->range_start = addr;
-		if (addr + PAGE_SIZE > tlb->range_end)
-			tlb->range_end = addr + PAGE_SIZE;
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(struct page *);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	tlb_flush(tlb);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	free_pages_and_swap_cache(tlb->pages, tlb->nr);
-	tlb->nr = 0;
-	if (tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->vma = NULL;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	__tlb_alloc_page(tlb);
+#include <asm-generic/tlb.h>
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
+#ifndef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry)
 #endif
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->range_start = start;
-		tlb->range_end = end;
-	}
-
-	tlb_flush_mmu(tlb);
 
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Memorize the range for the TLB flush.
- */
 static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
-	tlb_add_flush(tlb, addr);
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm) {
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-		tlb->vma = vma;
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		tlb_flush(tlb);
-}
-
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-	unsigned long addr)
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
 	pgtable_page_dtor(pte);
 
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-#else
+#ifndef CONFIG_ARM_LPAE
 	/*
 	 * With the classic ARM MMU, a pte page has two corresponding pmd
 	 * entries, each covering 1MB.
 	 */
-	addr &= PMD_MASK;
-	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
-	tlb_add_flush(tlb, addr + SZ_1M);
+	addr = (addr & PMD_MASK) + SZ_1M;
+	__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
 #endif
 
-	tlb_remove_entry(tlb, pte);
-}
-
-static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
-				  unsigned long addr)
-{
-#ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
-	tlb_remove_entry(tlb, virt_to_page(pmdp));
-#endif
+	tlb_remove_table(tlb, pte);
 }
 
 static inline void
-tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
+__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
-	tlb_add_flush(tlb, addr);
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-static inline void tlb_flush_remove_tables(struct mm_struct *mm)
-{
-}
+#ifdef CONFIG_ARM_LPAE
+	struct page *page = virt_to_page(pmdp);
 
-static inline void tlb_flush_remove_tables_local(void *arg)
-{
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
+#endif
 }
 
 #endif /* CONFIG_MMU */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 09/18] ia64/tlb: Conver to generic mmu_gather
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (8 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 08/18] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 10/18] sh/tlb: Convert SH " Peter Zijlstra
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Tony Luck

Generic mmu_gather provides everything ia64 needs (range tracking).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/include/asm/tlb.h      |  256 ---------------------------------------
 arch/ia64/include/asm/tlbflush.h |   25 +++
 arch/ia64/mm/tlb.c               |   23 +++
 3 files changed, 47 insertions(+), 257 deletions(-)

--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -47,262 +47,8 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define	IA64_GATHER_BUNDLE	8
-
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		nr;
-	unsigned int		max;
-	unsigned char		fullmm;		/* non-zero means full mm flush */
-	unsigned char		need_flush;	/* really unmapped some PTEs? */
-	unsigned long		start, end;
-	unsigned long		start_addr;
-	unsigned long		end_addr;
-	struct page		**pages;
-	struct page		*local[IA64_GATHER_BUNDLE];
-};
-
-struct ia64_tr_entry {
-	u64 ifa;
-	u64 itir;
-	u64 pte;
-	u64 rr;
-}; /*Record for tr entry!*/
-
-extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
-extern void ia64_ptr_entry(u64 target_mask, int slot);
-
-extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
-
-/*
- region register macros
-*/
-#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
-#define RR_VE(val)	(((val) & 0x0000000000000001) << 0)
-#define RR_VE_MASK	0x0000000000000001L
-#define RR_VE_SHIFT	0
-#define RR_TO_PS(val)	(((val) >> 2) & 0x000000000000003f)
-#define RR_PS(val)	(((val) & 0x000000000000003f) << 2)
-#define RR_PS_MASK	0x00000000000000fcL
-#define RR_PS_SHIFT	2
-#define RR_RID_MASK	0x00000000ffffff00L
-#define RR_TO_RID(val) 	((val >> 8) & 0xffffff)
-
-static inline void
-ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb->need_flush = 0;
-
-	if (tlb->fullmm) {
-		/*
-		 * Tearing down the entire address space.  This happens both as a result
-		 * of exit() and execve().  The latter case necessitates the call to
-		 * flush_tlb_mm() here.
-		 */
-		flush_tlb_mm(tlb->mm);
-	} else if (unlikely (end - start >= 1024*1024*1024*1024UL
-			     || REGION_NUMBER(start) != REGION_NUMBER(end - 1)))
-	{
-		/*
-		 * If we flush more than a tera-byte or across regions, we're probably
-		 * better off just flushing the entire TLB(s).  This should be very rare
-		 * and is not worth optimizing for.
-		 */
-		flush_tlb_all();
-	} else {
-		/*
-		 * flush_tlb_range() takes a vma instead of a mm pointer because
-		 * some architectures want the vm_flags for ITLB/DTLB flush.
-		 */
-		struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
-
-		/* flush the address range from the tlb: */
-		flush_tlb_range(&vma, start, end);
-		/* now flush the virt. page-table area mapping the address range: */
-		flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
-	}
-
-}
-
-static inline void
-ia64_tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	unsigned long i;
-	unsigned int nr;
-
-	/* lastly, release the freed pages */
-	nr = tlb->nr;
-
-	tlb->nr = 0;
-	tlb->start_addr = ~0UL;
-	for (i = 0; i < nr; ++i)
-		free_page_and_swap_cache(tlb->pages[i]);
-}
-
-/*
- * Flush the TLB for address range START to END and, if not in fast mode, release the
- * freed pages that where gathered up to this point.
- */
-static inline void
-ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	if (!tlb->need_flush)
-		return;
-	ia64_tlb_flush_mmu_tlbonly(tlb, start, end);
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(void *);
-	}
-}
-
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->start_addr = ~0UL;
-}
-
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force)
-		tlb->need_flush = 1;
-	/*
-	 * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
-	 * tlb->end_addr.
-	 */
-	ia64_tlb_flush_mmu(tlb, start, end);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Logically, this routine frees PAGE.  On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-
-	if (!tlb->nr && tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_tlbonly(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS.  This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start_addr == ~0UL)
-		tlb->start_addr = address;
-	tlb->end_addr = address + PAGE_SIZE;
-}
-
 #define tlb_migrate_finish(mm)	platform_tlb_migrate_finish(mm)
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
-} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pte_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pmd_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pud_free_tlb(tlb, pudp, address);		\
-} while (0)
+#include <asm-generic/tlb.h>
 
 #endif /* _ASM_IA64_TLB_H */
--- a/arch/ia64/include/asm/tlbflush.h
+++ b/arch/ia64/include/asm/tlbflush.h
@@ -14,6 +14,31 @@
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 
+struct ia64_tr_entry {
+	u64 ifa;
+	u64 itir;
+	u64 pte;
+	u64 rr;
+}; /*Record for tr entry!*/
+
+extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
+extern void ia64_ptr_entry(u64 target_mask, int slot);
+extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
+
+/*
+ region register macros
+*/
+#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
+#define RR_VE(val)     (((val) & 0x0000000000000001) << 0)
+#define RR_VE_MASK     0x0000000000000001L
+#define RR_VE_SHIFT    0
+#define RR_TO_PS(val)  (((val) >> 2) & 0x000000000000003f)
+#define RR_PS(val)     (((val) & 0x000000000000003f) << 2)
+#define RR_PS_MASK     0x00000000000000fcL
+#define RR_PS_SHIFT    2
+#define RR_RID_MASK    0x00000000ffffff00L
+#define RR_TO_RID(val)         ((val >> 8) & 0xffffff)
+
 /*
  * Now for some TLB flushing routines.  This is the kind of stuff that
  * can be very expensive, so try to avoid them whenever possible.
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -297,8 +297,8 @@ local_flush_tlb_all (void)
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
 
-void
-flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
+static void
+__flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
 		 unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -335,6 +335,25 @@ flush_tlb_range (struct vm_area_struct *
 	preempt_enable();
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
+
+void flush_tlb_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
+{
+	if (unlikely(end - start >= 1024*1024*1024*1024UL
+			|| REGION_NUMBER(start) != REGION_NUMBER(end - 1))) {
+		/*
+		 * If we flush more than a tera-byte or across regions, we're
+		 * probably better off just flushing the entire TLB(s).  This
+		 * should be very rare and is not worth optimizing for.
+		 */
+		flush_tlb_all();
+	} else {
+		/* flush the address range from the tlb */
+		__flush_tlb_range(vma, start, end);
+		/* flush the virt. page-table area mapping the addr range */
+		__flush_tlb_range(vma, ia64_thash(start), ia64_thash(end));
+	}
+}
 EXPORT_SYMBOL(flush_tlb_range);
 
 void ia64_tlb_init(void)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 09/18] ia64/tlb: Conver to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 09/18] ia64/tlb: Conver " Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Tony Luck

Generic mmu_gather provides everything ia64 needs (range tracking).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/include/asm/tlb.h      |  256 ---------------------------------------
 arch/ia64/include/asm/tlbflush.h |   25 +++
 arch/ia64/mm/tlb.c               |   23 +++
 3 files changed, 47 insertions(+), 257 deletions(-)

--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -47,262 +47,8 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define	IA64_GATHER_BUNDLE	8
-
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		nr;
-	unsigned int		max;
-	unsigned char		fullmm;		/* non-zero means full mm flush */
-	unsigned char		need_flush;	/* really unmapped some PTEs? */
-	unsigned long		start, end;
-	unsigned long		start_addr;
-	unsigned long		end_addr;
-	struct page		**pages;
-	struct page		*local[IA64_GATHER_BUNDLE];
-};
-
-struct ia64_tr_entry {
-	u64 ifa;
-	u64 itir;
-	u64 pte;
-	u64 rr;
-}; /*Record for tr entry!*/
-
-extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
-extern void ia64_ptr_entry(u64 target_mask, int slot);
-
-extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
-
-/*
- region register macros
-*/
-#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
-#define RR_VE(val)	(((val) & 0x0000000000000001) << 0)
-#define RR_VE_MASK	0x0000000000000001L
-#define RR_VE_SHIFT	0
-#define RR_TO_PS(val)	(((val) >> 2) & 0x000000000000003f)
-#define RR_PS(val)	(((val) & 0x000000000000003f) << 2)
-#define RR_PS_MASK	0x00000000000000fcL
-#define RR_PS_SHIFT	2
-#define RR_RID_MASK	0x00000000ffffff00L
-#define RR_TO_RID(val) 	((val >> 8) & 0xffffff)
-
-static inline void
-ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb->need_flush = 0;
-
-	if (tlb->fullmm) {
-		/*
-		 * Tearing down the entire address space.  This happens both as a result
-		 * of exit() and execve().  The latter case necessitates the call to
-		 * flush_tlb_mm() here.
-		 */
-		flush_tlb_mm(tlb->mm);
-	} else if (unlikely (end - start >= 1024*1024*1024*1024UL
-			     || REGION_NUMBER(start) != REGION_NUMBER(end - 1)))
-	{
-		/*
-		 * If we flush more than a tera-byte or across regions, we're probably
-		 * better off just flushing the entire TLB(s).  This should be very rare
-		 * and is not worth optimizing for.
-		 */
-		flush_tlb_all();
-	} else {
-		/*
-		 * flush_tlb_range() takes a vma instead of a mm pointer because
-		 * some architectures want the vm_flags for ITLB/DTLB flush.
-		 */
-		struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
-
-		/* flush the address range from the tlb: */
-		flush_tlb_range(&vma, start, end);
-		/* now flush the virt. page-table area mapping the address range: */
-		flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
-	}
-
-}
-
-static inline void
-ia64_tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	unsigned long i;
-	unsigned int nr;
-
-	/* lastly, release the freed pages */
-	nr = tlb->nr;
-
-	tlb->nr = 0;
-	tlb->start_addr = ~0UL;
-	for (i = 0; i < nr; ++i)
-		free_page_and_swap_cache(tlb->pages[i]);
-}
-
-/*
- * Flush the TLB for address range START to END and, if not in fast mode, release the
- * freed pages that where gathered up to this point.
- */
-static inline void
-ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	if (!tlb->need_flush)
-		return;
-	ia64_tlb_flush_mmu_tlbonly(tlb, start, end);
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(void *);
-	}
-}
-
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	tlb->fullmm = !(start | (end+1));
-	tlb->start = start;
-	tlb->end = end;
-	tlb->start_addr = ~0UL;
-}
-
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-			unsigned long start, unsigned long end, bool force)
-{
-	if (force)
-		tlb->need_flush = 1;
-	/*
-	 * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
-	 * tlb->end_addr.
-	 */
-	ia64_tlb_flush_mmu(tlb, start, end);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Logically, this routine frees PAGE.  On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-
-	if (!tlb->nr && tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-
-	tlb->pages[tlb->nr++] = page;
-	VM_WARN_ON(tlb->nr > tlb->max);
-	if (tlb->nr == tlb->max)
-		return true;
-	return false;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_tlbonly(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu_free(tlb);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS.  This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start_addr == ~0UL)
-		tlb->start_addr = address;
-	tlb->end_addr = address + PAGE_SIZE;
-}
-
 #define tlb_migrate_finish(mm)	platform_tlb_migrate_finish(mm)
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
-} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb,
-						     unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pte_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pmd_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pud_free_tlb(tlb, pudp, address);		\
-} while (0)
+#include <asm-generic/tlb.h>
 
 #endif /* _ASM_IA64_TLB_H */
--- a/arch/ia64/include/asm/tlbflush.h
+++ b/arch/ia64/include/asm/tlbflush.h
@@ -14,6 +14,31 @@
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 
+struct ia64_tr_entry {
+	u64 ifa;
+	u64 itir;
+	u64 pte;
+	u64 rr;
+}; /*Record for tr entry!*/
+
+extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
+extern void ia64_ptr_entry(u64 target_mask, int slot);
+extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
+
+/*
+ region register macros
+*/
+#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
+#define RR_VE(val)     (((val) & 0x0000000000000001) << 0)
+#define RR_VE_MASK     0x0000000000000001L
+#define RR_VE_SHIFT    0
+#define RR_TO_PS(val)  (((val) >> 2) & 0x000000000000003f)
+#define RR_PS(val)     (((val) & 0x000000000000003f) << 2)
+#define RR_PS_MASK     0x00000000000000fcL
+#define RR_PS_SHIFT    2
+#define RR_RID_MASK    0x00000000ffffff00L
+#define RR_TO_RID(val)         ((val >> 8) & 0xffffff)
+
 /*
  * Now for some TLB flushing routines.  This is the kind of stuff that
  * can be very expensive, so try to avoid them whenever possible.
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -297,8 +297,8 @@ local_flush_tlb_all (void)
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
 
-void
-flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
+static void
+__flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
 		 unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -335,6 +335,25 @@ flush_tlb_range (struct vm_area_struct *
 	preempt_enable();
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
+
+void flush_tlb_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
+{
+	if (unlikely(end - start >= 1024*1024*1024*1024UL
+			|| REGION_NUMBER(start) != REGION_NUMBER(end - 1))) {
+		/*
+		 * If we flush more than a tera-byte or across regions, we're
+		 * probably better off just flushing the entire TLB(s).  This
+		 * should be very rare and is not worth optimizing for.
+		 */
+		flush_tlb_all();
+	} else {
+		/* flush the address range from the tlb */
+		__flush_tlb_range(vma, start, end);
+		/* flush the virt. page-table area mapping the addr range */
+		__flush_tlb_range(vma, ia64_thash(start), ia64_thash(end));
+	}
+}
 EXPORT_SYMBOL(flush_tlb_range);
 
 void ia64_tlb_init(void)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 10/18] sh/tlb: Convert SH to generic mmu_gather
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (9 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 09/18] ia64/tlb: Conver " Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 11/18] um/tlb: Convert " Peter Zijlstra
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Yoshinori Sato, Rich Felker

Generic mmu_gather provides everything SH needs (range tracking and
cache coherency).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sh/include/asm/pgalloc.h |    7 ++
 arch/sh/include/asm/tlb.h     |  130 ------------------------------------------
 2 files changed, 8 insertions(+), 129 deletions(-)

--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -72,6 +72,15 @@ do {							\
 	tlb_remove_page((tlb), (pte));			\
 } while (0)
 
+#if CONFIG_PGTABLE_LEVELS > 2
+#define __pmd_free_tlb(tlb, pmdp, addr)			\
+do {							\
+	struct page *page = virt_to_page(pmdp);		\
+	pgtable_pmd_page_dtor(page);			\
+	tlb_remove_page((tlb), page);			\
+} while (0);
+#endif
+
 static inline void check_pgt_cache(void)
 {
 	quicklist_trim(QUICK_PT, NULL, 25, 16);
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -11,131 +11,8 @@
 
 #ifdef CONFIG_MMU
 #include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	unsigned long		start, end;
-};
 
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (tlb->fullmm || force)
-		flush_tlb_mm(tlb->mm);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm && tlb->end) {
-		flush_tlb_range(vma, tlb->start, tlb->end);
-		init_tlb_gather(tlb);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
+#include <asm-generic/tlb.h>
 
 #if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
 extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -155,11 +32,6 @@ static inline void tlb_unwire_entry(void
 
 #else /* CONFIG_MMU */
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
-#define tlb_flush(tlb)					do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* CONFIG_MMU */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 10/18] sh/tlb: Convert SH to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 10/18] sh/tlb: Convert SH " Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Yoshinori Sato, Rich Felker

Generic mmu_gather provides everything SH needs (range tracking and
cache coherency).

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sh/include/asm/pgalloc.h |    7 ++
 arch/sh/include/asm/tlb.h     |  130 ------------------------------------------
 2 files changed, 8 insertions(+), 129 deletions(-)

--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -72,6 +72,15 @@ do {							\
 	tlb_remove_page((tlb), (pte));			\
 } while (0)
 
+#if CONFIG_PGTABLE_LEVELS > 2
+#define __pmd_free_tlb(tlb, pmdp, addr)			\
+do {							\
+	struct page *page = virt_to_page(pmdp);		\
+	pgtable_pmd_page_dtor(page);			\
+	tlb_remove_page((tlb), page);			\
+} while (0);
+#endif
+
 static inline void check_pgt_cache(void)
 {
 	quicklist_trim(QUICK_PT, NULL, 25, 16);
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -11,131 +11,8 @@
 
 #ifdef CONFIG_MMU
 #include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	unsigned long		start, end;
-};
 
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (tlb->fullmm || force)
-		flush_tlb_mm(tlb->mm);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm && tlb->end) {
-		flush_tlb_range(vma, tlb->start, tlb->end);
-		init_tlb_gather(tlb);
-	}
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
+#include <asm-generic/tlb.h>
 
 #if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
 extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -155,11 +32,6 @@ static inline void tlb_unwire_entry(void
 
 #else /* CONFIG_MMU */
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
-#define tlb_flush(tlb)					do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* CONFIG_MMU */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 11/18] um/tlb: Convert to generic mmu_gather
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (10 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 10/18] sh/tlb: Convert SH " Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 12/18] arch/tlb: Clean up simple architectures Peter Zijlstra
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Richard Weinberger

Generic mmu_gather provides the simple flush_tlb_range() based range
tracking mmu_gather UM needs.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/include/asm/tlb.h |  156 ----------------------------------------------
 1 file changed, 2 insertions(+), 154 deletions(-)

--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -2,160 +2,8 @@
 #ifndef __UM_TLB_H
 #define __UM_TLB_H
 
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/percpu.h>
-#include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
-
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		need_flush; /* Really unmapped some ptes? */
-	unsigned long		start;
-	unsigned long		end;
-	unsigned int		fullmm; /* non-zero means full mm flush */
-};
-
-static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
-					  unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->need_flush = 0;
-
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			       unsigned long end);
-
-static inline void
-tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end);
-}
-
-static inline void
-tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-/* arch_tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-		tlb->need_flush = 1;
-	}
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-/* tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)),
- *	while handling the additional races in SMP caused by other CPUs
- *	caching valid mappings in their TLBs.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/**
- * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
- *
- * Record the fact that pte's were really umapped in ->need_flush, so we can
- * later optimise away the tlb invalidate.   This helps when userspace is
- * unmapping already-unmapped pages, which happens quite a lot.
- */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
-	do {							\
-		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
-	} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
-
-#define pud_free_tlb(tlb, pudp, addr) __pud_free_tlb(tlb, pudp, addr)
-
-#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
-
-#define tlb_migrate_finish(mm) do {} while (0)
+#include <asm-generic/cacheflush.h>
+#include <asm-generic/tlb.h>
 
 #endif

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 11/18] um/tlb: Convert to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 11/18] um/tlb: Convert " Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Richard Weinberger

Generic mmu_gather provides the simple flush_tlb_range() based range
tracking mmu_gather UM needs.

Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/um/include/asm/tlb.h |  156 ----------------------------------------------
 1 file changed, 2 insertions(+), 154 deletions(-)

--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -2,160 +2,8 @@
 #ifndef __UM_TLB_H
 #define __UM_TLB_H
 
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/percpu.h>
-#include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
-
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		need_flush; /* Really unmapped some ptes? */
-	unsigned long		start;
-	unsigned long		end;
-	unsigned int		fullmm; /* non-zero means full mm flush */
-};
-
-static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
-					  unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->need_flush = 0;
-
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-
-	init_tlb_gather(tlb);
-}
-
-extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			       unsigned long end);
-
-static inline void
-tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end);
-}
-
-static inline void
-tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
-/* arch_tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-		tlb->need_flush = 1;
-	}
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-/* tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)),
- *	while handling the additional races in SMP caused by other CPUs
- *	caching valid mappings in their TLBs.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
-					  struct page *page, int page_size)
-{
-	return __tlb_remove_page(tlb, page);
-}
-
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
-{
-	return tlb_remove_page(tlb, page);
-}
-
-/**
- * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
- *
- * Record the fact that pte's were really umapped in ->need_flush, so we can
- * later optimise away the tlb invalidate.   This helps when userspace is
- * unmapping already-unmapped pages, which happens quite a lot.
- */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
-	do {							\
-		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
-	} while (0)
-
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
-
-#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
-
-#define pud_free_tlb(tlb, pudp, addr) __pud_free_tlb(tlb, pudp, addr)
-
-#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
-
-#define tlb_migrate_finish(mm) do {} while (0)
+#include <asm-generic/cacheflush.h>
+#include <asm-generic/tlb.h>
 
 #endif

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (11 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 11/18] um/tlb: Convert " Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-10-03 17:03   ` Vineet Gupta
  2018-09-26 11:36 ` [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Richard Henderson, Vineet Gupta,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Helge Deller,
	David S. Miller, Guan Xuetao, Max Filippov

There are generally two cases:

 1) either the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) or an architecture lacks an efficient flush_tlb_range() and
    we override tlb_end_vma() and tlb_flush().

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
h8300:	    has no mmu
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Helge Deller <deller@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/tlb.h      |    2 --
 arch/arc/include/asm/tlb.h        |   23 -----------------------
 arch/c6x/include/asm/tlb.h        |    1 +
 arch/h8300/include/asm/tlb.h      |    2 --
 arch/hexagon/include/asm/tlb.h    |   12 ------------
 arch/m68k/include/asm/tlb.h       |    1 -
 arch/microblaze/include/asm/tlb.h |    4 +---
 arch/mips/include/asm/tlb.h       |    8 --------
 arch/nds32/include/asm/tlb.h      |   10 ----------
 arch/nios2/include/asm/tlb.h      |    8 +++++---
 arch/openrisc/include/asm/tlb.h   |    6 ++++--
 arch/parisc/include/asm/tlb.h     |   13 -------------
 arch/powerpc/include/asm/tlb.h    |    1 -
 arch/sparc/include/asm/tlb_32.h   |   13 -------------
 arch/unicore32/include/asm/tlb.h  |   10 ++++++----
 arch/xtensa/include/asm/tlb.h     |   17 -----------------
 16 files changed, 17 insertions(+), 114 deletions(-)

--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -4,8 +4,6 @@
 
 #define tlb_start_vma(tlb, vma)			do { } while (0)
 #define tlb_end_vma(tlb, vma)			do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
-
 #define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -9,29 +9,6 @@
 #ifndef _ASM_ARC_TLB_H
 #define _ASM_ARC_TLB_H
 
-#define tlb_flush(tlb)				\
-do {						\
-	if (tlb->fullmm)			\
-		flush_tlb_mm((tlb)->mm);	\
-} while (0)
-
-/*
- * This pair is called at time of munmap/exit to flush cache and TLB entries
- * for mappings being torn down.
- * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
- * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
- *
- * Note, read http://lkml.org/lkml/2004/1/15/6
- */
-
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, ptep, address)
-
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/c6x/include/asm/tlb.h
+++ b/arch/c6x/include/asm/tlb.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_C6X_TLB_H
 #define _ASM_C6X_TLB_H
 
+#define tlb_end_vma(tlb,vma) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/h8300/include/asm/tlb.h
+++ b/arch/h8300/include/asm/tlb.h
@@ -2,8 +2,6 @@
 #ifndef __H8300_TLB_H__
 #define __H8300_TLB_H__
 
-#define tlb_flush(tlb)	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/hexagon/include/asm/tlb.h
+++ b/arch/hexagon/include/asm/tlb.h
@@ -22,18 +22,6 @@
 #include <linux/pagemap.h>
 #include <asm/tlbflush.h>
 
-/*
- * We don't need any special per-pte or per-vma handling...
- */
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/m68k/include/asm/tlb.h
+++ b/arch/m68k/include/asm/tlb.h
@@ -8,7 +8,6 @@
  */
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
 /*
  * .. because we flush the whole mm when it
--- a/arch/microblaze/include/asm/tlb.h
+++ b/arch/microblaze/include/asm/tlb.h
@@ -11,14 +11,12 @@
 #ifndef _ASM_MICROBLAZE_TLB_H
 #define _ASM_MICROBLAZE_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <linux/pagemap.h>
 
 #ifdef CONFIG_MMU
 #define tlb_start_vma(tlb, vma)		do { } while (0)
 #define tlb_end_vma(tlb, vma)		do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 #endif
 
 #include <asm-generic/tlb.h>
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,14 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #define _UNIQUE_ENTRYHI(base, idx)					\
 		(((base) + ((idx) << (PAGE_SHIFT + 1))) |		\
 		 (cpu_has_tlbinv ? MIPS_ENTRYHI_EHINV : 0))
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,16 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_end_vma(tlb,vma)				\
-	do { 						\
-		if(!tlb->fullmm)			\
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, addr) do { } while (0)
-
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -11,12 +11,14 @@
 #ifndef _ASM_NIOS2_TLB_H
 #define _ASM_NIOS2_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 extern void set_mmu_pid(unsigned long pid);
 
+/*
+ * NIOS32 does have flush_tlb_range(), but it lacks a limit and fallback to
+ * full mm invalidation. So use flush_tlb_mm() for everything.
+ */
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
+#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
 
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
--- a/arch/openrisc/include/asm/tlb.h
+++ b/arch/openrisc/include/asm/tlb.h
@@ -22,12 +22,14 @@
 /*
  * or32 doesn't need any special per-pte or
  * per-vma handling..
+ *
+ * OpenRISC doesn't have an efficient flush_tlb_range() so use flush_tlb_mm()
+ * for everything.
  */
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -2,19 +2,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_flush(tlb)			\
-do {	if ((tlb)->fullmm)		\
-		flush_tlb_mm((tlb)->mm);\
-} while (0)
-
-#define tlb_end_vma(tlb, vma)	\
-do {	if (!(tlb)->fullmm)	\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tlb)->mm, pmd)
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,19 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_end_vma(tlb, vma) \
-do {								\
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
-#define tlb_flush(tlb) \
-do {								\
-	flush_tlb_mm((tlb)->mm);				\
-} while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC_TLB_H */
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -12,10 +12,12 @@
 #ifndef __UNICORE_TLB_H__
 #define __UNICORE_TLB_H__
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+/*
+ * unicore32 lacks an afficient flush_tlb_range(), use flush_tlb_mm().
+ */
+#define tlb_start_vma(tlb, vma)		do { } while (0)
+#define tlb_end_vma(tlb, vma)		do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 
 #define __pte_free_tlb(tlb, pte, addr)				\
 	do {							\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -14,23 +14,6 @@
 #include <asm/cache.h>
 #include <asm/page.h>
 
-#if (DCACHE_WAY_SIZE <= PAGE_SIZE)
-
-# define tlb_end_vma(tlb,vma)			do { } while (0)
-
-#else
-
-# define tlb_end_vma(tlb, vma)						      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end);     \
-	} while(0)
-
-#endif
-
-#define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, address)	pte_free((tlb)->mm, pte)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-09-26 11:36 ` [PATCH 12/18] arch/tlb: Clean up simple architectures Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  2018-10-03 17:03   ` Vineet Gupta
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Richard Henderson, Vineet Gupta,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Helge Deller,
	David S. Miller, Guan Xuetao, Max Filippov

There are generally two cases:

 1) either the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) or an architecture lacks an efficient flush_tlb_range() and
    we override tlb_end_vma() and tlb_flush().

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
h8300:	    has no mmu
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Helge Deller <deller@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/tlb.h      |    2 --
 arch/arc/include/asm/tlb.h        |   23 -----------------------
 arch/c6x/include/asm/tlb.h        |    1 +
 arch/h8300/include/asm/tlb.h      |    2 --
 arch/hexagon/include/asm/tlb.h    |   12 ------------
 arch/m68k/include/asm/tlb.h       |    1 -
 arch/microblaze/include/asm/tlb.h |    4 +---
 arch/mips/include/asm/tlb.h       |    8 --------
 arch/nds32/include/asm/tlb.h      |   10 ----------
 arch/nios2/include/asm/tlb.h      |    8 +++++---
 arch/openrisc/include/asm/tlb.h   |    6 ++++--
 arch/parisc/include/asm/tlb.h     |   13 -------------
 arch/powerpc/include/asm/tlb.h    |    1 -
 arch/sparc/include/asm/tlb_32.h   |   13 -------------
 arch/unicore32/include/asm/tlb.h  |   10 ++++++----
 arch/xtensa/include/asm/tlb.h     |   17 -----------------
 16 files changed, 17 insertions(+), 114 deletions(-)

--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -4,8 +4,6 @@
 
 #define tlb_start_vma(tlb, vma)			do { } while (0)
 #define tlb_end_vma(tlb, vma)			do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
-
 #define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/arc/include/asm/tlb.h
+++ b/arch/arc/include/asm/tlb.h
@@ -9,29 +9,6 @@
 #ifndef _ASM_ARC_TLB_H
 #define _ASM_ARC_TLB_H
 
-#define tlb_flush(tlb)				\
-do {						\
-	if (tlb->fullmm)			\
-		flush_tlb_mm((tlb)->mm);	\
-} while (0)
-
-/*
- * This pair is called at time of munmap/exit to flush cache and TLB entries
- * for mappings being torn down.
- * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
- * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
- *
- * Note, read http://lkml.org/lkml/2004/1/15/6
- */
-
-#define tlb_end_vma(tlb, vma)						\
-do {									\
-	if (!tlb->fullmm)						\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, ptep, address)
-
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/c6x/include/asm/tlb.h
+++ b/arch/c6x/include/asm/tlb.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_C6X_TLB_H
 #define _ASM_C6X_TLB_H
 
+#define tlb_end_vma(tlb,vma) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
--- a/arch/h8300/include/asm/tlb.h
+++ b/arch/h8300/include/asm/tlb.h
@@ -2,8 +2,6 @@
 #ifndef __H8300_TLB_H__
 #define __H8300_TLB_H__
 
-#define tlb_flush(tlb)	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/hexagon/include/asm/tlb.h
+++ b/arch/hexagon/include/asm/tlb.h
@@ -22,18 +22,6 @@
 #include <linux/pagemap.h>
 #include <asm/tlbflush.h>
 
-/*
- * We don't need any special per-pte or per-vma handling...
- */
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/m68k/include/asm/tlb.h
+++ b/arch/m68k/include/asm/tlb.h
@@ -8,7 +8,6 @@
  */
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
 /*
  * .. because we flush the whole mm when it
--- a/arch/microblaze/include/asm/tlb.h
+++ b/arch/microblaze/include/asm/tlb.h
@@ -11,14 +11,12 @@
 #ifndef _ASM_MICROBLAZE_TLB_H
 #define _ASM_MICROBLAZE_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <linux/pagemap.h>
 
 #ifdef CONFIG_MMU
 #define tlb_start_vma(tlb, vma)		do { } while (0)
 #define tlb_end_vma(tlb, vma)		do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 #endif
 
 #include <asm-generic/tlb.h>
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -5,14 +5,6 @@
 #include <asm/cpu-features.h>
 #include <asm/mipsregs.h>
 
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
-/*
- * .. because we flush the whole mm when it fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #define _UNIQUE_ENTRYHI(base, idx)					\
 		(((base) + ((idx) << (PAGE_SHIFT + 1))) |		\
 		 (cpu_has_tlbinv ? MIPS_ENTRYHI_EHINV : 0))
--- a/arch/nds32/include/asm/tlb.h
+++ b/arch/nds32/include/asm/tlb.h
@@ -4,16 +4,6 @@
 #ifndef __ASMNDS32_TLB_H
 #define __ASMNDS32_TLB_H
 
-#define tlb_end_vma(tlb,vma)				\
-	do { 						\
-		if(!tlb->fullmm)			\
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-	} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, addr) do { } while (0)
-
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, addr)	pte_free((tlb)->mm, pte)
--- a/arch/nios2/include/asm/tlb.h
+++ b/arch/nios2/include/asm/tlb.h
@@ -11,12 +11,14 @@
 #ifndef _ASM_NIOS2_TLB_H
 #define _ASM_NIOS2_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 extern void set_mmu_pid(unsigned long pid);
 
+/*
+ * NIOS32 does have flush_tlb_range(), but it lacks a limit and fallback to
+ * full mm invalidation. So use flush_tlb_mm() for everything.
+ */
 #define tlb_end_vma(tlb, vma)	do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
+#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
 
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
--- a/arch/openrisc/include/asm/tlb.h
+++ b/arch/openrisc/include/asm/tlb.h
@@ -22,12 +22,14 @@
 /*
  * or32 doesn't need any special per-pte or
  * per-vma handling..
+ *
+ * OpenRISC doesn't have an efficient flush_tlb_range() so use flush_tlb_mm()
+ * for everything.
  */
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-
 #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -2,19 +2,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_flush(tlb)			\
-do {	if ((tlb)->fullmm)		\
-		flush_tlb_mm((tlb)->mm);\
-} while (0)
-
-#define tlb_end_vma(tlb, vma)	\
-do {	if (!(tlb)->fullmm)	\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
 #include <asm-generic/tlb.h>
 
 #define __pmd_free_tlb(tlb, pmd, addr)	pmd_free((tlb)->mm, pmd)
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -2,19 +2,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_end_vma(tlb, vma) \
-do {								\
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define __tlb_remove_tlb_entry(tlb, pte, address) \
-	do { } while (0)
-
-#define tlb_flush(tlb) \
-do {								\
-	flush_tlb_mm((tlb)->mm);				\
-} while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC_TLB_H */
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -12,10 +12,12 @@
 #ifndef __UNICORE_TLB_H__
 #define __UNICORE_TLB_H__
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
-#define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
+/*
+ * unicore32 lacks an afficient flush_tlb_range(), use flush_tlb_mm().
+ */
+#define tlb_start_vma(tlb, vma)		do { } while (0)
+#define tlb_end_vma(tlb, vma)		do { } while (0)
+#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 
 #define __pte_free_tlb(tlb, pte, addr)				\
 	do {							\
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -14,23 +14,6 @@
 #include <asm/cache.h>
 #include <asm/page.h>
 
-#if (DCACHE_WAY_SIZE <= PAGE_SIZE)
-
-# define tlb_end_vma(tlb,vma)			do { } while (0)
-
-#else
-
-# define tlb_end_vma(tlb, vma)						      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end);     \
-	} while(0)
-
-#endif
-
-#define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, address)	pte_free((tlb)->mm, pte)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (12 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 12/18] arch/tlb: Clean up simple architectures Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-12-11  5:43   ` Aneesh Kumar K.V
  2018-09-26 11:36 ` [PATCH 14/18] s390/tlb: convert to generic mmu_gather Peter Zijlstra
                   ` (6 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Linus Torvalds, Martin Schwidefsky

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
mmu_gather code. If the option is set the mmu_gather will not
track individual pages for delayed page free anymore. A platform
that enables the option needs to provide its own implementation
of the __tlb_remove_page_size function to free pages.

Cc: npiggin@gmail.com
Cc: heiko.carstens@de.ibm.com
Cc: will.deacon@arm.com
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux@armlinux.org.uk
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
---
 arch/Kconfig              |    3 +
 include/asm-generic/tlb.h |    9 +++
 mm/mmu_gather.c           |  107 +++++++++++++++++++++++++---------------------
 3 files changed, 70 insertions(+), 49 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -368,6 +368,9 @@ config HAVE_RCU_TABLE_NO_INVALIDATE
 config HAVE_MMU_GATHER_PAGE_SIZE
 	bool
 
+config HAVE_MMU_GATHER_NO_GATHER
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -184,6 +184,7 @@ extern void tlb_remove_table(struct mmu_
 
 #endif
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -208,6 +209,10 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
+extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
+				   int page_size);
+#endif
+
 /*
  * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
@@ -254,6 +259,7 @@ struct mmu_gather {
 
 	unsigned int		batch_count;
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
@@ -261,6 +267,7 @@ struct mmu_gather {
 #ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	unsigned int page_size;
 #endif
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -269,8 +276,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
 void tlb_flush_mmu_free(struct mmu_gather *tlb);
-extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
-				   int page_size);
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -13,6 +13,8 @@
 
 #ifdef HAVE_GENERIC_MMU_GATHER
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+
 static bool tlb_next_batch(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
@@ -41,6 +43,56 @@ static bool tlb_next_batch(struct mmu_ga
 	return true;
 }
 
+static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch;
+
+	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
+		free_pages_and_swap_cache(batch->pages, batch->nr);
+		batch->nr = 0;
+	}
+	tlb->active = &tlb->local;
+}
+
+static void tlb_batch_list_free(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch, *next;
+
+	for (batch = tlb->local.next; batch; batch = next) {
+		next = batch->next;
+		free_pages((unsigned long)batch, 0);
+	}
+	tlb->local.next = NULL;
+}
+
+bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
+{
+	struct mmu_gather_batch *batch;
+
+	VM_BUG_ON(!tlb->end);
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	VM_WARN_ON(tlb->page_size != page_size);
+#endif
+
+	batch = tlb->active;
+	/*
+	 * Add the page and check if we are full. If so
+	 * force a flush.
+	 */
+	batch->pages[batch->nr++] = page;
+	if (batch->nr == batch->max) {
+		if (!tlb_next_batch(tlb))
+			return true;
+		batch = tlb->active;
+	}
+	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
+
+	return false;
+}
+
+#endif /* HAVE_MMU_GATHER_NO_GATHER */
+
 void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 				unsigned long start, unsigned long end)
 {
@@ -48,12 +100,15 @@ void arch_tlb_gather_mmu(struct mmu_gath
 
 	/* Is it from 0 to ~0? */
 	tlb->fullmm     = !(start | (end+1));
+
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 	tlb->need_flush_all = 0;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 	tlb->batch_count = 0;
+#endif
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
@@ -67,16 +122,12 @@ void arch_tlb_gather_mmu(struct mmu_gath
 
 void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
-	struct mmu_gather_batch *batch;
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
-	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
-		free_pages_and_swap_cache(batch->pages, batch->nr);
-		batch->nr = 0;
-	}
-	tlb->active = &tlb->local;
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_pages_flush(tlb);
+#endif
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -92,8 +143,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 		unsigned long start, unsigned long end, bool force)
 {
-	struct mmu_gather_batch *batch, *next;
-
 	if (force) {
 		__tlb_reset_range(tlb);
 		__tlb_adjust_range(tlb, start, end - start);
@@ -103,45 +152,9 @@ void arch_tlb_finish_mmu(struct mmu_gath
 
 	/* keep the page table cache within bounds */
 	check_pgt_cache();
-
-	for (batch = tlb->local.next; batch; batch = next) {
-		next = batch->next;
-		free_pages((unsigned long)batch, 0);
-	}
-	tlb->local.next = NULL;
-}
-
-/* __tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while
- *	handling the additional races in SMP caused by other CPUs caching valid
- *	mappings in their TLBs. Returns the number of free page slots left.
- *	When out of page slots we must call tlb_flush_mmu().
- *returns true if the caller should flush.
- */
-bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
-{
-	struct mmu_gather_batch *batch;
-
-	VM_BUG_ON(!tlb->end);
-
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
-	VM_WARN_ON(tlb->page_size != page_size);
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_list_free(tlb);
 #endif
-
-	batch = tlb->active;
-	/*
-	 * Add the page and check if we are full. If so
-	 * force a flush.
-	 */
-	batch->pages[batch->nr++] = page;
-	if (batch->nr == batch->max) {
-		if (!tlb_next_batch(tlb))
-			return true;
-		batch = tlb->active;
-	}
-	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
-
-	return false;
 }
 
 #endif /* HAVE_GENERIC_MMU_GATHER */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER
  2018-09-26 11:36 ` [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  2018-12-11  5:43   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Linus Torvalds, Martin Schwidefsky

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
mmu_gather code. If the option is set the mmu_gather will not
track individual pages for delayed page free anymore. A platform
that enables the option needs to provide its own implementation
of the __tlb_remove_page_size function to free pages.

Cc: npiggin@gmail.com
Cc: heiko.carstens@de.ibm.com
Cc: will.deacon@arm.com
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux@armlinux.org.uk
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
---
 arch/Kconfig              |    3 +
 include/asm-generic/tlb.h |    9 +++
 mm/mmu_gather.c           |  107 +++++++++++++++++++++++++---------------------
 3 files changed, 70 insertions(+), 49 deletions(-)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -368,6 +368,9 @@ config HAVE_RCU_TABLE_NO_INVALIDATE
 config HAVE_MMU_GATHER_PAGE_SIZE
 	bool
 
+config HAVE_MMU_GATHER_NO_GATHER
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -184,6 +184,7 @@ extern void tlb_remove_table(struct mmu_
 
 #endif
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -208,6 +209,10 @@ struct mmu_gather_batch {
  */
 #define MAX_GATHER_BATCH_COUNT	(10000UL/MAX_GATHER_BATCH)
 
+extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
+				   int page_size);
+#endif
+
 /*
  * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
@@ -254,6 +259,7 @@ struct mmu_gather {
 
 	unsigned int		batch_count;
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
 	struct page		*__pages[MMU_GATHER_BUNDLE];
@@ -261,6 +267,7 @@ struct mmu_gather {
 #ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
 	unsigned int page_size;
 #endif
+#endif
 };
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
@@ -269,8 +276,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
 void tlb_flush_mmu_free(struct mmu_gather *tlb);
-extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
-				   int page_size);
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -13,6 +13,8 @@
 
 #ifdef HAVE_GENERIC_MMU_GATHER
 
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+
 static bool tlb_next_batch(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
@@ -41,6 +43,56 @@ static bool tlb_next_batch(struct mmu_ga
 	return true;
 }
 
+static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch;
+
+	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
+		free_pages_and_swap_cache(batch->pages, batch->nr);
+		batch->nr = 0;
+	}
+	tlb->active = &tlb->local;
+}
+
+static void tlb_batch_list_free(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch, *next;
+
+	for (batch = tlb->local.next; batch; batch = next) {
+		next = batch->next;
+		free_pages((unsigned long)batch, 0);
+	}
+	tlb->local.next = NULL;
+}
+
+bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
+{
+	struct mmu_gather_batch *batch;
+
+	VM_BUG_ON(!tlb->end);
+
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	VM_WARN_ON(tlb->page_size != page_size);
+#endif
+
+	batch = tlb->active;
+	/*
+	 * Add the page and check if we are full. If so
+	 * force a flush.
+	 */
+	batch->pages[batch->nr++] = page;
+	if (batch->nr == batch->max) {
+		if (!tlb_next_batch(tlb))
+			return true;
+		batch = tlb->active;
+	}
+	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
+
+	return false;
+}
+
+#endif /* HAVE_MMU_GATHER_NO_GATHER */
+
 void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 				unsigned long start, unsigned long end)
 {
@@ -48,12 +100,15 @@ void arch_tlb_gather_mmu(struct mmu_gath
 
 	/* Is it from 0 to ~0? */
 	tlb->fullmm     = !(start | (end+1));
+
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 	tlb->need_flush_all = 0;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 	tlb->batch_count = 0;
+#endif
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
@@ -67,16 +122,12 @@ void arch_tlb_gather_mmu(struct mmu_gath
 
 void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
-	struct mmu_gather_batch *batch;
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
-	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
-		free_pages_and_swap_cache(batch->pages, batch->nr);
-		batch->nr = 0;
-	}
-	tlb->active = &tlb->local;
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_pages_flush(tlb);
+#endif
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -92,8 +143,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 		unsigned long start, unsigned long end, bool force)
 {
-	struct mmu_gather_batch *batch, *next;
-
 	if (force) {
 		__tlb_reset_range(tlb);
 		__tlb_adjust_range(tlb, start, end - start);
@@ -103,45 +152,9 @@ void arch_tlb_finish_mmu(struct mmu_gath
 
 	/* keep the page table cache within bounds */
 	check_pgt_cache();
-
-	for (batch = tlb->local.next; batch; batch = next) {
-		next = batch->next;
-		free_pages((unsigned long)batch, 0);
-	}
-	tlb->local.next = NULL;
-}
-
-/* __tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while
- *	handling the additional races in SMP caused by other CPUs caching valid
- *	mappings in their TLBs. Returns the number of free page slots left.
- *	When out of page slots we must call tlb_flush_mmu().
- *returns true if the caller should flush.
- */
-bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size)
-{
-	struct mmu_gather_batch *batch;
-
-	VM_BUG_ON(!tlb->end);
-
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
-	VM_WARN_ON(tlb->page_size != page_size);
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_list_free(tlb);
 #endif
-
-	batch = tlb->active;
-	/*
-	 * Add the page and check if we are full. If so
-	 * force a flush.
-	 */
-	batch->pages[batch->nr++] = page;
-	if (batch->nr == batch->max) {
-		if (!tlb_next_batch(tlb))
-			return true;
-		batch = tlb->active;
-	}
-	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
-
-	return false;
 }
 
 #endif /* HAVE_GENERIC_MMU_GATHER */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 14/18] s390/tlb: convert to generic mmu_gather
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (13 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu() Peter Zijlstra
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Linus Torvalds, Martin Schwidefsky

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Cc: npiggin@gmail.com
Cc: heiko.carstens@de.ibm.com
Cc: will.deacon@arm.com
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux@armlinux.org.uk
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180918125151.31744-3-schwidefsky@de.ibm.com
---
 arch/s390/Kconfig           |    2 
 arch/s390/include/asm/tlb.h |  128 +++++++++++++-------------------------------
 arch/s390/mm/pgalloc.c      |   63 ---------------------
 3 files changed, 42 insertions(+), 151 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -157,10 +157,12 @@ config S390
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MEMBLOCK_PHYS_MAP
+	select HAVE_MMU_GATHER_NO_GATHER
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NOP_MCOUNT
 	select HAVE_OPROFILE
 	select HAVE_PERF_EVENTS
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -22,98 +22,39 @@
  * Pages used for the page tables is a different story. FIXME: more
  */
 
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/processor.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-
-struct mmu_gather {
-	struct mm_struct *mm;
-	struct mmu_table_batch *batch;
-	unsigned int fullmm;
-	unsigned long start, end;
-};
-
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-	tlb->batch = NULL;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	__tlb_flush_mm_lazy(tlb->mm);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	tlb_table_flush(tlb);
-}
-
+void __tlb_remove_table(void *_table);
+static inline void tlb_flush(struct mmu_gather *tlb);
+static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
+					  struct page *page, int page_size);
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
+#define tlb_start_vma(tlb, vma)			do { } while (0)
+#define tlb_end_vma(tlb, vma)			do { } while (0)
 
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-	}
+#define tlb_flush tlb_flush
+#define pte_free_tlb pte_free_tlb
+#define pmd_free_tlb pmd_free_tlb
+#define p4d_free_tlb p4d_free_tlb
+#define pud_free_tlb pud_free_tlb
 
-	tlb_flush_mmu(tlb);
-}
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm-generic/tlb.h>
 
 /*
  * Release the page cache reference for a pte removed by
  * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
  * has already been freed, so just do free_page_and_swap_cache.
  */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-}
-
 static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
 					  struct page *page, int page_size)
 {
-	return __tlb_remove_page(tlb, page);
+	free_page_and_swap_cache(page);
+	return false;
 }
 
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
+static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	return tlb_remove_page(tlb, page);
+	__tlb_flush_mm_lazy(tlb->mm);
 }
 
 /*
@@ -121,8 +62,17 @@ static inline void tlb_remove_page_size(
  * page table from the tlb.
  */
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address)
+                                unsigned long address)
 {
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_ptes = 1;
+	/*
+	 * page_table_free_rcu takes care of the allocation bit masks
+	 * of the 2K table fragments in the 4K page table page,
+	 * then calls tlb_remove_table.
+	 */
 	page_table_free_rcu(tlb, (unsigned long *) pte, address);
 }
 
@@ -139,6 +89,10 @@ static inline void pmd_free_tlb(struct m
 	if (tlb->mm->context.asce_limit <= _REGION3_SIZE)
 		return;
 	pgtable_pmd_page_dtor(virt_to_page(pmd));
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pmd);
 }
 
@@ -154,6 +108,10 @@ static inline void p4d_free_tlb(struct m
 {
 	if (tlb->mm->context.asce_limit <= _REGION1_SIZE)
 		return;
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_p4ds = 1;
 	tlb_remove_table(tlb, p4d);
 }
 
@@ -169,19 +127,11 @@ static inline void pud_free_tlb(struct m
 {
 	if (tlb->mm->context.asce_limit <= _REGION2_SIZE)
 		return;
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pud);
 }
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-#define tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
-#define tlb_remove_pmd_tlb_entry(tlb, pmdp, addr)	do { } while (0)
-#define tlb_migrate_finish(mm)			do { } while (0)
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
 
 #endif /* _S390_TLB_H */
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -288,7 +288,7 @@ void page_table_free_rcu(struct mmu_gath
 	tlb_remove_table(tlb, table);
 }
 
-static void __tlb_remove_table(void *_table)
+void __tlb_remove_table(void *_table)
 {
 	unsigned int mask = (unsigned long) _table & 3;
 	void *table = (void *)((unsigned long) _table ^ mask);
@@ -314,67 +314,6 @@ static void __tlb_remove_table(void *_ta
 	}
 }
 
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely
-	 * on IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	tlb->mm->context.flush_mm = 1;
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)
-			__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			__tlb_flush_mm_lazy(tlb->mm);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_flush_mmu(tlb);
-}
-
 /*
  * Base infrastructure required to generate basic asces, region, segment,
  * and page tables that do not make use of enhanced features like EDAT1.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 14/18] s390/tlb: convert to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 14/18] s390/tlb: convert to generic mmu_gather Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Linus Torvalds, Martin Schwidefsky

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Cc: npiggin@gmail.com
Cc: heiko.carstens@de.ibm.com
Cc: will.deacon@arm.com
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux@armlinux.org.uk
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180918125151.31744-3-schwidefsky@de.ibm.com
---
 arch/s390/Kconfig           |    2 
 arch/s390/include/asm/tlb.h |  128 +++++++++++++-------------------------------
 arch/s390/mm/pgalloc.c      |   63 ---------------------
 3 files changed, 42 insertions(+), 151 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -157,10 +157,12 @@ config S390
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MEMBLOCK_PHYS_MAP
+	select HAVE_MMU_GATHER_NO_GATHER
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NOP_MCOUNT
 	select HAVE_OPROFILE
 	select HAVE_PERF_EVENTS
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -22,98 +22,39 @@
  * Pages used for the page tables is a different story. FIXME: more
  */
 
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <asm/processor.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-
-struct mmu_gather {
-	struct mm_struct *mm;
-	struct mmu_table_batch *batch;
-	unsigned int fullmm;
-	unsigned long start, end;
-};
-
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-	tlb->start = start;
-	tlb->end = end;
-	tlb->fullmm = !(start | (end+1));
-	tlb->batch = NULL;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	__tlb_flush_mm_lazy(tlb->mm);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-	tlb_table_flush(tlb);
-}
-
+void __tlb_remove_table(void *_table);
+static inline void tlb_flush(struct mmu_gather *tlb);
+static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
+					  struct page *page, int page_size);
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
+#define tlb_start_vma(tlb, vma)			do { } while (0)
+#define tlb_end_vma(tlb, vma)			do { } while (0)
 
-static inline void
-arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		tlb->start = start;
-		tlb->end = end;
-	}
+#define tlb_flush tlb_flush
+#define pte_free_tlb pte_free_tlb
+#define pmd_free_tlb pmd_free_tlb
+#define p4d_free_tlb p4d_free_tlb
+#define pud_free_tlb pud_free_tlb
 
-	tlb_flush_mmu(tlb);
-}
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm-generic/tlb.h>
 
 /*
  * Release the page cache reference for a pte removed by
  * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
  * has already been freed, so just do free_page_and_swap_cache.
  */
-static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return false; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-}
-
 static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
 					  struct page *page, int page_size)
 {
-	return __tlb_remove_page(tlb, page);
+	free_page_and_swap_cache(page);
+	return false;
 }
 
-static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-					struct page *page, int page_size)
+static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	return tlb_remove_page(tlb, page);
+	__tlb_flush_mm_lazy(tlb->mm);
 }
 
 /*
@@ -121,8 +62,17 @@ static inline void tlb_remove_page_size(
  * page table from the tlb.
  */
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address)
+                                unsigned long address)
 {
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_ptes = 1;
+	/*
+	 * page_table_free_rcu takes care of the allocation bit masks
+	 * of the 2K table fragments in the 4K page table page,
+	 * then calls tlb_remove_table.
+	 */
 	page_table_free_rcu(tlb, (unsigned long *) pte, address);
 }
 
@@ -139,6 +89,10 @@ static inline void pmd_free_tlb(struct m
 	if (tlb->mm->context.asce_limit <= _REGION3_SIZE)
 		return;
 	pgtable_pmd_page_dtor(virt_to_page(pmd));
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pmd);
 }
 
@@ -154,6 +108,10 @@ static inline void p4d_free_tlb(struct m
 {
 	if (tlb->mm->context.asce_limit <= _REGION1_SIZE)
 		return;
+	__tlb_adjust_range(tlb, address, PAGE_SIZE);
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_p4ds = 1;
 	tlb_remove_table(tlb, p4d);
 }
 
@@ -169,19 +127,11 @@ static inline void pud_free_tlb(struct m
 {
 	if (tlb->mm->context.asce_limit <= _REGION2_SIZE)
 		return;
+	tlb->mm->context.flush_mm = 1;
+	tlb->freed_tables = 1;
+	tlb->cleared_puds = 1;
 	tlb_remove_table(tlb, pud);
 }
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-#define tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
-#define tlb_remove_pmd_tlb_entry(tlb, pmdp, addr)	do { } while (0)
-#define tlb_migrate_finish(mm)			do { } while (0)
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	\
-	tlb_remove_tlb_entry(tlb, ptep, address)
-
-static inline void tlb_change_page_size(struct mmu_gather *tlb, unsigned int page_size)
-{
-}
 
 #endif /* _S390_TLB_H */
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -288,7 +288,7 @@ void page_table_free_rcu(struct mmu_gath
 	tlb_remove_table(tlb, table);
 }
 
-static void __tlb_remove_table(void *_table)
+void __tlb_remove_table(void *_table)
 {
 	unsigned int mask = (unsigned long) _table & 3;
 	void *table = (void *)((unsigned long) _table ^ mask);
@@ -314,67 +314,6 @@ static void __tlb_remove_table(void *_ta
 	}
 }
 
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely
-	 * on IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	tlb->mm->context.flush_mm = 1;
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)
-			__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			__tlb_flush_mm_lazy(tlb->mm);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_flush_mmu(tlb);
-}
-
 /*
  * Base infrastructure required to generate basic asces, region, segment,
  * and page tables that do not make use of enhanced features like EDAT1.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu()
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (14 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 14/18] s390/tlb: convert to generic mmu_gather Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER Peter Zijlstra
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Now that all architectures are converted to the generic code, remove
the arch hooks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 mm/mmu_gather.c |   93 +++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 51 deletions(-)

--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -93,33 +93,6 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-				unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-
-	/* Is it from 0 to ~0? */
-	tlb->fullmm     = !(start | (end+1));
-
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb->need_flush_all = 0;
-	tlb->local.next = NULL;
-	tlb->local.nr   = 0;
-	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
-	tlb->active     = &tlb->local;
-	tlb->batch_count = 0;
-#endif
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
-	tlb->page_size = 0;
-#endif
-
-	__tlb_reset_range(tlb);
-}
-
 void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
@@ -136,27 +109,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_flush_mmu_free(tlb);
 }
 
-/* tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-void arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		__tlb_reset_range(tlb);
-		__tlb_adjust_range(tlb, start, end - start);
-	}
-
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb_batch_list_free(tlb);
-#endif
-}
-
 #endif /* HAVE_GENERIC_MMU_GATHER */
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
@@ -258,10 +210,40 @@ void tlb_remove_table(struct mmu_gather
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 			unsigned long start, unsigned long end)
 {
-	arch_tlb_gather_mmu(tlb, mm, start, end);
+	tlb->mm = mm;
+
+	/* Is it from 0 to ~0? */
+	tlb->fullmm     = !(start | (end+1));
+
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb->need_flush_all = 0;
+	tlb->local.next = NULL;
+	tlb->local.nr   = 0;
+	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
+	tlb->active     = &tlb->local;
+	tlb->batch_count = 0;
+#endif
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	tlb->page_size = 0;
+#endif
+
+	__tlb_reset_range(tlb);
 	inc_tlb_flush_pending(tlb->mm);
 }
 
+/**
+ * tlb_finish_mmu - finish an mmu_gather structure
+ * @tlb: the mmu_gather structure to finish
+ * @start: start of the region that will be removed from the page-table
+ * @end: end of the region that will be removed from the page-table
+ *
+ * Called at the end of the shootdown operation to free up any resources that
+ * were required.
+ */
 void tlb_finish_mmu(struct mmu_gather *tlb,
 		unsigned long start, unsigned long end)
 {
@@ -272,8 +254,17 @@ void tlb_finish_mmu(struct mmu_gather *t
 	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
 	 * forcefully if we detect parallel PTE batching threads.
 	 */
-	bool force = mm_tlb_flush_nested(tlb->mm);
+	if (mm_tlb_flush_nested(tlb->mm)) {
+		__tlb_reset_range(tlb);
+		__tlb_adjust_range(tlb, start, end - start);
+	}
 
-	arch_tlb_finish_mmu(tlb, start, end, force);
+	tlb_flush_mmu(tlb);
+
+	/* keep the page table cache within bounds */
+	check_pgt_cache();
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_list_free(tlb);
+#endif
 	dec_tlb_flush_pending(tlb->mm);
 }

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu()
  2018-09-26 11:36 ` [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu() Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Now that all architectures are converted to the generic code, remove
the arch hooks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 mm/mmu_gather.c |   93 +++++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 51 deletions(-)

--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -93,33 +93,6 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-				unsigned long start, unsigned long end)
-{
-	tlb->mm = mm;
-
-	/* Is it from 0 to ~0? */
-	tlb->fullmm     = !(start | (end+1));
-
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb->need_flush_all = 0;
-	tlb->local.next = NULL;
-	tlb->local.nr   = 0;
-	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
-	tlb->active     = &tlb->local;
-	tlb->batch_count = 0;
-#endif
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
-#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
-	tlb->page_size = 0;
-#endif
-
-	__tlb_reset_range(tlb);
-}
-
 void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
@@ -136,27 +109,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_flush_mmu_free(tlb);
 }
 
-/* tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-void arch_tlb_finish_mmu(struct mmu_gather *tlb,
-		unsigned long start, unsigned long end, bool force)
-{
-	if (force) {
-		__tlb_reset_range(tlb);
-		__tlb_adjust_range(tlb, start, end - start);
-	}
-
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb_batch_list_free(tlb);
-#endif
-}
-
 #endif /* HAVE_GENERIC_MMU_GATHER */
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
@@ -258,10 +210,40 @@ void tlb_remove_table(struct mmu_gather
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 			unsigned long start, unsigned long end)
 {
-	arch_tlb_gather_mmu(tlb, mm, start, end);
+	tlb->mm = mm;
+
+	/* Is it from 0 to ~0? */
+	tlb->fullmm     = !(start | (end+1));
+
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb->need_flush_all = 0;
+	tlb->local.next = NULL;
+	tlb->local.nr   = 0;
+	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
+	tlb->active     = &tlb->local;
+	tlb->batch_count = 0;
+#endif
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
+#ifdef CONFIG_HAVE_MMU_GATHER_PAGE_SIZE
+	tlb->page_size = 0;
+#endif
+
+	__tlb_reset_range(tlb);
 	inc_tlb_flush_pending(tlb->mm);
 }
 
+/**
+ * tlb_finish_mmu - finish an mmu_gather structure
+ * @tlb: the mmu_gather structure to finish
+ * @start: start of the region that will be removed from the page-table
+ * @end: end of the region that will be removed from the page-table
+ *
+ * Called at the end of the shootdown operation to free up any resources that
+ * were required.
+ */
 void tlb_finish_mmu(struct mmu_gather *tlb,
 		unsigned long start, unsigned long end)
 {
@@ -272,8 +254,17 @@ void tlb_finish_mmu(struct mmu_gather *t
 	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
 	 * forcefully if we detect parallel PTE batching threads.
 	 */
-	bool force = mm_tlb_flush_nested(tlb->mm);
+	if (mm_tlb_flush_nested(tlb->mm)) {
+		__tlb_reset_range(tlb);
+		__tlb_adjust_range(tlb, start, end - start);
+	}
 
-	arch_tlb_finish_mmu(tlb, start, end, force);
+	tlb_flush_mmu(tlb);
+
+	/* keep the page table cache within bounds */
+	check_pgt_cache();
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_list_free(tlb);
+#endif
 	dec_tlb_flush_pending(tlb->mm);
 }

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (15 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu() Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free() Peter Zijlstra
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Since all architectures are now using it, it is redundant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    1 -
 mm/mmu_gather.c           |    4 ----
 2 files changed, 5 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -139,7 +139,6 @@
  *  page-tables natively.
  *
  */
-#define HAVE_GENERIC_MMU_GATHER
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -11,8 +11,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 
-#ifdef HAVE_GENERIC_MMU_GATHER
-
 #ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 
 static bool tlb_next_batch(struct mmu_gather *tlb)
@@ -109,8 +107,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_flush_mmu_free(tlb);
 }
 
-#endif /* HAVE_GENERIC_MMU_GATHER */
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 
 /*

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER
  2018-09-26 11:36 ` [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

Since all architectures are now using it, it is redundant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    1 -
 mm/mmu_gather.c           |    4 ----
 2 files changed, 5 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -139,7 +139,6 @@
  *  page-tables natively.
  *
  */
-#define HAVE_GENERIC_MMU_GATHER
 
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -11,8 +11,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
 
-#ifdef HAVE_GENERIC_MMU_GATHER
-
 #ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
 
 static bool tlb_next_batch(struct mmu_gather *tlb)
@@ -109,8 +107,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_flush_mmu_free(tlb);
 }
 
-#endif /* HAVE_GENERIC_MMU_GATHER */
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 
 /*

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free()
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (16 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 11:36 ` [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush() Peter Zijlstra
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

As the comment notes; it is a potentially dangerous operation. Just
use tlb_flush_mmu(), that will skip the (double) TLB invalidate if
it really isn't needed anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |   10 +++-------
 mm/memory.c               |    2 +-
 mm/mmu_gather.c           |    2 +-
 3 files changed, 5 insertions(+), 9 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -67,16 +67,13 @@
  *    call before __tlb_remove_page*() to set the current page-size; implies a
  *    possible tlb_flush_mmu() call.
  *
- *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
+ *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly()
  *
  *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
  *                              related state, like the range)
  *
- *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
- *			     sure no additional tlb_remove_page()
- *			     calls happen between _tlbonly() and this.
- *
- *    tlb_flush_mmu() - the above two calls.
+ *    tlb_flush_mmu() - in addition to the above TLB invalidate, also frees
+ *			whatever pages are still batched.
  *
  *  - mmu_gather::fullmm
  *
@@ -274,7 +271,6 @@ void arch_tlb_gather_mmu(struct mmu_gath
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
-void tlb_flush_mmu_free(struct mmu_gather *tlb);
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1155,7 +1155,7 @@ static unsigned long zap_pte_range(struc
 	 */
 	if (force_flush) {
 		force_flush = 0;
-		tlb_flush_mmu_free(tlb);
+		tlb_flush_mmu(tlb);
 		if (addr != end)
 			goto again;
 	}
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,7 +91,7 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-void tlb_flush_mmu_free(struct mmu_gather *tlb)
+static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free()
  2018-09-26 11:36 ` [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free() Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

As the comment notes; it is a potentially dangerous operation. Just
use tlb_flush_mmu(), that will skip the (double) TLB invalidate if
it really isn't needed anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |   10 +++-------
 mm/memory.c               |    2 +-
 mm/mmu_gather.c           |    2 +-
 3 files changed, 5 insertions(+), 9 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -67,16 +67,13 @@
  *    call before __tlb_remove_page*() to set the current page-size; implies a
  *    possible tlb_flush_mmu() call.
  *
- *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly() / tlb_flush_mmu_free()
+ *  - tlb_flush_mmu() / tlb_flush_mmu_tlbonly()
  *
  *    tlb_flush_mmu_tlbonly() - does the TLB invalidate (and resets
  *                              related state, like the range)
  *
- *    tlb_flush_mmu_free() - frees the queued pages; make absolutely
- *			     sure no additional tlb_remove_page()
- *			     calls happen between _tlbonly() and this.
- *
- *    tlb_flush_mmu() - the above two calls.
+ *    tlb_flush_mmu() - in addition to the above TLB invalidate, also frees
+ *			whatever pages are still batched.
  *
  *  - mmu_gather::fullmm
  *
@@ -274,7 +271,6 @@ void arch_tlb_gather_mmu(struct mmu_gath
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
-void tlb_flush_mmu_free(struct mmu_gather *tlb);
 
 static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 				      unsigned long address,
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1155,7 +1155,7 @@ static unsigned long zap_pte_range(struc
 	 */
 	if (force_flush) {
 		force_flush = 0;
-		tlb_flush_mmu_free(tlb);
+		tlb_flush_mmu(tlb);
 		if (addr != end)
 			goto again;
 	}
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,7 +91,7 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-void tlb_flush_mmu_free(struct mmu_gather *tlb)
+static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush()
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (17 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free() Peter Zijlstra
@ 2018-09-26 11:36 ` Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
  2018-09-26 12:47 ` [PATCH 00/18] my generic mmu_gather patches Will Deacon
  2018-12-11  5:50 ` Aneesh Kumar K.V
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

There are no external users of this API (nor should there be); remove it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    1 -
 mm/mmu_gather.c           |   34 +++++++++++++++++-----------------
 2 files changed, 17 insertions(+), 18 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -174,7 +174,6 @@ struct mmu_table_batch {
 #define MAX_TABLE_BATCH		\
 	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
 
-extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 #endif
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,22 +91,6 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-static void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb_batch_pages_flush(tlb);
-#endif
-}
-
-void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 
 /*
@@ -159,7 +143,7 @@ static void tlb_remove_table_rcu(struct
 	free_page((unsigned long)batch);
 }
 
-void tlb_table_flush(struct mmu_gather *tlb)
+static void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
@@ -191,6 +175,22 @@ void tlb_remove_table(struct mmu_gather
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+static void tlb_flush_mmu_free(struct mmu_gather *tlb)
+{
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_pages_flush(tlb);
+#endif
+}
+
+void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu_tlbonly(tlb);
+	tlb_flush_mmu_free(tlb);
+}
+
 /**
  * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down
  * @tlb: the mmu_gather structure to initialize

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush()
  2018-09-26 11:36 ` [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush() Peter Zijlstra
@ 2018-09-26 11:36   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 11:36 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux, heiko.carstens, riel

There are no external users of this API (nor should there be); remove it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/tlb.h |    1 -
 mm/mmu_gather.c           |   34 +++++++++++++++++-----------------
 2 files changed, 17 insertions(+), 18 deletions(-)

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -174,7 +174,6 @@ struct mmu_table_batch {
 #define MAX_TABLE_BATCH		\
 	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
 
-extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 #endif
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -91,22 +91,6 @@ bool __tlb_remove_page_size(struct mmu_g
 
 #endif /* HAVE_MMU_GATHER_NO_GATHER */
 
-static void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
-	tlb_batch_pages_flush(tlb);
-#endif
-}
-
-void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush_mmu_tlbonly(tlb);
-	tlb_flush_mmu_free(tlb);
-}
-
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 
 /*
@@ -159,7 +143,7 @@ static void tlb_remove_table_rcu(struct
 	free_page((unsigned long)batch);
 }
 
-void tlb_table_flush(struct mmu_gather *tlb)
+static void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
@@ -191,6 +175,22 @@ void tlb_remove_table(struct mmu_gather
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+static void tlb_flush_mmu_free(struct mmu_gather *tlb)
+{
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
+#ifndef CONFIG_HAVE_MMU_GATHER_NO_GATHER
+	tlb_batch_pages_flush(tlb);
+#endif
+}
+
+void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu_tlbonly(tlb);
+	tlb_flush_mmu_free(tlb);
+}
+
 /**
  * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down
  * @tlb: the mmu_gather structure to initialize

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 00/18] my generic mmu_gather patches
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (18 preceding siblings ...)
  2018-09-26 11:36 ` [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush() Peter Zijlstra
@ 2018-09-26 12:47 ` Will Deacon
  2018-09-26 12:47   ` Will Deacon
  2018-12-11  5:50 ` Aneesh Kumar K.V
  20 siblings, 1 reply; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel, fengguang.wu

Hi Peter,

On Wed, Sep 26, 2018 at 01:36:23PM +0200, Peter Zijlstra wrote:
> Here is my current stash of generic mmu_gather patches that goes on top of Will's
> tlb patches:

FWIW, patches 1,2,15,16,17 and 18 look fine to me, so:

Acked-by: Will Deacon <will.deacon@arm.com>

for those.

I'll leave some minor comments on a few of the others.

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 00/18] my generic mmu_gather patches
  2018-09-26 12:47 ` [PATCH 00/18] my generic mmu_gather patches Will Deacon
@ 2018-09-26 12:47   ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel, fengguang.wu

Hi Peter,

On Wed, Sep 26, 2018 at 01:36:23PM +0200, Peter Zijlstra wrote:
> Here is my current stash of generic mmu_gather patches that goes on top of Will's
> tlb patches:

FWIW, patches 1,2,15,16,17 and 18 look fine to me, so:

Acked-by: Will Deacon <will.deacon@arm.com>

for those.

I'll leave some minor comments on a few of the others.

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-26 11:36 ` [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
@ 2018-09-26 12:53   ` Will Deacon
  2018-09-26 12:53     ` Will Deacon
  1 sibling, 1 reply; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:29PM +0200, Peter Zijlstra wrote:
> Needed for ia64 -- alternatively we drop the entire hook.

Ack for dropping the hook.

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish()
  2018-09-26 12:53   ` Will Deacon
@ 2018-09-26 12:53     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:29PM +0200, Peter Zijlstra wrote:
> Needed for ia64 -- alternatively we drop the entire hook.

Ack for dropping the hook.

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 11:36 ` [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
@ 2018-09-26 12:53   ` Will Deacon
  2018-09-26 12:53     ` Will Deacon
  2018-09-26 13:11     ` Peter Zijlstra
  1 sibling, 2 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:28PM +0200, Peter Zijlstra wrote:
> Provide a generic tlb_flush() implementation that relies on
> flush_tlb_range(). This is a little awkward because flush_tlb_range()
> assumes a VMA for range invalidation, but we no longer have one.
> 
> Audit of all flush_tlb_range() implementations shows only vma->vm_mm
> and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
> invalidates) and VM_HUGETLB (large TLB invalidate) are used.
> 
> Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
> 'fake' VMA.
> 
> This allows architectures that have a reasonably efficient
> flush_tlb_range() to not require any additional effort.
> 
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/include/asm/tlb.h   |    1 
>  arch/powerpc/include/asm/tlb.h |    1 
>  arch/riscv/include/asm/tlb.h   |    1 
>  arch/x86/include/asm/tlb.h     |    1 
>  include/asm-generic/tlb.h      |   80 +++++++++++++++++++++++++++++++++++------
>  5 files changed, 74 insertions(+), 10 deletions(-)
> 
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -27,6 +27,7 @@ static inline void __tlb_remove_table(vo
>  	free_page_and_swap_cache((struct page *)_table);
>  }
>  
> +#define tlb_flush tlb_flush
>  static void tlb_flush(struct mmu_gather *tlb);
>  
>  #include <asm-generic/tlb.h>
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -28,6 +28,7 @@
>  #define tlb_end_vma(tlb, vma)	do { } while (0)
>  #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
>  
> +#define tlb_flush tlb_flush
>  extern void tlb_flush(struct mmu_gather *tlb);
>  
>  /* Get the generic bits... */
> --- a/arch/riscv/include/asm/tlb.h
> +++ b/arch/riscv/include/asm/tlb.h
> @@ -18,6 +18,7 @@ struct mmu_gather;
>  
>  static void tlb_flush(struct mmu_gather *tlb);
>  
> +#define tlb_flush tlb_flush
>  #include <asm-generic/tlb.h>
>  
>  static inline void tlb_flush(struct mmu_gather *tlb)
> --- a/arch/x86/include/asm/tlb.h
> +++ b/arch/x86/include/asm/tlb.h
> @@ -6,6 +6,7 @@
>  #define tlb_end_vma(tlb, vma) do { } while (0)
>  #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
>  
> +#define tlb_flush tlb_flush
>  static inline void tlb_flush(struct mmu_gather *tlb);
>  
>  #include <asm-generic/tlb.h>
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -241,6 +241,12 @@ struct mmu_gather {
>  	unsigned int		cleared_puds : 1;
>  	unsigned int		cleared_p4ds : 1;
>  
> +	/*
> +	 * tracks VM_EXEC | VM_HUGETLB in tlb_start_vma
> +	 */
> +	unsigned int		vma_exec : 1;
> +	unsigned int		vma_huge : 1;
> +
>  	unsigned int		batch_count;
>  
>  	struct mmu_gather_batch *active;
> @@ -282,7 +288,35 @@ static inline void __tlb_reset_range(str
>  	tlb->cleared_pmds = 0;
>  	tlb->cleared_puds = 0;
>  	tlb->cleared_p4ds = 0;
> +	/*
> +	 * Do not reset mmu_gather::vma_* fields here, we do not
> +	 * call into tlb_start_vma() again to set them if there is an
> +	 * intermediate flush.
> +	 */
> +}
> +
> +#ifndef tlb_flush
> +
> +#if defined(tlb_start_vma) || defined(tlb_end_vma)
> +#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> +#endif
> +
> +#define tlb_flush tlb_flush

Do we need this #define?

> @@ -353,19 +387,45 @@ static inline unsigned long tlb_get_unma
>   * the vmas are adjusted to only cover the region to be torn down.
>   */
>  #ifndef tlb_start_vma
> -#define tlb_start_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
> -} while (0)
> +#define tlb_start_vma tlb_start_vma

Or this one?

> +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> +{
> +	if (tlb->fullmm)
> +		return;
> +
> +	/*
> +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> +	 * mips-4k) flush only large pages.
> +	 *
> +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> +	 * range.
> +	 *
> +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> +	 * these values the batch is empty.
> +	 */
> +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);

Hmm, does this result in code generation for archs that don't care about the
vm_flags?

> +	flush_cache_range(vma, vma->vm_start, vma->vm_end);
> +}
>  #endif
>  
>  #ifndef tlb_end_vma
> -#define tlb_end_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		tlb_flush_mmu_tlbonly(tlb);				\
> -} while (0)
> +#define tlb_end_vma tlb_end_vma

Another #define we can drop?

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 12:53   ` Will Deacon
@ 2018-09-26 12:53     ` Will Deacon
  2018-09-26 13:11     ` Peter Zijlstra
  1 sibling, 0 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:28PM +0200, Peter Zijlstra wrote:
> Provide a generic tlb_flush() implementation that relies on
> flush_tlb_range(). This is a little awkward because flush_tlb_range()
> assumes a VMA for range invalidation, but we no longer have one.
> 
> Audit of all flush_tlb_range() implementations shows only vma->vm_mm
> and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
> invalidates) and VM_HUGETLB (large TLB invalidate) are used.
> 
> Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
> 'fake' VMA.
> 
> This allows architectures that have a reasonably efficient
> flush_tlb_range() to not require any additional effort.
> 
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/include/asm/tlb.h   |    1 
>  arch/powerpc/include/asm/tlb.h |    1 
>  arch/riscv/include/asm/tlb.h   |    1 
>  arch/x86/include/asm/tlb.h     |    1 
>  include/asm-generic/tlb.h      |   80 +++++++++++++++++++++++++++++++++++------
>  5 files changed, 74 insertions(+), 10 deletions(-)
> 
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -27,6 +27,7 @@ static inline void __tlb_remove_table(vo
>  	free_page_and_swap_cache((struct page *)_table);
>  }
>  
> +#define tlb_flush tlb_flush
>  static void tlb_flush(struct mmu_gather *tlb);
>  
>  #include <asm-generic/tlb.h>
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -28,6 +28,7 @@
>  #define tlb_end_vma(tlb, vma)	do { } while (0)
>  #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
>  
> +#define tlb_flush tlb_flush
>  extern void tlb_flush(struct mmu_gather *tlb);
>  
>  /* Get the generic bits... */
> --- a/arch/riscv/include/asm/tlb.h
> +++ b/arch/riscv/include/asm/tlb.h
> @@ -18,6 +18,7 @@ struct mmu_gather;
>  
>  static void tlb_flush(struct mmu_gather *tlb);
>  
> +#define tlb_flush tlb_flush
>  #include <asm-generic/tlb.h>
>  
>  static inline void tlb_flush(struct mmu_gather *tlb)
> --- a/arch/x86/include/asm/tlb.h
> +++ b/arch/x86/include/asm/tlb.h
> @@ -6,6 +6,7 @@
>  #define tlb_end_vma(tlb, vma) do { } while (0)
>  #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
>  
> +#define tlb_flush tlb_flush
>  static inline void tlb_flush(struct mmu_gather *tlb);
>  
>  #include <asm-generic/tlb.h>
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -241,6 +241,12 @@ struct mmu_gather {
>  	unsigned int		cleared_puds : 1;
>  	unsigned int		cleared_p4ds : 1;
>  
> +	/*
> +	 * tracks VM_EXEC | VM_HUGETLB in tlb_start_vma
> +	 */
> +	unsigned int		vma_exec : 1;
> +	unsigned int		vma_huge : 1;
> +
>  	unsigned int		batch_count;
>  
>  	struct mmu_gather_batch *active;
> @@ -282,7 +288,35 @@ static inline void __tlb_reset_range(str
>  	tlb->cleared_pmds = 0;
>  	tlb->cleared_puds = 0;
>  	tlb->cleared_p4ds = 0;
> +	/*
> +	 * Do not reset mmu_gather::vma_* fields here, we do not
> +	 * call into tlb_start_vma() again to set them if there is an
> +	 * intermediate flush.
> +	 */
> +}
> +
> +#ifndef tlb_flush
> +
> +#if defined(tlb_start_vma) || defined(tlb_end_vma)
> +#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> +#endif
> +
> +#define tlb_flush tlb_flush

Do we need this #define?

> @@ -353,19 +387,45 @@ static inline unsigned long tlb_get_unma
>   * the vmas are adjusted to only cover the region to be torn down.
>   */
>  #ifndef tlb_start_vma
> -#define tlb_start_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
> -} while (0)
> +#define tlb_start_vma tlb_start_vma

Or this one?

> +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> +{
> +	if (tlb->fullmm)
> +		return;
> +
> +	/*
> +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> +	 * mips-4k) flush only large pages.
> +	 *
> +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> +	 * range.
> +	 *
> +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> +	 * these values the batch is empty.
> +	 */
> +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);

Hmm, does this result in code generation for archs that don't care about the
vm_flags?

> +	flush_cache_range(vma, vma->vm_start, vma->vm_end);
> +}
>  #endif
>  
>  #ifndef tlb_end_vma
> -#define tlb_end_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		tlb_flush_mmu_tlbonly(tlb);				\
> -} while (0)
> +#define tlb_end_vma tlb_end_vma

Another #define we can drop?

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/18] arm/tlb: Convert to generic mmu_gather
  2018-09-26 11:36 ` [PATCH 08/18] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
@ 2018-09-26 12:54   ` Will Deacon
  2018-09-26 12:54     ` Will Deacon
  1 sibling, 1 reply; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:31PM +0200, Peter Zijlstra wrote:
> Generic mmu_gather provides everything that ARM needs:
> 
>  - range tracking
>  - RCU table free
>  - VM_EXEC tracking
>  - VIPT cache flushing
> 
> The one notable curiosity is the 'funny' range tracking for classical
> ARM in __pte_free_tlb().
> 
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
>  1 file changed, 14 insertions(+), 241 deletions(-)

[...]

>  static inline void
> -tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
> +__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
> -	tlb_add_flush(tlb, addr);
> -}
> -
> -#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
> -#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
> -#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
> -
> -#define tlb_migrate_finish(mm)		do { } while (0)
> -
> -static inline void tlb_change_page_size(struct mmu_gather *tlb,
> -						     unsigned int page_size)
> -{
> -}
> -
> -static inline void tlb_flush_remove_tables(struct mm_struct *mm)
> -{
> -}
> +#ifdef CONFIG_ARM_LPAE
> +	struct page *page = virt_to_page(pmdp);
>  
> -static inline void tlb_flush_remove_tables_local(void *arg)
> -{
> +	pgtable_pmd_page_dtor(page);

The dtor() is a NOP for Arm, so I don't think you need too call it (and we
never call the ctor() afaict). I wonder if should be caring about this on
arm64...

Other than that:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/18] arm/tlb: Convert to generic mmu_gather
  2018-09-26 12:54   ` Will Deacon
@ 2018-09-26 12:54     ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-26 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:36:31PM +0200, Peter Zijlstra wrote:
> Generic mmu_gather provides everything that ARM needs:
> 
>  - range tracking
>  - RCU table free
>  - VM_EXEC tracking
>  - VIPT cache flushing
> 
> The one notable curiosity is the 'funny' range tracking for classical
> ARM in __pte_free_tlb().
> 
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm/include/asm/tlb.h |  255 ++-------------------------------------------
>  1 file changed, 14 insertions(+), 241 deletions(-)

[...]

>  static inline void
> -tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
> +__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
> -	tlb_add_flush(tlb, addr);
> -}
> -
> -#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
> -#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
> -#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
> -
> -#define tlb_migrate_finish(mm)		do { } while (0)
> -
> -static inline void tlb_change_page_size(struct mmu_gather *tlb,
> -						     unsigned int page_size)
> -{
> -}
> -
> -static inline void tlb_flush_remove_tables(struct mm_struct *mm)
> -{
> -}
> +#ifdef CONFIG_ARM_LPAE
> +	struct page *page = virt_to_page(pmdp);
>  
> -static inline void tlb_flush_remove_tables_local(void *arg)
> -{
> +	pgtable_pmd_page_dtor(page);

The dtor() is a NOP for Arm, so I don't think you need too call it (and we
never call the ctor() afaict). I wonder if should be caring about this on
arm64...

Other than that:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 12:53   ` Will Deacon
  2018-09-26 12:53     ` Will Deacon
@ 2018-09-26 13:11     ` Peter Zijlstra
  2018-09-26 13:11       ` Peter Zijlstra
  2018-09-26 18:07       ` Peter Zijlstra
  1 sibling, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 13:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:53:35PM +0100, Will Deacon wrote:
> On Wed, Sep 26, 2018 at 01:36:28PM +0200, Peter Zijlstra wrote:
> > +#ifndef tlb_flush
> > +
> > +#if defined(tlb_start_vma) || defined(tlb_end_vma)
> > +#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> > +#endif
> > +
> > +#define tlb_flush tlb_flush
> 
> Do we need this #define?

Probably not, that was just my fingers doing the normal #ifndef #define
pattern. I'll take em out back for a 'hug' :-)

> > +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> > +{
> > +	if (tlb->fullmm)
> > +		return;
> > +
> > +	/*
> > +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> > +	 * mips-4k) flush only large pages.
> > +	 *
> > +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> > +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> > +	 * range.
> > +	 *
> > +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> > +	 * these values the batch is empty.
> > +	 */
> > +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> > +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> 
> Hmm, does this result in code generation for archs that don't care about the
> vm_flags?

Yes. It's not much code, but if you deeply care we could frob things to
get rid of it.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 13:11     ` Peter Zijlstra
@ 2018-09-26 13:11       ` Peter Zijlstra
  2018-09-26 18:07       ` Peter Zijlstra
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 13:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 01:53:35PM +0100, Will Deacon wrote:
> On Wed, Sep 26, 2018 at 01:36:28PM +0200, Peter Zijlstra wrote:
> > +#ifndef tlb_flush
> > +
> > +#if defined(tlb_start_vma) || defined(tlb_end_vma)
> > +#error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
> > +#endif
> > +
> > +#define tlb_flush tlb_flush
> 
> Do we need this #define?

Probably not, that was just my fingers doing the normal #ifndef #define
pattern. I'll take em out back for a 'hug' :-)

> > +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> > +{
> > +	if (tlb->fullmm)
> > +		return;
> > +
> > +	/*
> > +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> > +	 * mips-4k) flush only large pages.
> > +	 *
> > +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> > +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> > +	 * range.
> > +	 *
> > +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> > +	 * these values the batch is empty.
> > +	 */
> > +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> > +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> 
> Hmm, does this result in code generation for archs that don't care about the
> vm_flags?

Yes. It's not much code, but if you deeply care we could frob things to
get rid of it.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 13:11     ` Peter Zijlstra
  2018-09-26 13:11       ` Peter Zijlstra
@ 2018-09-26 18:07       ` Peter Zijlstra
  2018-09-26 18:07         ` Peter Zijlstra
  2018-09-27 12:14         ` Will Deacon
  1 sibling, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 18:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 03:11:41PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 26, 2018 at 01:53:35PM +0100, Will Deacon wrote:

> > > +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> > > +{
> > > +	if (tlb->fullmm)
> > > +		return;
> > > +
> > > +	/*
> > > +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> > > +	 * mips-4k) flush only large pages.
> > > +	 *
> > > +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> > > +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> > > +	 * range.
> > > +	 *
> > > +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> > > +	 * these values the batch is empty.
> > > +	 */
> > > +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> > > +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> > 
> > Hmm, does this result in code generation for archs that don't care about the
> > vm_flags?
> 
> Yes. It's not much code, but if you deeply care we could frob things to
> get rid of it.

Something a little like the below... not particularly pretty but should
work.

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -305,7 +305,8 @@ static inline void __tlb_reset_range(str
 #error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
 #endif
 
-#define tlb_flush tlb_flush
+#define generic_tlb_flush
+
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
 	if (tlb->fullmm || tlb->need_flush_all) {
@@ -391,12 +392,12 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma tlb_start_vma
 static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 {
 	if (tlb->fullmm)
 		return;
 
+#ifdef generic_tlb_flush
 	/*
 	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
 	 * mips-4k) flush only large pages.
@@ -410,13 +411,13 @@ static inline void tlb_start_vma(struct
 	 */
 	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
 	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
+#endif
 
 	flush_cache_range(vma, vma->vm_start, vma->vm_end);
 }
 #endif
 
 #ifndef tlb_end_vma
-#define tlb_end_vma tlb_end_vma
 static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 {
 	if (tlb->fullmm)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 18:07       ` Peter Zijlstra
@ 2018-09-26 18:07         ` Peter Zijlstra
  2018-09-27 12:14         ` Will Deacon
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-09-26 18:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 03:11:41PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 26, 2018 at 01:53:35PM +0100, Will Deacon wrote:

> > > +static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> > > +{
> > > +	if (tlb->fullmm)
> > > +		return;
> > > +
> > > +	/*
> > > +	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
> > > +	 * mips-4k) flush only large pages.
> > > +	 *
> > > +	 * flush_tlb_range() implementations that flush I-TLB also flush D-TLB
> > > +	 * (tile, xtensa, arm), so it's ok to just add VM_EXEC to an existing
> > > +	 * range.
> > > +	 *
> > > +	 * We rely on tlb_end_vma() to issue a flush, such that when we reset
> > > +	 * these values the batch is empty.
> > > +	 */
> > > +	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
> > > +	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> > 
> > Hmm, does this result in code generation for archs that don't care about the
> > vm_flags?
> 
> Yes. It's not much code, but if you deeply care we could frob things to
> get rid of it.

Something a little like the below... not particularly pretty but should
work.

--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -305,7 +305,8 @@ static inline void __tlb_reset_range(str
 #error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
 #endif
 
-#define tlb_flush tlb_flush
+#define generic_tlb_flush
+
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
 	if (tlb->fullmm || tlb->need_flush_all) {
@@ -391,12 +392,12 @@ static inline unsigned long tlb_get_unma
  * the vmas are adjusted to only cover the region to be torn down.
  */
 #ifndef tlb_start_vma
-#define tlb_start_vma tlb_start_vma
 static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 {
 	if (tlb->fullmm)
 		return;
 
+#ifdef generic_tlb_flush
 	/*
 	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
 	 * mips-4k) flush only large pages.
@@ -410,13 +411,13 @@ static inline void tlb_start_vma(struct
 	 */
 	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
 	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
+#endif
 
 	flush_cache_range(vma, vma->vm_start, vma->vm_end);
 }
 #endif
 
 #ifndef tlb_end_vma
-#define tlb_end_vma tlb_end_vma
 static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 {
 	if (tlb->fullmm)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-26 18:07       ` Peter Zijlstra
  2018-09-26 18:07         ` Peter Zijlstra
@ 2018-09-27 12:14         ` Will Deacon
  2018-09-27 12:14           ` Will Deacon
  1 sibling, 1 reply; 64+ messages in thread
From: Will Deacon @ 2018-09-27 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 08:07:27PM +0200, Peter Zijlstra wrote:
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -305,7 +305,8 @@ static inline void __tlb_reset_range(str
>  #error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
>  #endif
>  
> -#define tlb_flush tlb_flush
> +#define generic_tlb_flush
> +
>  static inline void tlb_flush(struct mmu_gather *tlb)
>  {
>  	if (tlb->fullmm || tlb->need_flush_all) {
> @@ -391,12 +392,12 @@ static inline unsigned long tlb_get_unma
>   * the vmas are adjusted to only cover the region to be torn down.
>   */
>  #ifndef tlb_start_vma
> -#define tlb_start_vma tlb_start_vma
>  static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
>  {
>  	if (tlb->fullmm)
>  		return;
>  
> +#ifdef generic_tlb_flush
>  	/*
>  	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
>  	 * mips-4k) flush only large pages.
> @@ -410,13 +411,13 @@ static inline void tlb_start_vma(struct
>  	 */
>  	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
>  	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> +#endif

Alternatively, we could wrap the two assignments above in a macro like:

	tlb_update_vma_flags(tlb, vma)

which could be empty if the generic tlb_flush isn't in use?

Anyway, as long as we resolve this one way or the other, you can add my Ack:

Acked-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush
  2018-09-27 12:14         ` Will Deacon
@ 2018-09-27 12:14           ` Will Deacon
  0 siblings, 0 replies; 64+ messages in thread
From: Will Deacon @ 2018-09-27 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: aneesh.kumar, akpm, npiggin, linux-arch, linux-mm, linux-kernel,
	linux, heiko.carstens, riel

On Wed, Sep 26, 2018 at 08:07:27PM +0200, Peter Zijlstra wrote:
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -305,7 +305,8 @@ static inline void __tlb_reset_range(str
>  #error Default tlb_flush() relies on default tlb_start_vma() and tlb_end_vma()
>  #endif
>  
> -#define tlb_flush tlb_flush
> +#define generic_tlb_flush
> +
>  static inline void tlb_flush(struct mmu_gather *tlb)
>  {
>  	if (tlb->fullmm || tlb->need_flush_all) {
> @@ -391,12 +392,12 @@ static inline unsigned long tlb_get_unma
>   * the vmas are adjusted to only cover the region to be torn down.
>   */
>  #ifndef tlb_start_vma
> -#define tlb_start_vma tlb_start_vma
>  static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
>  {
>  	if (tlb->fullmm)
>  		return;
>  
> +#ifdef generic_tlb_flush
>  	/*
>  	 * flush_tlb_range() implementations that look at VM_HUGETLB (tile,
>  	 * mips-4k) flush only large pages.
> @@ -410,13 +411,13 @@ static inline void tlb_start_vma(struct
>  	 */
>  	tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
>  	tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> +#endif

Alternatively, we could wrap the two assignments above in a macro like:

	tlb_update_vma_flags(tlb, vma)

which could be empty if the generic tlb_flush isn't in use?

Anyway, as long as we resolve this one way or the other, you can add my Ack:

Acked-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-09-26 11:36 ` [PATCH 12/18] arch/tlb: Clean up simple architectures Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
@ 2018-10-03 17:03   ` Vineet Gupta
  2018-10-03 17:03     ` Vineet Gupta
  2018-10-11 15:04     ` Peter Zijlstra
  1 sibling, 2 replies; 64+ messages in thread
From: Vineet Gupta @ 2018-10-03 17:03 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, linux, heiko.carstens, riel,
	Richard Henderson, Vineet Gupta, Mark Salter, Richard Kuo,
	Michal Simek, Paul Burton, Greentime Hu, Ley Foon Tan,
	Jonas Bonn, Helge Deller, David S. Miller, Guan Xuetao,
	Max Filippov

On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
> There are generally two cases:
>
>  1) either the platform has an efficient flush_tlb_range() and
>     asm-generic/tlb.h doesn't need any overrides at all.
>
>  2) or an architecture lacks an efficient flush_tlb_range() and
>     we override tlb_end_vma() and tlb_flush().
>
> Convert all 'simple' architectures to one of these two forms.
>
> alpha:	    has no range invalidate -> 2
> arc:	    already used flush_tlb_range() -> 1
> c6x:	    has no range invalidate -> 2
> h8300:	    has no mmu
> hexagon:    has an efficient flush_tlb_range() -> 1
>             (flush_tlb_mm() is in fact a full range invalidate,
> 	     so no need to shoot down everything)
> m68k:	    has inefficient flush_tlb_range() -> 2
> microblaze: has no flush_tlb_range() -> 2
> mips:	    has efficient flush_tlb_range() -> 1
> 	    (even though it currently seems to use flush_tlb_mm())
> nds32:	    already uses flush_tlb_range() -> 1
> nios2:	    has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> openrisc:   has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> parisc:	    already uses flush_tlb_range() -> 1
> sparc32:    already uses flush_tlb_range() -> 1
> unicore32:  has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> xtensa:	    has efficient flush_tlb_range() -> 1
>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Richard Kuo <rkuo@codeaurora.org>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Greentime Hu <green.hu@gmail.com>
> Cc: Ley Foon Tan <lftan@altera.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Helge Deller <deller@gmx.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/alpha/include/asm/tlb.h      |    2 --
>  arch/arc/include/asm/tlb.h        |   23 -----------------------
>  arch/c6x/include/asm/tlb.h        |    1 +
>  arch/h8300/include/asm/tlb.h      |    2 --
>  arch/hexagon/include/asm/tlb.h    |   12 ------------
>  arch/m68k/include/asm/tlb.h       |    1 -
>  arch/microblaze/include/asm/tlb.h |    4 +---
>  arch/mips/include/asm/tlb.h       |    8 --------
>  arch/nds32/include/asm/tlb.h      |   10 ----------
>  arch/nios2/include/asm/tlb.h      |    8 +++++---
>  arch/openrisc/include/asm/tlb.h   |    6 ++++--
>  arch/parisc/include/asm/tlb.h     |   13 -------------
>  arch/powerpc/include/asm/tlb.h    |    1 -
>  arch/sparc/include/asm/tlb_32.h   |   13 -------------
>  arch/unicore32/include/asm/tlb.h  |   10 ++++++----
>  arch/xtensa/include/asm/tlb.h     |   17 -----------------
>  16 files changed, 17 insertions(+), 114 deletions(-)
>
> --- a/arch/alpha/include/asm/tlb.h
> +++ b/arch/alpha/include/asm/tlb.h
> @@ -4,8 +4,6 @@
>  
>  #define tlb_start_vma(tlb, vma)			do { } while (0)
>  #define tlb_end_vma(tlb, vma)			do { } while (0)
> -#define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
> -
>  #define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
>  
>  #include <asm-generic/tlb.h>
> --- a/arch/arc/include/asm/tlb.h
> +++ b/arch/arc/include/asm/tlb.h
> @@ -9,29 +9,6 @@
>  #ifndef _ASM_ARC_TLB_H
>  #define _ASM_ARC_TLB_H
>  
> -#define tlb_flush(tlb)				\
> -do {						\
> -	if (tlb->fullmm)			\
> -		flush_tlb_mm((tlb)->mm);	\
> -} while (0)
> -
> -/*
> - * This pair is called at time of munmap/exit to flush cache and TLB entries
> - * for mappings being torn down.
> - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
> - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
> - *
> - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
> - */
> -
> -#define tlb_end_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
> -} while (0)
> -
> -#define __tlb_remove_tlb_entry(tlb, ptep, address)
> -
>  #include <linux/pagemap.h>
>  #include <asm-generic/tlb.h>

LGTM per discussion in an earlier thread. However given that for "simpler" arches
the whole series doesn't apply can you please beef up the changelog so I don't go
scratching my head 2 years down the line. It currently describes the hows of
things but not exactly whys: shift_arg_pages missing tlb_start_vma,
move_page_tables look dodgy, yady yadda ?

Thx,
-Vineet


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-03 17:03   ` Vineet Gupta
@ 2018-10-03 17:03     ` Vineet Gupta
  2018-10-11 15:04     ` Peter Zijlstra
  1 sibling, 0 replies; 64+ messages in thread
From: Vineet Gupta @ 2018-10-03 17:03 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, linux, heiko.carstens, riel,
	Richard Henderson, Vineet Gupta, Mark Salter, Richard Kuo,
	Michal Simek, Paul Burton, Greentime Hu, Ley Foon Tan,
	Jonas Bonn, Helge Deller, David S. Miller, Guan Xuetao,
	Max Filippov

On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
> There are generally two cases:
>
>  1) either the platform has an efficient flush_tlb_range() and
>     asm-generic/tlb.h doesn't need any overrides at all.
>
>  2) or an architecture lacks an efficient flush_tlb_range() and
>     we override tlb_end_vma() and tlb_flush().
>
> Convert all 'simple' architectures to one of these two forms.
>
> alpha:	    has no range invalidate -> 2
> arc:	    already used flush_tlb_range() -> 1
> c6x:	    has no range invalidate -> 2
> h8300:	    has no mmu
> hexagon:    has an efficient flush_tlb_range() -> 1
>             (flush_tlb_mm() is in fact a full range invalidate,
> 	     so no need to shoot down everything)
> m68k:	    has inefficient flush_tlb_range() -> 2
> microblaze: has no flush_tlb_range() -> 2
> mips:	    has efficient flush_tlb_range() -> 1
> 	    (even though it currently seems to use flush_tlb_mm())
> nds32:	    already uses flush_tlb_range() -> 1
> nios2:	    has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> openrisc:   has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> parisc:	    already uses flush_tlb_range() -> 1
> sparc32:    already uses flush_tlb_range() -> 1
> unicore32:  has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> xtensa:	    has efficient flush_tlb_range() -> 1
>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Richard Kuo <rkuo@codeaurora.org>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Greentime Hu <green.hu@gmail.com>
> Cc: Ley Foon Tan <lftan@altera.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Helge Deller <deller@gmx.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/alpha/include/asm/tlb.h      |    2 --
>  arch/arc/include/asm/tlb.h        |   23 -----------------------
>  arch/c6x/include/asm/tlb.h        |    1 +
>  arch/h8300/include/asm/tlb.h      |    2 --
>  arch/hexagon/include/asm/tlb.h    |   12 ------------
>  arch/m68k/include/asm/tlb.h       |    1 -
>  arch/microblaze/include/asm/tlb.h |    4 +---
>  arch/mips/include/asm/tlb.h       |    8 --------
>  arch/nds32/include/asm/tlb.h      |   10 ----------
>  arch/nios2/include/asm/tlb.h      |    8 +++++---
>  arch/openrisc/include/asm/tlb.h   |    6 ++++--
>  arch/parisc/include/asm/tlb.h     |   13 -------------
>  arch/powerpc/include/asm/tlb.h    |    1 -
>  arch/sparc/include/asm/tlb_32.h   |   13 -------------
>  arch/unicore32/include/asm/tlb.h  |   10 ++++++----
>  arch/xtensa/include/asm/tlb.h     |   17 -----------------
>  16 files changed, 17 insertions(+), 114 deletions(-)
>
> --- a/arch/alpha/include/asm/tlb.h
> +++ b/arch/alpha/include/asm/tlb.h
> @@ -4,8 +4,6 @@
>  
>  #define tlb_start_vma(tlb, vma)			do { } while (0)
>  #define tlb_end_vma(tlb, vma)			do { } while (0)
> -#define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
> -
>  #define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
>  
>  #include <asm-generic/tlb.h>
> --- a/arch/arc/include/asm/tlb.h
> +++ b/arch/arc/include/asm/tlb.h
> @@ -9,29 +9,6 @@
>  #ifndef _ASM_ARC_TLB_H
>  #define _ASM_ARC_TLB_H
>  
> -#define tlb_flush(tlb)				\
> -do {						\
> -	if (tlb->fullmm)			\
> -		flush_tlb_mm((tlb)->mm);	\
> -} while (0)
> -
> -/*
> - * This pair is called at time of munmap/exit to flush cache and TLB entries
> - * for mappings being torn down.
> - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
> - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
> - *
> - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
> - */
> -
> -#define tlb_end_vma(tlb, vma)						\
> -do {									\
> -	if (!tlb->fullmm)						\
> -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
> -} while (0)
> -
> -#define __tlb_remove_tlb_entry(tlb, ptep, address)
> -
>  #include <linux/pagemap.h>
>  #include <asm-generic/tlb.h>

LGTM per discussion in an earlier thread. However given that for "simpler" arches
the whole series doesn't apply can you please beef up the changelog so I don't go
scratching my head 2 years down the line. It currently describes the hows of
things but not exactly whys: shift_arg_pages missing tlb_start_vma,
move_page_tables look dodgy, yady yadda ?

Thx,
-Vineet


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-03 17:03   ` Vineet Gupta
  2018-10-03 17:03     ` Vineet Gupta
@ 2018-10-11 15:04     ` Peter Zijlstra
  2018-10-11 15:04       ` Peter Zijlstra
  2018-10-12 19:40       ` Vineet Gupta
  1 sibling, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-10-11 15:04 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan

On Wed, Oct 03, 2018 at 05:03:50PM +0000, Vineet Gupta wrote:
> On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
> > There are generally two cases:
> >
> >  1) either the platform has an efficient flush_tlb_range() and
> >     asm-generic/tlb.h doesn't need any overrides at all.
> >
> >  2) or an architecture lacks an efficient flush_tlb_range() and
> >     we override tlb_end_vma() and tlb_flush().
> >
> > Convert all 'simple' architectures to one of these two forms.
> >

> > --- a/arch/arc/include/asm/tlb.h
> > +++ b/arch/arc/include/asm/tlb.h
> > @@ -9,29 +9,6 @@
> >  #ifndef _ASM_ARC_TLB_H
> >  #define _ASM_ARC_TLB_H
> >  
> > -#define tlb_flush(tlb)				\
> > -do {						\
> > -	if (tlb->fullmm)			\
> > -		flush_tlb_mm((tlb)->mm);	\
> > -} while (0)
> > -
> > -/*
> > - * This pair is called at time of munmap/exit to flush cache and TLB entries
> > - * for mappings being torn down.
> > - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
> > - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
> > - *
> > - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
> > - */
> > -
> > -#define tlb_end_vma(tlb, vma)						\
> > -do {									\
> > -	if (!tlb->fullmm)						\
> > -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
> > -} while (0)
> > -
> > -#define __tlb_remove_tlb_entry(tlb, ptep, address)
> > -
> >  #include <linux/pagemap.h>
> >  #include <asm-generic/tlb.h>
> 
> LGTM per discussion in an earlier thread. However given that for "simpler" arches
> the whole series doesn't apply can you please beef up the changelog so I don't go
> scratching my head 2 years down the line. It currently describes the hows of
> things but not exactly whys: shift_arg_pages missing tlb_start_vma,
> move_page_tables look dodgy, yady yadda ?

Right you are. Thanks for pointing out the somewhat sparse Changelog;
typically I end up kicking myself a few years down the line.

I think I will in fact change the implementation a little and provide a
symbol/Kconfig to switch the default implementation between
flush_tlb_vma() and flush_tlb_mm().

That avoids some of the repetition. But see here a preview of the new
Changelog, does that clarify things enough?

---
Subject: arch/tlb: Clean up simple architectures
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Sep 4 17:04:07 CEST 2018

The generic mmu_gather implementation is geared towards range tracking
and provided the architecture provides a fairly efficient
flush_tlb_range() implementation (or provides a custom tlb_flush()
implementation) things will work well.

The one case this doesn't cover well is where there is no (efficient)
range invalidate at all. In this case we can select
MMU_GATHER_NO_RANGE.

So this reduces to two cases:

 1) either the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) or an architecture lacks an efficient flush_tlb_range() and
    we need to select MMU_GATHER_NO_RANGE.

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Note this also fixes a bug in the existing code for a number
platforms. Those platforms that did:

  tlb_end_vma() -> if (!fullmm) flush_tlb_*()
  tlb_flush -> if (full_mm) flush_tlb_mm()

missed the case of shift_arg_pages(), which doesn't have @fullmm set,
nor calls into tlb_*vma(), but still frees page-tables and thus needs
an invalidate. The new code handles this by detecting a non-empty
range, and either issuing the matching range invalidate or a full
invalidate, depending on the capabilities.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Helge Deller <deller@gmx.de>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Mark Salter <msalter@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-11 15:04     ` Peter Zijlstra
@ 2018-10-11 15:04       ` Peter Zijlstra
  2018-10-12 19:40       ` Vineet Gupta
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-10-11 15:04 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Helge Deller,
	David S. Miller, Guan Xuetao, Max Filippov

On Wed, Oct 03, 2018 at 05:03:50PM +0000, Vineet Gupta wrote:
> On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
> > There are generally two cases:
> >
> >  1) either the platform has an efficient flush_tlb_range() and
> >     asm-generic/tlb.h doesn't need any overrides at all.
> >
> >  2) or an architecture lacks an efficient flush_tlb_range() and
> >     we override tlb_end_vma() and tlb_flush().
> >
> > Convert all 'simple' architectures to one of these two forms.
> >

> > --- a/arch/arc/include/asm/tlb.h
> > +++ b/arch/arc/include/asm/tlb.h
> > @@ -9,29 +9,6 @@
> >  #ifndef _ASM_ARC_TLB_H
> >  #define _ASM_ARC_TLB_H
> >  
> > -#define tlb_flush(tlb)				\
> > -do {						\
> > -	if (tlb->fullmm)			\
> > -		flush_tlb_mm((tlb)->mm);	\
> > -} while (0)
> > -
> > -/*
> > - * This pair is called at time of munmap/exit to flush cache and TLB entries
> > - * for mappings being torn down.
> > - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
> > - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
> > - *
> > - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
> > - */
> > -
> > -#define tlb_end_vma(tlb, vma)						\
> > -do {									\
> > -	if (!tlb->fullmm)						\
> > -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
> > -} while (0)
> > -
> > -#define __tlb_remove_tlb_entry(tlb, ptep, address)
> > -
> >  #include <linux/pagemap.h>
> >  #include <asm-generic/tlb.h>
> 
> LGTM per discussion in an earlier thread. However given that for "simpler" arches
> the whole series doesn't apply can you please beef up the changelog so I don't go
> scratching my head 2 years down the line. It currently describes the hows of
> things but not exactly whys: shift_arg_pages missing tlb_start_vma,
> move_page_tables look dodgy, yady yadda ?

Right you are. Thanks for pointing out the somewhat sparse Changelog;
typically I end up kicking myself a few years down the line.

I think I will in fact change the implementation a little and provide a
symbol/Kconfig to switch the default implementation between
flush_tlb_vma() and flush_tlb_mm().

That avoids some of the repetition. But see here a preview of the new
Changelog, does that clarify things enough?

---
Subject: arch/tlb: Clean up simple architectures
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Sep 4 17:04:07 CEST 2018

The generic mmu_gather implementation is geared towards range tracking
and provided the architecture provides a fairly efficient
flush_tlb_range() implementation (or provides a custom tlb_flush()
implementation) things will work well.

The one case this doesn't cover well is where there is no (efficient)
range invalidate at all. In this case we can select
MMU_GATHER_NO_RANGE.

So this reduces to two cases:

 1) either the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) or an architecture lacks an efficient flush_tlb_range() and
    we need to select MMU_GATHER_NO_RANGE.

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Note this also fixes a bug in the existing code for a number
platforms. Those platforms that did:

  tlb_end_vma() -> if (!fullmm) flush_tlb_*()
  tlb_flush -> if (full_mm) flush_tlb_mm()

missed the case of shift_arg_pages(), which doesn't have @fullmm set,
nor calls into tlb_*vma(), but still frees page-tables and thus needs
an invalidate. The new code handles this by detecting a non-empty
range, and either issuing the matching range invalidate or a full
invalidate, depending on the capabilities.

Cc: Nick Piggin <npiggin@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Helge Deller <deller@gmx.de>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Mark Salter <msalter@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-11 15:04     ` Peter Zijlstra
  2018-10-11 15:04       ` Peter Zijlstra
@ 2018-10-12 19:40       ` Vineet Gupta
  2018-10-12 19:40         ` Vineet Gupta
  2018-10-15 14:14         ` Peter Zijlstra
  1 sibling, 2 replies; 64+ messages in thread
From: Vineet Gupta @ 2018-10-12 19:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan

On 10/11/2018 08:06 AM, Peter Zijlstra wrote:
> On Wed, Oct 03, 2018 at 05:03:50PM +0000, Vineet Gupta wrote:
>> On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
>>> There are generally two cases:
>>>
>>>  1) either the platform has an efficient flush_tlb_range() and
>>>     asm-generic/tlb.h doesn't need any overrides at all.
>>>
>>>  2) or an architecture lacks an efficient flush_tlb_range() and
>>>     we override tlb_end_vma() and tlb_flush().
>>>
>>> Convert all 'simple' architectures to one of these two forms.
>>>
>>> --- a/arch/arc/include/asm/tlb.h
>>> +++ b/arch/arc/include/asm/tlb.h
>>> @@ -9,29 +9,6 @@
>>>  #ifndef _ASM_ARC_TLB_H
>>>  #define _ASM_ARC_TLB_H
>>>  
>>> -#define tlb_flush(tlb)				\
>>> -do {						\
>>> -	if (tlb->fullmm)			\
>>> -		flush_tlb_mm((tlb)->mm);	\
>>> -} while (0)
>>> -
>>> -/*
>>> - * This pair is called at time of munmap/exit to flush cache and TLB entries
>>> - * for mappings being torn down.
>>> - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
>>> - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
>>> - *
>>> - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
>>> - */
>>> -
>>> -#define tlb_end_vma(tlb, vma)						\
>>> -do {									\
>>> -	if (!tlb->fullmm)						\
>>> -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
>>> -} while (0)
>>> -
>>> -#define __tlb_remove_tlb_entry(tlb, ptep, address)
>>> -
>>>  #include <linux/pagemap.h>
>>>  #include <asm-generic/tlb.h>
>> LGTM per discussion in an earlier thread. However given that for "simpler" arches
>> the whole series doesn't apply can you please beef up the changelog so I don't go
>> scratching my head 2 years down the line. It currently describes the hows of
>> things but not exactly whys: shift_arg_pages missing tlb_start_vma,
>> move_page_tables look dodgy, yady yadda ?
> Right you are. Thanks for pointing out the somewhat sparse Changelog;
> typically I end up kicking myself a few years down the line.
>
> I think I will in fact change the implementation a little and provide a
> symbol/Kconfig to switch the default implementation between
> flush_tlb_vma() and flush_tlb_mm().
>
> That avoids some of the repetition. But see here a preview of the new
> Changelog, does that clarify things enough?
>
> ---
> Subject: arch/tlb: Clean up simple architectures
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Sep 4 17:04:07 CEST 2018
>
> The generic mmu_gather implementation is geared towards range tracking
> and provided the architecture provides a fairly efficient
> flush_tlb_range() implementation (or provides a custom tlb_flush()
> implementation) things will work well.
>
> The one case this doesn't cover well is where there is no (efficient)
> range invalidate at all. In this case we can select
> MMU_GATHER_NO_RANGE.
>
> So this reduces to two cases:
>
>  1) either the platform has an efficient flush_tlb_range() and
>     asm-generic/tlb.h doesn't need any overrides at all.
>
>  2) or an architecture lacks an efficient flush_tlb_range() and
>     we need to select MMU_GATHER_NO_RANGE.
>
> Convert all 'simple' architectures to one of these two forms.
>
> alpha:	    has no range invalidate -> 2
> arc:	    already used flush_tlb_range() -> 1
> c6x:	    has no range invalidate -> 2
> hexagon:    has an efficient flush_tlb_range() -> 1
>             (flush_tlb_mm() is in fact a full range invalidate,
> 	     so no need to shoot down everything)
> m68k:	    has inefficient flush_tlb_range() -> 2
> microblaze: has no flush_tlb_range() -> 2
> mips:	    has efficient flush_tlb_range() -> 1
> 	    (even though it currently seems to use flush_tlb_mm())
> nds32:	    already uses flush_tlb_range() -> 1
> nios2:	    has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> openrisc:   has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> parisc:	    already uses flush_tlb_range() -> 1
> sparc32:    already uses flush_tlb_range() -> 1
> unicore32:  has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> xtensa:	    has efficient flush_tlb_range() -> 1
>
> Note this also fixes a bug in the existing code for a number
> platforms. Those platforms that did:
>
>   tlb_end_vma() -> if (!fullmm) flush_tlb_*()
>   tlb_flush -> if (full_mm) flush_tlb_mm()
>
> missed the case of shift_arg_pages(), which doesn't have @fullmm set,
> nor calls into tlb_*vma(), but still frees page-tables and thus needs
> an invalidate. The new code handles this by detecting a non-empty
> range, and either issuing the matching range invalidate or a full
> invalidate, depending on the capabilities.
>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Greentime Hu <green.hu@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ley Foon Tan <lftan@altera.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Richard Kuo <rkuo@codeaurora.org
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Very nice. Thx for doing this.

Once you have redone this, please point me to a branch so I can give this a spin.
I've always been interested in tracking down / optimizing the full TLB flushes -
which ARC implements by simply moving the MMU/process to a new ASID (TLB entries
tagged with an 8 bit value - unique per process). When I started looking into this
, a simple ls (fork+execve) would increment the ASID by 13 which I'd optimized to
a reasonable 4. Haven't checked that in recent times though so would be fun to
revive that measurement.

-Vineet

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-12 19:40       ` Vineet Gupta
@ 2018-10-12 19:40         ` Vineet Gupta
  2018-10-15 14:14         ` Peter Zijlstra
  1 sibling, 0 replies; 64+ messages in thread
From: Vineet Gupta @ 2018-10-12 19:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Helge Deller,
	David S. Miller, Guan Xuetao, Max Filippov, arcml

On 10/11/2018 08:06 AM, Peter Zijlstra wrote:
> On Wed, Oct 03, 2018 at 05:03:50PM +0000, Vineet Gupta wrote:
>> On 09/26/2018 04:56 AM, Peter Zijlstra wrote:
>>> There are generally two cases:
>>>
>>>  1) either the platform has an efficient flush_tlb_range() and
>>>     asm-generic/tlb.h doesn't need any overrides at all.
>>>
>>>  2) or an architecture lacks an efficient flush_tlb_range() and
>>>     we override tlb_end_vma() and tlb_flush().
>>>
>>> Convert all 'simple' architectures to one of these two forms.
>>>
>>> --- a/arch/arc/include/asm/tlb.h
>>> +++ b/arch/arc/include/asm/tlb.h
>>> @@ -9,29 +9,6 @@
>>>  #ifndef _ASM_ARC_TLB_H
>>>  #define _ASM_ARC_TLB_H
>>>  
>>> -#define tlb_flush(tlb)				\
>>> -do {						\
>>> -	if (tlb->fullmm)			\
>>> -		flush_tlb_mm((tlb)->mm);	\
>>> -} while (0)
>>> -
>>> -/*
>>> - * This pair is called at time of munmap/exit to flush cache and TLB entries
>>> - * for mappings being torn down.
>>> - * 1) cache-flush part -implemented via tlb_start_vma( ) for VIPT aliasing D$
>>> - * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range
>>> - *
>>> - * Note, read https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.org_lkml_2004_1_15_6&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=5jiyvgRek4SKK5DUWDBGufVcuLez5G-jJCh3K-ndHsg&s=7uAzzw_jdAXMfb07B-vGPh3V1vggbTAsB7xL6Kie47A&e=
>>> - */
>>> -
>>> -#define tlb_end_vma(tlb, vma)						\
>>> -do {									\
>>> -	if (!tlb->fullmm)						\
>>> -		flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
>>> -} while (0)
>>> -
>>> -#define __tlb_remove_tlb_entry(tlb, ptep, address)
>>> -
>>>  #include <linux/pagemap.h>
>>>  #include <asm-generic/tlb.h>
>> LGTM per discussion in an earlier thread. However given that for "simpler" arches
>> the whole series doesn't apply can you please beef up the changelog so I don't go
>> scratching my head 2 years down the line. It currently describes the hows of
>> things but not exactly whys: shift_arg_pages missing tlb_start_vma,
>> move_page_tables look dodgy, yady yadda ?
> Right you are. Thanks for pointing out the somewhat sparse Changelog;
> typically I end up kicking myself a few years down the line.
>
> I think I will in fact change the implementation a little and provide a
> symbol/Kconfig to switch the default implementation between
> flush_tlb_vma() and flush_tlb_mm().
>
> That avoids some of the repetition. But see here a preview of the new
> Changelog, does that clarify things enough?
>
> ---
> Subject: arch/tlb: Clean up simple architectures
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Sep 4 17:04:07 CEST 2018
>
> The generic mmu_gather implementation is geared towards range tracking
> and provided the architecture provides a fairly efficient
> flush_tlb_range() implementation (or provides a custom tlb_flush()
> implementation) things will work well.
>
> The one case this doesn't cover well is where there is no (efficient)
> range invalidate at all. In this case we can select
> MMU_GATHER_NO_RANGE.
>
> So this reduces to two cases:
>
>  1) either the platform has an efficient flush_tlb_range() and
>     asm-generic/tlb.h doesn't need any overrides at all.
>
>  2) or an architecture lacks an efficient flush_tlb_range() and
>     we need to select MMU_GATHER_NO_RANGE.
>
> Convert all 'simple' architectures to one of these two forms.
>
> alpha:	    has no range invalidate -> 2
> arc:	    already used flush_tlb_range() -> 1
> c6x:	    has no range invalidate -> 2
> hexagon:    has an efficient flush_tlb_range() -> 1
>             (flush_tlb_mm() is in fact a full range invalidate,
> 	     so no need to shoot down everything)
> m68k:	    has inefficient flush_tlb_range() -> 2
> microblaze: has no flush_tlb_range() -> 2
> mips:	    has efficient flush_tlb_range() -> 1
> 	    (even though it currently seems to use flush_tlb_mm())
> nds32:	    already uses flush_tlb_range() -> 1
> nios2:	    has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> openrisc:   has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> parisc:	    already uses flush_tlb_range() -> 1
> sparc32:    already uses flush_tlb_range() -> 1
> unicore32:  has inefficient flush_tlb_range() -> 2
> 	    (no limit on range iteration)
> xtensa:	    has efficient flush_tlb_range() -> 1
>
> Note this also fixes a bug in the existing code for a number
> platforms. Those platforms that did:
>
>   tlb_end_vma() -> if (!fullmm) flush_tlb_*()
>   tlb_flush -> if (full_mm) flush_tlb_mm()
>
> missed the case of shift_arg_pages(), which doesn't have @fullmm set,
> nor calls into tlb_*vma(), but still frees page-tables and thus needs
> an invalidate. The new code handles this by detecting a non-empty
> range, and either issuing the matching range invalidate or a full
> invalidate, depending on the capabilities.
>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michal Simek <monstr@monstr.eu>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Greentime Hu <green.hu@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ley Foon Tan <lftan@altera.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Richard Kuo <rkuo@codeaurora.org
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Very nice. Thx for doing this.

Once you have redone this, please point me to a branch so I can give this a spin.
I've always been interested in tracking down / optimizing the full TLB flushes -
which ARC implements by simply moving the MMU/process to a new ASID (TLB entries
tagged with an 8 bit value - unique per process). When I started looking into this
, a simple ls (fork+execve) would increment the ASID by 13 which I'd optimized to
a reasonable 4. Haven't checked that in recent times though so would be fun to
revive that measurement.

-Vineet

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-12 19:40       ` Vineet Gupta
  2018-10-12 19:40         ` Vineet Gupta
@ 2018-10-15 14:14         ` Peter Zijlstra
  2018-10-15 14:14           ` Peter Zijlstra
  1 sibling, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2018-10-15 14:14 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan

On Fri, Oct 12, 2018 at 07:40:04PM +0000, Vineet Gupta wrote:
> Very nice. Thx for doing this.
> 
> Once you have redone this, please point me to a branch so I can give this a spin.
> I've always been interested in tracking down / optimizing the full TLB flushes -
> which ARC implements by simply moving the MMU/process to a new ASID (TLB entries
> tagged with an 8 bit value - unique per process). When I started looking into this
> , a simple ls (fork+execve) would increment the ASID by 13 which I'd optimized to
> a reasonable 4. Haven't checked that in recent times though so would be fun to
> revive that measurement.

I just pushed out the latest version to:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git mm/tlb

(mandatory caution: that tree is unstable / throw-away)

I'll wait a few days to see what, if anything, comes back from 0day
before posting again.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 12/18] arch/tlb: Clean up simple architectures
  2018-10-15 14:14         ` Peter Zijlstra
@ 2018-10-15 14:14           ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2018-10-15 14:14 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: will.deacon, aneesh.kumar, akpm, npiggin, linux-arch, linux-mm,
	linux-kernel, linux, heiko.carstens, riel, Richard Henderson,
	Mark Salter, Richard Kuo, Michal Simek, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Helge Deller,
	David S. Miller, Guan Xuetao, Max Filippov, arcml

On Fri, Oct 12, 2018 at 07:40:04PM +0000, Vineet Gupta wrote:
> Very nice. Thx for doing this.
> 
> Once you have redone this, please point me to a branch so I can give this a spin.
> I've always been interested in tracking down / optimizing the full TLB flushes -
> which ARC implements by simply moving the MMU/process to a new ASID (TLB entries
> tagged with an 8 bit value - unique per process). When I started looking into this
> , a simple ls (fork+execve) would increment the ASID by 13 which I'd optimized to
> a reasonable 4. Haven't checked that in recent times though so would be fun to
> revive that measurement.

I just pushed out the latest version to:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git mm/tlb

(mandatory caution: that tree is unstable / throw-away)

I'll wait a few days to see what, if anything, comes back from 0day
before posting again.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER
  2018-09-26 11:36 ` [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
  2018-09-26 11:36   ` Peter Zijlstra
@ 2018-12-11  5:43   ` Aneesh Kumar K.V
  2018-12-11  5:43     ` Aneesh Kumar K.V
  1 sibling, 1 reply; 64+ messages in thread
From: Aneesh Kumar K.V @ 2018-12-11  5:43 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, Linus Torvalds, Martin Schwidefsky

Peter Zijlstra <peterz@infradead.org> writes:

> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
>
> Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
> mmu_gather code. If the option is set the mmu_gather will not
> track individual pages for delayed page free anymore. A platform
> that enables the option needs to provide its own implementation
> of the __tlb_remove_page_size function to free pages.

Can we rename this to HAVE_NO_BATCH_MMU_GATHER? 

>
> Cc: npiggin@gmail.com
> Cc: heiko.carstens@de.ibm.com
> Cc: will.deacon@arm.com
> Cc: aneesh.kumar@linux.vnet.ibm.com
> Cc: akpm@linux-foundation.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux@armlinux.org.uk
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com

-aneesh

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER
  2018-12-11  5:43   ` Aneesh Kumar K.V
@ 2018-12-11  5:43     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2018-12-11  5:43 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, linux, heiko.carstens, riel,
	Linus Torvalds, Martin Schwidefsky

Peter Zijlstra <peterz@infradead.org> writes:

> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
>
> Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
> mmu_gather code. If the option is set the mmu_gather will not
> track individual pages for delayed page free anymore. A platform
> that enables the option needs to provide its own implementation
> of the __tlb_remove_page_size function to free pages.

Can we rename this to HAVE_NO_BATCH_MMU_GATHER? 

>
> Cc: npiggin@gmail.com
> Cc: heiko.carstens@de.ibm.com
> Cc: will.deacon@arm.com
> Cc: aneesh.kumar@linux.vnet.ibm.com
> Cc: akpm@linux-foundation.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux@armlinux.org.uk
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com

-aneesh

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 00/18] my generic mmu_gather patches
  2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
                   ` (19 preceding siblings ...)
  2018-09-26 12:47 ` [PATCH 00/18] my generic mmu_gather patches Will Deacon
@ 2018-12-11  5:50 ` Aneesh Kumar K.V
  2018-12-11  5:50   ` Aneesh Kumar K.V
  20 siblings, 1 reply; 64+ messages in thread
From: Aneesh Kumar K.V @ 2018-12-11  5:50 UTC (permalink / raw)
  To: will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, peterz, linux,
	heiko.carstens, riel, fengguang.wu

Peter Zijlstra <peterz@infradead.org> writes:

> Hi,
>
> Here is my current stash of generic mmu_gather patches that goes on top of Will's
> tlb patches:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tlb/asm-generic
>
> And they include the s390 patches done by Heiko. At the end of this, there is
> not a single arch left with a custom mmu_gather.
>
> I've been slow posting these, because the 0-day bot seems to be having trouble
> and I've not been getting the regular cross-build green light emails that I
> otherwise rely upon.
>
> I hope to have addressed all the feedback from the last time, and I've added a
> bunch of missing Cc's from last time.
>
> Please review with care.

What is the update with this patch series? Looks good to be merged
upstream?

You can also add

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

to the series.

-aneesh

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 00/18] my generic mmu_gather patches
  2018-12-11  5:50 ` Aneesh Kumar K.V
@ 2018-12-11  5:50   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 64+ messages in thread
From: Aneesh Kumar K.V @ 2018-12-11  5:50 UTC (permalink / raw)
  To: Peter Zijlstra, will.deacon, aneesh.kumar, akpm, npiggin
  Cc: linux-arch, linux-mm, linux-kernel, linux, heiko.carstens, riel,
	fengguang.wu

Peter Zijlstra <peterz@infradead.org> writes:

> Hi,
>
> Here is my current stash of generic mmu_gather patches that goes on top of Will's
> tlb patches:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tlb/asm-generic
>
> And they include the s390 patches done by Heiko. At the end of this, there is
> not a single arch left with a custom mmu_gather.
>
> I've been slow posting these, because the 0-day bot seems to be having trouble
> and I've not been getting the regular cross-build green light emails that I
> otherwise rely upon.
>
> I hope to have addressed all the feedback from the last time, and I've added a
> bunch of missing Cc's from last time.
>
> Please review with care.

What is the update with this patch series? Looks good to be merged
upstream?

You can also add

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

to the series.

-aneesh

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2018-12-11  5:50 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-26 11:36 [PATCH 00/18] my generic mmu_gather patches Peter Zijlstra
2018-09-26 11:36 ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 01/18] asm-generic/tlb: Provide a comment Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 02/18] asm-generic/tlb: Provide HAVE_MMU_GATHER_PAGE_SIZE Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 03/18] x86/mm: Page size aware flush_tlb_mm_range() Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 04/18] asm-generic/tlb: Provide generic VIPT cache flush Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 05/18] asm-generic/tlb: Provide generic tlb_flush Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 12:53   ` Will Deacon
2018-09-26 12:53     ` Will Deacon
2018-09-26 13:11     ` Peter Zijlstra
2018-09-26 13:11       ` Peter Zijlstra
2018-09-26 18:07       ` Peter Zijlstra
2018-09-26 18:07         ` Peter Zijlstra
2018-09-27 12:14         ` Will Deacon
2018-09-27 12:14           ` Will Deacon
2018-09-26 11:36 ` [PATCH 06/18] asm-generic/tlb: Conditionally provide tlb_migrate_finish() Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 12:53   ` Will Deacon
2018-09-26 12:53     ` Will Deacon
2018-09-26 11:36 ` [PATCH 07/18] asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 08/18] arm/tlb: Convert to generic mmu_gather Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 12:54   ` Will Deacon
2018-09-26 12:54     ` Will Deacon
2018-09-26 11:36 ` [PATCH 09/18] ia64/tlb: Conver " Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 10/18] sh/tlb: Convert SH " Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 11/18] um/tlb: Convert " Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 12/18] arch/tlb: Clean up simple architectures Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-10-03 17:03   ` Vineet Gupta
2018-10-03 17:03     ` Vineet Gupta
2018-10-11 15:04     ` Peter Zijlstra
2018-10-11 15:04       ` Peter Zijlstra
2018-10-12 19:40       ` Vineet Gupta
2018-10-12 19:40         ` Vineet Gupta
2018-10-15 14:14         ` Peter Zijlstra
2018-10-15 14:14           ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 13/18] asm-generic/tlb: Introduce HAVE_MMU_GATHER_NO_GATHER Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-12-11  5:43   ` Aneesh Kumar K.V
2018-12-11  5:43     ` Aneesh Kumar K.V
2018-09-26 11:36 ` [PATCH 14/18] s390/tlb: convert to generic mmu_gather Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 15/18] asm-generic/tlb: Remove arch_tlb*_mmu() Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 16/18] asm-generic/tlb: Remove HAVE_GENERIC_MMU_GATHER Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 17/18] asm-generic/tlb: Remove tlb_flush_mmu_free() Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 11:36 ` [PATCH 18/18] asm-generic/tlb: Remove tlb_table_flush() Peter Zijlstra
2018-09-26 11:36   ` Peter Zijlstra
2018-09-26 12:47 ` [PATCH 00/18] my generic mmu_gather patches Will Deacon
2018-09-26 12:47   ` Will Deacon
2018-12-11  5:50 ` Aneesh Kumar K.V
2018-12-11  5:50   ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).