All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] Unify TLB gather implementations -v3
@ 2012-06-27 21:15 ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

Its been a while since I last send this out, but here goes..

There's no arch left over, I finally got s390 converted too.
The series is compile tested on:

 arm, powerpc64, sparc64, sparc32, s390x, arm, ia64, xtensa

I lack a working toolchain for: sh, avr32
Simply wouldn't build:          mips, parisc 
 
---
 arch/Kconfig                         |   16 ++
 arch/alpha/include/asm/tlb.h         |    2 -
 arch/arm/Kconfig                     |    1 +
 arch/arm/include/asm/tlb.h           |  183 ++--------------------
 arch/avr32/Kconfig                   |    1 +
 arch/avr32/include/asm/tlb.h         |   11 --
 arch/blackfin/include/asm/tlb.h      |    6 -
 arch/c6x/include/asm/tlb.h           |    2 -
 arch/cris/include/asm/tlb.h          |    1 -
 arch/frv/include/asm/tlb.h           |    5 -
 arch/h8300/include/asm/tlb.h         |   13 --
 arch/hexagon/include/asm/tlb.h       |    5 -
 arch/ia64/Kconfig                    |    1 +
 arch/ia64/include/asm/tlb.h          |  233 +---------------------------
 arch/ia64/include/asm/tlbflush.h     |   25 +++
 arch/ia64/mm/tlb.c                   |   24 +++-
 arch/m32r/include/asm/tlb.h          |    6 -
 arch/m68k/include/asm/tlb.h          |    6 -
 arch/microblaze/include/asm/tlb.h    |    2 -
 arch/mips/Kconfig                    |    1 +
 arch/mips/include/asm/tlb.h          |   15 --
 arch/mn10300/include/asm/tlb.h       |    5 -
 arch/openrisc/include/asm/tlb.h      |    1 -
 arch/parisc/Kconfig                  |    1 +
 arch/parisc/include/asm/tlb.h        |   15 --
 arch/powerpc/include/asm/tlb.h       |    2 -
 arch/powerpc/mm/hugetlbpage.c        |    4 +-
 arch/powerpc/mm/tlb_hash32.c         |   15 --
 arch/powerpc/mm/tlb_hash64.c         |   14 --
 arch/powerpc/mm/tlb_nohash.c         |    5 -
 arch/s390/Kconfig                    |    1 +
 arch/s390/include/asm/pgalloc.h      |    3 +
 arch/s390/include/asm/pgtable.h      |    1 +
 arch/s390/include/asm/tlb.h          |   71 ++-------
 arch/s390/mm/pgtable.c               |   63 +-------
 arch/score/include/asm/tlb.h         |    1 -
 arch/sh/Kconfig                      |    1 +
 arch/sh/include/asm/tlb.h            |   99 +-----------
 arch/sparc/Kconfig                   |    1 +
 arch/sparc/Makefile                  |    1 +
 arch/sparc/include/asm/tlb_32.h      |   15 --
 arch/sparc/include/asm/tlb_64.h      |    1 -
 arch/sparc/include/asm/tlbflush_64.h |   11 ++
 arch/tile/include/asm/tlb.h          |    1 -
 arch/um/Kconfig.common               |    1 +
 arch/um/include/asm/tlb.h            |  111 +-------------
 arch/um/kernel/tlb.c                 |   13 --
 arch/unicore32/include/asm/tlb.h     |    1 -
 arch/x86/include/asm/tlb.h           |    2 +-
 arch/x86/mm/pgtable.c                |    6 +-
 arch/xtensa/Kconfig                  |    1 +
 arch/xtensa/include/asm/tlb.h        |   24 ---
 arch/xtensa/mm/tlb.c                 |    2 +-
 include/asm-generic/4level-fixup.h   |    2 +-
 include/asm-generic/tlb.h            |  284 +++++++++++++++++++++++++++++-----
 mm/memory.c                          |   54 +++++--
 56 files changed, 415 insertions(+), 977 deletions(-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 00/20] Unify TLB gather implementations -v3
@ 2012-06-27 21:15 ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

Its been a while since I last send this out, but here goes..

There's no arch left over, I finally got s390 converted too.
The series is compile tested on:

 arm, powerpc64, sparc64, sparc32, s390x, arm, ia64, xtensa

I lack a working toolchain for: sh, avr32
Simply wouldn't build:          mips, parisc 
 
---
 arch/Kconfig                         |   16 ++
 arch/alpha/include/asm/tlb.h         |    2 -
 arch/arm/Kconfig                     |    1 +
 arch/arm/include/asm/tlb.h           |  183 ++--------------------
 arch/avr32/Kconfig                   |    1 +
 arch/avr32/include/asm/tlb.h         |   11 --
 arch/blackfin/include/asm/tlb.h      |    6 -
 arch/c6x/include/asm/tlb.h           |    2 -
 arch/cris/include/asm/tlb.h          |    1 -
 arch/frv/include/asm/tlb.h           |    5 -
 arch/h8300/include/asm/tlb.h         |   13 --
 arch/hexagon/include/asm/tlb.h       |    5 -
 arch/ia64/Kconfig                    |    1 +
 arch/ia64/include/asm/tlb.h          |  233 +---------------------------
 arch/ia64/include/asm/tlbflush.h     |   25 +++
 arch/ia64/mm/tlb.c                   |   24 +++-
 arch/m32r/include/asm/tlb.h          |    6 -
 arch/m68k/include/asm/tlb.h          |    6 -
 arch/microblaze/include/asm/tlb.h    |    2 -
 arch/mips/Kconfig                    |    1 +
 arch/mips/include/asm/tlb.h          |   15 --
 arch/mn10300/include/asm/tlb.h       |    5 -
 arch/openrisc/include/asm/tlb.h      |    1 -
 arch/parisc/Kconfig                  |    1 +
 arch/parisc/include/asm/tlb.h        |   15 --
 arch/powerpc/include/asm/tlb.h       |    2 -
 arch/powerpc/mm/hugetlbpage.c        |    4 +-
 arch/powerpc/mm/tlb_hash32.c         |   15 --
 arch/powerpc/mm/tlb_hash64.c         |   14 --
 arch/powerpc/mm/tlb_nohash.c         |    5 -
 arch/s390/Kconfig                    |    1 +
 arch/s390/include/asm/pgalloc.h      |    3 +
 arch/s390/include/asm/pgtable.h      |    1 +
 arch/s390/include/asm/tlb.h          |   71 ++-------
 arch/s390/mm/pgtable.c               |   63 +-------
 arch/score/include/asm/tlb.h         |    1 -
 arch/sh/Kconfig                      |    1 +
 arch/sh/include/asm/tlb.h            |   99 +-----------
 arch/sparc/Kconfig                   |    1 +
 arch/sparc/Makefile                  |    1 +
 arch/sparc/include/asm/tlb_32.h      |   15 --
 arch/sparc/include/asm/tlb_64.h      |    1 -
 arch/sparc/include/asm/tlbflush_64.h |   11 ++
 arch/tile/include/asm/tlb.h          |    1 -
 arch/um/Kconfig.common               |    1 +
 arch/um/include/asm/tlb.h            |  111 +-------------
 arch/um/kernel/tlb.c                 |   13 --
 arch/unicore32/include/asm/tlb.h     |    1 -
 arch/x86/include/asm/tlb.h           |    2 +-
 arch/x86/mm/pgtable.c                |    6 +-
 arch/xtensa/Kconfig                  |    1 +
 arch/xtensa/include/asm/tlb.h        |   24 ---
 arch/xtensa/mm/tlb.c                 |    2 +-
 include/asm-generic/4level-fixup.h   |    2 +-
 include/asm-generic/tlb.h            |  284 +++++++++++++++++++++++++++++-----
 mm/memory.c                          |   54 +++++--
 56 files changed, 415 insertions(+), 977 deletions(-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 01/20] mm, x86: Add HAVE_RCU_TABLE_FREE support
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mmu-generic-rcu.patch --]
[-- Type: text/plain, Size: 2566 bytes --]

Implements optional HAVE_RCU_TABLE_FREE support for x86.

This is useful for things like Xen and KVM where paravirt tlb flush
means the software page table walkers like GUP-fast cannot rely on
IRQs disabling like regular x86 can.

Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/include/asm/tlb.h |    1 +
 arch/x86/mm/pgtable.c      |    6 +++---
 include/asm-generic/tlb.h  |    9 +++++++++
 3 files changed, 13 insertions(+), 3 deletions(-)
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_TLB_H
 #define _ASM_X86_TLB_H
 
+#define __tlb_remove_table(table) free_page_and_swap_cache(table)
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -51,21 +51,21 @@ void ___pte_free_tlb(struct mmu_gather *
 {
 	pgtable_page_dtor(pte);
 	paravirt_release_pte(page_to_pfn(pte));
-	tlb_remove_page(tlb, pte);
+	tlb_remove_table(tlb, pte);
 }
 
 #if PAGETABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
 	paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
-	tlb_remove_page(tlb, virt_to_page(pmd));
+	tlb_remove_table(tlb, virt_to_page(pmd));
 }
 
 #if PAGETABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	tlb_remove_page(tlb, virt_to_page(pud));
+	tlb_remove_table(tlb, virt_to_page(pud));
 }
 #endif	/* PAGETABLE_LEVELS > 3 */
 #endif	/* PAGETABLE_LEVELS > 2 */
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,8 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -60,6 +62,13 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+#else
+
+static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	tlb_remove_page(tlb, table);
+}
+
 #endif
 
 /*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 01/20] mm, x86: Add HAVE_RCU_TABLE_FREE support
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel, Jeremy Fitzhardinge, Avi Kivity

[-- Attachment #1: mmu-generic-rcu.patch --]
[-- Type: text/plain, Size: 2566 bytes --]

Implements optional HAVE_RCU_TABLE_FREE support for x86.

This is useful for things like Xen and KVM where paravirt tlb flush
means the software page table walkers like GUP-fast cannot rely on
IRQs disabling like regular x86 can.

Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/include/asm/tlb.h |    1 +
 arch/x86/mm/pgtable.c      |    6 +++---
 include/asm-generic/tlb.h  |    9 +++++++++
 3 files changed, 13 insertions(+), 3 deletions(-)
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_TLB_H
 #define _ASM_X86_TLB_H
 
+#define __tlb_remove_table(table) free_page_and_swap_cache(table)
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -51,21 +51,21 @@ void ___pte_free_tlb(struct mmu_gather *
 {
 	pgtable_page_dtor(pte);
 	paravirt_release_pte(page_to_pfn(pte));
-	tlb_remove_page(tlb, pte);
+	tlb_remove_table(tlb, pte);
 }
 
 #if PAGETABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
 	paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
-	tlb_remove_page(tlb, virt_to_page(pmd));
+	tlb_remove_table(tlb, virt_to_page(pmd));
 }
 
 #if PAGETABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
-	tlb_remove_page(tlb, virt_to_page(pud));
+	tlb_remove_table(tlb, virt_to_page(pud));
 }
 #endif	/* PAGETABLE_LEVELS > 3 */
 #endif	/* PAGETABLE_LEVELS > 2 */
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,8 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -60,6 +62,13 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+#else
+
+static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	tlb_remove_page(tlb, table);
+}
+
 #endif
 
 /*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: nikunj_a__dadhania-flush_page-table_pages_before_freeing_them.patch --]
[-- Type: text/plain, Size: 3500 bytes --]

From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>

Certain architectures (viz. x86, arm, s390) have hardware page-table
walkers (#PF). So during the RCU page-table teardown process make sure
we do a tlb flush of page-table pages on all relevant CPUs to
synchronize against hardware walkers, and then free the pages.

Moreover, the (mm_users < 2) condition does not hold good for the above
architectures, as the hardware engine is one of the user.

This patch should also make the generic RCU page-table freeing code
suitable for s390 again since it fixes the issues raised in
cd94154cc6a ("[S390] fix tlb flushing for page table pages").

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
[ Edited Kconfig bit ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig |   13 +++++++++++++
 mm/memory.c  |   23 +++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,19 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
 config HAVE_RCU_TABLE_FREE
 	bool
 
+config HAVE_HW_PAGE_TABLE_WALKS
+	def_bool y
+	depends on HAVE_RCU_TABLE_FREE && !(SPARC64 || PPC)
+	help
+	  An arch should be excluded if it doesn't have hardware page-table
+	  walkers that can (re)populate TLB caches concurrently with us
+	  tearing down page-tables.
+
+	  Both SPARC and PPC are excluded because they have 'external'
+	  hash-table based MMUs which are cleared before we take down the
+	  linux page-table structure. Therefore we don't need to emit
+	  hardware TLB flush instructions before freeing page-table pages.
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -329,11 +329,26 @@ static void tlb_remove_table_rcu(struct 
 	free_page((unsigned long)batch);
 }
 
+#ifdef CONFIG_HAVE_HW_PAGE_TABLE_WALKS
+/*
+ * Some architectures (x86, arm, s390) can walk the page tables when
+ * the page-table tear down might be happening. So make sure we flush
+ * the TLBs before freeing the page-table pages.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu(tlb);
+}
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+#endif /* CONFIG_HAVE_HW_PAGE_TABLE_WALKS */
+
 void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
 	if (*batch) {
+		tlb_table_flush_mmu(tlb);
 		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
@@ -345,18 +360,22 @@ void tlb_remove_table(struct mmu_gather 
 
 	tlb->need_flush = 1;
 
+#ifndef CONFIG_HAVE_HW_PAGE_TABLE_WALKS
 	/*
-	 * When there's less then two users of this mm there cannot be a
-	 * concurrent page-table walk.
+	 * When there's less then two users of this mm there cannot be
+	 * a concurrent page-table walk for architectures that do not
+	 * have hardware page-table walkers.
 	 */
 	if (atomic_read(&tlb->mm->mm_users) < 2) {
 		__tlb_remove_table(table);
 		return;
 	}
+#endif
 
 	if (*batch == NULL) {
 		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		if (*batch == NULL) {
+			tlb_table_flush_mmu(tlb);
 			tlb_remove_table_one(table);
 			return;
 		}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: nikunj_a__dadhania-flush_page-table_pages_before_freeing_them.patch --]
[-- Type: text/plain, Size: 3500 bytes --]

From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>

Certain architectures (viz. x86, arm, s390) have hardware page-table
walkers (#PF). So during the RCU page-table teardown process make sure
we do a tlb flush of page-table pages on all relevant CPUs to
synchronize against hardware walkers, and then free the pages.

Moreover, the (mm_users < 2) condition does not hold good for the above
architectures, as the hardware engine is one of the user.

This patch should also make the generic RCU page-table freeing code
suitable for s390 again since it fixes the issues raised in
cd94154cc6a ("[S390] fix tlb flushing for page table pages").

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
[ Edited Kconfig bit ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig |   13 +++++++++++++
 mm/memory.c  |   23 +++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,19 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
 config HAVE_RCU_TABLE_FREE
 	bool
 
+config HAVE_HW_PAGE_TABLE_WALKS
+	def_bool y
+	depends on HAVE_RCU_TABLE_FREE && !(SPARC64 || PPC)
+	help
+	  An arch should be excluded if it doesn't have hardware page-table
+	  walkers that can (re)populate TLB caches concurrently with us
+	  tearing down page-tables.
+
+	  Both SPARC and PPC are excluded because they have 'external'
+	  hash-table based MMUs which are cleared before we take down the
+	  linux page-table structure. Therefore we don't need to emit
+	  hardware TLB flush instructions before freeing page-table pages.
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -329,11 +329,26 @@ static void tlb_remove_table_rcu(struct 
 	free_page((unsigned long)batch);
 }
 
+#ifdef CONFIG_HAVE_HW_PAGE_TABLE_WALKS
+/*
+ * Some architectures (x86, arm, s390) can walk the page tables when
+ * the page-table tear down might be happening. So make sure we flush
+ * the TLBs before freeing the page-table pages.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu(tlb);
+}
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+#endif /* CONFIG_HAVE_HW_PAGE_TABLE_WALKS */
+
 void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
 	if (*batch) {
+		tlb_table_flush_mmu(tlb);
 		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
@@ -345,18 +360,22 @@ void tlb_remove_table(struct mmu_gather 
 
 	tlb->need_flush = 1;
 
+#ifndef CONFIG_HAVE_HW_PAGE_TABLE_WALKS
 	/*
-	 * When there's less then two users of this mm there cannot be a
-	 * concurrent page-table walk.
+	 * When there's less then two users of this mm there cannot be
+	 * a concurrent page-table walk for architectures that do not
+	 * have hardware page-table walkers.
 	 */
 	if (atomic_read(&tlb->mm->mm_users) < 2) {
 		__tlb_remove_table(table);
 		return;
 	}
+#endif
 
 	if (*batch == NULL) {
 		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		if (*batch == NULL) {
+			tlb_table_flush_mmu(tlb);
 			tlb_remove_table_one(table);
 			return;
 		}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 03/20] mm, tlb: Remove a few #ifdefs
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mm-cleanup-generic-rcu-ifdeffery.patch --]
[-- Type: text/plain, Size: 3908 bytes --]


Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/asm-generic/tlb.h |   85 ++++++++++++++++++++++++++--------------------
 mm/memory.c               |    6 ---
 2 files changed, 50 insertions(+), 41 deletions(-)
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -21,6 +21,40 @@
 
 static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
 
+/*
+ * If we can't allocate a page to make a big batch of page pointers
+ * to work on, then just handle a few from the on-stack structure.
+ */
+#define MMU_GATHER_BUNDLE	8
+
+struct mmu_gather_batch {
+	struct mmu_gather_batch	*next;
+	unsigned int		nr;
+	unsigned int		max;
+	struct page		*pages[0];
+};
+
+#define MAX_GATHER_BATCH	\
+	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
+
+/* struct mmu_gather is an opaque type used by the mm code for passing around
+ * any data needed by arch specific code for tlb_remove_page.
+ */
+struct mmu_gather {
+	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+#endif
+	unsigned int		need_flush : 1,	/* Did free PTEs */
+				fast_mode  : 1; /* No batching   */
+
+	unsigned int		fullmm;
+
+	struct mmu_gather_batch *active;
+	struct mmu_gather_batch	local;
+	struct page		*__pages[MMU_GATHER_BUNDLE];
+};
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -59,51 +93,30 @@ struct mmu_table_batch {
 #define MAX_TABLE_BATCH		\
 	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
 
+static inline void tlb_table_init(struct mmu_gather *tlb)
+{
+	tlb->batch = NULL;
+}
+
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
-#else
+#else /* CONFIG_HAVE_RCU_TABLE_FREE */
 
-static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+static inline void tlb_table_init(struct mmu_gather *tlb)
 {
-	tlb_remove_page(tlb, table);
 }
 
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define MMU_GATHER_BUNDLE	8
-
-struct mmu_gather_batch {
-	struct mmu_gather_batch	*next;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		*pages[0];
-};
-
-#define MAX_GATHER_BATCH	\
-	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch	*batch;
-#endif
-	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
+static inline void tlb_table_flush(struct mmu_gather *tlb)
+{
+}
 
-	unsigned int		fullmm;
+static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	tlb_remove_page(tlb, table);
+}
 
-	struct mmu_gather_batch *active;
-	struct mmu_gather_batch	local;
-	struct page		*__pages[MMU_GATHER_BUNDLE];
-};
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
 #define HAVE_GENERIC_MMU_GATHER
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -214,9 +214,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
+	tlb_table_init(tlb);
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -227,9 +225,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 		return;
 	tlb->need_flush = 0;
 	tlb_flush(tlb);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
-#endif
 
 	if (tlb_fast_mode(tlb))
 		return;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 03/20] mm, tlb: Remove a few #ifdefs
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mm-cleanup-generic-rcu-ifdeffery.patch --]
[-- Type: text/plain, Size: 3908 bytes --]


Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/asm-generic/tlb.h |   85 ++++++++++++++++++++++++++--------------------
 mm/memory.c               |    6 ---
 2 files changed, 50 insertions(+), 41 deletions(-)
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -21,6 +21,40 @@
 
 static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
 
+/*
+ * If we can't allocate a page to make a big batch of page pointers
+ * to work on, then just handle a few from the on-stack structure.
+ */
+#define MMU_GATHER_BUNDLE	8
+
+struct mmu_gather_batch {
+	struct mmu_gather_batch	*next;
+	unsigned int		nr;
+	unsigned int		max;
+	struct page		*pages[0];
+};
+
+#define MAX_GATHER_BATCH	\
+	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
+
+/* struct mmu_gather is an opaque type used by the mm code for passing around
+ * any data needed by arch specific code for tlb_remove_page.
+ */
+struct mmu_gather {
+	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+#endif
+	unsigned int		need_flush : 1,	/* Did free PTEs */
+				fast_mode  : 1; /* No batching   */
+
+	unsigned int		fullmm;
+
+	struct mmu_gather_batch *active;
+	struct mmu_gather_batch	local;
+	struct page		*__pages[MMU_GATHER_BUNDLE];
+};
+
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 /*
  * Semi RCU freeing of the page directories.
@@ -59,51 +93,30 @@ struct mmu_table_batch {
 #define MAX_TABLE_BATCH		\
 	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
 
+static inline void tlb_table_init(struct mmu_gather *tlb)
+{
+	tlb->batch = NULL;
+}
+
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
-#else
+#else /* CONFIG_HAVE_RCU_TABLE_FREE */
 
-static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+static inline void tlb_table_init(struct mmu_gather *tlb)
 {
-	tlb_remove_page(tlb, table);
 }
 
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define MMU_GATHER_BUNDLE	8
-
-struct mmu_gather_batch {
-	struct mmu_gather_batch	*next;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		*pages[0];
-};
-
-#define MAX_GATHER_BATCH	\
-	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch	*batch;
-#endif
-	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
+static inline void tlb_table_flush(struct mmu_gather *tlb)
+{
+}
 
-	unsigned int		fullmm;
+static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+	tlb_remove_page(tlb, table);
+}
 
-	struct mmu_gather_batch *active;
-	struct mmu_gather_batch	local;
-	struct page		*__pages[MMU_GATHER_BUNDLE];
-};
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
 #define HAVE_GENERIC_MMU_GATHER
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -214,9 +214,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
+	tlb_table_init(tlb);
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -227,9 +225,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 		return;
 	tlb->need_flush = 0;
 	tlb_flush(tlb);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
-#endif
 
 	if (tlb_fast_mode(tlb))
 		return;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 04/20] mm, s390: use generic RCU page-table freeing code
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: s390-use-generic-rcu-page-table-free.patch --]
[-- Type: text/plain, Size: 5790 bytes --]

Now that we fixed the problem that caused the revert cd94154cc6a
("[S390] fix tlb flushing for page table pages") of the original
36409f6353fc2 ("[S390] use generic RCU page-table freeing code"), we
can revert the revert.

Original-patch-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/s390/Kconfig               |    1 
 arch/s390/include/asm/pgalloc.h |    3 +
 arch/s390/include/asm/tlb.h     |   22 +++++++++++++
 arch/s390/mm/pgtable.c          |   63 +---------------------------------------
 4 files changed, 28 insertions(+), 61 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -84,6 +84,7 @@ config S390
 	select HAVE_KERNEL_XZ
 	select HAVE_ARCH_MUTEX_CPU_RELAX
 	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
+	select HAVE_RCU_TABLE_FREE if SMP
 	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,7 +22,10 @@ void crst_table_free(struct mm_struct *,
 
 unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
+void __tlb_remove_table(void *_table);
+#endif
 
 static inline void clear_table(unsigned long *s, unsigned long val, size_t n)
 {
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -30,10 +30,14 @@
 
 struct mmu_gather {
 	struct mm_struct *mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch *batch;
+#endif
 	unsigned int fullmm;
+	unsigned int need_flush;
 };
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 struct mmu_table_batch {
 	struct rcu_head		rcu;
 	unsigned int		nr;
@@ -45,6 +49,7 @@ struct mmu_table_batch {
 
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+#endif
 
 static inline void tlb_gather_mmu(struct mmu_gather *tlb,
 				  struct mm_struct *mm,
@@ -52,20 +57,29 @@ static inline void tlb_gather_mmu(struct
 {
 	tlb->mm = mm;
 	tlb->fullmm = full_mm_flush;
+	tlb->need_flush = 0;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
+#endif
 	if (tlb->fullmm)
 		__tlb_flush_mm(mm);
 }
 
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
+	if (!tlb->need_flush)
+		return;
+	tlb->need_flush = 0;
+	__tlb_flush_mm(tlb->mm);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_finish_mmu(struct mmu_gather *tlb,
 				  unsigned long start, unsigned long end)
 {
-	tlb_table_flush(tlb);
+	tlb_flush_mmu(tlb);
 }
 
 /*
@@ -91,8 +105,10 @@ static inline void tlb_remove_page(struc
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 				unsigned long address)
 {
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return page_table_free_rcu(tlb, (unsigned long *) pte);
+#endif
 	page_table_free(tlb->mm, (unsigned long *) pte);
 }
 
@@ -109,8 +125,10 @@ static inline void pmd_free_tlb(struct m
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
 		return;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return tlb_remove_table(tlb, pmd);
+#endif
 	crst_table_free(tlb->mm, (unsigned long *) pmd);
 #endif
 }
@@ -128,8 +146,10 @@ static inline void pud_free_tlb(struct m
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
 		return;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return tlb_remove_table(tlb, pud);
+#endif
 	crst_table_free(tlb->mm, (unsigned long *) pud);
 #endif
 }
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -678,6 +678,8 @@ void page_table_free(struct mm_struct *m
 	}
 }
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
 static void __page_table_free_rcu(void *table, unsigned bit)
 {
 	struct page *page;
@@ -731,66 +733,7 @@ void __tlb_remove_table(void *_table)
 		free_pages((unsigned long) table, ALLOC_ORDER);
 }
 
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely
-	 * on IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		__tlb_flush_mm(tlb->mm);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)
-			__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			__tlb_flush_mm(tlb->mm);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_table_flush(tlb);
-}
+#endif
 
 /*
  * switch on pgstes for its userspace process (for kvm)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 04/20] mm, s390: use generic RCU page-table freeing code
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: s390-use-generic-rcu-page-table-free.patch --]
[-- Type: text/plain, Size: 5790 bytes --]

Now that we fixed the problem that caused the revert cd94154cc6a
("[S390] fix tlb flushing for page table pages") of the original
36409f6353fc2 ("[S390] use generic RCU page-table freeing code"), we
can revert the revert.

Original-patch-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/s390/Kconfig               |    1 
 arch/s390/include/asm/pgalloc.h |    3 +
 arch/s390/include/asm/tlb.h     |   22 +++++++++++++
 arch/s390/mm/pgtable.c          |   63 +---------------------------------------
 4 files changed, 28 insertions(+), 61 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -84,6 +84,7 @@ config S390
 	select HAVE_KERNEL_XZ
 	select HAVE_ARCH_MUTEX_CPU_RELAX
 	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
+	select HAVE_RCU_TABLE_FREE if SMP
 	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,7 +22,10 @@ void crst_table_free(struct mm_struct *,
 
 unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
+void __tlb_remove_table(void *_table);
+#endif
 
 static inline void clear_table(unsigned long *s, unsigned long val, size_t n)
 {
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -30,10 +30,14 @@
 
 struct mmu_gather {
 	struct mm_struct *mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch *batch;
+#endif
 	unsigned int fullmm;
+	unsigned int need_flush;
 };
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 struct mmu_table_batch {
 	struct rcu_head		rcu;
 	unsigned int		nr;
@@ -45,6 +49,7 @@ struct mmu_table_batch {
 
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+#endif
 
 static inline void tlb_gather_mmu(struct mmu_gather *tlb,
 				  struct mm_struct *mm,
@@ -52,20 +57,29 @@ static inline void tlb_gather_mmu(struct
 {
 	tlb->mm = mm;
 	tlb->fullmm = full_mm_flush;
+	tlb->need_flush = 0;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
+#endif
 	if (tlb->fullmm)
 		__tlb_flush_mm(mm);
 }
 
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
+	if (!tlb->need_flush)
+		return;
+	tlb->need_flush = 0;
+	__tlb_flush_mm(tlb->mm);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_finish_mmu(struct mmu_gather *tlb,
 				  unsigned long start, unsigned long end)
 {
-	tlb_table_flush(tlb);
+	tlb_flush_mmu(tlb);
 }
 
 /*
@@ -91,8 +105,10 @@ static inline void tlb_remove_page(struc
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 				unsigned long address)
 {
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return page_table_free_rcu(tlb, (unsigned long *) pte);
+#endif
 	page_table_free(tlb->mm, (unsigned long *) pte);
 }
 
@@ -109,8 +125,10 @@ static inline void pmd_free_tlb(struct m
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
 		return;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return tlb_remove_table(tlb, pmd);
+#endif
 	crst_table_free(tlb->mm, (unsigned long *) pmd);
 #endif
 }
@@ -128,8 +146,10 @@ static inline void pud_free_tlb(struct m
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
 		return;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
 		return tlb_remove_table(tlb, pud);
+#endif
 	crst_table_free(tlb->mm, (unsigned long *) pud);
 #endif
 }
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -678,6 +678,8 @@ void page_table_free(struct mm_struct *m
 	}
 }
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
 static void __page_table_free_rcu(void *table, unsigned bit)
 {
 	struct page *page;
@@ -731,66 +733,7 @@ void __tlb_remove_table(void *_table)
 		free_pages((unsigned long) table, ALLOC_ORDER);
 }
 
-static void tlb_remove_table_smp_sync(void *arg)
-{
-	/* Simply deliver the interrupt */
-}
-
-static void tlb_remove_table_one(void *table)
-{
-	/*
-	 * This isn't an RCU grace period and hence the page-tables cannot be
-	 * assumed to be actually RCU-freed.
-	 *
-	 * It is however sufficient for software page-table walkers that rely
-	 * on IRQ disabling. See the comment near struct mmu_table_batch.
-	 */
-	smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
-	__tlb_remove_table(table);
-}
-
-static void tlb_remove_table_rcu(struct rcu_head *head)
-{
-	struct mmu_table_batch *batch;
-	int i;
-
-	batch = container_of(head, struct mmu_table_batch, rcu);
-
-	for (i = 0; i < batch->nr; i++)
-		__tlb_remove_table(batch->tables[i]);
-
-	free_page((unsigned long)batch);
-}
-
-void tlb_table_flush(struct mmu_gather *tlb)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch) {
-		__tlb_flush_mm(tlb->mm);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
-		*batch = NULL;
-	}
-}
-
-void tlb_remove_table(struct mmu_gather *tlb, void *table)
-{
-	struct mmu_table_batch **batch = &tlb->batch;
-
-	if (*batch == NULL) {
-		*batch = (struct mmu_table_batch *)
-			__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-		if (*batch == NULL) {
-			__tlb_flush_mm(tlb->mm);
-			tlb_remove_table_one(table);
-			return;
-		}
-		(*batch)->nr = 0;
-	}
-	(*batch)->tables[(*batch)->nr++] = table;
-	if ((*batch)->nr == MAX_TABLE_BATCH)
-		tlb_table_flush(tlb);
-}
+#endif
 
 /*
  * switch on pgstes for its userspace process (for kvm)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 05/20] mm, powerpc: Dont use tlb_flush for external tlb flushes
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: powerpc64-tlb_flush.patch --]
[-- Type: text/plain, Size: 1314 bytes --]

Both sparc64 and powerpc64 use tlb_flush() to flush their respective
hash-tables which is entirely different from what
flush_tlb_range()/flush_tlb_mm() would do.

Powerpc64 already uses arch_*_lazy_mmu_mode() to batch and flush these
so any tlb_flush() caller should already find an empty batch. So
remove this functionality from tlb_flush().

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/powerpc/mm/tlb_hash64.c |   10 ----------
 1 file changed, 10 deletions(-)
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -155,16 +155,6 @@ void __flush_tlb_pending(struct ppc64_tl
 
 void tlb_flush(struct mmu_gather *tlb)
 {
-	struct ppc64_tlb_batch *tlbbatch = &get_cpu_var(ppc64_tlb_batch);
-
-	/* If there's a TLB batch pending, then we must flush it because the
-	 * pages are going to be freed and we really don't want to have a CPU
-	 * access a freed page because it has a stale TLB
-	 */
-	if (tlbbatch->index)
-		__flush_tlb_pending(tlbbatch);
-
-	put_cpu_var(ppc64_tlb_batch);
 }
 
 /**


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 05/20] mm, powerpc: Dont use tlb_flush for external tlb flushes
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: powerpc64-tlb_flush.patch --]
[-- Type: text/plain, Size: 1314 bytes --]

Both sparc64 and powerpc64 use tlb_flush() to flush their respective
hash-tables which is entirely different from what
flush_tlb_range()/flush_tlb_mm() would do.

Powerpc64 already uses arch_*_lazy_mmu_mode() to batch and flush these
so any tlb_flush() caller should already find an empty batch. So
remove this functionality from tlb_flush().

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/powerpc/mm/tlb_hash64.c |   10 ----------
 1 file changed, 10 deletions(-)
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -155,16 +155,6 @@ void __flush_tlb_pending(struct ppc64_tl
 
 void tlb_flush(struct mmu_gather *tlb)
 {
-	struct ppc64_tlb_batch *tlbbatch = &get_cpu_var(ppc64_tlb_batch);
-
-	/* If there's a TLB batch pending, then we must flush it because the
-	 * pages are going to be freed and we really don't want to have a CPU
-	 * access a freed page because it has a stale TLB
-	 */
-	if (tlbbatch->index)
-		__flush_tlb_pending(tlbbatch);
-
-	put_cpu_var(ppc64_tlb_batch);
 }
 
 /**


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 06/20] mm, sparc64: Dont use tlb_flush for external tlb flushes
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: sparc64-tlb_flush.patch --]
[-- Type: text/plain, Size: 2281 bytes --]

Both sparc64 and powerpc64 use tlb_flush() to flush their respective
hash-tables which is entirely different from what
flush_tlb_range()/flush_tlb_mm() would do.

Powerpc64 already uses arch_*_lazy_mmu_mode() to batch and flush these
so any tlb_flush() caller should already find an empty batch, make
sparc64 do the same.

This ensures all platforms now have a tlb_flush() implementation that
is either flush_tlb_mm() or flush_tlb_range().

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sparc/Makefile                  |    1 +
 arch/sparc/include/asm/tlb_64.h      |    2 +-
 arch/sparc/include/asm/tlbflush_64.h |   11 +++++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -37,6 +37,7 @@ LDFLAGS       := -m elf64_sparc
 export BITS   := 64
 UTS_MACHINE   := sparc64
 
+KBUILD_CPPFLAGS += -D__HAVE_ARCH_ENTER_LAZY_MMU_MODE
 KBUILD_CFLAGS += -m64 -pipe -mno-fpu -mcpu=ultrasparc -mcmodel=medlow
 KBUILD_CFLAGS += -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare
 KBUILD_CFLAGS += -Wa,--undeclared-regs
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -25,7 +25,7 @@ extern void flush_tlb_pending(void);
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb)	flush_tlb_pending()
+#define tlb_flush(tlb)	do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/sparc/include/asm/tlbflush_64.h
+++ b/arch/sparc/include/asm/tlbflush_64.h
@@ -26,6 +26,17 @@ extern void flush_tlb_pending(void);
 #define flush_tlb_page(vma,addr)	flush_tlb_pending()
 #define flush_tlb_mm(mm)		flush_tlb_pending()
 
+static inline void arch_enter_lazy_mmu_mode(void)
+{
+}
+
+static inline void arch_leave_lazy_mmu_mode(void)
+{
+	flush_tlb_pending();
+}
+
+#define arch_flush_lazy_mmu_mode()      do {} while (0)
+
 /* Local cpu only.  */
 extern void __flush_tlb_all(void);
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 06/20] mm, sparc64: Dont use tlb_flush for external tlb flushes
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: sparc64-tlb_flush.patch --]
[-- Type: text/plain, Size: 2281 bytes --]

Both sparc64 and powerpc64 use tlb_flush() to flush their respective
hash-tables which is entirely different from what
flush_tlb_range()/flush_tlb_mm() would do.

Powerpc64 already uses arch_*_lazy_mmu_mode() to batch and flush these
so any tlb_flush() caller should already find an empty batch, make
sparc64 do the same.

This ensures all platforms now have a tlb_flush() implementation that
is either flush_tlb_mm() or flush_tlb_range().

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sparc/Makefile                  |    1 +
 arch/sparc/include/asm/tlb_64.h      |    2 +-
 arch/sparc/include/asm/tlbflush_64.h |   11 +++++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -37,6 +37,7 @@ LDFLAGS       := -m elf64_sparc
 export BITS   := 64
 UTS_MACHINE   := sparc64
 
+KBUILD_CPPFLAGS += -D__HAVE_ARCH_ENTER_LAZY_MMU_MODE
 KBUILD_CFLAGS += -m64 -pipe -mno-fpu -mcpu=ultrasparc -mcmodel=medlow
 KBUILD_CFLAGS += -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare
 KBUILD_CFLAGS += -Wa,--undeclared-regs
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -25,7 +25,7 @@ extern void flush_tlb_pending(void);
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb)	flush_tlb_pending()
+#define tlb_flush(tlb)	do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/sparc/include/asm/tlbflush_64.h
+++ b/arch/sparc/include/asm/tlbflush_64.h
@@ -26,6 +26,17 @@ extern void flush_tlb_pending(void);
 #define flush_tlb_page(vma,addr)	flush_tlb_pending()
 #define flush_tlb_mm(mm)		flush_tlb_pending()
 
+static inline void arch_enter_lazy_mmu_mode(void)
+{
+}
+
+static inline void arch_leave_lazy_mmu_mode(void)
+{
+	flush_tlb_pending();
+}
+
+#define arch_flush_lazy_mmu_mode()      do {} while (0)
+
 /* Local cpu only.  */
 extern void __flush_tlb_all(void);
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 07/20] mm, arch: Remove tlb_flush()
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: generic_tlb_flush.patch --]
[-- Type: text/plain, Size: 11831 bytes --]

Since all asm-generic/tlb.h users their tlb_flush() implementation is
now either a nop or flush_tlb_mm(), remove it and make the generic
code use flush_tlb_mm() directly.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/include/asm/tlb.h      |    2 --
 arch/arm/include/asm/tlb.h        |    2 --
 arch/avr32/include/asm/tlb.h      |    5 -----
 arch/blackfin/include/asm/tlb.h   |    6 ------
 arch/c6x/include/asm/tlb.h        |    2 --
 arch/cris/include/asm/tlb.h       |    1 -
 arch/frv/include/asm/tlb.h        |    5 -----
 arch/h8300/include/asm/tlb.h      |   13 -------------
 arch/hexagon/include/asm/tlb.h    |    5 -----
 arch/m32r/include/asm/tlb.h       |    6 ------
 arch/m68k/include/asm/tlb.h       |    6 ------
 arch/microblaze/include/asm/tlb.h |    2 --
 arch/mips/include/asm/tlb.h       |    5 -----
 arch/mn10300/include/asm/tlb.h    |    5 -----
 arch/openrisc/include/asm/tlb.h   |    1 -
 arch/parisc/include/asm/tlb.h     |    5 -----
 arch/powerpc/include/asm/tlb.h    |    2 --
 arch/powerpc/mm/tlb_hash32.c      |   15 ---------------
 arch/powerpc/mm/tlb_hash64.c      |    4 ----
 arch/powerpc/mm/tlb_nohash.c      |    5 -----
 arch/score/include/asm/tlb.h      |    1 -
 arch/sh/include/asm/tlb.h         |    1 -
 arch/sparc/include/asm/tlb_32.h   |    5 -----
 arch/sparc/include/asm/tlb_64.h   |    1 -
 arch/tile/include/asm/tlb.h       |    1 -
 arch/unicore32/include/asm/tlb.h  |    1 -
 arch/x86/include/asm/tlb.h        |    1 -
 arch/xtensa/include/asm/tlb.h     |    1 -
 mm/memory.c                       |    2 +-
 29 files changed, 1 insertion(+), 110 deletions(-)
--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -5,8 +5,6 @@
 #define tlb_end_vma(tlb, vma)			do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
 
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, address)		pte_free((tlb)->mm, pte)
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -23,8 +23,6 @@
 
 #include <linux/pagemap.h>
 
-#define tlb_flush(tlb)	((void) tlb)
-
 #include <asm-generic/tlb.h>
 
 #else /* !CONFIG_MMU */
--- a/arch/avr32/include/asm/tlb.h
+++ b/arch/avr32/include/asm/tlb.h
@@ -16,11 +16,6 @@
 
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while(0)
 
-/*
- * Flush whole TLB for MM
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 /*
--- a/arch/blackfin/include/asm/tlb.h
+++ b/arch/blackfin/include/asm/tlb.h
@@ -11,12 +11,6 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif				/* _BLACKFIN_TLB_H */
--- a/arch/c6x/include/asm/tlb.h
+++ b/arch/c6x/include/asm/tlb.h
@@ -1,8 +1,6 @@
 #ifndef _ASM_C6X_TLB_H
 #define _ASM_C6X_TLB_H
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _ASM_C6X_TLB_H */
--- a/arch/cris/include/asm/tlb.h
+++ b/arch/cris/include/asm/tlb.h
@@ -13,7 +13,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/frv/include/asm/tlb.h
+++ b/arch/frv/include/asm/tlb.h
@@ -16,11 +16,6 @@ extern void check_pgt_cache(void);
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _ASM_TLB_H */
--- a/arch/h8300/include/asm/tlb.h
+++ b/arch/h8300/include/asm/tlb.h
@@ -5,19 +5,6 @@
 #ifndef __H8300_TLB_H__
 #define __H8300_TLB_H__
 
-#define tlb_flush(tlb)	do { } while(0)
-
-/* 
-  include/asm-h8300/tlb.h 
-*/
-
-#ifndef __H8300_TLB_H__
-#define __H8300_TLB_H__
-
-#define tlb_flush(tlb)	do { } while(0)
-
 #include <asm-generic/tlb.h>
 
 #endif
-
-#endif
--- a/arch/hexagon/include/asm/tlb.h
+++ b/arch/hexagon/include/asm/tlb.h
@@ -29,11 +29,6 @@
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/m32r/include/asm/tlb.h
+++ b/arch/m32r/include/asm/tlb.h
@@ -9,12 +9,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _M32R_TLB_H */
--- a/arch/m68k/include/asm/tlb.h
+++ b/arch/m68k/include/asm/tlb.h
@@ -9,12 +9,6 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _M68K_TLB_H */
--- a/arch/microblaze/include/asm/tlb.h
+++ b/arch/microblaze/include/asm/tlb.h
@@ -11,8 +11,6 @@
 #ifndef _ASM_MICROBLAZE_TLB_H
 #define _ASM_MICROBLAZE_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -13,11 +13,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* __ASM_TLB_H */
--- a/arch/mn10300/include/asm/tlb.h
+++ b/arch/mn10300/include/asm/tlb.h
@@ -23,11 +23,6 @@ extern void check_pgt_cache(void);
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 /* for now, just use the generic stuff */
 #include <asm-generic/tlb.h>
 
--- a/arch/openrisc/include/asm/tlb.h
+++ b/arch/openrisc/include/asm/tlb.h
@@ -27,7 +27,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -1,11 +1,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_flush(tlb)			\
-do {	if ((tlb)->fullmm)		\
-		flush_tlb_mm((tlb)->mm);\
-} while (0)
-
 #define tlb_start_vma(tlb, vma) \
 do {	if (!(tlb)->fullmm)	\
 		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -28,8 +28,6 @@
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 
-extern void tlb_flush(struct mmu_gather *tlb);
-
 /* Get the generic bits... */
 #include <asm-generic/tlb.h>
 
--- a/arch/powerpc/mm/tlb_hash32.c
+++ b/arch/powerpc/mm/tlb_hash32.c
@@ -60,21 +60,6 @@ void flush_tlb_page_nohash(struct vm_are
 }
 
 /*
- * Called at the end of a mmu_gather operation to make sure the
- * TLB flush is completely done.
- */
-void tlb_flush(struct mmu_gather *tlb)
-{
-	if (Hash == 0) {
-		/*
-		 * 603 needs to flush the whole TLB here since
-		 * it doesn't use a hash table.
-		 */
-		_tlbia();
-	}
-}
-
-/*
  * TLB flushing:
  *
  *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -153,10 +153,6 @@ void __flush_tlb_pending(struct ppc64_tl
 	batch->index = 0;
 }
 
-void tlb_flush(struct mmu_gather *tlb)
-{
-}
-
 /**
  * __flush_hash_table_range - Flush all HPTEs for a given address range
  *                            from the hash table (and the TLB). But keeps
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -357,11 +357,6 @@ void flush_tlb_range(struct vm_area_stru
 }
 EXPORT_SYMBOL(flush_tlb_range);
 
-void tlb_flush(struct mmu_gather *tlb)
-{
-	flush_tlb_mm(tlb->mm);
-}
-
 /*
  * Below are functions specific to the 64-bit variant of Book3E though that
  * may change in the future
--- a/arch/score/include/asm/tlb.h
+++ b/arch/score/include/asm/tlb.h
@@ -8,7 +8,6 @@
 #define tlb_start_vma(tlb, vma)		do {} while (0)
 #define tlb_end_vma(tlb, vma)		do {} while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do {} while (0)
-#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 
 extern void score7_FTLB_refill_Handler(void);
 
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -126,7 +126,6 @@ static inline void tlb_unwire_entry(void
 #define tlb_start_vma(tlb, vma)				do { } while (0)
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
-#define tlb_flush(tlb)					do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -14,11 +14,6 @@ do {								\
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 
-#define tlb_flush(tlb) \
-do {								\
-	flush_tlb_mm((tlb)->mm);				\
-} while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC_TLB_H */
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -25,7 +25,6 @@ extern void flush_tlb_pending(void);
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb)	do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/tile/include/asm/tlb.h
+++ b/arch/tile/include/asm/tlb.h
@@ -18,7 +18,6 @@
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -15,7 +15,6 @@
 #define tlb_start_vma(tlb, vma)				do { } while (0)
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #define __pte_free_tlb(tlb, pte, addr)				\
 	do {							\
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -5,7 +5,6 @@
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -38,7 +38,6 @@
 #endif
 
 #define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -226,7 +226,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	if (!tlb->need_flush)
 		return;
 	tlb->need_flush = 0;
-	tlb_flush(tlb);
+	flush_tlb_mm(tlb->mm);
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 07/20] mm, arch: Remove tlb_flush()
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: generic_tlb_flush.patch --]
[-- Type: text/plain, Size: 11831 bytes --]

Since all asm-generic/tlb.h users their tlb_flush() implementation is
now either a nop or flush_tlb_mm(), remove it and make the generic
code use flush_tlb_mm() directly.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/include/asm/tlb.h      |    2 --
 arch/arm/include/asm/tlb.h        |    2 --
 arch/avr32/include/asm/tlb.h      |    5 -----
 arch/blackfin/include/asm/tlb.h   |    6 ------
 arch/c6x/include/asm/tlb.h        |    2 --
 arch/cris/include/asm/tlb.h       |    1 -
 arch/frv/include/asm/tlb.h        |    5 -----
 arch/h8300/include/asm/tlb.h      |   13 -------------
 arch/hexagon/include/asm/tlb.h    |    5 -----
 arch/m32r/include/asm/tlb.h       |    6 ------
 arch/m68k/include/asm/tlb.h       |    6 ------
 arch/microblaze/include/asm/tlb.h |    2 --
 arch/mips/include/asm/tlb.h       |    5 -----
 arch/mn10300/include/asm/tlb.h    |    5 -----
 arch/openrisc/include/asm/tlb.h   |    1 -
 arch/parisc/include/asm/tlb.h     |    5 -----
 arch/powerpc/include/asm/tlb.h    |    2 --
 arch/powerpc/mm/tlb_hash32.c      |   15 ---------------
 arch/powerpc/mm/tlb_hash64.c      |    4 ----
 arch/powerpc/mm/tlb_nohash.c      |    5 -----
 arch/score/include/asm/tlb.h      |    1 -
 arch/sh/include/asm/tlb.h         |    1 -
 arch/sparc/include/asm/tlb_32.h   |    5 -----
 arch/sparc/include/asm/tlb_64.h   |    1 -
 arch/tile/include/asm/tlb.h       |    1 -
 arch/unicore32/include/asm/tlb.h  |    1 -
 arch/x86/include/asm/tlb.h        |    1 -
 arch/xtensa/include/asm/tlb.h     |    1 -
 mm/memory.c                       |    2 +-
 29 files changed, 1 insertion(+), 110 deletions(-)
--- a/arch/alpha/include/asm/tlb.h
+++ b/arch/alpha/include/asm/tlb.h
@@ -5,8 +5,6 @@
 #define tlb_end_vma(tlb, vma)			do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, addr)	do { } while (0)
 
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #define __pte_free_tlb(tlb, pte, address)		pte_free((tlb)->mm, pte)
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -23,8 +23,6 @@
 
 #include <linux/pagemap.h>
 
-#define tlb_flush(tlb)	((void) tlb)
-
 #include <asm-generic/tlb.h>
 
 #else /* !CONFIG_MMU */
--- a/arch/avr32/include/asm/tlb.h
+++ b/arch/avr32/include/asm/tlb.h
@@ -16,11 +16,6 @@
 
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while(0)
 
-/*
- * Flush whole TLB for MM
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 /*
--- a/arch/blackfin/include/asm/tlb.h
+++ b/arch/blackfin/include/asm/tlb.h
@@ -11,12 +11,6 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif				/* _BLACKFIN_TLB_H */
--- a/arch/c6x/include/asm/tlb.h
+++ b/arch/c6x/include/asm/tlb.h
@@ -1,8 +1,6 @@
 #ifndef _ASM_C6X_TLB_H
 #define _ASM_C6X_TLB_H
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _ASM_C6X_TLB_H */
--- a/arch/cris/include/asm/tlb.h
+++ b/arch/cris/include/asm/tlb.h
@@ -13,7 +13,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/frv/include/asm/tlb.h
+++ b/arch/frv/include/asm/tlb.h
@@ -16,11 +16,6 @@ extern void check_pgt_cache(void);
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _ASM_TLB_H */
--- a/arch/h8300/include/asm/tlb.h
+++ b/arch/h8300/include/asm/tlb.h
@@ -5,19 +5,6 @@
 #ifndef __H8300_TLB_H__
 #define __H8300_TLB_H__
 
-#define tlb_flush(tlb)	do { } while(0)
-
-/* 
-  include/asm-h8300/tlb.h 
-*/
-
-#ifndef __H8300_TLB_H__
-#define __H8300_TLB_H__
-
-#define tlb_flush(tlb)	do { } while(0)
-
 #include <asm-generic/tlb.h>
 
 #endif
-
-#endif
--- a/arch/hexagon/include/asm/tlb.h
+++ b/arch/hexagon/include/asm/tlb.h
@@ -29,11 +29,6 @@
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif
--- a/arch/m32r/include/asm/tlb.h
+++ b/arch/m32r/include/asm/tlb.h
@@ -9,12 +9,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _M32R_TLB_H */
--- a/arch/m68k/include/asm/tlb.h
+++ b/arch/m68k/include/asm/tlb.h
@@ -9,12 +9,6 @@
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it
- * fills up.
- */
-#define tlb_flush(tlb)		flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _M68K_TLB_H */
--- a/arch/microblaze/include/asm/tlb.h
+++ b/arch/microblaze/include/asm/tlb.h
@@ -11,8 +11,6 @@
 #ifndef _ASM_MICROBLAZE_TLB_H
 #define _ASM_MICROBLAZE_TLB_H
 
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -13,11 +13,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up.
- */
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
 #include <asm-generic/tlb.h>
 
 #endif /* __ASM_TLB_H */
--- a/arch/mn10300/include/asm/tlb.h
+++ b/arch/mn10300/include/asm/tlb.h
@@ -23,11 +23,6 @@ extern void check_pgt_cache(void);
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
 
-/*
- * .. because we flush the whole mm when it fills up
- */
-#define tlb_flush(tlb)	flush_tlb_mm((tlb)->mm)
-
 /* for now, just use the generic stuff */
 #include <asm-generic/tlb.h>
 
--- a/arch/openrisc/include/asm/tlb.h
+++ b/arch/openrisc/include/asm/tlb.h
@@ -27,7 +27,6 @@
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 #include <linux/pagemap.h>
 #include <asm-generic/tlb.h>
 
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -1,11 +1,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_flush(tlb)			\
-do {	if ((tlb)->fullmm)		\
-		flush_tlb_mm((tlb)->mm);\
-} while (0)
-
 #define tlb_start_vma(tlb, vma) \
 do {	if (!(tlb)->fullmm)	\
 		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -28,8 +28,6 @@
 #define tlb_start_vma(tlb, vma)	do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 
-extern void tlb_flush(struct mmu_gather *tlb);
-
 /* Get the generic bits... */
 #include <asm-generic/tlb.h>
 
--- a/arch/powerpc/mm/tlb_hash32.c
+++ b/arch/powerpc/mm/tlb_hash32.c
@@ -60,21 +60,6 @@ void flush_tlb_page_nohash(struct vm_are
 }
 
 /*
- * Called at the end of a mmu_gather operation to make sure the
- * TLB flush is completely done.
- */
-void tlb_flush(struct mmu_gather *tlb)
-{
-	if (Hash == 0) {
-		/*
-		 * 603 needs to flush the whole TLB here since
-		 * it doesn't use a hash table.
-		 */
-		_tlbia();
-	}
-}
-
-/*
  * TLB flushing:
  *
  *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -153,10 +153,6 @@ void __flush_tlb_pending(struct ppc64_tl
 	batch->index = 0;
 }
 
-void tlb_flush(struct mmu_gather *tlb)
-{
-}
-
 /**
  * __flush_hash_table_range - Flush all HPTEs for a given address range
  *                            from the hash table (and the TLB). But keeps
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -357,11 +357,6 @@ void flush_tlb_range(struct vm_area_stru
 }
 EXPORT_SYMBOL(flush_tlb_range);
 
-void tlb_flush(struct mmu_gather *tlb)
-{
-	flush_tlb_mm(tlb->mm);
-}
-
 /*
  * Below are functions specific to the 64-bit variant of Book3E though that
  * may change in the future
--- a/arch/score/include/asm/tlb.h
+++ b/arch/score/include/asm/tlb.h
@@ -8,7 +8,6 @@
 #define tlb_start_vma(tlb, vma)		do {} while (0)
 #define tlb_end_vma(tlb, vma)		do {} while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do {} while (0)
-#define tlb_flush(tlb)			flush_tlb_mm((tlb)->mm)
 
 extern void score7_FTLB_refill_Handler(void);
 
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -126,7 +126,6 @@ static inline void tlb_unwire_entry(void
 #define tlb_start_vma(tlb, vma)				do { } while (0)
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
-#define tlb_flush(tlb)					do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -14,11 +14,6 @@ do {								\
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 
-#define tlb_flush(tlb) \
-do {								\
-	flush_tlb_mm((tlb)->mm);				\
-} while (0)
-
 #include <asm-generic/tlb.h>
 
 #endif /* _SPARC_TLB_H */
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -25,7 +25,6 @@ extern void flush_tlb_pending(void);
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma)	do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb)	do { } while (0)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/tile/include/asm/tlb.h
+++ b/arch/tile/include/asm/tlb.h
@@ -18,7 +18,6 @@
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/unicore32/include/asm/tlb.h
+++ b/arch/unicore32/include/asm/tlb.h
@@ -15,7 +15,6 @@
 #define tlb_start_vma(tlb, vma)				do { } while (0)
 #define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address)	do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #define __pte_free_tlb(tlb, pte, addr)				\
 	do {							\
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -5,7 +5,6 @@
 #define tlb_start_vma(tlb, vma) do { } while (0)
 #define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -38,7 +38,6 @@
 #endif
 
 #define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
-#define tlb_flush(tlb)				flush_tlb_mm((tlb)->mm)
 
 #include <asm-generic/tlb.h>
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -226,7 +226,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	if (!tlb->need_flush)
 		return;
 	tlb->need_flush = 0;
-	tlb_flush(tlb);
+	flush_tlb_mm(tlb->mm);
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mmu_gather_fullmm.patch --]
[-- Type: text/plain, Size: 1391 bytes --]

This originated from s390 which does something similar and would allow
s390 to use the generic TLB flushing code.

The idea is to flush the mm wide cache and tlb a priory and not bother
with multiple flushes if the batching isn't large enough.

This can be safely done since there cannot be any concurrency on this
mm, its either after the process died (exit) or in the middle of
execve where the thread switched to the new mm.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/memory.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -215,16 +215,22 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->active     = &tlb->local;
 
 	tlb_table_init(tlb);
+
+	if (fullmm) {
+		flush_cache_mm(mm);
+		flush_tlb_mm(mm);
+	}
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	flush_tlb_mm(tlb->mm);
+	if (!tlb->fullmm && tlb->need_flush) {
+		tlb->need_flush = 0;
+		flush_tlb_mm(tlb->mm);
+	}
+
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mmu_gather_fullmm.patch --]
[-- Type: text/plain, Size: 1391 bytes --]

This originated from s390 which does something similar and would allow
s390 to use the generic TLB flushing code.

The idea is to flush the mm wide cache and tlb a priory and not bother
with multiple flushes if the batching isn't large enough.

This can be safely done since there cannot be any concurrency on this
mm, its either after the process died (exit) or in the middle of
execve where the thread switched to the new mm.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/memory.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -215,16 +215,22 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->active     = &tlb->local;
 
 	tlb_table_init(tlb);
+
+	if (fullmm) {
+		flush_cache_mm(mm);
+		flush_tlb_mm(mm);
+	}
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	flush_tlb_mm(tlb->mm);
+	if (!tlb->fullmm && tlb->need_flush) {
+		tlb->need_flush = 0;
+		flush_tlb_mm(tlb->mm);
+	}
+
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 09/20] mm, arch: Add end argument to p??_free_tlb()
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: peter_zijlstra-p_free_tlb.patch --]
[-- Type: text/plain, Size: 7594 bytes --]

In order to facilitate range tracking we need the end address of the
object we're freeing. The callsites already compute this address so
change things to simply pass it along.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/include/asm/tlb.h         |    6 +++---
 arch/ia64/include/asm/tlb.h        |    6 +++---
 arch/powerpc/mm/hugetlbpage.c      |    4 ++--
 arch/s390/include/asm/tlb.h        |    6 +++---
 arch/sh/include/asm/tlb.h          |    6 +++---
 arch/um/include/asm/tlb.h          |    6 +++---
 include/asm-generic/4level-fixup.h |    2 +-
 include/asm-generic/tlb.h          |    6 +++---
 mm/memory.c                        |   10 +++++-----
 9 files changed, 26 insertions(+), 26 deletions(-)
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -217,9 +217,9 @@ static inline void __pmd_free_tlb(struct
 #endif
 }
 
-#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
+#define pte_free_tlb(tlb, ptep, addr, end)	__pte_free_tlb(tlb, ptep, addr)
+#define pmd_free_tlb(tlb, pmdp, addr, end)	__pmd_free_tlb(tlb, pmdp, addr)
+#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -262,19 +262,19 @@ do {							\
 	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
 } while (0)
 
-#define pte_free_tlb(tlb, ptep, address)		\
+#define pte_free_tlb(tlb, ptep, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pte_free_tlb(tlb, ptep, address);		\
 } while (0)
 
-#define pmd_free_tlb(tlb, ptep, address)		\
+#define pmd_free_tlb(tlb, ptep, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pmd_free_tlb(tlb, ptep, address);		\
 } while (0)
 
-#define pud_free_tlb(tlb, pudp, address)		\
+#define pud_free_tlb(tlb, pudp, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pud_free_tlb(tlb, pudp, address);		\
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -503,7 +503,7 @@ static void hugetlb_free_pmd_range(struc
 
 	pmd = pmd_offset(pud, start);
 	pud_clear(pud);
-	pmd_free_tlb(tlb, pmd, start);
+	pmd_free_tlb(tlb, pmd, start, end);
 }
 
 static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
@@ -551,7 +551,7 @@ static void hugetlb_free_pud_range(struc
 
 	pud = pud_offset(pgd, start);
 	pgd_clear(pgd);
-	pud_free_tlb(tlb, pud, start);
+	pud_free_tlb(tlb, pud, start, end);
 }
 
 /*
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -103,7 +103,7 @@ static inline void tlb_remove_page(struc
  * page table from the tlb.
  */
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
@@ -120,7 +120,7 @@ static inline void pte_free_tlb(struct m
  * to avoid the double free of the pmd in this case.
  */
 static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
@@ -141,7 +141,7 @@ static inline void pmd_free_tlb(struct m
  * to avoid the double free of the pud in this case.
  */
 static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -99,9 +99,9 @@ static inline void tlb_remove_page(struc
 	__tlb_remove_page(tlb, page);
 }
 
-#define pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
+#define pte_free_tlb(tlb, ptep, addr, end)	pte_free((tlb)->mm, ptep)
+#define pmd_free_tlb(tlb, pmdp, addr, end)	pmd_free((tlb)->mm, pmdp)
+#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -109,11 +109,11 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
+#define pte_free_tlb(tlb, ptep, addr, end) __pte_free_tlb(tlb, ptep, addr)
 
-#define pud_free_tlb(tlb, pudp, addr) __pud_free_tlb(tlb, pudp, addr)
+#define pud_free_tlb(tlb, pudp, addr, end) __pud_free_tlb(tlb, pudp, addr)
 
-#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
+#define pmd_free_tlb(tlb, pmdp, addr, end) __pmd_free_tlb(tlb, pmdp, addr)
 
 #define tlb_migrate_finish(mm) do {} while (0)
 
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -27,7 +27,7 @@
 #define pud_page_vaddr(pud)		pgd_page_vaddr(pud)
 
 #undef pud_free_tlb
-#define pud_free_tlb(tlb, x, addr)	do { } while (0)
+#define pud_free_tlb(tlb, x, addr, end)	do { } while (0)
 #define pud_free(mm, x)			do { } while (0)
 #define __pud_free_tlb(tlb, x, addr)	do { } while (0)
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -166,21 +166,21 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, address)			\
+#define pte_free_tlb(tlb, ptep, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 
 #ifndef __ARCH_HAS_4LEVEL_HACK
-#define pud_free_tlb(tlb, pudp, address)			\
+#define pud_free_tlb(tlb, pudp, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
 
-#define pmd_free_tlb(tlb, pmdp, address)			\
+#define pmd_free_tlb(tlb, pmdp, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pmd_free_tlb(tlb, pmdp, address);		\
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -421,11 +421,11 @@ void pmd_clear_bad(pmd_t *pmd)
  * has been handled earlier when unmapping all the memory regions.
  */
 static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
-			   unsigned long addr)
+			   unsigned long addr, unsigned long end)
 {
 	pgtable_t token = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
-	pte_free_tlb(tlb, token, addr);
+	pte_free_tlb(tlb, token, addr, end);
 	tlb->mm->nr_ptes--;
 }
 
@@ -443,7 +443,7 @@ static inline void free_pmd_range(struct
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
-		free_pte_range(tlb, pmd, addr);
+		free_pte_range(tlb, pmd, addr, next);
 	} while (pmd++, addr = next, addr != end);
 
 	start &= PUD_MASK;
@@ -459,7 +459,7 @@ static inline void free_pmd_range(struct
 
 	pmd = pmd_offset(pud, start);
 	pud_clear(pud);
-	pmd_free_tlb(tlb, pmd, start);
+	pmd_free_tlb(tlb, pmd, start, end);
 }
 
 static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
@@ -492,7 +492,7 @@ static inline void free_pud_range(struct
 
 	pud = pud_offset(pgd, start);
 	pgd_clear(pgd);
-	pud_free_tlb(tlb, pud, start);
+	pud_free_tlb(tlb, pud, start, end);
 }
 
 /*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 09/20] mm, arch: Add end argument to p??_free_tlb()
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: peter_zijlstra-p_free_tlb.patch --]
[-- Type: text/plain, Size: 7594 bytes --]

In order to facilitate range tracking we need the end address of the
object we're freeing. The callsites already compute this address so
change things to simply pass it along.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/include/asm/tlb.h         |    6 +++---
 arch/ia64/include/asm/tlb.h        |    6 +++---
 arch/powerpc/mm/hugetlbpage.c      |    4 ++--
 arch/s390/include/asm/tlb.h        |    6 +++---
 arch/sh/include/asm/tlb.h          |    6 +++---
 arch/um/include/asm/tlb.h          |    6 +++---
 include/asm-generic/4level-fixup.h |    2 +-
 include/asm-generic/tlb.h          |    6 +++---
 mm/memory.c                        |   10 +++++-----
 9 files changed, 26 insertions(+), 26 deletions(-)
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -217,9 +217,9 @@ static inline void __pmd_free_tlb(struct
 #endif
 }
 
-#define pte_free_tlb(tlb, ptep, addr)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
+#define pte_free_tlb(tlb, ptep, addr, end)	__pte_free_tlb(tlb, ptep, addr)
+#define pmd_free_tlb(tlb, pmdp, addr, end)	__pmd_free_tlb(tlb, pmdp, addr)
+#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -262,19 +262,19 @@ do {							\
 	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
 } while (0)
 
-#define pte_free_tlb(tlb, ptep, address)		\
+#define pte_free_tlb(tlb, ptep, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pte_free_tlb(tlb, ptep, address);		\
 } while (0)
 
-#define pmd_free_tlb(tlb, ptep, address)		\
+#define pmd_free_tlb(tlb, ptep, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pmd_free_tlb(tlb, ptep, address);		\
 } while (0)
 
-#define pud_free_tlb(tlb, pudp, address)		\
+#define pud_free_tlb(tlb, pudp, address, end)		\
 do {							\
 	tlb->need_flush = 1;				\
 	__pud_free_tlb(tlb, pudp, address);		\
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -503,7 +503,7 @@ static void hugetlb_free_pmd_range(struc
 
 	pmd = pmd_offset(pud, start);
 	pud_clear(pud);
-	pmd_free_tlb(tlb, pmd, start);
+	pmd_free_tlb(tlb, pmd, start, end);
 }
 
 static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
@@ -551,7 +551,7 @@ static void hugetlb_free_pud_range(struc
 
 	pud = pud_offset(pgd, start);
 	pgd_clear(pgd);
-	pud_free_tlb(tlb, pud, start);
+	pud_free_tlb(tlb, pud, start, end);
 }
 
 /*
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -103,7 +103,7 @@ static inline void tlb_remove_page(struc
  * page table from the tlb.
  */
 static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
@@ -120,7 +120,7 @@ static inline void pte_free_tlb(struct m
  * to avoid the double free of the pmd in this case.
  */
 static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
@@ -141,7 +141,7 @@ static inline void pmd_free_tlb(struct m
  * to avoid the double free of the pud in this case.
  */
 static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
-				unsigned long address)
+				unsigned long address, unsigned long end)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -99,9 +99,9 @@ static inline void tlb_remove_page(struc
 	__tlb_remove_page(tlb, page);
 }
 
-#define pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
+#define pte_free_tlb(tlb, ptep, addr, end)	pte_free((tlb)->mm, ptep)
+#define pmd_free_tlb(tlb, pmdp, addr, end)	pmd_free((tlb)->mm, pmdp)
+#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
 
 #define tlb_migrate_finish(mm)		do { } while (0)
 
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -109,11 +109,11 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
+#define pte_free_tlb(tlb, ptep, addr, end) __pte_free_tlb(tlb, ptep, addr)
 
-#define pud_free_tlb(tlb, pudp, addr) __pud_free_tlb(tlb, pudp, addr)
+#define pud_free_tlb(tlb, pudp, addr, end) __pud_free_tlb(tlb, pudp, addr)
 
-#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
+#define pmd_free_tlb(tlb, pmdp, addr, end) __pmd_free_tlb(tlb, pmdp, addr)
 
 #define tlb_migrate_finish(mm) do {} while (0)
 
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -27,7 +27,7 @@
 #define pud_page_vaddr(pud)		pgd_page_vaddr(pud)
 
 #undef pud_free_tlb
-#define pud_free_tlb(tlb, x, addr)	do { } while (0)
+#define pud_free_tlb(tlb, x, addr, end)	do { } while (0)
 #define pud_free(mm, x)			do { } while (0)
 #define __pud_free_tlb(tlb, x, addr)	do { } while (0)
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -166,21 +166,21 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, address)			\
+#define pte_free_tlb(tlb, ptep, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pte_free_tlb(tlb, ptep, address);		\
 	} while (0)
 
 #ifndef __ARCH_HAS_4LEVEL_HACK
-#define pud_free_tlb(tlb, pudp, address)			\
+#define pud_free_tlb(tlb, pudp, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pud_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
 
-#define pmd_free_tlb(tlb, pmdp, address)			\
+#define pmd_free_tlb(tlb, pmdp, address, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
 		__pmd_free_tlb(tlb, pmdp, address);		\
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -421,11 +421,11 @@ void pmd_clear_bad(pmd_t *pmd)
  * has been handled earlier when unmapping all the memory regions.
  */
 static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
-			   unsigned long addr)
+			   unsigned long addr, unsigned long end)
 {
 	pgtable_t token = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
-	pte_free_tlb(tlb, token, addr);
+	pte_free_tlb(tlb, token, addr, end);
 	tlb->mm->nr_ptes--;
 }
 
@@ -443,7 +443,7 @@ static inline void free_pmd_range(struct
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
-		free_pte_range(tlb, pmd, addr);
+		free_pte_range(tlb, pmd, addr, next);
 	} while (pmd++, addr = next, addr != end);
 
 	start &= PUD_MASK;
@@ -459,7 +459,7 @@ static inline void free_pmd_range(struct
 
 	pmd = pmd_offset(pud, start);
 	pud_clear(pud);
-	pmd_free_tlb(tlb, pmd, start);
+	pmd_free_tlb(tlb, pmd, start, end);
 }
 
 static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
@@ -492,7 +492,7 @@ static inline void free_pud_range(struct
 
 	pud = pud_offset(pgd, start);
 	pgd_clear(pgd);
-	pud_free_tlb(tlb, pud, start);
+	pud_free_tlb(tlb, pud, start, end);
 }
 
 /*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 10/20] mm: Provide generic range tracking and flushing
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mm-generic-tlb-range.patch --]
[-- Type: text/plain, Size: 12136 bytes --]

In order to convert various architectures to generic tlb we need to
provide some extra infrastructure to track the range of the flushed
page tables.

There are two mmu_gather cases to consider:

  unmap_region()
    tlb_gather_mmu()
    unmap_vmas()
      for (; vma; vma = vma->vm_next)
        unmap_page_range()
          tlb_start_vma() -> flush cache range/track vm_flags
          zap_*_range()
            arch_enter_lazy_mmu_mode()
            ptep_get_and_clear_full() -> batch/track external tlbs
            tlb_remove_tlb_entry() -> track range/external tlbs
            tlb_remove_page() -> batch page
            arch_lazy_leave_mmu_mode() -> flush external tlbs
          tlb_end_vma()
    free_pgtables()
      while (vma)
        unlink_*_vma()
        free_*_range()
          *_free_tlb() -> track range/batch page
    tlb_finish_mmu() -> flush TLBs and flush everything
  free vmas

and:

  shift_arg_pages()
    tlb_gather_mmu()
    free_*_range()
      *_free_tlb() -> track tlb range
    tlb_finish_mmu() -> flush things

There are various reasons that we need to flush TLBs _after_ tearing
down the page-tables themselves. For some architectures (x86 among
others) this serializes against (both hardware and software) page
table walkers like gup_fast().

For others (ARM) this is (also) needed to evict stale page-table
caches - ARM LPAE mode apparently caches page tables and concurrent
hardware walkers could re-populate these caches if the final tlb flush
were to be from tlb_end_vma() since an concurrent walk could still be
in progress.

So implement generic range tracking over both clearing the PTEs and
tearing down the page-tables.

Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig              |    3 
 include/asm-generic/tlb.h |  193 ++++++++++++++++++++++++++++++++++++++++++----
 mm/memory.c               |    3 
 3 files changed, 185 insertions(+), 14 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -244,6 +244,9 @@ config HAVE_HW_PAGE_TABLE_WALKS
 	  linux page-table structure. Therefore we don't need to emit
 	  hardware TLB flush instructions before freeing page-table pages.
 
+config HAVE_MMU_GATHER_RANGE
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -5,12 +5,77 @@
  * Copyright 2001 Red Hat, Inc.
  * Based on code from mm/memory.c Copyright Linus Torvalds and others.
  *
- * Copyright 2011 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ * Copyright 2011-2012 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
+ *
+ * This generic implementation tries to cover all TLB invalidate needs
+ * across our archicture spectrum, please ask before adding a new arch
+ * specific mmu_gather implementation.
+ *
+ * The TLB shootdown code deals with all the fun races an SMP system bring
+ * to the otherwise simple task of unmapping and freeing pages.
+ *
+ * There are two mmu_gather cases to consider, the below shows the various
+ * hooks and how this implementation employs them:
+ *
+ *   unmap_region()
+ *     tlb_gather_mmu()
+ *     unmap_vmas()
+ *       for (; vma; vma = vma->vm_next)
+ *         unmap_page_range()
+ *           tlb_start_vma() -> flush cache range/track vm_flags
+ *           zap_*_range()
+ *             arch_enter_lazy_mmu_mode()
+ *             ptep_get_and_clear_full() -> batch/track external tlbs
+ *             tlb_remove_tlb_entry() -> track range/external tlbs
+ *             tlb_remove_page() -> batch page
+ *             arch_leave_lazy_mmu_mode() -> flush external tlbs
+ *         tlb_end_vma()
+ *     free_pgtables()
+ *       while (vma)
+ *         unlink_*_vma()
+ *         free_*_range()
+ *           *_free_tlb() -> track range/batch page
+ *     tlb_finish_mmu() -> flush TLBs and pages
+ *   free vmas
+ *
+ * and:
+ *
+ *   shift_arg_pages()
+ *     tlb_gather_mmu()
+ *     free_*_range()
+ *       *_free_tlb() -> track range/batch page
+ *     tlb_finish_mmu() -> flush TLBs and pages
+ *
+ * This code has 3 relevant Kconfig knobs:
+ *
+ *  CONFIG_HAVE_MMU_GATHER_RANGE -- In case the architecture has an efficient
+ *    flush_tlb_range() implementation this adds range tracking to the
+ *    mmu_gather and avoids full mm invalidation where possible.
+ *
+ *    There's a number of curious details wrt passing a vm_area_struct, see
+ *    our tlb_start_vma() implementation.
+ *
+ *  CONFIG_HAVE_RCU_TABLE_FREE -- In case flush_tlb_*() doesn't
+ *    serialize software walkers against page-table tear-down. This option
+ *    enables a semi-RCU freeing of page-tables such that disabling IRQs
+ *    will still provide the required serialization. See the big comment
+ *    a page or so down.
+ *
+ *  CONFIG_HAVE_HW_PAGE_TABLE_WALKS -- Optimization for architectures with
+ *    'external' hash-table MMUs and similar which don't require a TLB
+ *    invalidate before freeing page-tables, always used in conjunction
+ *    with CONFIG_HAVE_RCU_TABLE_FREE to provide proper serialization for
+ *    software page-table walkers.
+ *
+ *    For instance SPARC64 and PPC use arch_{enter,leave}_lazy_mmu_mode()
+ *    toghether with ptep_get_and_clear_full() to wipe their hash-table.
+ *
+ *    See arch/Kconfig for more details.
  */
 #ifndef _ASM_GENERIC__TLB_H
 #define _ASM_GENERIC__TLB_H
@@ -37,7 +102,8 @@ struct mmu_gather_batch {
 #define MAX_GATHER_BATCH	\
 	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
 
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
 struct mmu_gather {
@@ -45,6 +111,10 @@ struct mmu_gather {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+	unsigned long		start, end;
+	unsigned long		vm_flags;
+#endif
 	unsigned int		need_flush : 1,	/* Did free PTEs */
 				fast_mode  : 1; /* No batching   */
 
@@ -83,6 +153,16 @@ struct mmu_gather {
  * pressure. To guarantee progress we fall back to single table freeing, see
  * the implementation of tlb_remove_table_one().
  *
+ * When this option is selected, the arch is expected to use:
+ *
+ *  void tlb_remove_table(struct mmu_gather *tlb, void *table)
+ *
+ * to 'free' page-tables from their respective __{pte,pmd,pud}_free_tlb()
+ * implementations and has to provide an implementation of:
+ *
+ *   void  __tlb_remove_table(void *);
+ *
+ * that actually does the free.
  */
 struct mmu_table_batch {
 	struct rcu_head		rcu;
@@ -118,8 +198,90 @@ static inline void tlb_remove_table(stru
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+void tlb_flush_mmu(struct mmu_gather *tlb);
+
 #define HAVE_GENERIC_MMU_GATHER
 
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+	tlb->start = TASK_SIZE;
+	tlb->end = 0;
+	tlb->vm_flags = 0;
+}
+
+static inline void
+tlb_track_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end)
+{
+	if (!tlb->fullmm) {
+		tlb->start = min(tlb->start, addr);
+		tlb->end = max(tlb->end, end);
+	}
+}
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	/*
+	 * Fake VMA, some architectures use VM_EXEC to flush I-TLB/I$,
+	 * and some use VM_HUGETLB since they have separate HPAGE TLBs.
+	 */
+	struct vm_area_struct vma = {
+		.vm_mm = tlb->mm,
+		.vm_flags = tlb->vm_flags,
+	};
+
+	flush_tlb_range(&vma, tlb->start, tlb->end);
+	tlb_range_init(tlb);
+}
+
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * flush_tlb_range() implementations that look at VM_HUGETLB
+	 * (tile, mips-r4k) flush only large pages, so force flush on
+	 * VM_HUGETLB vma boundaries.
+	 */
+	if ((tlb->vm_flags & VM_HUGETLB) != (vma->vm_flags & VM_HUGETLB))
+		tlb_flush_mmu(tlb);
+
+	/*
+	 * flush_tlb_range() implementations that flush I-TLB also flush
+	 * D-TLB (tile, extensa, arm), so its ok to just add VM_EXEC to
+	 * an existing range.
+	 */
+	tlb->vm_flags |= vma->vm_flags & (VM_EXEC|VM_HUGETLB);
+
+	flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
+#else /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+}
+
+/*
+ * Macro avoids argument evaluation.
+ */
+#define tlb_track_range(tlb, addr, end) do { } while (0)
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	flush_tlb_mm(tlb->mm);
+}
+
+#endif /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_SMP
@@ -134,7 +296,6 @@ static inline int tlb_fast_mode(struct m
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
-void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end);
 int __tlb_remove_page(struct mmu_gather *tlb, struct page *page);
 
@@ -155,10 +316,11 @@ static inline void tlb_remove_page(struc
  * later optimise away the tlb invalidate.   This helps when userspace is
  * unmapping already-unmapped pages, which happens quite a lot.
  */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
+#define tlb_remove_tlb_entry(tlb, ptep, addr)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
+		tlb_track_range(tlb, addr, addr + PAGE_SIZE);	\
+		__tlb_remove_tlb_entry(tlb, ptep, addr);	\
 	} while (0)
 
 /**
@@ -175,26 +337,31 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, address, end)			\
+#define pte_free_tlb(tlb, ptep, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pte_free_tlb(tlb, ptep, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pte_free_tlb(tlb, ptep, addr);		\
 	} while (0)
 
-#ifndef __ARCH_HAS_4LEVEL_HACK
-#define pud_free_tlb(tlb, pudp, address, end)			\
+#define pmd_free_tlb(tlb, pmdp, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pud_free_tlb(tlb, pudp, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pmd_free_tlb(tlb, pmdp, addr);		\
 	} while (0)
-#endif
 
-#define pmd_free_tlb(tlb, pmdp, address, end)			\
+#ifndef __ARCH_HAS_4LEVEL_HACK
+#define pud_free_tlb(tlb, pudp, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pmd_free_tlb(tlb, pmdp, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pud_free_tlb(tlb, pudp, addr);		\
 	} while (0)
+#endif
 
+#ifndef tlb_migrate_finish
 #define tlb_migrate_finish(mm) do {} while (0)
+#endif
 
 #endif /* _ASM_GENERIC__TLB_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -214,6 +214,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 
+	tlb_range_init(tlb);
 	tlb_table_init(tlb);
 
 	if (fullmm) {
@@ -228,7 +229,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 
 	if (!tlb->fullmm && tlb->need_flush) {
 		tlb->need_flush = 0;
-		flush_tlb_mm(tlb->mm);
+		tlb_flush(tlb);
 	}
 
 	tlb_table_flush(tlb);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 10/20] mm: Provide generic range tracking and flushing
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mm-generic-tlb-range.patch --]
[-- Type: text/plain, Size: 12136 bytes --]

In order to convert various architectures to generic tlb we need to
provide some extra infrastructure to track the range of the flushed
page tables.

There are two mmu_gather cases to consider:

  unmap_region()
    tlb_gather_mmu()
    unmap_vmas()
      for (; vma; vma = vma->vm_next)
        unmap_page_range()
          tlb_start_vma() -> flush cache range/track vm_flags
          zap_*_range()
            arch_enter_lazy_mmu_mode()
            ptep_get_and_clear_full() -> batch/track external tlbs
            tlb_remove_tlb_entry() -> track range/external tlbs
            tlb_remove_page() -> batch page
            arch_lazy_leave_mmu_mode() -> flush external tlbs
          tlb_end_vma()
    free_pgtables()
      while (vma)
        unlink_*_vma()
        free_*_range()
          *_free_tlb() -> track range/batch page
    tlb_finish_mmu() -> flush TLBs and flush everything
  free vmas

and:

  shift_arg_pages()
    tlb_gather_mmu()
    free_*_range()
      *_free_tlb() -> track tlb range
    tlb_finish_mmu() -> flush things

There are various reasons that we need to flush TLBs _after_ tearing
down the page-tables themselves. For some architectures (x86 among
others) this serializes against (both hardware and software) page
table walkers like gup_fast().

For others (ARM) this is (also) needed to evict stale page-table
caches - ARM LPAE mode apparently caches page tables and concurrent
hardware walkers could re-populate these caches if the final tlb flush
were to be from tlb_end_vma() since an concurrent walk could still be
in progress.

So implement generic range tracking over both clearing the PTEs and
tearing down the page-tables.

Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig              |    3 
 include/asm-generic/tlb.h |  193 ++++++++++++++++++++++++++++++++++++++++++----
 mm/memory.c               |    3 
 3 files changed, 185 insertions(+), 14 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -244,6 +244,9 @@ config HAVE_HW_PAGE_TABLE_WALKS
 	  linux page-table structure. Therefore we don't need to emit
 	  hardware TLB flush instructions before freeing page-table pages.
 
+config HAVE_MMU_GATHER_RANGE
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -5,12 +5,77 @@
  * Copyright 2001 Red Hat, Inc.
  * Based on code from mm/memory.c Copyright Linus Torvalds and others.
  *
- * Copyright 2011 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ * Copyright 2011-2012 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
+ *
+ * This generic implementation tries to cover all TLB invalidate needs
+ * across our archicture spectrum, please ask before adding a new arch
+ * specific mmu_gather implementation.
+ *
+ * The TLB shootdown code deals with all the fun races an SMP system bring
+ * to the otherwise simple task of unmapping and freeing pages.
+ *
+ * There are two mmu_gather cases to consider, the below shows the various
+ * hooks and how this implementation employs them:
+ *
+ *   unmap_region()
+ *     tlb_gather_mmu()
+ *     unmap_vmas()
+ *       for (; vma; vma = vma->vm_next)
+ *         unmap_page_range()
+ *           tlb_start_vma() -> flush cache range/track vm_flags
+ *           zap_*_range()
+ *             arch_enter_lazy_mmu_mode()
+ *             ptep_get_and_clear_full() -> batch/track external tlbs
+ *             tlb_remove_tlb_entry() -> track range/external tlbs
+ *             tlb_remove_page() -> batch page
+ *             arch_leave_lazy_mmu_mode() -> flush external tlbs
+ *         tlb_end_vma()
+ *     free_pgtables()
+ *       while (vma)
+ *         unlink_*_vma()
+ *         free_*_range()
+ *           *_free_tlb() -> track range/batch page
+ *     tlb_finish_mmu() -> flush TLBs and pages
+ *   free vmas
+ *
+ * and:
+ *
+ *   shift_arg_pages()
+ *     tlb_gather_mmu()
+ *     free_*_range()
+ *       *_free_tlb() -> track range/batch page
+ *     tlb_finish_mmu() -> flush TLBs and pages
+ *
+ * This code has 3 relevant Kconfig knobs:
+ *
+ *  CONFIG_HAVE_MMU_GATHER_RANGE -- In case the architecture has an efficient
+ *    flush_tlb_range() implementation this adds range tracking to the
+ *    mmu_gather and avoids full mm invalidation where possible.
+ *
+ *    There's a number of curious details wrt passing a vm_area_struct, see
+ *    our tlb_start_vma() implementation.
+ *
+ *  CONFIG_HAVE_RCU_TABLE_FREE -- In case flush_tlb_*() doesn't
+ *    serialize software walkers against page-table tear-down. This option
+ *    enables a semi-RCU freeing of page-tables such that disabling IRQs
+ *    will still provide the required serialization. See the big comment
+ *    a page or so down.
+ *
+ *  CONFIG_HAVE_HW_PAGE_TABLE_WALKS -- Optimization for architectures with
+ *    'external' hash-table MMUs and similar which don't require a TLB
+ *    invalidate before freeing page-tables, always used in conjunction
+ *    with CONFIG_HAVE_RCU_TABLE_FREE to provide proper serialization for
+ *    software page-table walkers.
+ *
+ *    For instance SPARC64 and PPC use arch_{enter,leave}_lazy_mmu_mode()
+ *    toghether with ptep_get_and_clear_full() to wipe their hash-table.
+ *
+ *    See arch/Kconfig for more details.
  */
 #ifndef _ASM_GENERIC__TLB_H
 #define _ASM_GENERIC__TLB_H
@@ -37,7 +102,8 @@ struct mmu_gather_batch {
 #define MAX_GATHER_BATCH	\
 	((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
 
-/* struct mmu_gather is an opaque type used by the mm code for passing around
+/*
+ * struct mmu_gather is an opaque type used by the mm code for passing around
  * any data needed by arch specific code for tlb_remove_page.
  */
 struct mmu_gather {
@@ -45,6 +111,10 @@ struct mmu_gather {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	struct mmu_table_batch	*batch;
 #endif
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+	unsigned long		start, end;
+	unsigned long		vm_flags;
+#endif
 	unsigned int		need_flush : 1,	/* Did free PTEs */
 				fast_mode  : 1; /* No batching   */
 
@@ -83,6 +153,16 @@ struct mmu_gather {
  * pressure. To guarantee progress we fall back to single table freeing, see
  * the implementation of tlb_remove_table_one().
  *
+ * When this option is selected, the arch is expected to use:
+ *
+ *  void tlb_remove_table(struct mmu_gather *tlb, void *table)
+ *
+ * to 'free' page-tables from their respective __{pte,pmd,pud}_free_tlb()
+ * implementations and has to provide an implementation of:
+ *
+ *   void  __tlb_remove_table(void *);
+ *
+ * that actually does the free.
  */
 struct mmu_table_batch {
 	struct rcu_head		rcu;
@@ -118,8 +198,90 @@ static inline void tlb_remove_table(stru
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+void tlb_flush_mmu(struct mmu_gather *tlb);
+
 #define HAVE_GENERIC_MMU_GATHER
 
+#ifdef CONFIG_HAVE_MMU_GATHER_RANGE
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+	tlb->start = TASK_SIZE;
+	tlb->end = 0;
+	tlb->vm_flags = 0;
+}
+
+static inline void
+tlb_track_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end)
+{
+	if (!tlb->fullmm) {
+		tlb->start = min(tlb->start, addr);
+		tlb->end = max(tlb->end, end);
+	}
+}
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	/*
+	 * Fake VMA, some architectures use VM_EXEC to flush I-TLB/I$,
+	 * and some use VM_HUGETLB since they have separate HPAGE TLBs.
+	 */
+	struct vm_area_struct vma = {
+		.vm_mm = tlb->mm,
+		.vm_flags = tlb->vm_flags,
+	};
+
+	flush_tlb_range(&vma, tlb->start, tlb->end);
+	tlb_range_init(tlb);
+}
+
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+	if (tlb->fullmm)
+		return;
+
+	/*
+	 * flush_tlb_range() implementations that look at VM_HUGETLB
+	 * (tile, mips-r4k) flush only large pages, so force flush on
+	 * VM_HUGETLB vma boundaries.
+	 */
+	if ((tlb->vm_flags & VM_HUGETLB) != (vma->vm_flags & VM_HUGETLB))
+		tlb_flush_mmu(tlb);
+
+	/*
+	 * flush_tlb_range() implementations that flush I-TLB also flush
+	 * D-TLB (tile, extensa, arm), so its ok to just add VM_EXEC to
+	 * an existing range.
+	 */
+	tlb->vm_flags |= vma->vm_flags & (VM_EXEC|VM_HUGETLB);
+
+	flush_cache_range(vma, vma->vm_start, vma->vm_end);
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+}
+
+#else /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
+static inline void tlb_range_init(struct mmu_gather *tlb)
+{
+}
+
+/*
+ * Macro avoids argument evaluation.
+ */
+#define tlb_track_range(tlb, addr, end) do { } while (0)
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+	flush_tlb_mm(tlb->mm);
+}
+
+#endif /* CONFIG_HAVE_MMU_GATHER_RANGE */
+
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_SMP
@@ -134,7 +296,6 @@ static inline int tlb_fast_mode(struct m
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
-void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end);
 int __tlb_remove_page(struct mmu_gather *tlb, struct page *page);
 
@@ -155,10 +316,11 @@ static inline void tlb_remove_page(struc
  * later optimise away the tlb invalidate.   This helps when userspace is
  * unmapping already-unmapped pages, which happens quite a lot.
  */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
+#define tlb_remove_tlb_entry(tlb, ptep, addr)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
+		tlb_track_range(tlb, addr, addr + PAGE_SIZE);	\
+		__tlb_remove_tlb_entry(tlb, ptep, addr);	\
 	} while (0)
 
 /**
@@ -175,26 +337,31 @@ static inline void tlb_remove_page(struc
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);	\
 	} while (0)
 
-#define pte_free_tlb(tlb, ptep, address, end)			\
+#define pte_free_tlb(tlb, ptep, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pte_free_tlb(tlb, ptep, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pte_free_tlb(tlb, ptep, addr);		\
 	} while (0)
 
-#ifndef __ARCH_HAS_4LEVEL_HACK
-#define pud_free_tlb(tlb, pudp, address, end)			\
+#define pmd_free_tlb(tlb, pmdp, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pud_free_tlb(tlb, pudp, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pmd_free_tlb(tlb, pmdp, addr);		\
 	} while (0)
-#endif
 
-#define pmd_free_tlb(tlb, pmdp, address, end)			\
+#ifndef __ARCH_HAS_4LEVEL_HACK
+#define pud_free_tlb(tlb, pudp, addr, end)			\
 	do {							\
 		tlb->need_flush = 1;				\
-		__pmd_free_tlb(tlb, pmdp, address);		\
+		tlb_track_range(tlb, addr, end);		\
+		__pud_free_tlb(tlb, pudp, addr);		\
 	} while (0)
+#endif
 
+#ifndef tlb_migrate_finish
 #define tlb_migrate_finish(mm) do {} while (0)
+#endif
 
 #endif /* _ASM_GENERIC__TLB_H */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -214,6 +214,7 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
 	tlb->active     = &tlb->local;
 
+	tlb_range_init(tlb);
 	tlb_table_init(tlb);
 
 	if (fullmm) {
@@ -228,7 +229,7 @@ void tlb_flush_mmu(struct mmu_gather *tl
 
 	if (!tlb->fullmm && tlb->need_flush) {
 		tlb->need_flush = 0;
-		flush_tlb_mm(tlb->mm);
+		tlb_flush(tlb);
 	}
 
 	tlb_table_flush(tlb);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: s390-mmu_range.patch --]
[-- Type: text/plain, Size: 5641 bytes --]

Now that s390 is using the generic RCU freeing of page-table pages,
all that remains different wrt the generic mmu_gather code is the lack
of mmu_gather based TLB flushing for regular entries.

S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
before __tlb_remove_page() because its ptep_get_and_clear*() family
already does a full TLB invalidate. Therefore force it to use
tlb_fast_mode.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/s390/include/asm/pgtable.h |    1 
 arch/s390/include/asm/tlb.h     |   85 ++++------------------------------------
 include/asm-generic/tlb.h       |    7 +++
 3 files changed, 17 insertions(+), 76 deletions(-)
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1242,6 +1242,7 @@ extern int s390_enable_sie(void);
  * No page table caches to initialise
  */
 #define pgtable_cache_init()	do { } while (0)
+#define check_pgt_cache()	do { } while (0)
 
 #include <asm-generic/pgtable.h>
 
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -28,82 +28,16 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-struct mmu_gather {
-	struct mm_struct *mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch *batch;
-#endif
-	unsigned int fullmm;
-	unsigned int need_flush;
-};
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-#endif
-
-static inline void tlb_gather_mmu(struct mmu_gather *tlb,
-				  struct mm_struct *mm,
-				  unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-	tlb->need_flush = 0;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
-	if (tlb->fullmm)
-		__tlb_flush_mm(mm);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	__tlb_flush_mm(tlb->mm);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-}
-
-static inline void tlb_finish_mmu(struct mmu_gather *tlb,
-				  unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-}
+#define tlb_fast_mode(tlb)	(1)
 
-/*
- * Release the page cache reference for a pte removed by
- * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
- * has already been freed, so just do free_page_and_swap_cache.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-}
+#include <asm-generic/tlb.h>
 
 /*
  * pte_free_tlb frees a pte table and clears the CRSTE for the
  * page table from the tlb.
  */
-static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address, unsigned long end)
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
+				unsigned long address)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
@@ -119,8 +53,8 @@ static inline void pte_free_tlb(struct m
  * as the pgd. pmd_free_tlb checks the asce_limit against 2GB
  * to avoid the double free of the pmd in this case.
  */
-static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-				unsigned long address, unsigned long end)
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
+				unsigned long address)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
@@ -140,8 +74,8 @@ static inline void pmd_free_tlb(struct m
  * as the pgd. pud_free_tlb checks the asce_limit against 4TB
  * to avoid the double free of the pud in this case.
  */
-static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
-				unsigned long address, unsigned long end)
+static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
+				unsigned long address)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
@@ -156,7 +90,6 @@ static inline void pud_free_tlb(struct m
 
 #define tlb_start_vma(tlb, vma)			do { } while (0)
 #define tlb_end_vma(tlb, vma)			do { } while (0)
-#define tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
-#define tlb_migrate_finish(mm)			do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
 
 #endif /* _S390_TLB_H */
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -207,6 +207,12 @@ static inline void tlb_flush(struct mmu_
 
 #endif /* CONFIG_HAVE_MMU_GATHER_RANGE */
 
+/*
+ * Some architectures (s390) do a TLB flush from their ptep_get_and_clear*()
+ * functions, these archs don't need another TLB invalidate and can free their
+ * pages immediately. They'll over-ride tlb_fast_mode with a constant enable.
+ */
+#ifndef tlb_fast_mode
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_SMP
@@ -219,6 +225,7 @@ static inline int tlb_fast_mode(struct m
 	return 1;
 #endif
 }
+#endif
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: s390-mmu_range.patch --]
[-- Type: text/plain, Size: 5641 bytes --]

Now that s390 is using the generic RCU freeing of page-table pages,
all that remains different wrt the generic mmu_gather code is the lack
of mmu_gather based TLB flushing for regular entries.

S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
before __tlb_remove_page() because its ptep_get_and_clear*() family
already does a full TLB invalidate. Therefore force it to use
tlb_fast_mode.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/s390/include/asm/pgtable.h |    1 
 arch/s390/include/asm/tlb.h     |   85 ++++------------------------------------
 include/asm-generic/tlb.h       |    7 +++
 3 files changed, 17 insertions(+), 76 deletions(-)
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1242,6 +1242,7 @@ extern int s390_enable_sie(void);
  * No page table caches to initialise
  */
 #define pgtable_cache_init()	do { } while (0)
+#define check_pgt_cache()	do { } while (0)
 
 #include <asm-generic/pgtable.h>
 
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -28,82 +28,16 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-struct mmu_gather {
-	struct mm_struct *mm;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	struct mmu_table_batch *batch;
-#endif
-	unsigned int fullmm;
-	unsigned int need_flush;
-};
-
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-struct mmu_table_batch {
-	struct rcu_head		rcu;
-	unsigned int		nr;
-	void			*tables[0];
-};
-
-#define MAX_TABLE_BATCH		\
-	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-#endif
-
-static inline void tlb_gather_mmu(struct mmu_gather *tlb,
-				  struct mm_struct *mm,
-				  unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-	tlb->need_flush = 0;
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb->batch = NULL;
-#endif
-	if (tlb->fullmm)
-		__tlb_flush_mm(mm);
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	__tlb_flush_mm(tlb->mm);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
-}
-
-static inline void tlb_finish_mmu(struct mmu_gather *tlb,
-				  unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-}
+#define tlb_fast_mode(tlb)	(1)
 
-/*
- * Release the page cache reference for a pte removed by
- * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
- * has already been freed, so just do free_page_and_swap_cache.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-}
+#include <asm-generic/tlb.h>
 
 /*
  * pte_free_tlb frees a pte table and clears the CRSTE for the
  * page table from the tlb.
  */
-static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-				unsigned long address, unsigned long end)
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
+				unsigned long address)
 {
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	if (!tlb->fullmm)
@@ -119,8 +53,8 @@ static inline void pte_free_tlb(struct m
  * as the pgd. pmd_free_tlb checks the asce_limit against 2GB
  * to avoid the double free of the pmd in this case.
  */
-static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-				unsigned long address, unsigned long end)
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
+				unsigned long address)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 31))
@@ -140,8 +74,8 @@ static inline void pmd_free_tlb(struct m
  * as the pgd. pud_free_tlb checks the asce_limit against 4TB
  * to avoid the double free of the pud in this case.
  */
-static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
-				unsigned long address, unsigned long end)
+static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
+				unsigned long address)
 {
 #ifdef CONFIG_64BIT
 	if (tlb->mm->context.asce_limit <= (1UL << 42))
@@ -156,7 +90,6 @@ static inline void pud_free_tlb(struct m
 
 #define tlb_start_vma(tlb, vma)			do { } while (0)
 #define tlb_end_vma(tlb, vma)			do { } while (0)
-#define tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
-#define tlb_migrate_finish(mm)			do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, addr)	do { } while (0)
 
 #endif /* _S390_TLB_H */
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -207,6 +207,12 @@ static inline void tlb_flush(struct mmu_
 
 #endif /* CONFIG_HAVE_MMU_GATHER_RANGE */
 
+/*
+ * Some architectures (s390) do a TLB flush from their ptep_get_and_clear*()
+ * functions, these archs don't need another TLB invalidate and can free their
+ * pages immediately. They'll over-ride tlb_fast_mode with a constant enable.
+ */
+#ifndef tlb_fast_mode
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_SMP
@@ -219,6 +225,7 @@ static inline int tlb_fast_mode(struct m
 	return 1;
 #endif
 }
+#endif
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 12/20] mm, arm: Convert arm to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mm-arm-tlb-range.patch --]
[-- Type: text/plain, Size: 7779 bytes --]

Might want to optimize the tlb_flush() function to do a full mm flush
when the range is 'large', IA64 does this too.

Cc: Russell King <rmk@arm.linux.org.uk>
Fixes-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/Kconfig           |    1 
 arch/arm/include/asm/tlb.h |  181 +++------------------------------------------
 include/asm-generic/tlb.h  |    4 
 3 files changed, 19 insertions(+), 167 deletions(-)
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -45,6 +45,7 @@ config ARM
 	select GENERIC_SMP_IDLE_THREAD
 	select KTIME_SCALAR
 	select GENERIC_CLOCKEVENTS_BROADCAST if SMP
+	select HAVE_MMU_GATHER_RANGE if MMU
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -27,183 +27,37 @@
 
 #else /* !CONFIG_MMU */
 
-#include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
-#define MMU_GATHER_BUNDLE	8
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	struct vm_area_struct	*vma;
-	unsigned long		range_start;
-	unsigned long		range_end;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		**pages;
-	struct page		*local[MMU_GATHER_BUNDLE];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * This is unnecessarily complex.  There's three ways the TLB shootdown
- * code is used:
- *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
- *     tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.
- *  2. Unmapping all vmas.  See exit_mmap().
- *     tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.  Additionally, page tables will be freed.
- *  3. Unmapping argument pages.  See shift_arg_pages().
- *     tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
- *     tlb->vma will be NULL.
- */
-static inline void tlb_flush(struct mmu_gather *tlb)
-{
-	if (tlb->fullmm || !tlb->vma)
-		flush_tlb_mm(tlb->mm);
-	else if (tlb->range_end > 0) {
-		flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
-{
-	if (!tlb->fullmm) {
-		if (addr < tlb->range_start)
-			tlb->range_start = addr;
-		if (addr + PAGE_SIZE > tlb->range_end)
-			tlb->range_end = addr + PAGE_SIZE;
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(struct page *);
-	}
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
-}
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
-{
-	tlb->mm = mm;
-	tlb->fullmm = fullmm;
-	tlb->vma = NULL;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	__tlb_alloc_page(tlb);
-}
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr);
 
 static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
+__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr);
 
 /*
- * Memorize the range for the TLB flush.
+ * ARMv7 speculative prefetch can drag new entries into the TLB at any time
+ * so we have to unconditionally disable tlb_fast_mode, even on UP.
  */
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
-	tlb_add_flush(tlb, addr);
-}
+#ifdef CONFIG_CPU_32v7
+#define tlb_fast_mode(tlb)	(0)
+#endif
 
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm) {
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-		tlb->vma = vma;
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
+#include <asm-generic/tlb.h>
 
 static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		tlb_flush(tlb);
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
-	tlb->pages[tlb->nr++] = page;
-	VM_BUG_ON(tlb->nr > tlb->max);
-	return tlb->max - tlb->nr;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (!__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-	unsigned long addr)
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
 	pgtable_page_dtor(pte);
 
+#ifndef CONFIG_ARM_LPAE
 	/*
 	 * With the classic ARM MMU, a pte page has two corresponding pmd
 	 * entries, each covering 1MB.
 	 */
-	addr &= PMD_MASK;
-	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
-	tlb_add_flush(tlb, addr + SZ_1M);
+	addr = (addr & PMD_MASK) + SZ_1M;
+	tlb_track_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE);
+#endif
 
 	tlb_remove_page(tlb, pte);
 }
@@ -212,16 +66,9 @@ static inline void __pmd_free_tlb(struct
 				  unsigned long addr)
 {
 #ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
 	tlb_remove_page(tlb, virt_to_page(pmdp));
 #endif
 }
 
-#define pte_free_tlb(tlb, ptep, addr, end)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr, end)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
-
 #endif /* CONFIG_MMU */
 #endif
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -286,6 +286,10 @@ static inline void tlb_flush(struct mmu_
  * Some architectures (s390) do a TLB flush from their ptep_get_and_clear*()
  * functions, these archs don't need another TLB invalidate and can free their
  * pages immediately. They'll over-ride tlb_fast_mode with a constant enable.
+ *
+ * Other archs (ARMv7) can have speculative TLB loaders such that we have
+ * concurrency, even on UP, and have to over-ride tlb_fast_mode with a constant
+ * disable.
  */
 #ifndef tlb_fast_mode
 static inline int tlb_fast_mode(struct mmu_gather *tlb)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 12/20] mm, arm: Convert arm to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mm-arm-tlb-range.patch --]
[-- Type: text/plain, Size: 7779 bytes --]

Might want to optimize the tlb_flush() function to do a full mm flush
when the range is 'large', IA64 does this too.

Cc: Russell King <rmk@arm.linux.org.uk>
Fixes-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/Kconfig           |    1 
 arch/arm/include/asm/tlb.h |  181 +++------------------------------------------
 include/asm-generic/tlb.h  |    4 
 3 files changed, 19 insertions(+), 167 deletions(-)
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -45,6 +45,7 @@ config ARM
 	select GENERIC_SMP_IDLE_THREAD
 	select KTIME_SCALAR
 	select GENERIC_CLOCKEVENTS_BROADCAST if SMP
+	select HAVE_MMU_GATHER_RANGE if MMU
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -27,183 +27,37 @@
 
 #else /* !CONFIG_MMU */
 
-#include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
-#define MMU_GATHER_BUNDLE	8
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	struct vm_area_struct	*vma;
-	unsigned long		range_start;
-	unsigned long		range_end;
-	unsigned int		nr;
-	unsigned int		max;
-	struct page		**pages;
-	struct page		*local[MMU_GATHER_BUNDLE];
-};
-
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
-/*
- * This is unnecessarily complex.  There's three ways the TLB shootdown
- * code is used:
- *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
- *     tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.
- *  2. Unmapping all vmas.  See exit_mmap().
- *     tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
- *     tlb->vma will be non-NULL.  Additionally, page tables will be freed.
- *  3. Unmapping argument pages.  See shift_arg_pages().
- *     tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
- *     tlb->vma will be NULL.
- */
-static inline void tlb_flush(struct mmu_gather *tlb)
-{
-	if (tlb->fullmm || !tlb->vma)
-		flush_tlb_mm(tlb->mm);
-	else if (tlb->range_end > 0) {
-		flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
-
-static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
-{
-	if (!tlb->fullmm) {
-		if (addr < tlb->range_start)
-			tlb->range_start = addr;
-		if (addr + PAGE_SIZE > tlb->range_end)
-			tlb->range_end = addr + PAGE_SIZE;
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(struct page *);
-	}
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
-}
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
-{
-	tlb->mm = mm;
-	tlb->fullmm = fullmm;
-	tlb->vma = NULL;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	tlb->nr = 0;
-	__tlb_alloc_page(tlb);
-}
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr);
 
 static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
+__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr);
 
 /*
- * Memorize the range for the TLB flush.
+ * ARMv7 speculative prefetch can drag new entries into the TLB at any time
+ * so we have to unconditionally disable tlb_fast_mode, even on UP.
  */
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
-{
-	tlb_add_flush(tlb, addr);
-}
+#ifdef CONFIG_CPU_32v7
+#define tlb_fast_mode(tlb)	(0)
+#endif
 
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm) {
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-		tlb->vma = vma;
-		tlb->range_start = TASK_SIZE;
-		tlb->range_end = 0;
-	}
-}
+#include <asm-generic/tlb.h>
 
 static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		tlb_flush(tlb);
-}
-
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
-	tlb->pages[tlb->nr++] = page;
-	VM_BUG_ON(tlb->nr > tlb->max);
-	return tlb->max - tlb->nr;
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (!__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
-	unsigned long addr)
+__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
 	pgtable_page_dtor(pte);
 
+#ifndef CONFIG_ARM_LPAE
 	/*
 	 * With the classic ARM MMU, a pte page has two corresponding pmd
 	 * entries, each covering 1MB.
 	 */
-	addr &= PMD_MASK;
-	tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE);
-	tlb_add_flush(tlb, addr + SZ_1M);
+	addr = (addr & PMD_MASK) + SZ_1M;
+	tlb_track_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE);
+#endif
 
 	tlb_remove_page(tlb, pte);
 }
@@ -212,16 +66,9 @@ static inline void __pmd_free_tlb(struct
 				  unsigned long addr)
 {
 #ifdef CONFIG_ARM_LPAE
-	tlb_add_flush(tlb, addr);
 	tlb_remove_page(tlb, virt_to_page(pmdp));
 #endif
 }
 
-#define pte_free_tlb(tlb, ptep, addr, end)	__pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr, end)	__pmd_free_tlb(tlb, pmdp, addr)
-#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
-
 #endif /* CONFIG_MMU */
 #endif
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -286,6 +286,10 @@ static inline void tlb_flush(struct mmu_
  * Some architectures (s390) do a TLB flush from their ptep_get_and_clear*()
  * functions, these archs don't need another TLB invalidate and can free their
  * pages immediately. They'll over-ride tlb_fast_mode with a constant enable.
+ *
+ * Other archs (ARMv7) can have speculative TLB loaders such that we have
+ * concurrency, even on UP, and have to over-ride tlb_fast_mode with a constant
+ * disable.
  */
 #ifndef tlb_fast_mode
 static inline int tlb_fast_mode(struct mmu_gather *tlb)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 13/20] mm, ia64: Convert ia64 to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mm-ia64-tlb-range.patch --]
[-- Type: text/plain, Size: 10331 bytes --]

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/ia64/Kconfig                |    1 
 arch/ia64/include/asm/tlb.h      |  233 ---------------------------------------
 arch/ia64/include/asm/tlbflush.h |   25 ++++
 arch/ia64/mm/tlb.c               |   24 +++-
 4 files changed, 49 insertions(+), 234 deletions(-)
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -28,6 +28,7 @@ config IA64
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
+	select HAVE_MMU_GATHER_RANGE
 	select IRQ_PER_CPU
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_OPTIONAL_GPIOLIB
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,238 +46,9 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define	IA64_GATHER_BUNDLE	8
-
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
-	unsigned int		max;
-	unsigned char		fullmm;		/* non-zero means full mm flush */
-	unsigned char		need_flush;	/* really unmapped some PTEs? */
-	unsigned long		start_addr;
-	unsigned long		end_addr;
-	struct page		**pages;
-	struct page		*local[IA64_GATHER_BUNDLE];
-};
-
-struct ia64_tr_entry {
-	u64 ifa;
-	u64 itir;
-	u64 pte;
-	u64 rr;
-}; /*Record for tr entry!*/
-
-extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
-extern void ia64_ptr_entry(u64 target_mask, int slot);
-
-extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
-
-/*
- region register macros
-*/
-#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
-#define RR_VE(val)	(((val) & 0x0000000000000001) << 0)
-#define RR_VE_MASK	0x0000000000000001L
-#define RR_VE_SHIFT	0
-#define RR_TO_PS(val)	(((val) >> 2) & 0x000000000000003f)
-#define RR_PS(val)	(((val) & 0x000000000000003f) << 2)
-#define RR_PS_MASK	0x00000000000000fcL
-#define RR_PS_SHIFT	2
-#define RR_RID_MASK	0x00000000ffffff00L
-#define RR_TO_RID(val) 	((val >> 8) & 0xffffff)
-
-/*
- * Flush the TLB for address range START to END and, if not in fast mode, release the
- * freed pages that where gathered up to this point.
- */
-static inline void
-ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	unsigned int nr;
-
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-
-	if (tlb->fullmm) {
-		/*
-		 * Tearing down the entire address space.  This happens both as a result
-		 * of exit() and execve().  The latter case necessitates the call to
-		 * flush_tlb_mm() here.
-		 */
-		flush_tlb_mm(tlb->mm);
-	} else if (unlikely (end - start >= 1024*1024*1024*1024UL
-			     || REGION_NUMBER(start) != REGION_NUMBER(end - 1)))
-	{
-		/*
-		 * If we flush more than a tera-byte or across regions, we're probably
-		 * better off just flushing the entire TLB(s).  This should be very rare
-		 * and is not worth optimizing for.
-		 */
-		flush_tlb_all();
-	} else {
-		/*
-		 * XXX fix me: flush_tlb_range() should take an mm pointer instead of a
-		 * vma pointer.
-		 */
-		struct vm_area_struct vma;
-
-		vma.vm_mm = tlb->mm;
-		/* flush the address range from the tlb: */
-		flush_tlb_range(&vma, start, end);
-		/* now flush the virt. page-table area mapping the address range: */
-		flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
-	}
-
-	/* lastly, release the freed pages */
-	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(void *);
-	}
-}
-
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
-	tlb->fullmm = full_mm_flush;
-	tlb->start_addr = ~0UL;
-}
-
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	/*
-	 * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
-	 * tlb->end_addr.
-	 */
-	ia64_tlb_flush_mmu(tlb, start, end);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Logically, this routine frees PAGE.  On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
-	if (!tlb->nr && tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-
-	tlb->pages[tlb->nr++] = page;
-	VM_BUG_ON(tlb->nr > tlb->max);
-
-	return tlb->max - tlb->nr;
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (!__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS.  This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start_addr == ~0UL)
-		tlb->start_addr = address;
-	tlb->end_addr = address + PAGE_SIZE;
-}
-
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 #define tlb_migrate_finish(mm)	platform_tlb_migrate_finish(mm)
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
-} while (0)
-
-#define pte_free_tlb(tlb, ptep, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pte_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pmd_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pud_free_tlb(tlb, pudp, address);		\
-} while (0)
+#include <asm-generic/tlb.h>
 
 #endif /* _ASM_IA64_TLB_H */
--- a/arch/ia64/include/asm/tlbflush.h
+++ b/arch/ia64/include/asm/tlbflush.h
@@ -13,6 +13,31 @@
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 
+struct ia64_tr_entry {
+	u64 ifa;
+	u64 itir;
+	u64 pte;
+	u64 rr;
+}; /*Record for tr entry!*/
+
+extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
+extern void ia64_ptr_entry(u64 target_mask, int slot);
+extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
+
+/*
+ region register macros
+*/
+#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
+#define RR_VE(val)     (((val) & 0x0000000000000001) << 0)
+#define RR_VE_MASK     0x0000000000000001L
+#define RR_VE_SHIFT    0
+#define RR_TO_PS(val)  (((val) >> 2) & 0x000000000000003f)
+#define RR_PS(val)     (((val) & 0x000000000000003f) << 2)
+#define RR_PS_MASK     0x00000000000000fcL
+#define RR_PS_SHIFT    2
+#define RR_RID_MASK    0x00000000ffffff00L
+#define RR_TO_RID(val)         ((val >> 8) & 0xffffff)
+
 /*
  * Now for some TLB flushing routines.  This is the kind of stuff that
  * can be very expensive, so try to avoid them whenever possible.
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -297,9 +297,8 @@ local_flush_tlb_all (void)
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
 
-void
-flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
-		 unsigned long end)
+void __flush_tlb_range(struct vm_area_struct *vma,
+		  unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long size = end - start;
@@ -335,6 +334,25 @@ flush_tlb_range (struct vm_area_struct *
 	preempt_enable();
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
+
+void flush_tlb_range(struct vm_area_struct *vma,
+		     unsigned long start, unsigned long end)
+{
+	if (unlikely(end - start >= 1024*1024*1024*1024UL
+			|| REGION_NUMBER(start) != REGION_NUMBER(end - 1))) {
+		/*
+		 * If we flush more than a tera-byte or across regions, we're
+		 * probably better off just flushing the entire TLB(s).  This
+		 * should be very rare and is not worth optimizing for.
+		 */
+		flush_tlb_all();
+	} else {
+		/* flush the address range from the tlb */
+		__flush_tlb_range(vma, start, end);
+		/* flush the virt. page-table area mapping the addr range */
+		__flush_tlb_range(vma, ia64_thash(start), ia64_thash(end));
+	}
+}
 EXPORT_SYMBOL(flush_tlb_range);
 
 void __devinit


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 13/20] mm, ia64: Convert ia64 to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mm-ia64-tlb-range.patch --]
[-- Type: text/plain, Size: 10331 bytes --]

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/ia64/Kconfig                |    1 
 arch/ia64/include/asm/tlb.h      |  233 ---------------------------------------
 arch/ia64/include/asm/tlbflush.h |   25 ++++
 arch/ia64/mm/tlb.c               |   24 +++-
 4 files changed, 49 insertions(+), 234 deletions(-)
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -28,6 +28,7 @@ config IA64
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
+	select HAVE_MMU_GATHER_RANGE
 	select IRQ_PER_CPU
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_OPTIONAL_GPIOLIB
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,238 +46,9 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
-/*
- * If we can't allocate a page to make a big batch of page pointers
- * to work on, then just handle a few from the on-stack structure.
- */
-#define	IA64_GATHER_BUNDLE	8
-
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
-	unsigned int		max;
-	unsigned char		fullmm;		/* non-zero means full mm flush */
-	unsigned char		need_flush;	/* really unmapped some PTEs? */
-	unsigned long		start_addr;
-	unsigned long		end_addr;
-	struct page		**pages;
-	struct page		*local[IA64_GATHER_BUNDLE];
-};
-
-struct ia64_tr_entry {
-	u64 ifa;
-	u64 itir;
-	u64 pte;
-	u64 rr;
-}; /*Record for tr entry!*/
-
-extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
-extern void ia64_ptr_entry(u64 target_mask, int slot);
-
-extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
-
-/*
- region register macros
-*/
-#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
-#define RR_VE(val)	(((val) & 0x0000000000000001) << 0)
-#define RR_VE_MASK	0x0000000000000001L
-#define RR_VE_SHIFT	0
-#define RR_TO_PS(val)	(((val) >> 2) & 0x000000000000003f)
-#define RR_PS(val)	(((val) & 0x000000000000003f) << 2)
-#define RR_PS_MASK	0x00000000000000fcL
-#define RR_PS_SHIFT	2
-#define RR_RID_MASK	0x00000000ffffff00L
-#define RR_TO_RID(val) 	((val >> 8) & 0xffffff)
-
-/*
- * Flush the TLB for address range START to END and, if not in fast mode, release the
- * freed pages that where gathered up to this point.
- */
-static inline void
-ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	unsigned int nr;
-
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-
-	if (tlb->fullmm) {
-		/*
-		 * Tearing down the entire address space.  This happens both as a result
-		 * of exit() and execve().  The latter case necessitates the call to
-		 * flush_tlb_mm() here.
-		 */
-		flush_tlb_mm(tlb->mm);
-	} else if (unlikely (end - start >= 1024*1024*1024*1024UL
-			     || REGION_NUMBER(start) != REGION_NUMBER(end - 1)))
-	{
-		/*
-		 * If we flush more than a tera-byte or across regions, we're probably
-		 * better off just flushing the entire TLB(s).  This should be very rare
-		 * and is not worth optimizing for.
-		 */
-		flush_tlb_all();
-	} else {
-		/*
-		 * XXX fix me: flush_tlb_range() should take an mm pointer instead of a
-		 * vma pointer.
-		 */
-		struct vm_area_struct vma;
-
-		vma.vm_mm = tlb->mm;
-		/* flush the address range from the tlb: */
-		flush_tlb_range(&vma, start, end);
-		/* now flush the virt. page-table area mapping the address range: */
-		flush_tlb_range(&vma, ia64_thash(start), ia64_thash(end));
-	}
-
-	/* lastly, release the freed pages */
-	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
-}
-
-static inline void __tlb_alloc_page(struct mmu_gather *tlb)
-{
-	unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
-
-	if (addr) {
-		tlb->pages = (void *)addr;
-		tlb->max = PAGE_SIZE / sizeof(void *);
-	}
-}
-
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->max = ARRAY_SIZE(tlb->local);
-	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
-	tlb->fullmm = full_mm_flush;
-	tlb->start_addr = ~0UL;
-}
-
-/*
- * Called at the end of the shootdown operation to free up any resources that were
- * collected.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	/*
-	 * Note: tlb->nr may be 0 at this point, so we can't rely on tlb->start_addr and
-	 * tlb->end_addr.
-	 */
-	ia64_tlb_flush_mmu(tlb, start, end);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-
-	if (tlb->pages != tlb->local)
-		free_pages((unsigned long)tlb->pages, 0);
-}
-
-/*
- * Logically, this routine frees PAGE.  On MP machines, the actual freeing of the page
- * must be delayed until after the TLB has been flushed (see comments at the beginning of
- * this file).
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
-	if (!tlb->nr && tlb->pages == tlb->local)
-		__tlb_alloc_page(tlb);
-
-	tlb->pages[tlb->nr++] = page;
-	VM_BUG_ON(tlb->nr > tlb->max);
-
-	return tlb->max - tlb->nr;
-}
-
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	ia64_tlb_flush_mmu(tlb, tlb->start_addr, tlb->end_addr);
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	if (!__tlb_remove_page(tlb, page))
-		tlb_flush_mmu(tlb);
-}
-
-/*
- * Remove TLB entry for PTE mapped at virtual address ADDRESS.  This is called for any
- * PTE, not just those pointing to (normal) physical memory.
- */
-static inline void
-__tlb_remove_tlb_entry (struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start_addr == ~0UL)
-		tlb->start_addr = address;
-	tlb->end_addr = address + PAGE_SIZE;
-}
-
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 #define tlb_migrate_finish(mm)	platform_tlb_migrate_finish(mm)
 
-#define tlb_start_vma(tlb, vma)			do { } while (0)
-#define tlb_end_vma(tlb, vma)			do { } while (0)
-
-#define tlb_remove_tlb_entry(tlb, ptep, addr)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__tlb_remove_tlb_entry(tlb, ptep, addr);	\
-} while (0)
-
-#define pte_free_tlb(tlb, ptep, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pte_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pmd_free_tlb(tlb, ptep, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pmd_free_tlb(tlb, ptep, address);		\
-} while (0)
-
-#define pud_free_tlb(tlb, pudp, address, end)		\
-do {							\
-	tlb->need_flush = 1;				\
-	__pud_free_tlb(tlb, pudp, address);		\
-} while (0)
+#include <asm-generic/tlb.h>
 
 #endif /* _ASM_IA64_TLB_H */
--- a/arch/ia64/include/asm/tlbflush.h
+++ b/arch/ia64/include/asm/tlbflush.h
@@ -13,6 +13,31 @@
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 
+struct ia64_tr_entry {
+	u64 ifa;
+	u64 itir;
+	u64 pte;
+	u64 rr;
+}; /*Record for tr entry!*/
+
+extern int ia64_itr_entry(u64 target_mask, u64 va, u64 pte, u64 log_size);
+extern void ia64_ptr_entry(u64 target_mask, int slot);
+extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
+
+/*
+ region register macros
+*/
+#define RR_TO_VE(val)   (((val) >> 0) & 0x0000000000000001)
+#define RR_VE(val)     (((val) & 0x0000000000000001) << 0)
+#define RR_VE_MASK     0x0000000000000001L
+#define RR_VE_SHIFT    0
+#define RR_TO_PS(val)  (((val) >> 2) & 0x000000000000003f)
+#define RR_PS(val)     (((val) & 0x000000000000003f) << 2)
+#define RR_PS_MASK     0x00000000000000fcL
+#define RR_PS_SHIFT    2
+#define RR_RID_MASK    0x00000000ffffff00L
+#define RR_TO_RID(val)         ((val >> 8) & 0xffffff)
+
 /*
  * Now for some TLB flushing routines.  This is the kind of stuff that
  * can be very expensive, so try to avoid them whenever possible.
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -297,9 +297,8 @@ local_flush_tlb_all (void)
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
 
-void
-flush_tlb_range (struct vm_area_struct *vma, unsigned long start,
-		 unsigned long end)
+void __flush_tlb_range(struct vm_area_struct *vma,
+		  unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long size = end - start;
@@ -335,6 +334,25 @@ flush_tlb_range (struct vm_area_struct *
 	preempt_enable();
 	ia64_srlz_i();			/* srlz.i implies srlz.d */
 }
+
+void flush_tlb_range(struct vm_area_struct *vma,
+		     unsigned long start, unsigned long end)
+{
+	if (unlikely(end - start >= 1024*1024*1024*1024UL
+			|| REGION_NUMBER(start) != REGION_NUMBER(end - 1))) {
+		/*
+		 * If we flush more than a tera-byte or across regions, we're
+		 * probably better off just flushing the entire TLB(s).  This
+		 * should be very rare and is not worth optimizing for.
+		 */
+		flush_tlb_all();
+	} else {
+		/* flush the address range from the tlb */
+		__flush_tlb_range(vma, start, end);
+		/* flush the virt. page-table area mapping the addr range */
+		__flush_tlb_range(vma, ia64_thash(start), ia64_thash(end));
+	}
+}
 EXPORT_SYMBOL(flush_tlb_range);
 
 void __devinit


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 14/20] mm, sh: Convert sh to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mm-sh-tlb-range.patch --]
[-- Type: text/plain, Size: 3973 bytes --]

Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sh/Kconfig           |    1 
 arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
 2 files changed, 6 insertions(+), 93 deletions(-)
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -28,6 +28,7 @@ config SUPERH
 	select IRQ_FORCED_THREADING
 	select RTC_LIB
 	select GENERIC_ATOMIC64
+	select HAVE_MMU_GATHER_RANGE if MMU
 	select GENERIC_IRQ_SHOW
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_CLOCKEVENTS
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -10,100 +10,14 @@
 
 #ifdef CONFIG_MMU
 #include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	unsigned long		start, end;
-};
 
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-
-	init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	if (tlb->fullmm)
-		flush_tlb_mm(tlb->mm);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm && tlb->end) {
-		flush_tlb_range(vma, tlb->start, tlb->end);
-		init_tlb_gather(tlb);
-	}
-}
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
+#define __pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
+#define __pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
 
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-#define pte_free_tlb(tlb, ptep, addr, end)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr, end)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
+#include <asm-generic/tlb.h>
 
 #if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
 extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -123,8 +37,6 @@ static inline void tlb_unwire_entry(void
 
 #else /* CONFIG_MMU */
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 14/20] mm, sh: Convert sh to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mm-sh-tlb-range.patch --]
[-- Type: text/plain, Size: 3973 bytes --]

Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sh/Kconfig           |    1 
 arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
 2 files changed, 6 insertions(+), 93 deletions(-)
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -28,6 +28,7 @@ config SUPERH
 	select IRQ_FORCED_THREADING
 	select RTC_LIB
 	select GENERIC_ATOMIC64
+	select HAVE_MMU_GATHER_RANGE if MMU
 	select GENERIC_IRQ_SHOW
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_CLOCKEVENTS
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -10,100 +10,14 @@
 
 #ifdef CONFIG_MMU
 #include <linux/swap.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-
-/*
- * TLB handling.  This allows us to remove pages from the page
- * tables, and efficiently handle the TLB issues.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		fullmm;
-	unsigned long		start, end;
-};
 
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-
-	init_tlb_gather(tlb);
-}
-
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	if (tlb->fullmm)
-		flush_tlb_mm(tlb->mm);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-static inline void
-tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-/*
- * In the case of tlb vma handling, we can optimise these away in the
- * case where we're doing a full MM flush.  When we're doing a munmap,
- * the vmas are adjusted to only cover the region to be torn down.
- */
-static inline void
-tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm)
-		flush_cache_range(vma, vma->vm_start, vma->vm_end);
-}
-
-static inline void
-tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
-{
-	if (!tlb->fullmm && tlb->end) {
-		flush_tlb_range(vma, tlb->start, tlb->end);
-		init_tlb_gather(tlb);
-	}
-}
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-}
+#define __pte_free_tlb(tlb, ptep, addr)	pte_free((tlb)->mm, ptep)
+#define __pmd_free_tlb(tlb, pmdp, addr)	pmd_free((tlb)->mm, pmdp)
+#define __pud_free_tlb(tlb, pudp, addr)	pud_free((tlb)->mm, pudp)
 
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-#define pte_free_tlb(tlb, ptep, addr, end)	pte_free((tlb)->mm, ptep)
-#define pmd_free_tlb(tlb, pmdp, addr, end)	pmd_free((tlb)->mm, pmdp)
-#define pud_free_tlb(tlb, pudp, addr, end)	pud_free((tlb)->mm, pudp)
-
-#define tlb_migrate_finish(mm)		do { } while (0)
+#include <asm-generic/tlb.h>
 
 #if defined(CONFIG_CPU_SH4) || defined(CONFIG_SUPERH64)
 extern void tlb_wire_entry(struct vm_area_struct *, unsigned long, pte_t);
@@ -123,8 +37,6 @@ static inline void tlb_unwire_entry(void
 
 #else /* CONFIG_MMU */
 
-#define tlb_start_vma(tlb, vma)				do { } while (0)
-#define tlb_end_vma(tlb, vma)				do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, pte, address)	do { } while (0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 15/20] mm, um: Convert um to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: um-tlb-range.patch --]
[-- Type: text/plain, Size: 4582 bytes --]

Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/um/Kconfig.common    |    1 
 arch/um/include/asm/tlb.h |  111 +---------------------------------------------
 arch/um/kernel/tlb.c      |   13 -----
 3 files changed, 4 insertions(+), 121 deletions(-)
--- a/arch/um/Kconfig.common
+++ b/arch/um/Kconfig.common
@@ -11,6 +11,7 @@ config UML
 	select GENERIC_CPU_DEVICES
 	select GENERIC_IO
 	select GENERIC_CLOCKEVENTS
+	select HAVE_MMU_GATHER_RANGE
 
 config MMU
 	bool
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -7,114 +7,9 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		need_flush; /* Really unmapped some ptes? */
-	unsigned long		start;
-	unsigned long		end;
-	unsigned int		fullmm; /* non-zero means full mm flush */
-};
-
-static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
-					  unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->need_flush = 0;
-
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-
-	init_tlb_gather(tlb);
-}
-
-extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			       unsigned long end);
-
-static inline void
-tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-
-	flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end);
-	init_tlb_gather(tlb);
-}
-
-/* tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-/* tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)),
- *	while handling the additional races in SMP caused by other CPUs
- *	caching valid mappings in their TLBs.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-/**
- * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
- *
- * Record the fact that pte's were really umapped in ->need_flush, so we can
- * later optimise away the tlb invalidate.   This helps when userspace is
- * unmapping already-unmapped pages, which happens quite a lot.
- */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
-	do {							\
-		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
-	} while (0)
-
-#define pte_free_tlb(tlb, ptep, addr, end) __pte_free_tlb(tlb, ptep, addr)
-
-#define pud_free_tlb(tlb, pudp, addr, end) __pud_free_tlb(tlb, pudp, addr)
-
-#define pmd_free_tlb(tlb, pmdp, addr, end) __pmd_free_tlb(tlb, pmdp, addr)
-
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 #define tlb_migrate_finish(mm) do {} while (0)
 
+#include <asm-generic/tlb.h>
+
 #endif
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -502,19 +502,6 @@ void flush_tlb_range(struct vm_area_stru
 }
 EXPORT_SYMBOL(flush_tlb_range);
 
-void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			unsigned long end)
-{
-	/*
-	 * Don't bother flushing if this address space is about to be
-	 * destroyed.
-	 */
-	if (atomic_read(&mm->mm_users) == 0)
-		return;
-
-	fix_range(mm, start, end, 0);
-}
-
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma = mm->mmap;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 15/20] mm, um: Convert um to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: um-tlb-range.patch --]
[-- Type: text/plain, Size: 4582 bytes --]

Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/um/Kconfig.common    |    1 
 arch/um/include/asm/tlb.h |  111 +---------------------------------------------
 arch/um/kernel/tlb.c      |   13 -----
 3 files changed, 4 insertions(+), 121 deletions(-)
--- a/arch/um/Kconfig.common
+++ b/arch/um/Kconfig.common
@@ -11,6 +11,7 @@ config UML
 	select GENERIC_CPU_DEVICES
 	select GENERIC_IO
 	select GENERIC_CLOCKEVENTS
+	select HAVE_MMU_GATHER_RANGE
 
 config MMU
 	bool
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -7,114 +7,9 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-#define tlb_start_vma(tlb, vma) do { } while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
-#define tlb_flush(tlb) flush_tlb_mm((tlb)->mm)
-
-/* struct mmu_gather is an opaque type used by the mm code for passing around
- * any data needed by arch specific code for tlb_remove_page.
- */
-struct mmu_gather {
-	struct mm_struct	*mm;
-	unsigned int		need_flush; /* Really unmapped some ptes? */
-	unsigned long		start;
-	unsigned long		end;
-	unsigned int		fullmm; /* non-zero means full mm flush */
-};
-
-static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
-					  unsigned long address)
-{
-	if (tlb->start > address)
-		tlb->start = address;
-	if (tlb->end < address + PAGE_SIZE)
-		tlb->end = address + PAGE_SIZE;
-}
-
-static inline void init_tlb_gather(struct mmu_gather *tlb)
-{
-	tlb->need_flush = 0;
-
-	tlb->start = TASK_SIZE;
-	tlb->end = 0;
-
-	if (tlb->fullmm) {
-		tlb->start = 0;
-		tlb->end = TASK_SIZE;
-	}
-}
-
-static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
-{
-	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
-
-	init_tlb_gather(tlb);
-}
-
-extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			       unsigned long end);
-
-static inline void
-tlb_flush_mmu(struct mmu_gather *tlb)
-{
-	if (!tlb->need_flush)
-		return;
-
-	flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end);
-	init_tlb_gather(tlb);
-}
-
-/* tlb_finish_mmu
- *	Called at the end of the shootdown operation to free up any resources
- *	that were required.
- */
-static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
-{
-	tlb_flush_mmu(tlb);
-
-	/* keep the page table cache within bounds */
-	check_pgt_cache();
-}
-
-/* tlb_remove_page
- *	Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)),
- *	while handling the additional races in SMP caused by other CPUs
- *	caching valid mappings in their TLBs.
- */
-static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	tlb->need_flush = 1;
-	free_page_and_swap_cache(page);
-	return 1; /* avoid calling tlb_flush_mmu */
-}
-
-static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
-{
-	__tlb_remove_page(tlb, page);
-}
-
-/**
- * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation.
- *
- * Record the fact that pte's were really umapped in ->need_flush, so we can
- * later optimise away the tlb invalidate.   This helps when userspace is
- * unmapping already-unmapped pages, which happens quite a lot.
- */
-#define tlb_remove_tlb_entry(tlb, ptep, address)		\
-	do {							\
-		tlb->need_flush = 1;				\
-		__tlb_remove_tlb_entry(tlb, ptep, address);	\
-	} while (0)
-
-#define pte_free_tlb(tlb, ptep, addr, end) __pte_free_tlb(tlb, ptep, addr)
-
-#define pud_free_tlb(tlb, pudp, addr, end) __pud_free_tlb(tlb, pudp, addr)
-
-#define pmd_free_tlb(tlb, pmdp, addr, end) __pmd_free_tlb(tlb, pmdp, addr)
-
+#define __tlb_remove_tlb_entry(tlb, ptep, addr) do { } while (0)
 #define tlb_migrate_finish(mm) do {} while (0)
 
+#include <asm-generic/tlb.h>
+
 #endif
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -502,19 +502,6 @@ void flush_tlb_range(struct vm_area_stru
 }
 EXPORT_SYMBOL(flush_tlb_range);
 
-void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-			unsigned long end)
-{
-	/*
-	 * Don't bother flushing if this address space is about to be
-	 * destroyed.
-	 */
-	if (atomic_read(&mm->mm_users) == 0)
-		return;
-
-	fix_range(mm, start, end, 0);
-}
-
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma = mm->mmap;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 16/20] mm, avr32: Convert avr32 to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: avr32-mmu_range.patch --]
[-- Type: text/plain, Size: 1251 bytes --]

Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/avr32/Kconfig           |    1 +
 arch/avr32/include/asm/tlb.h |    6 ------
 2 files changed, 1 insertion(+), 6 deletions(-)
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -14,6 +14,7 @@ config AVR32
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_CLOCKEVENTS
+	select HAVE_MMU_GATHER_RANGE
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
--- a/arch/avr32/include/asm/tlb.h
+++ b/arch/avr32/include/asm/tlb.h
@@ -8,12 +8,6 @@
 #ifndef __ASM_AVR32_TLB_H
 #define __ASM_AVR32_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-	flush_cache_range(vma, vma->vm_start, vma->vm_end)
-
-#define tlb_end_vma(tlb, vma) \
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while(0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 16/20] mm, avr32: Convert avr32 to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: avr32-mmu_range.patch --]
[-- Type: text/plain, Size: 1251 bytes --]

Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/avr32/Kconfig           |    1 +
 arch/avr32/include/asm/tlb.h |    6 ------
 2 files changed, 1 insertion(+), 6 deletions(-)
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -14,6 +14,7 @@ config AVR32
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_CLOCKEVENTS
+	select HAVE_MMU_GATHER_RANGE
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
--- a/arch/avr32/include/asm/tlb.h
+++ b/arch/avr32/include/asm/tlb.h
@@ -8,12 +8,6 @@
 #ifndef __ASM_AVR32_TLB_H
 #define __ASM_AVR32_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-	flush_cache_range(vma, vma->vm_start, vma->vm_end)
-
-#define tlb_end_vma(tlb, vma) \
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) do { } while(0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 17/20] mm, mips: Convert mips to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: mips-mmu_range.patch --]
[-- Type: text/plain, Size: 1251 bytes --]

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/mips/Kconfig           |    1 +
 arch/mips/include/asm/tlb.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -34,6 +34,7 @@ config MIPS
 	select BUILDTIME_EXTABLE_SORT
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_CMOS_UPDATE
+	select HAVE_MMU_GATHER_RANGE
 
 menu "Machine selection"
 
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -1,16 +1,6 @@
 #ifndef __ASM_TLB_H
 #define __ASM_TLB_H
 
-/*
- * MIPS doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma) 				\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 17/20] mm, mips: Convert mips to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: mips-mmu_range.patch --]
[-- Type: text/plain, Size: 1251 bytes --]

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/mips/Kconfig           |    1 +
 arch/mips/include/asm/tlb.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -34,6 +34,7 @@ config MIPS
 	select BUILDTIME_EXTABLE_SORT
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_CMOS_UPDATE
+	select HAVE_MMU_GATHER_RANGE
 
 menu "Machine selection"
 
--- a/arch/mips/include/asm/tlb.h
+++ b/arch/mips/include/asm/tlb.h
@@ -1,16 +1,6 @@
 #ifndef __ASM_TLB_H
 #define __ASM_TLB_H
 
-/*
- * MIPS doesn't need any special per-pte or per-vma handling, except
- * we need to flush cache for area to be unmapped.
- */
-#define tlb_start_vma(tlb, vma) 				\
-	do {							\
-		if (!tlb->fullmm)				\
-			flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-	}  while (0)
-#define tlb_end_vma(tlb, vma) do { } while (0)
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 
 #include <asm-generic/tlb.h>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 18/20] mm, parisc: Convert parisc to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: parisc-mmu_range.patch --]
[-- Type: text/plain, Size: 1282 bytes --]

Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/parisc/Kconfig           |    1 +
 arch/parisc/include/asm/tlb.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -19,6 +19,7 @@ config PARISC
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
+	select HAVE_MMU_GATHER_RANGE
 
 	help
 	  The PA-RISC microprocessor is designed by Hewlett-Packard and used
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -1,16 +1,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {	if (!(tlb)->fullmm)	\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
-#define tlb_end_vma(tlb, vma)	\
-do {	if (!(tlb)->fullmm)	\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 18/20] mm, parisc: Convert parisc to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: parisc-mmu_range.patch --]
[-- Type: text/plain, Size: 1282 bytes --]

Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/parisc/Kconfig           |    1 +
 arch/parisc/include/asm/tlb.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -19,6 +19,7 @@ config PARISC
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
+	select HAVE_MMU_GATHER_RANGE
 
 	help
 	  The PA-RISC microprocessor is designed by Hewlett-Packard and used
--- a/arch/parisc/include/asm/tlb.h
+++ b/arch/parisc/include/asm/tlb.h
@@ -1,16 +1,6 @@
 #ifndef _PARISC_TLB_H
 #define _PARISC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {	if (!(tlb)->fullmm)	\
-		flush_cache_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
-#define tlb_end_vma(tlb, vma)	\
-do {	if (!(tlb)->fullmm)	\
-		flush_tlb_range(vma, vma->vm_start, vma->vm_end); \
-} while (0)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 19/20] mm, sparc32: Convert sparc32 to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:15   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: sparc32-mmu_range.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sparc/Kconfig              |    1 +
 arch/sparc/include/asm/tlb_32.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -41,6 +41,7 @@ config SPARC32
 	def_bool !64BIT
 	select GENERIC_ATOMIC64
 	select CLZ_TAB
+	select HAVE_MMU_GATHER_RANGE
 
 config SPARC64
 	def_bool 64BIT
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -1,16 +1,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {								\
-	flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define tlb_end_vma(tlb, vma) \
-do {								\
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 19/20] mm, sparc32: Convert sparc32 to generic tlb
@ 2012-06-27 21:15   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:15 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: sparc32-mmu_range.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sparc/Kconfig              |    1 +
 arch/sparc/include/asm/tlb_32.h |   10 ----------
 2 files changed, 1 insertion(+), 10 deletions(-)
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -41,6 +41,7 @@ config SPARC32
 	def_bool !64BIT
 	select GENERIC_ATOMIC64
 	select CLZ_TAB
+	select HAVE_MMU_GATHER_RANGE
 
 config SPARC64
 	def_bool 64BIT
--- a/arch/sparc/include/asm/tlb_32.h
+++ b/arch/sparc/include/asm/tlb_32.h
@@ -1,16 +1,6 @@
 #ifndef _SPARC_TLB_H
 #define _SPARC_TLB_H
 
-#define tlb_start_vma(tlb, vma) \
-do {								\
-	flush_cache_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
-#define tlb_end_vma(tlb, vma) \
-do {								\
-	flush_tlb_range(vma, vma->vm_start, vma->vm_end);	\
-} while (0)
-
 #define __tlb_remove_tlb_entry(tlb, pte, address) \
 	do { } while (0)
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 20/20] mm, xtensa: Convert xtensa to generic tlb
  2012-06-27 21:15 ` Peter Zijlstra
@ 2012-06-27 21:16   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:16 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

[-- Attachment #1: xtensa-mmu_range.patch --]
[-- Type: text/plain, Size: 1955 bytes --]

Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/xtensa/Kconfig           |    1 +
 arch/xtensa/include/asm/tlb.h |   23 -----------------------
 arch/xtensa/mm/tlb.c          |    2 +-
 3 files changed, 2 insertions(+), 24 deletions(-)
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -10,6 +10,7 @@ config XTENSA
 	select HAVE_GENERIC_HARDIRQS
 	select GENERIC_IRQ_SHOW
 	select GENERIC_CPU_DEVICES
+	select HAVE_MMU_GATHER_RANGE
 	help
 	  Xtensa processors are 32-bit RISC machines designed by Tensilica
 	  primarily for embedded systems.  These processors are both
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -14,29 +14,6 @@
 #include <asm/cache.h>
 #include <asm/page.h>
 
-#if (DCACHE_WAY_SIZE <= PAGE_SIZE)
-
-/* Note, read http://lkml.org/lkml/2004/1/15/6 */
-
-# define tlb_start_vma(tlb,vma)			do { } while (0)
-# define tlb_end_vma(tlb,vma)			do { } while (0)
-
-#else
-
-# define tlb_start_vma(tlb, vma)					      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_cache_range(vma, vma->vm_start, vma->vm_end);   \
-	} while(0)
-
-# define tlb_end_vma(tlb, vma)						      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end);     \
-	} while(0)
-
-#endif
-
 #define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
 
 #include <asm-generic/tlb.h>
--- a/arch/xtensa/mm/tlb.c
+++ b/arch/xtensa/mm/tlb.c
@@ -63,7 +63,7 @@ void flush_tlb_all (void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (mm == current->active_mm) {
-		int flags;
+		unsigned long flags;
 		local_save_flags(flags);
 		__get_new_mmu_context(mm);
 		__load_mmu_context(mm);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [PATCH 20/20] mm, xtensa: Convert xtensa to generic tlb
@ 2012-06-27 21:16   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 21:16 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, akpm, Linus Torvalds, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, Peter Zijlstra,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

[-- Attachment #1: xtensa-mmu_range.patch --]
[-- Type: text/plain, Size: 1955 bytes --]

Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/xtensa/Kconfig           |    1 +
 arch/xtensa/include/asm/tlb.h |   23 -----------------------
 arch/xtensa/mm/tlb.c          |    2 +-
 3 files changed, 2 insertions(+), 24 deletions(-)
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -10,6 +10,7 @@ config XTENSA
 	select HAVE_GENERIC_HARDIRQS
 	select GENERIC_IRQ_SHOW
 	select GENERIC_CPU_DEVICES
+	select HAVE_MMU_GATHER_RANGE
 	help
 	  Xtensa processors are 32-bit RISC machines designed by Tensilica
 	  primarily for embedded systems.  These processors are both
--- a/arch/xtensa/include/asm/tlb.h
+++ b/arch/xtensa/include/asm/tlb.h
@@ -14,29 +14,6 @@
 #include <asm/cache.h>
 #include <asm/page.h>
 
-#if (DCACHE_WAY_SIZE <= PAGE_SIZE)
-
-/* Note, read http://lkml.org/lkml/2004/1/15/6 */
-
-# define tlb_start_vma(tlb,vma)			do { } while (0)
-# define tlb_end_vma(tlb,vma)			do { } while (0)
-
-#else
-
-# define tlb_start_vma(tlb, vma)					      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_cache_range(vma, vma->vm_start, vma->vm_end);   \
-	} while(0)
-
-# define tlb_end_vma(tlb, vma)						      \
-	do {								      \
-		if (!tlb->fullmm)					      \
-			flush_tlb_range(vma, vma->vm_start, vma->vm_end);     \
-	} while(0)
-
-#endif
-
 #define __tlb_remove_tlb_entry(tlb,pte,addr)	do { } while (0)
 
 #include <asm-generic/tlb.h>
--- a/arch/xtensa/mm/tlb.c
+++ b/arch/xtensa/mm/tlb.c
@@ -63,7 +63,7 @@ void flush_tlb_all (void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (mm == current->active_mm) {
-		int flags;
+		unsigned long flags;
 		local_save_flags(flags);
 		__get_new_mmu_context(mm);
 		__load_mmu_context(mm);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
  2012-06-27 21:15   ` Peter Zijlstra
@ 2012-06-27 22:13     ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

On Wed, 2012-06-27 at 23:15 +0200, Peter Zijlstra wrote:
> 
> S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
> before __tlb_remove_page() because its ptep_get_and_clear*() family
> already does a full TLB invalidate. Therefore force it to use
> tlb_fast_mode. 

On that.. ptep_get_and_clear() says:

/*                                                                                             
 * This is hard to understand. ptep_get_and_clear and ptep_clear_flush                         
 * both clear the TLB for the unmapped pte. The reason is that                                 
 * ptep_get_and_clear is used in common code (e.g. change_pte_range)                           
 * to modify an active pte. The sequence is                                                    
 *   1) ptep_get_and_clear                                                                     
 *   2) set_pte_at                                                                             
 *   3) flush_tlb_range                                                                        
 * On s390 the tlb needs to get flushed with the modification of the pte                       
 * if the pte is active. The only way how this can be implemented is to                        
 * have ptep_get_and_clear do the tlb flush. In exchange flush_tlb_range                       
 * is a nop.                                                                                   
 */ 

I think there is another way, arch_{enter,leave}_lazy_mmu_mode() seems
to wrap these sites so you can do as SPARC64 and PPC do and batch
through there.

That should save a number of TLB invalidates..


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
@ 2012-06-27 22:13     ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 23:15 +0200, Peter Zijlstra wrote:
> 
> S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
> before __tlb_remove_page() because its ptep_get_and_clear*() family
> already does a full TLB invalidate. Therefore force it to use
> tlb_fast_mode. 

On that.. ptep_get_and_clear() says:

/*                                                                                             
 * This is hard to understand. ptep_get_and_clear and ptep_clear_flush                         
 * both clear the TLB for the unmapped pte. The reason is that                                 
 * ptep_get_and_clear is used in common code (e.g. change_pte_range)                           
 * to modify an active pte. The sequence is                                                    
 *   1) ptep_get_and_clear                                                                     
 *   2) set_pte_at                                                                             
 *   3) flush_tlb_range                                                                        
 * On s390 the tlb needs to get flushed with the modification of the pte                       
 * if the pte is active. The only way how this can be implemented is to                        
 * have ptep_get_and_clear do the tlb flush. In exchange flush_tlb_range                       
 * is a nop.                                                                                   
 */ 

I think there is another way, arch_{enter,leave}_lazy_mmu_mode() seems
to wrap these sites so you can do as SPARC64 and PPC do and batch
through there.

That should save a number of TLB invalidates..


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 21:15   ` Peter Zijlstra
@ 2012-06-27 22:23     ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 22:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Certain architectures (viz. x86, arm, s390) have hardware page-table
> walkers (#PF). So during the RCU page-table teardown process make sure
> we do a tlb flush of page-table pages on all relevant CPUs to
> synchronize against hardware walkers, and then free the pages.

NACK.

Why would hw page table walkers be that special? Plus your config
option is horribly done anyway, where you do it as some kind of
"default y" and then have complex conditionals on it.

Plus it really isn't about hardware page table walkers at all. It's
more about the possibility of speculative TLB fils, it has nothing to
do with *how* they are done. Sure, it's likely that a software
pagetable walker wouldn't be something that gets called speculatively,
but it's not out of the question.

So I think your config option is totally mis-designed and actively
misleading. It's also horrible from a design standpoint, since it's
entirely possible that some day POWERPC will actually see the light
and do speculative TLB fills etc.

So *if* this needs to be done, it needs to be done right. That means:

 - don't talk about HW walking, since it's not about that

 - don't say "if you have speculative walkers", and use an ifndef. Say
"If you can *guarantee* that nothing else walks page tables
speculatively, and we have only one thread that owns the mmu, and that
one thread is us, *then* we can do this optimization". So switch the
config option around.

 - make it a per-architecture thing to say "I guarantee that I never
fill the TLB speculatively". Don't do that "default y" with complex
conditionals crap.

IOW, if Sparc/PPC really want to guarantee that they never fill TLB
entries speculatively, and that if we are in a kernel thread they will
*never* fill the TLB with anything else, then make them enable
CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
files.

Not like this patch. And not with the misleading names and comments.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 22:23     ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 22:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Certain architectures (viz. x86, arm, s390) have hardware page-table
> walkers (#PF). So during the RCU page-table teardown process make sure
> we do a tlb flush of page-table pages on all relevant CPUs to
> synchronize against hardware walkers, and then free the pages.

NACK.

Why would hw page table walkers be that special? Plus your config
option is horribly done anyway, where you do it as some kind of
"default y" and then have complex conditionals on it.

Plus it really isn't about hardware page table walkers at all. It's
more about the possibility of speculative TLB fils, it has nothing to
do with *how* they are done. Sure, it's likely that a software
pagetable walker wouldn't be something that gets called speculatively,
but it's not out of the question.

So I think your config option is totally mis-designed and actively
misleading. It's also horrible from a design standpoint, since it's
entirely possible that some day POWERPC will actually see the light
and do speculative TLB fills etc.

So *if* this needs to be done, it needs to be done right. That means:

 - don't talk about HW walking, since it's not about that

 - don't say "if you have speculative walkers", and use an ifndef. Say
"If you can *guarantee* that nothing else walks page tables
speculatively, and we have only one thread that owns the mmu, and that
one thread is us, *then* we can do this optimization". So switch the
config option around.

 - make it a per-architecture thing to say "I guarantee that I never
fill the TLB speculatively". Don't do that "default y" with complex
conditionals crap.

IOW, if Sparc/PPC really want to guarantee that they never fill TLB
entries speculatively, and that if we are in a kernel thread they will
*never* fill the TLB with anything else, then make them enable
CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
files.

Not like this patch. And not with the misleading names and comments.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 21:15   ` Peter Zijlstra
@ 2012-06-27 22:26     ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 22:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt

On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> This originated from s390 which does something similar and would allow
> s390 to use the generic TLB flushing code.
>
> The idea is to flush the mm wide cache and tlb a priory and not bother
> with multiple flushes if the batching isn't large enough.
>
> This can be safely done since there cannot be any concurrency on this
> mm, its either after the process died (exit) or in the middle of
> execve where the thread switched to the new mm.

I think we actually *used* to do the final TLB flush from within the
context of the process that died. That doesn't seem to ever be the
case any more, but it does worry me a bit. Maybe a

   VM_BUG_ON(current->active_mm == mm);

or something for the fullmm case?

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 22:26     ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 22:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> This originated from s390 which does something similar and would allow
> s390 to use the generic TLB flushing code.
>
> The idea is to flush the mm wide cache and tlb a priory and not bother
> with multiple flushes if the batching isn't large enough.
>
> This can be safely done since there cannot be any concurrency on this
> mm, its either after the process died (exit) or in the middle of
> execve where the thread switched to the new mm.

I think we actually *used* to do the final TLB flush from within the
context of the process that died. That doesn't seem to ever be the
case any more, but it does worry me a bit. Maybe a

   VM_BUG_ON(current->active_mm == mm);

or something for the fullmm case?

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 22:23     ` Linus Torvalds
  (?)
@ 2012-06-27 23:01       ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:

> Plus it really isn't about hardware page table walkers at all. It's
> more about the possibility of speculative TLB fils, it has nothing to
> do with *how* they are done. Sure, it's likely that a software
> pagetable walker wouldn't be something that gets called speculatively,
> but it's not out of the question.
> 
Hmm, I would call gup_fast() as speculative as we can get in software.
It does a lock-less walk of the page-tables. That's what the RCU free'd
page-table stuff is for to begin with.
> 
> IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> entries speculatively, and that if we are in a kernel thread they will
> *never* fill the TLB with anything else, then make them enable
> CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> files. 

Since we've dealt with the speculative software side by using RCU-ish
stuff, the only thing that's left is hardware, now neither sparc64 nor
ppc actually know about the linux page-tables from what I understood,
they only look at their hash-table thing.

So even if the hardware did do speculative tlb fills, it would do them
from the hash-table, but that's already cleared out.


How about something like this

---
Subject: mm: Add missing TLB invalidate to RCU page-table freeing
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu Jun 28 00:49:33 CEST 2012

For normal systems we need a TLB invalidate before freeing the
page-tables, the generic RCU based page-table freeing code lacked
this.

This is because this code originally came from ppc where the hardware
never walks the linux page-tables and thus this invalidate is not
required.

Others, notably s390 which ran into this problem in cd94154cc6a
("[S390] fix tlb flushing for page table pages"), do very much need
this TLB invalidation.

Therefore add it, with a Kconfig option to disable it so as to not
unduly slow down PPC and SPARC64 which neither of them need it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig         |    3 +++
 arch/powerpc/Kconfig |    1 +
 arch/sparc/Kconfig   |    1 +
 mm/memory.c          |   18 ++++++++++++++++++
 4 files changed, 23 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
 config HAVE_RCU_TABLE_FREE
 	bool
 
+config STRICT_TLB_FILL
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,6 +127,7 @@ config PPC
 	select GENERIC_IRQ_SHOW_LEVEL
 	select IRQ_FORCED_THREADING
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_BPF_JIT if PPC64
 	select HAVE_ARCH_JUMP_LABEL
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -52,6 +52,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_SYSCALL_WRAPPERS
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
 	free_page((unsigned long)batch);
 }
 
+#ifdef CONFIG_STRICT_TLB_FILL
+/*
+ * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
+ * the PTE entries from their hash-table. Their hardware never looks at the
+ * linux page-table structures, so they don't need a hardware TLB invalidate
+ * when tearing down the page-table structure itself.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu(tlb);
+}
+#endif
+
 void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
 	if (*batch) {
+		tlb_table_flush_mmu(tlb);
 		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
@@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
 
 	tlb->need_flush = 1;
 
+#ifdef CONFIG_STRICT_TLB_FILL
 	/*
 	 * When there's less then two users of this mm there cannot be a
 	 * concurrent page-table walk.
@@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
 		__tlb_remove_table(table);
 		return;
 	}
+#endif
 
 	if (*batch == NULL) {
 		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 23:01       ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:

> Plus it really isn't about hardware page table walkers at all. It's
> more about the possibility of speculative TLB fils, it has nothing to
> do with *how* they are done. Sure, it's likely that a software
> pagetable walker wouldn't be something that gets called speculatively,
> but it's not out of the question.
> 
Hmm, I would call gup_fast() as speculative as we can get in software.
It does a lock-less walk of the page-tables. That's what the RCU free'd
page-table stuff is for to begin with.
> 
> IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> entries speculatively, and that if we are in a kernel thread they will
> *never* fill the TLB with anything else, then make them enable
> CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> files. 

Since we've dealt with the speculative software side by using RCU-ish
stuff, the only thing that's left is hardware, now neither sparc64 nor
ppc actually know about the linux page-tables from what I understood,
they only look at their hash-table thing.

So even if the hardware did do speculative tlb fills, it would do them
from the hash-table, but that's already cleared out.


How about something like this

---
Subject: mm: Add missing TLB invalidate to RCU page-table freeing
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu Jun 28 00:49:33 CEST 2012

For normal systems we need a TLB invalidate before freeing the
page-tables, the generic RCU based page-table freeing code lacked
this.

This is because this code originally came from ppc where the hardware
never walks the linux page-tables and thus this invalidate is not
required.

Others, notably s390 which ran into this problem in cd94154cc6a
("[S390] fix tlb flushing for page table pages"), do very much need
this TLB invalidation.

Therefore add it, with a Kconfig option to disable it so as to not
unduly slow down PPC and SPARC64 which neither of them need it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig         |    3 +++
 arch/powerpc/Kconfig |    1 +
 arch/sparc/Kconfig   |    1 +
 mm/memory.c          |   18 ++++++++++++++++++
 4 files changed, 23 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
 config HAVE_RCU_TABLE_FREE
 	bool
 
+config STRICT_TLB_FILL
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,6 +127,7 @@ config PPC
 	select GENERIC_IRQ_SHOW_LEVEL
 	select IRQ_FORCED_THREADING
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_BPF_JIT if PPC64
 	select HAVE_ARCH_JUMP_LABEL
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -52,6 +52,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_SYSCALL_WRAPPERS
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
 	free_page((unsigned long)batch);
 }
 
+#ifdef CONFIG_STRICT_TLB_FILL
+/*
+ * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
+ * the PTE entries from their hash-table. Their hardware never looks at the
+ * linux page-table structures, so they don't need a hardware TLB invalidate
+ * when tearing down the page-table structure itself.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu(tlb);
+}
+#endif
+
 void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
 	if (*batch) {
+		tlb_table_flush_mmu(tlb);
 		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
@@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
 
 	tlb->need_flush = 1;
 
+#ifdef CONFIG_STRICT_TLB_FILL
 	/*
 	 * When there's less then two users of this mm there cannot be a
 	 * concurrent page-table walk.
@@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
 		__tlb_remove_table(table);
 		return;
 	}
+#endif
 
 	if (*batch == NULL) {
 		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 23:01       ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:

> Plus it really isn't about hardware page table walkers at all. It's
> more about the possibility of speculative TLB fils, it has nothing to
> do with *how* they are done. Sure, it's likely that a software
> pagetable walker wouldn't be something that gets called speculatively,
> but it's not out of the question.
> 
Hmm, I would call gup_fast() as speculative as we can get in software.
It does a lock-less walk of the page-tables. That's what the RCU free'd
page-table stuff is for to begin with.
> 
> IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> entries speculatively, and that if we are in a kernel thread they will
> *never* fill the TLB with anything else, then make them enable
> CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> files. 

Since we've dealt with the speculative software side by using RCU-ish
stuff, the only thing that's left is hardware, now neither sparc64 nor
ppc actually know about the linux page-tables from what I understood,
they only look at their hash-table thing.

So even if the hardware did do speculative tlb fills, it would do them
from the hash-table, but that's already cleared out.


How about something like this

---
Subject: mm: Add missing TLB invalidate to RCU page-table freeing
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu Jun 28 00:49:33 CEST 2012

For normal systems we need a TLB invalidate before freeing the
page-tables, the generic RCU based page-table freeing code lacked
this.

This is because this code originally came from ppc where the hardware
never walks the linux page-tables and thus this invalidate is not
required.

Others, notably s390 which ran into this problem in cd94154cc6a
("[S390] fix tlb flushing for page table pages"), do very much need
this TLB invalidation.

Therefore add it, with a Kconfig option to disable it so as to not
unduly slow down PPC and SPARC64 which neither of them need it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/Kconfig         |    3 +++
 arch/powerpc/Kconfig |    1 +
 arch/sparc/Kconfig   |    1 +
 mm/memory.c          |   18 ++++++++++++++++++
 4 files changed, 23 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
 config HAVE_RCU_TABLE_FREE
 	bool
 
+config STRICT_TLB_FILL
+	bool
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,6 +127,7 @@ config PPC
 	select GENERIC_IRQ_SHOW_LEVEL
 	select IRQ_FORCED_THREADING
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_BPF_JIT if PPC64
 	select HAVE_ARCH_JUMP_LABEL
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -52,6 +52,7 @@ config SPARC64
 	select HAVE_KRETPROBES
 	select HAVE_KPROBES
 	select HAVE_RCU_TABLE_FREE if SMP
+	select STRICT_TLB_FILL
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_SYSCALL_WRAPPERS
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
 	free_page((unsigned long)batch);
 }
 
+#ifdef CONFIG_STRICT_TLB_FILL
+/*
+ * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
+ * the PTE entries from their hash-table. Their hardware never looks at the
+ * linux page-table structures, so they don't need a hardware TLB invalidate
+ * when tearing down the page-table structure itself.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+	tlb_flush_mmu(tlb);
+}
+#endif
+
 void tlb_table_flush(struct mmu_gather *tlb)
 {
 	struct mmu_table_batch **batch = &tlb->batch;
 
 	if (*batch) {
+		tlb_table_flush_mmu(tlb);
 		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
@@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
 
 	tlb->need_flush = 1;
 
+#ifdef CONFIG_STRICT_TLB_FILL
 	/*
 	 * When there's less then two users of this mm there cannot be a
 	 * concurrent page-table walk.
@@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
 		__tlb_remove_table(table);
 		return;
 	}
+#endif
 
 	if (*batch == NULL) {
 		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 22:26     ` Linus Torvalds
@ 2012-06-27 23:02       ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf

On Wed, 2012-06-27 at 15:26 -0700, Linus Torvalds wrote:
> On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > This originated from s390 which does something similar and would allow
> > s390 to use the generic TLB flushing code.
> >
> > The idea is to flush the mm wide cache and tlb a priory and not bother
> > with multiple flushes if the batching isn't large enough.
> >
> > This can be safely done since there cannot be any concurrency on this
> > mm, its either after the process died (exit) or in the middle of
> > execve where the thread switched to the new mm.
> 
> I think we actually *used* to do the final TLB flush from within the
> context of the process that died. That doesn't seem to ever be the
> case any more, but it does worry me a bit. Maybe a
> 
>    VM_BUG_ON(current->active_mm == mm);
> 
> or something for the fullmm case?

OK, added it and am rebooting the test box..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:02       ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger,
	Hans-Christian Egtvedt, Ralf Baechle, Kyle McMartin,
	James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 15:26 -0700, Linus Torvalds wrote:
> On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > This originated from s390 which does something similar and would allow
> > s390 to use the generic TLB flushing code.
> >
> > The idea is to flush the mm wide cache and tlb a priory and not bother
> > with multiple flushes if the batching isn't large enough.
> >
> > This can be safely done since there cannot be any concurrency on this
> > mm, its either after the process died (exit) or in the middle of
> > execve where the thread switched to the new mm.
> 
> I think we actually *used* to do the final TLB flush from within the
> context of the process that died. That doesn't seem to ever be the
> case any more, but it does worry me a bit. Maybe a
> 
>    VM_BUG_ON(current->active_mm == mm);
> 
> or something for the fullmm case?

OK, added it and am rebooting the test box..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 23:02       ` Peter Zijlstra
  (?)
@ 2012-06-27 23:13         ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 01:02 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:26 -0700, Linus Torvalds wrote:
> > On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > This originated from s390 which does something similar and would allow
> > > s390 to use the generic TLB flushing code.
> > >
> > > The idea is to flush the mm wide cache and tlb a priory and not bother
> > > with multiple flushes if the batching isn't large enough.
> > >
> > > This can be safely done since there cannot be any concurrency on this
> > > mm, its either after the process died (exit) or in the middle of
> > > execve where the thread switched to the new mm.
> > 
> > I think we actually *used* to do the final TLB flush from within the
> > context of the process that died. That doesn't seem to ever be the
> > case any more, but it does worry me a bit. Maybe a
> > 
> >    VM_BUG_ON(current->active_mm == mm);
> > 
> > or something for the fullmm case?
> 
> OK, added it and am rebooting the test box..

That triggered.. is this a problem though, at this point userspace is
very dead so it shouldn't matter, right?

Will have to properly think about it tomorrow, its been 1am, brain is
mostly sleeping already.

------------[ cut here ]------------
kernel BUG at /home/root/src/linux-2.6/mm/memory.c:221!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU 13 
Pid: 132, comm: modprobe Not tainted 3.5.0-rc4-01507-g912ca15-dirty #180 Supermicro X8DTN/X8DTN
RIP: 0010:[<ffffffff811511bf>]  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
RSP: 0018:ffff880235b2bd78  EFLAGS: 00010246
RAX: ffff880235b18000 RBX: ffff880235b2bdc0 RCX: ffff880235b18000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: ffff880235b2bd98 R08: 0000000000000018 R09: 0000000000000004
R10: ffffffff81eedfc0 R11: 0000000000000084 R12: ffff8804356b8000
R13: 0000000000000001 R14: ffff880235b185f0 R15: ffff880235b18000
FS:  0000000000000000(0000) GS:ffff880237ce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000038ce8ae150 CR3: 0000000436ad6000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 132, threadinfo ffff880235b2a000, task ffff880235b18000)
Stack:
 ffff880235b2bd98 0000000000000000 ffff8804356b8000 ffff8804356b8060
 ffff880235b2be38 ffffffff8115ad38 ffff880235b2be38 ffff880235b4e000
 ffff880235b4e630 ffff8804356b8000 0000000100000000 ffff880235b2bdd8
Call Trace:
 [<ffffffff8115ad38>] exit_mmap+0x98/0x150
 [<ffffffff810bf98e>] ? exit_numa+0xae/0xe0
 [<ffffffff81078b74>] mmput+0x84/0x120
 [<ffffffff81080ce8>] exit_mm+0x108/0x130
 [<ffffffff81081388>] do_exit+0x678/0x950
 [<ffffffff811a3ad6>] ? alloc_fd+0xd6/0x120
 [<ffffffff811791c0>] ? kmem_cache_free+0x20/0x130
 [<ffffffff810819af>] do_group_exit+0x3f/0xa0
 [<ffffffff81081a27>] sys_exit_group+0x17/0x20
 [<ffffffff81980ed2>] system_call_fastpath+0x16/0x1b
Code: 10 74 1a 65 48 8b 04 25 80 ba 00 00 4c 3b a0 90 02 00 00 74 16 4c 89 e7 e8 5f 39 f2 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 
RIP  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
 RSP <ffff880235b2bd78>
---[ end trace f99f121b09c974f8 ]---


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:13         ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Thu, 2012-06-28 at 01:02 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:26 -0700, Linus Torvalds wrote:
> > On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > This originated from s390 which does something similar and would allow
> > > s390 to use the generic TLB flushing code.
> > >
> > > The idea is to flush the mm wide cache and tlb a priory and not bother
> > > with multiple flushes if the batching isn't large enough.
> > >
> > > This can be safely done since there cannot be any concurrency on this
> > > mm, its either after the process died (exit) or in the middle of
> > > execve where the thread switched to the new mm.
> > 
> > I think we actually *used* to do the final TLB flush from within the
> > context of the process that died. That doesn't seem to ever be the
> > case any more, but it does worry me a bit. Maybe a
> > 
> >    VM_BUG_ON(current->active_mm == mm);
> > 
> > or something for the fullmm case?
> 
> OK, added it and am rebooting the test box..

That triggered.. is this a problem though, at this point userspace is
very dead so it shouldn't matter, right?

Will have to properly think about it tomorrow, its been 1am, brain is
mostly sleeping already.

------------[ cut here ]------------
kernel BUG at /home/root/src/linux-2.6/mm/memory.c:221!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU 13 
Pid: 132, comm: modprobe Not tainted 3.5.0-rc4-01507-g912ca15-dirty #180 Supermicro X8DTN/X8DTN
RIP: 0010:[<ffffffff811511bf>]  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
RSP: 0018:ffff880235b2bd78  EFLAGS: 00010246
RAX: ffff880235b18000 RBX: ffff880235b2bdc0 RCX: ffff880235b18000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: ffff880235b2bd98 R08: 0000000000000018 R09: 0000000000000004
R10: ffffffff81eedfc0 R11: 0000000000000084 R12: ffff8804356b8000
R13: 0000000000000001 R14: ffff880235b185f0 R15: ffff880235b18000
FS:  0000000000000000(0000) GS:ffff880237ce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000038ce8ae150 CR3: 0000000436ad6000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 132, threadinfo ffff880235b2a000, task ffff880235b18000)
Stack:
 ffff880235b2bd98 0000000000000000 ffff8804356b8000 ffff8804356b8060
 ffff880235b2be38 ffffffff8115ad38 ffff880235b2be38 ffff880235b4e000
 ffff880235b4e630 ffff8804356b8000 0000000100000000 ffff880235b2bdd8
Call Trace:
 [<ffffffff8115ad38>] exit_mmap+0x98/0x150
 [<ffffffff810bf98e>] ? exit_numa+0xae/0xe0
 [<ffffffff81078b74>] mmput+0x84/0x120
 [<ffffffff81080ce8>] exit_mm+0x108/0x130
 [<ffffffff81081388>] do_exit+0x678/0x950
 [<ffffffff811a3ad6>] ? alloc_fd+0xd6/0x120
 [<ffffffff811791c0>] ? kmem_cache_free+0x20/0x130
 [<ffffffff810819af>] do_group_exit+0x3f/0xa0
 [<ffffffff81081a27>] sys_exit_group+0x17/0x20
 [<ffffffff81980ed2>] system_call_fastpath+0x16/0x1b
Code: 10 74 1a 65 48 8b 04 25 80 ba 00 00 4c 3b a0 90 02 00 00 74 16 4c 89 e7 e8 5f 39 f2 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 
RIP  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
 RSP <ffff880235b2bd78>
---[ end trace f99f121b09c974f8 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:13         ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-27 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 01:02 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:26 -0700, Linus Torvalds wrote:
> > On Wed, Jun 27, 2012 at 2:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > This originated from s390 which does something similar and would allow
> > > s390 to use the generic TLB flushing code.
> > >
> > > The idea is to flush the mm wide cache and tlb a priory and not bother
> > > with multiple flushes if the batching isn't large enough.
> > >
> > > This can be safely done since there cannot be any concurrency on this
> > > mm, its either after the process died (exit) or in the middle of
> > > execve where the thread switched to the new mm.
> > 
> > I think we actually *used* to do the final TLB flush from within the
> > context of the process that died. That doesn't seem to ever be the
> > case any more, but it does worry me a bit. Maybe a
> > 
> >    VM_BUG_ON(current->active_mm == mm);
> > 
> > or something for the fullmm case?
> 
> OK, added it and am rebooting the test box..

That triggered.. is this a problem though, at this point userspace is
very dead so it shouldn't matter, right?

Will have to properly think about it tomorrow, its been 1am, brain is
mostly sleeping already.

------------[ cut here ]------------
kernel BUG at /home/root/src/linux-2.6/mm/memory.c:221!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU 13 
Pid: 132, comm: modprobe Not tainted 3.5.0-rc4-01507-g912ca15-dirty #180 Supermicro X8DTN/X8DTN
RIP: 0010:[<ffffffff811511bf>]  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
RSP: 0018:ffff880235b2bd78  EFLAGS: 00010246
RAX: ffff880235b18000 RBX: ffff880235b2bdc0 RCX: ffff880235b18000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: ffff880235b2bd98 R08: 0000000000000018 R09: 0000000000000004
R10: ffffffff81eedfc0 R11: 0000000000000084 R12: ffff8804356b8000
R13: 0000000000000001 R14: ffff880235b185f0 R15: ffff880235b18000
FS:  0000000000000000(0000) GS:ffff880237ce0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000038ce8ae150 CR3: 0000000436ad6000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 132, threadinfo ffff880235b2a000, task ffff880235b18000)
Stack:
 ffff880235b2bd98 0000000000000000 ffff8804356b8000 ffff8804356b8060
 ffff880235b2be38 ffffffff8115ad38 ffff880235b2be38 ffff880235b4e000
 ffff880235b4e630 ffff8804356b8000 0000000100000000 ffff880235b2bdd8
Call Trace:
 [<ffffffff8115ad38>] exit_mmap+0x98/0x150
 [<ffffffff810bf98e>] ? exit_numa+0xae/0xe0
 [<ffffffff81078b74>] mmput+0x84/0x120
 [<ffffffff81080ce8>] exit_mm+0x108/0x130
 [<ffffffff81081388>] do_exit+0x678/0x950
 [<ffffffff811a3ad6>] ? alloc_fd+0xd6/0x120
 [<ffffffff811791c0>] ? kmem_cache_free+0x20/0x130
 [<ffffffff810819af>] do_group_exit+0x3f/0xa0
 [<ffffffff81081a27>] sys_exit_group+0x17/0x20
 [<ffffffff81980ed2>] system_call_fastpath+0x16/0x1b
Code: 10 74 1a 65 48 8b 04 25 80 ba 00 00 4c 3b a0 90 02 00 00 74 16 4c 89 e7 e8 5f 39 f2 ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 
RIP  [<ffffffff811511bf>] tlb_gather_mmu+0x9f/0xb0
 RSP <ffff880235b2bd78>
---[ end trace f99f121b09c974f8 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 23:13         ` Peter Zijlstra
  (?)
@ 2012-06-27 23:23           ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> That triggered.. is this a problem though, at this point userspace is
> very dead so it shouldn't matter, right?

It still matters. Even if user space is dead, kernel space accesses
can result in TLB fills in user space. Exactly because of things like
speculative fills etc.

So what can happen - for example - is that the kernel does a indirect
jump, and the CPU predicts the destination of the jump that using the
branch prediction tables.

But the branch prediction tables are obviously just predictions, and
they easily contain user addresses etc in them. So the kernel may well
end up speculatively doing a TLB fill on a user access.

And your whole optimization depends on this not happening, unless I
read the logic wrong. The whole "invalidate the TLB just once
up-front" approach is *only* valid if you know that nothing is going
to ever fill that TLB again. But see above - if we're still running
within that TLB context, we have no idea what speculative execution
may or may not end up filling.

That said, maybe I misread your patch?

                   Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:23           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Wed, Jun 27, 2012 at 4:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> That triggered.. is this a problem though, at this point userspace is
> very dead so it shouldn't matter, right?

It still matters. Even if user space is dead, kernel space accesses
can result in TLB fills in user space. Exactly because of things like
speculative fills etc.

So what can happen - for example - is that the kernel does a indirect
jump, and the CPU predicts the destination of the jump that using the
branch prediction tables.

But the branch prediction tables are obviously just predictions, and
they easily contain user addresses etc in them. So the kernel may well
end up speculatively doing a TLB fill on a user access.

And your whole optimization depends on this not happening, unless I
read the logic wrong. The whole "invalidate the TLB just once
up-front" approach is *only* valid if you know that nothing is going
to ever fill that TLB again. But see above - if we're still running
within that TLB context, we have no idea what speculative execution
may or may not end up filling.

That said, maybe I misread your patch?

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:23           ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> That triggered.. is this a problem though, at this point userspace is
> very dead so it shouldn't matter, right?

It still matters. Even if user space is dead, kernel space accesses
can result in TLB fills in user space. Exactly because of things like
speculative fills etc.

So what can happen - for example - is that the kernel does a indirect
jump, and the CPU predicts the destination of the jump that using the
branch prediction tables.

But the branch prediction tables are obviously just predictions, and
they easily contain user addresses etc in them. So the kernel may well
end up speculatively doing a TLB fill on a user access.

And your whole optimization depends on this not happening, unless I
read the logic wrong. The whole "invalidate the TLB just once
up-front" approach is *only* valid if you know that nothing is going
to ever fill that TLB again. But see above - if we're still running
within that TLB context, we have no idea what speculative execution
may or may not end up filling.

That said, maybe I misread your patch?

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 23:23           ` Linus Torvalds
  (?)
@ 2012-06-27 23:33             ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But the branch prediction tables are obviously just predictions, and
> they easily contain user addresses etc in them. So the kernel may well
> end up speculatively doing a TLB fill on a user access.

That should be ".. on a user *address*", hopefully that was clear from
the context, if not from the text.

IOW, the point I'm trying to make is that even if there are zero
*actual* accesses of user space (because user space is dead, and the
kernel hopefully does no "get_user()/put_user()" stuff at this point
any more), the CPU may speculatively use user addresses for the
bog-standard kernel addresses that happen.

Taking a user address from the BTB is just one example. Speculative
memory accesses might happen after a mis-predicted branch, where we
test a pointer against NULL, and after the branch we access it. So
doing a speculative TLB walk of the NULL address would not necessarily
even be unusual. Obviously normally nothing is actually mapped there,
but these kinds of things can *easily* result in the page tables
themselves being cached, even if the final page doesn't exist.

Also, all of this obviously depends on how aggressive the speculation
is. It's entirely possible that effects like these are really hard to
see in practice, and you'll almost never hit it. But stale TLB
contents (or stale page directory caches) are *really* nasty when they
do happen, and almost impossible to debug. So we want to be insanely
anal in this area.

               Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:33             ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Wed, Jun 27, 2012 at 4:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But the branch prediction tables are obviously just predictions, and
> they easily contain user addresses etc in them. So the kernel may well
> end up speculatively doing a TLB fill on a user access.

That should be ".. on a user *address*", hopefully that was clear from
the context, if not from the text.

IOW, the point I'm trying to make is that even if there are zero
*actual* accesses of user space (because user space is dead, and the
kernel hopefully does no "get_user()/put_user()" stuff at this point
any more), the CPU may speculatively use user addresses for the
bog-standard kernel addresses that happen.

Taking a user address from the BTB is just one example. Speculative
memory accesses might happen after a mis-predicted branch, where we
test a pointer against NULL, and after the branch we access it. So
doing a speculative TLB walk of the NULL address would not necessarily
even be unusual. Obviously normally nothing is actually mapped there,
but these kinds of things can *easily* result in the page tables
themselves being cached, even if the final page doesn't exist.

Also, all of this obviously depends on how aggressive the speculation
is. It's entirely possible that effects like these are really hard to
see in practice, and you'll almost never hit it. But stale TLB
contents (or stale page directory caches) are *really* nasty when they
do happen, and almost impossible to debug. So we want to be insanely
anal in this area.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-27 23:33             ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But the branch prediction tables are obviously just predictions, and
> they easily contain user addresses etc in them. So the kernel may well
> end up speculatively doing a TLB fill on a user access.

That should be ".. on a user *address*", hopefully that was clear from
the context, if not from the text.

IOW, the point I'm trying to make is that even if there are zero
*actual* accesses of user space (because user space is dead, and the
kernel hopefully does no "get_user()/put_user()" stuff at this point
any more), the CPU may speculatively use user addresses for the
bog-standard kernel addresses that happen.

Taking a user address from the BTB is just one example. Speculative
memory accesses might happen after a mis-predicted branch, where we
test a pointer against NULL, and after the branch we access it. So
doing a speculative TLB walk of the NULL address would not necessarily
even be unusual. Obviously normally nothing is actually mapped there,
but these kinds of things can *easily* result in the page tables
themselves being cached, even if the final page doesn't exist.

Also, all of this obviously depends on how aggressive the speculation
is. It's entirely possible that effects like these are really hard to
see in practice, and you'll almost never hit it. But stale TLB
contents (or stale page directory caches) are *really* nasty when they
do happen, and almost impossible to debug. So we want to be insanely
anal in this area.

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 23:01       ` Peter Zijlstra
  (?)
@ 2012-06-27 23:42         ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:01 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> How about something like this

Looks better.

I'd be even happier if you made the whole

  "When there's less then two users.."

(There's a misspelling there, btw, I didn't notice until I
cut-and-pasted that) logic be a helper function, and have that helper
function be inside that same #ifdef CONFIG_STRICT_TLB_FILL block
together witht he tlb_table_flush_mmu() function.

IOW, something like

  static int tlb_remove_table_quick( struct mmu_gather *tlb, void *table)
  {
        if (atomic_read(&tlb->mm->mm_users) < 2) {
            __tlb_remove_table(table);
            return 1;
        }
        return 0;
  }

for the CONFIG_STRICT_TLB_FILL case, and then the default case just
does an unconditional "return 0".

So that the actual code can avoid having #ifdef's in the middle of a
function, and could just do

    if (tlb_remove_table_quick(tlb, table))
        return;

instead.

Maybe it's just me, but I detest seeing #ifdef's in the middle of
code. I'd much rather have the #ifdef's *outside* the code and have
these kinds of helper functions that sometimes end up becoming empty.

                   Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 23:42         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Wed, Jun 27, 2012 at 4:01 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> How about something like this

Looks better.

I'd be even happier if you made the whole

  "When there's less then two users.."

(There's a misspelling there, btw, I didn't notice until I
cut-and-pasted that) logic be a helper function, and have that helper
function be inside that same #ifdef CONFIG_STRICT_TLB_FILL block
together witht he tlb_table_flush_mmu() function.

IOW, something like

  static int tlb_remove_table_quick( struct mmu_gather *tlb, void *table)
  {
        if (atomic_read(&tlb->mm->mm_users) < 2) {
            __tlb_remove_table(table);
            return 1;
        }
        return 0;
  }

for the CONFIG_STRICT_TLB_FILL case, and then the default case just
does an unconditional "return 0".

So that the actual code can avoid having #ifdef's in the middle of a
function, and could just do

    if (tlb_remove_table_quick(tlb, table))
        return;

instead.

Maybe it's just me, but I detest seeing #ifdef's in the middle of
code. I'd much rather have the #ifdef's *outside* the code and have
these kinds of helper functions that sometimes end up becoming empty.

                   Linus

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-27 23:42         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-27 23:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 4:01 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> How about something like this

Looks better.

I'd be even happier if you made the whole

  "When there's less then two users.."

(There's a misspelling there, btw, I didn't notice until I
cut-and-pasted that) logic be a helper function, and have that helper
function be inside that same #ifdef CONFIG_STRICT_TLB_FILL block
together witht he tlb_table_flush_mmu() function.

IOW, something like

  static int tlb_remove_table_quick( struct mmu_gather *tlb, void *table)
  {
        if (atomic_read(&tlb->mm->mm_users) < 2) {
            __tlb_remove_table(table);
            return 1;
        }
        return 0;
  }

for the CONFIG_STRICT_TLB_FILL case, and then the default case just
does an unconditional "return 0".

So that the actual code can avoid having #ifdef's in the middle of a
function, and could just do

    if (tlb_remove_table_quick(tlb, table))
        return;

instead.

Maybe it's just me, but I detest seeing #ifdef's in the middle of
code. I'd much rather have the #ifdef's *outside* the code and have
these kinds of helper functions that sometimes end up becoming empty.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 23:01       ` Peter Zijlstra
  (?)
@ 2012-06-28  7:09         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28  7:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> 
> > Plus it really isn't about hardware page table walkers at all. It's
> > more about the possibility of speculative TLB fils, it has nothing to
> > do with *how* they are done. Sure, it's likely that a software
> > pagetable walker wouldn't be something that gets called speculatively,
> > but it's not out of the question.
> > 
> Hmm, I would call gup_fast() as speculative as we can get in software.
> It does a lock-less walk of the page-tables. That's what the RCU free'd
> page-table stuff is for to begin with.

Strictly speaking it's not :-) To *begin with* (as in the origin of that
code) it comes from powerpc hash table code which walks the linux page
tables locklessly :-) It then came in handy with gup_fast :-)

> > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > entries speculatively, and that if we are in a kernel thread they will
> > *never* fill the TLB with anything else, then make them enable
> > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > files. 
> 
> Since we've dealt with the speculative software side by using RCU-ish
> stuff, the only thing that's left is hardware, now neither sparc64 nor
> ppc actually know about the linux page-tables from what I understood,
> they only look at their hash-table thing.

Some embedded ppc's know about the lowest level (SW loaded PMD) but
that's not an issue here. We flush these special TLB entries
specifically and synchronously in __pte_free_tlb().

> So even if the hardware did do speculative tlb fills, it would do them
> from the hash-table, but that's already cleared out.

Right,

Cheers,
Ben.

> 
> How about something like this
> 
> ---
> Subject: mm: Add missing TLB invalidate to RCU page-table freeing
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu Jun 28 00:49:33 CEST 2012
> 
> For normal systems we need a TLB invalidate before freeing the
> page-tables, the generic RCU based page-table freeing code lacked
> this.
> 
> This is because this code originally came from ppc where the hardware
> never walks the linux page-tables and thus this invalidate is not
> required.
> 
> Others, notably s390 which ran into this problem in cd94154cc6a
> ("[S390] fix tlb flushing for page table pages"), do very much need
> this TLB invalidation.
> 
> Therefore add it, with a Kconfig option to disable it so as to not
> unduly slow down PPC and SPARC64 which neither of them need it.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/Kconfig         |    3 +++
>  arch/powerpc/Kconfig |    1 +
>  arch/sparc/Kconfig   |    1 +
>  mm/memory.c          |   18 ++++++++++++++++++
>  4 files changed, 23 insertions(+)
> 
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
>  config HAVE_RCU_TABLE_FREE
>  	bool
>  
> +config STRICT_TLB_FILL
> +	bool
> +
>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	bool
>  
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -127,6 +127,7 @@ config PPC
>  	select GENERIC_IRQ_SHOW_LEVEL
>  	select IRQ_FORCED_THREADING
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_SYSCALL_TRACEPOINTS
>  	select HAVE_BPF_JIT if PPC64
>  	select HAVE_ARCH_JUMP_LABEL
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -52,6 +52,7 @@ config SPARC64
>  	select HAVE_KRETPROBES
>  	select HAVE_KPROBES
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_MEMBLOCK
>  	select HAVE_MEMBLOCK_NODE_MAP
>  	select HAVE_SYSCALL_WRAPPERS
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
>  	free_page((unsigned long)batch);
>  }
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}
> @@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
>  
>  	tlb->need_flush = 1;
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
>  	/*
>  	 * When there's less then two users of this mm there cannot be a
>  	 * concurrent page-table walk.
> @@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
>  		__tlb_remove_table(table);
>  		return;
>  	}
> +#endif
>  
>  	if (*batch == NULL) {
>  		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28  7:09         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28  7:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike

On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> 
> > Plus it really isn't about hardware page table walkers at all. It's
> > more about the possibility of speculative TLB fils, it has nothing to
> > do with *how* they are done. Sure, it's likely that a software
> > pagetable walker wouldn't be something that gets called speculatively,
> > but it's not out of the question.
> > 
> Hmm, I would call gup_fast() as speculative as we can get in software.
> It does a lock-less walk of the page-tables. That's what the RCU free'd
> page-table stuff is for to begin with.

Strictly speaking it's not :-) To *begin with* (as in the origin of that
code) it comes from powerpc hash table code which walks the linux page
tables locklessly :-) It then came in handy with gup_fast :-)

> > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > entries speculatively, and that if we are in a kernel thread they will
> > *never* fill the TLB with anything else, then make them enable
> > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > files. 
> 
> Since we've dealt with the speculative software side by using RCU-ish
> stuff, the only thing that's left is hardware, now neither sparc64 nor
> ppc actually know about the linux page-tables from what I understood,
> they only look at their hash-table thing.

Some embedded ppc's know about the lowest level (SW loaded PMD) but
that's not an issue here. We flush these special TLB entries
specifically and synchronously in __pte_free_tlb().

> So even if the hardware did do speculative tlb fills, it would do them
> from the hash-table, but that's already cleared out.

Right,

Cheers,
Ben.

> 
> How about something like this
> 
> ---
> Subject: mm: Add missing TLB invalidate to RCU page-table freeing
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu Jun 28 00:49:33 CEST 2012
> 
> For normal systems we need a TLB invalidate before freeing the
> page-tables, the generic RCU based page-table freeing code lacked
> this.
> 
> This is because this code originally came from ppc where the hardware
> never walks the linux page-tables and thus this invalidate is not
> required.
> 
> Others, notably s390 which ran into this problem in cd94154cc6a
> ("[S390] fix tlb flushing for page table pages"), do very much need
> this TLB invalidation.
> 
> Therefore add it, with a Kconfig option to disable it so as to not
> unduly slow down PPC and SPARC64 which neither of them need it.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/Kconfig         |    3 +++
>  arch/powerpc/Kconfig |    1 +
>  arch/sparc/Kconfig   |    1 +
>  mm/memory.c          |   18 ++++++++++++++++++
>  4 files changed, 23 insertions(+)
> 
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
>  config HAVE_RCU_TABLE_FREE
>  	bool
>  
> +config STRICT_TLB_FILL
> +	bool
> +
>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	bool
>  
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -127,6 +127,7 @@ config PPC
>  	select GENERIC_IRQ_SHOW_LEVEL
>  	select IRQ_FORCED_THREADING
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_SYSCALL_TRACEPOINTS
>  	select HAVE_BPF_JIT if PPC64
>  	select HAVE_ARCH_JUMP_LABEL
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -52,6 +52,7 @@ config SPARC64
>  	select HAVE_KRETPROBES
>  	select HAVE_KPROBES
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_MEMBLOCK
>  	select HAVE_MEMBLOCK_NODE_MAP
>  	select HAVE_SYSCALL_WRAPPERS
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
>  	free_page((unsigned long)batch);
>  }
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}
> @@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
>  
>  	tlb->need_flush = 1;
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
>  	/*
>  	 * When there's less then two users of this mm there cannot be a
>  	 * concurrent page-table walk.
> @@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
>  		__tlb_remove_table(table);
>  		return;
>  	}
> +#endif
>  
>  	if (*batch == NULL) {
>  		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28  7:09         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28  7:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> 
> > Plus it really isn't about hardware page table walkers at all. It's
> > more about the possibility of speculative TLB fils, it has nothing to
> > do with *how* they are done. Sure, it's likely that a software
> > pagetable walker wouldn't be something that gets called speculatively,
> > but it's not out of the question.
> > 
> Hmm, I would call gup_fast() as speculative as we can get in software.
> It does a lock-less walk of the page-tables. That's what the RCU free'd
> page-table stuff is for to begin with.

Strictly speaking it's not :-) To *begin with* (as in the origin of that
code) it comes from powerpc hash table code which walks the linux page
tables locklessly :-) It then came in handy with gup_fast :-)

> > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > entries speculatively, and that if we are in a kernel thread they will
> > *never* fill the TLB with anything else, then make them enable
> > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > files. 
> 
> Since we've dealt with the speculative software side by using RCU-ish
> stuff, the only thing that's left is hardware, now neither sparc64 nor
> ppc actually know about the linux page-tables from what I understood,
> they only look at their hash-table thing.

Some embedded ppc's know about the lowest level (SW loaded PMD) but
that's not an issue here. We flush these special TLB entries
specifically and synchronously in __pte_free_tlb().

> So even if the hardware did do speculative tlb fills, it would do them
> from the hash-table, but that's already cleared out.

Right,

Cheers,
Ben.

> 
> How about something like this
> 
> ---
> Subject: mm: Add missing TLB invalidate to RCU page-table freeing
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu Jun 28 00:49:33 CEST 2012
> 
> For normal systems we need a TLB invalidate before freeing the
> page-tables, the generic RCU based page-table freeing code lacked
> this.
> 
> This is because this code originally came from ppc where the hardware
> never walks the linux page-tables and thus this invalidate is not
> required.
> 
> Others, notably s390 which ran into this problem in cd94154cc6a
> ("[S390] fix tlb flushing for page table pages"), do very much need
> this TLB invalidation.
> 
> Therefore add it, with a Kconfig option to disable it so as to not
> unduly slow down PPC and SPARC64 which neither of them need it.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/Kconfig         |    3 +++
>  arch/powerpc/Kconfig |    1 +
>  arch/sparc/Kconfig   |    1 +
>  mm/memory.c          |   18 ++++++++++++++++++
>  4 files changed, 23 insertions(+)
> 
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
>  config HAVE_RCU_TABLE_FREE
>  	bool
>  
> +config STRICT_TLB_FILL
> +	bool
> +
>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	bool
>  
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -127,6 +127,7 @@ config PPC
>  	select GENERIC_IRQ_SHOW_LEVEL
>  	select IRQ_FORCED_THREADING
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_SYSCALL_TRACEPOINTS
>  	select HAVE_BPF_JIT if PPC64
>  	select HAVE_ARCH_JUMP_LABEL
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -52,6 +52,7 @@ config SPARC64
>  	select HAVE_KRETPROBES
>  	select HAVE_KPROBES
>  	select HAVE_RCU_TABLE_FREE if SMP
> +	select STRICT_TLB_FILL
>  	select HAVE_MEMBLOCK
>  	select HAVE_MEMBLOCK_NODE_MAP
>  	select HAVE_SYSCALL_WRAPPERS
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct 
>  	free_page((unsigned long)batch);
>  }
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}
> @@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather 
>  
>  	tlb->need_flush = 1;
>  
> +#ifdef CONFIG_STRICT_TLB_FILL
>  	/*
>  	 * When there's less then two users of this mm there cannot be a
>  	 * concurrent page-table walk.
> @@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather 
>  		__tlb_remove_table(table);
>  		return;
>  	}
> +#endif
>  
>  	if (*batch == NULL) {
>  		*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
  2012-06-27 22:13     ` Peter Zijlstra
@ 2012-06-28  7:13       ` Martin Schwidefsky
  -1 siblings, 0 replies; 120+ messages in thread
From: Martin Schwidefsky @ 2012-06-28  7:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Tony Luck, Paul Mundt, Jeff Dike

On Thu, 28 Jun 2012 00:13:19 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2012-06-27 at 23:15 +0200, Peter Zijlstra wrote:
> > 
> > S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
> > before __tlb_remove_page() because its ptep_get_and_clear*() family
> > already does a full TLB invalidate. Therefore force it to use
> > tlb_fast_mode. 
> 
> On that.. ptep_get_and_clear() says:
> 
> /*                                                                                             
>  * This is hard to understand. ptep_get_and_clear and ptep_clear_flush                         
>  * both clear the TLB for the unmapped pte. The reason is that                                 
>  * ptep_get_and_clear is used in common code (e.g. change_pte_range)                           
>  * to modify an active pte. The sequence is                                                    
>  *   1) ptep_get_and_clear                                                                     
>  *   2) set_pte_at                                                                             
>  *   3) flush_tlb_range                                                                        
>  * On s390 the tlb needs to get flushed with the modification of the pte                       
>  * if the pte is active. The only way how this can be implemented is to                        
>  * have ptep_get_and_clear do the tlb flush. In exchange flush_tlb_range                       
>  * is a nop.                                                                                   
>  */ 
> 
> I think there is another way, arch_{enter,leave}_lazy_mmu_mode() seems
> to wrap these sites so you can do as SPARC64 and PPC do and batch
> through there.
> 
> That should save a number of TLB invalidates..

Unfortunately that is not good enough. The point is that a pte that can
be referenced by another cpu may not be modified without using one of
the special instructions that flushes the TLBs on all cpu at the same
time. It really is one pte at a time if more than one cpu attached a
particular mm.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 11/20] mm, s390: Convert to use generic mmu_gather
@ 2012-06-28  7:13       ` Martin Schwidefsky
  0 siblings, 0 replies; 120+ messages in thread
From: Martin Schwidefsky @ 2012-06-28  7:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Tony Luck, Paul Mundt, Jeff Dike,
	Richard Weinberger, Hans-Christian Egtvedt, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 28 Jun 2012 00:13:19 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2012-06-27 at 23:15 +0200, Peter Zijlstra wrote:
> > 
> > S390 doesn't need a TLB flush after ptep_get_and_clear_full() and
> > before __tlb_remove_page() because its ptep_get_and_clear*() family
> > already does a full TLB invalidate. Therefore force it to use
> > tlb_fast_mode. 
> 
> On that.. ptep_get_and_clear() says:
> 
> /*                                                                                             
>  * This is hard to understand. ptep_get_and_clear and ptep_clear_flush                         
>  * both clear the TLB for the unmapped pte. The reason is that                                 
>  * ptep_get_and_clear is used in common code (e.g. change_pte_range)                           
>  * to modify an active pte. The sequence is                                                    
>  *   1) ptep_get_and_clear                                                                     
>  *   2) set_pte_at                                                                             
>  *   3) flush_tlb_range                                                                        
>  * On s390 the tlb needs to get flushed with the modification of the pte                       
>  * if the pte is active. The only way how this can be implemented is to                        
>  * have ptep_get_and_clear do the tlb flush. In exchange flush_tlb_range                       
>  * is a nop.                                                                                   
>  */ 
> 
> I think there is another way, arch_{enter,leave}_lazy_mmu_mode() seems
> to wrap these sites so you can do as SPARC64 and PPC do and batch
> through there.
> 
> That should save a number of TLB invalidates..

Unfortunately that is not good enough. The point is that a pte that can
be referenced by another cpu may not be modified without using one of
the special instructions that flushes the TLBs on all cpu at the same
time. It really is one pte at a time if more than one cpu attached a
particular mm.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 23:33             ` Linus Torvalds
@ 2012-06-28  9:16               ` Catalin Marinas
  -1 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-28  9:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt

On Thu, Jun 28, 2012 at 12:33:44AM +0100, Linus Torvalds wrote:
> On Wed, Jun 27, 2012 at 4:23 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > But the branch prediction tables are obviously just predictions, and
> > they easily contain user addresses etc in them. So the kernel may well
> > end up speculatively doing a TLB fill on a user access.
> 
> That should be ".. on a user *address*", hopefully that was clear from
> the context, if not from the text.
> 
> IOW, the point I'm trying to make is that even if there are zero
> *actual* accesses of user space (because user space is dead, and the
> kernel hopefully does no "get_user()/put_user()" stuff at this point
> any more), the CPU may speculatively use user addresses for the
> bog-standard kernel addresses that happen.

That's definitely an issue on ARM and it was hit on older kernels.
Basically ARM processors can cache any page translation level in the
TLB. We need to make sure that no page entry at any level (either cached
in the TLB or not) points to an invalid next level table (hence the TLB
shootdown). For example, in cases like free_pgd_range(), if the cached
pgd entry points to an already freed pud/pmd table (pgd_clear is not
enough) it may walk the page tables speculatively cache another entry in
the TLB. Depending on the random data it reads from an old table page,
it may find a global entry (it's just a bit in the pte) which is not
tagged with an ASID (application specific id). A latter flush_tlb_mm()
only flushes the current ASID and doesn't touch global entries (used
only by kernel mappings). So we end up with global TLB entry in user
space that overrides any other application mapping.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28  9:16               ` Catalin Marinas
  0 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-28  9:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, Jun 28, 2012 at 12:33:44AM +0100, Linus Torvalds wrote:
> On Wed, Jun 27, 2012 at 4:23 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > But the branch prediction tables are obviously just predictions, and
> > they easily contain user addresses etc in them. So the kernel may well
> > end up speculatively doing a TLB fill on a user access.
> 
> That should be ".. on a user *address*", hopefully that was clear from
> the context, if not from the text.
> 
> IOW, the point I'm trying to make is that even if there are zero
> *actual* accesses of user space (because user space is dead, and the
> kernel hopefully does no "get_user()/put_user()" stuff at this point
> any more), the CPU may speculatively use user addresses for the
> bog-standard kernel addresses that happen.

That's definitely an issue on ARM and it was hit on older kernels.
Basically ARM processors can cache any page translation level in the
TLB. We need to make sure that no page entry at any level (either cached
in the TLB or not) points to an invalid next level table (hence the TLB
shootdown). For example, in cases like free_pgd_range(), if the cached
pgd entry points to an already freed pud/pmd table (pgd_clear is not
enough) it may walk the page tables speculatively cache another entry in
the TLB. Depending on the random data it reads from an old table page,
it may find a global entry (it's just a bit in the pte) which is not
tagged with an ASID (application specific id). A latter flush_tlb_mm()
only flushes the current ASID and doesn't touch global entries (used
only by kernel mappings). So we end up with global TLB entry in user
space that overrides any other application mapping.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28  9:16               ` Catalin Marinas
@ 2012-06-28 10:39                 ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 10:39 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Peter Zijlstra, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf

On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> That's definitely an issue on ARM and it was hit on older kernels.
> Basically ARM processors can cache any page translation level in the
> TLB. We need to make sure that no page entry at any level (either cached
> in the TLB or not) points to an invalid next level table (hence the TLB
> shootdown). For example, in cases like free_pgd_range(), if the cached
> pgd entry points to an already freed pud/pmd table (pgd_clear is not
> enough) it may walk the page tables speculatively cache another entry in
> the TLB. Depending on the random data it reads from an old table page,
> it may find a global entry (it's just a bit in the pte) which is not
> tagged with an ASID (application specific id). A latter flush_tlb_mm()
> only flushes the current ASID and doesn't touch global entries (used
> only by kernel mappings). So we end up with global TLB entry in user
> space that overrides any other application mapping.

Right, that's the typical scenario. I haven't looked at your flush
implementation though, but surely you can defer the actual freeing so
you can batch them & limit the number of TLB flushes right ?

Cheers,
Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 10:39                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 10:39 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linus Torvalds, Peter Zijlstra, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> That's definitely an issue on ARM and it was hit on older kernels.
> Basically ARM processors can cache any page translation level in the
> TLB. We need to make sure that no page entry at any level (either cached
> in the TLB or not) points to an invalid next level table (hence the TLB
> shootdown). For example, in cases like free_pgd_range(), if the cached
> pgd entry points to an already freed pud/pmd table (pgd_clear is not
> enough) it may walk the page tables speculatively cache another entry in
> the TLB. Depending on the random data it reads from an old table page,
> it may find a global entry (it's just a bit in the pte) which is not
> tagged with an ASID (application specific id). A latter flush_tlb_mm()
> only flushes the current ASID and doesn't touch global entries (used
> only by kernel mappings). So we end up with global TLB entry in user
> space that overrides any other application mapping.

Right, that's the typical scenario. I haven't looked at your flush
implementation though, but surely you can defer the actual freeing so
you can batch them & limit the number of TLB flushes right ?

Cheers,
Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-27 23:33             ` Linus Torvalds
  (?)
@ 2012-06-28 10:55               ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 10:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> IOW, the point I'm trying to make is that even if there are zero
> *actual* accesses of user space (because user space is dead, and the
> kernel hopefully does no "get_user()/put_user()" stuff at this point
> any more), the CPU may speculatively use user addresses for the
> bog-standard kernel addresses that happen. 

Right.. and s390 having done this only says that s390 appears to be ok
with it. Martin, does s390 hardware guarantee no speculative stuff like
Linus explained, or might there even be a latent issue on s390?

But it looks like we cannot do this in general, and esp. ARM (as already
noted by Catalin) has very aggressive speculative behaviour.

The alternative is that we do a switch_mm() to init_mm instead of the
TLB flush. On x86 that should be about the same cost, but I've not
looked at other architectures yet.

The second and least favourite alternative is of course special casing
this for s390 if it turns out its a safe thing to do for them.

/me goes look through arch code.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 10:55               ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 10:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> IOW, the point I'm trying to make is that even if there are zero
> *actual* accesses of user space (because user space is dead, and the
> kernel hopefully does no "get_user()/put_user()" stuff at this point
> any more), the CPU may speculatively use user addresses for the
> bog-standard kernel addresses that happen. 

Right.. and s390 having done this only says that s390 appears to be ok
with it. Martin, does s390 hardware guarantee no speculative stuff like
Linus explained, or might there even be a latent issue on s390?

But it looks like we cannot do this in general, and esp. ARM (as already
noted by Catalin) has very aggressive speculative behaviour.

The alternative is that we do a switch_mm() to init_mm instead of the
TLB flush. On x86 that should be about the same cost, but I've not
looked at other architectures yet.

The second and least favourite alternative is of course special casing
this for s390 if it turns out its a safe thing to do for them.

/me goes look through arch code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 10:55               ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 10:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> IOW, the point I'm trying to make is that even if there are zero
> *actual* accesses of user space (because user space is dead, and the
> kernel hopefully does no "get_user()/put_user()" stuff at this point
> any more), the CPU may speculatively use user addresses for the
> bog-standard kernel addresses that happen. 

Right.. and s390 having done this only says that s390 appears to be ok
with it. Martin, does s390 hardware guarantee no speculative stuff like
Linus explained, or might there even be a latent issue on s390?

But it looks like we cannot do this in general, and esp. ARM (as already
noted by Catalin) has very aggressive speculative behaviour.

The alternative is that we do a switch_mm() to init_mm instead of the
TLB flush. On x86 that should be about the same cost, but I've not
looked at other architectures yet.

The second and least favourite alternative is of course special casing
this for s390 if it turns out its a safe thing to do for them.

/me goes look through arch code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 10:39                 ` Benjamin Herrenschmidt
@ 2012-06-28 10:59                   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 10:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Catalin Marinas, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike

On Thu, 2012-06-28 at 20:39 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> > That's definitely an issue on ARM and it was hit on older kernels.
> > Basically ARM processors can cache any page translation level in the
> > TLB. We need to make sure that no page entry at any level (either cached
> > in the TLB or not) points to an invalid next level table (hence the TLB
> > shootdown). For example, in cases like free_pgd_range(), if the cached
> > pgd entry points to an already freed pud/pmd table (pgd_clear is not
> > enough) it may walk the page tables speculatively cache another entry in
> > the TLB. Depending on the random data it reads from an old table page,
> > it may find a global entry (it's just a bit in the pte) which is not
> > tagged with an ASID (application specific id). A latter flush_tlb_mm()
> > only flushes the current ASID and doesn't touch global entries (used
> > only by kernel mappings). So we end up with global TLB entry in user
> > space that overrides any other application mapping.
> 
> Right, that's the typical scenario. I haven't looked at your flush
> implementation though, but surely you can defer the actual freeing so
> you can batch them & limit the number of TLB flushes right ?

Yes they do.. its just the up-front TLB invalidate for fullmm that's a
problem.

s390 really wants this so it can avoid the per pte invalidate otherwise
required by ptep_get_and_clear_full().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 10:59                   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 10:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Catalin Marinas, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 20:39 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> > That's definitely an issue on ARM and it was hit on older kernels.
> > Basically ARM processors can cache any page translation level in the
> > TLB. We need to make sure that no page entry at any level (either cached
> > in the TLB or not) points to an invalid next level table (hence the TLB
> > shootdown). For example, in cases like free_pgd_range(), if the cached
> > pgd entry points to an already freed pud/pmd table (pgd_clear is not
> > enough) it may walk the page tables speculatively cache another entry in
> > the TLB. Depending on the random data it reads from an old table page,
> > it may find a global entry (it's just a bit in the pte) which is not
> > tagged with an ASID (application specific id). A latter flush_tlb_mm()
> > only flushes the current ASID and doesn't touch global entries (used
> > only by kernel mappings). So we end up with global TLB entry in user
> > space that overrides any other application mapping.
> 
> Right, that's the typical scenario. I haven't looked at your flush
> implementation though, but surely you can defer the actual freeing so
> you can batch them & limit the number of TLB flushes right ?

Yes they do.. its just the up-front TLB invalidate for fullmm that's a
problem.

s390 really wants this so it can avoid the per pte invalidate otherwise
required by ptep_get_and_clear_full().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-28  7:09         ` Benjamin Herrenschmidt
  (?)
@ 2012-06-28 11:05           ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 17:09 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> > 
> > > Plus it really isn't about hardware page table walkers at all. It's
> > > more about the possibility of speculative TLB fils, it has nothing to
> > > do with *how* they are done. Sure, it's likely that a software
> > > pagetable walker wouldn't be something that gets called speculatively,
> > > but it's not out of the question.
> > > 
> > Hmm, I would call gup_fast() as speculative as we can get in software.
> > It does a lock-less walk of the page-tables. That's what the RCU free'd
> > page-table stuff is for to begin with.
> 
> Strictly speaking it's not :-) To *begin with* (as in the origin of that
> code) it comes from powerpc hash table code which walks the linux page
> tables locklessly :-) It then came in handy with gup_fast :-)

Ah, ok my bad.

> > > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > > entries speculatively, and that if we are in a kernel thread they will
> > > *never* fill the TLB with anything else, then make them enable
> > > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > > files. 
> > 
> > Since we've dealt with the speculative software side by using RCU-ish
> > stuff, the only thing that's left is hardware, now neither sparc64 nor
> > ppc actually know about the linux page-tables from what I understood,
> > they only look at their hash-table thing.
> 
> Some embedded ppc's know about the lowest level (SW loaded PMD) but
> that's not an issue here. We flush these special TLB entries
> specifically and synchronously in __pte_free_tlb().

OK, I missed that.. is that
arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

> > So even if the hardware did do speculative tlb fills, it would do them
> > from the hash-table, but that's already cleared out.
> 
> Right,

Phew at least I got the important thing right ;-)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28 11:05           ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Thu, 2012-06-28 at 17:09 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> > 
> > > Plus it really isn't about hardware page table walkers at all. It's
> > > more about the possibility of speculative TLB fils, it has nothing to
> > > do with *how* they are done. Sure, it's likely that a software
> > > pagetable walker wouldn't be something that gets called speculatively,
> > > but it's not out of the question.
> > > 
> > Hmm, I would call gup_fast() as speculative as we can get in software.
> > It does a lock-less walk of the page-tables. That's what the RCU free'd
> > page-table stuff is for to begin with.
> 
> Strictly speaking it's not :-) To *begin with* (as in the origin of that
> code) it comes from powerpc hash table code which walks the linux page
> tables locklessly :-) It then came in handy with gup_fast :-)

Ah, ok my bad.

> > > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > > entries speculatively, and that if we are in a kernel thread they will
> > > *never* fill the TLB with anything else, then make them enable
> > > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > > files. 
> > 
> > Since we've dealt with the speculative software side by using RCU-ish
> > stuff, the only thing that's left is hardware, now neither sparc64 nor
> > ppc actually know about the linux page-tables from what I understood,
> > they only look at their hash-table thing.
> 
> Some embedded ppc's know about the lowest level (SW loaded PMD) but
> that's not an issue here. We flush these special TLB entries
> specifically and synchronously in __pte_free_tlb().

OK, I missed that.. is that
arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

> > So even if the hardware did do speculative tlb fills, it would do them
> > from the hash-table, but that's already cleared out.
> 
> Right,

Phew at least I got the important thing right ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28 11:05           ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 17:09 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote:
> > 
> > > Plus it really isn't about hardware page table walkers at all. It's
> > > more about the possibility of speculative TLB fils, it has nothing to
> > > do with *how* they are done. Sure, it's likely that a software
> > > pagetable walker wouldn't be something that gets called speculatively,
> > > but it's not out of the question.
> > > 
> > Hmm, I would call gup_fast() as speculative as we can get in software.
> > It does a lock-less walk of the page-tables. That's what the RCU free'd
> > page-table stuff is for to begin with.
> 
> Strictly speaking it's not :-) To *begin with* (as in the origin of that
> code) it comes from powerpc hash table code which walks the linux page
> tables locklessly :-) It then came in handy with gup_fast :-)

Ah, ok my bad.

> > > IOW, if Sparc/PPC really want to guarantee that they never fill TLB
> > > entries speculatively, and that if we are in a kernel thread they will
> > > *never* fill the TLB with anything else, then make them enable
> > > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig
> > > files. 
> > 
> > Since we've dealt with the speculative software side by using RCU-ish
> > stuff, the only thing that's left is hardware, now neither sparc64 nor
> > ppc actually know about the linux page-tables from what I understood,
> > they only look at their hash-table thing.
> 
> Some embedded ppc's know about the lowest level (SW loaded PMD) but
> that's not an issue here. We flush these special TLB entries
> specifically and synchronously in __pte_free_tlb().

OK, I missed that.. is that
arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

> > So even if the hardware did do speculative tlb fills, it would do them
> > from the hash-table, but that's already cleared out.
> 
> Right,

Phew at least I got the important thing right ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 10:55               ` Peter Zijlstra
  (?)
@ 2012-06-28 11:19                 ` Martin Schwidefsky
  -1 siblings, 0 replies; 120+ messages in thread
From: Martin Schwidefsky @ 2012-06-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 28 Jun 2012 12:55:04 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> > IOW, the point I'm trying to make is that even if there are zero
> > *actual* accesses of user space (because user space is dead, and the
> > kernel hopefully does no "get_user()/put_user()" stuff at this point
> > any more), the CPU may speculatively use user addresses for the
> > bog-standard kernel addresses that happen. 
> 
> Right.. and s390 having done this only says that s390 appears to be ok
> with it. Martin, does s390 hardware guarantee no speculative stuff like
> Linus explained, or might there even be a latent issue on s390?

The cpu can create speculative TLB entries, but only if it runs in the
mode that uses the respective mm. We have two mm's active at the same
time, the kernel mm (init_mm) and the user mm. While the cpu runs only
in kernel mode it is not allowed to create TLBs for the user mm.
While running in user mode it is allowed to speculatively create TLBs.
 
> But it looks like we cannot do this in general, and esp. ARM (as already
> noted by Catalin) has very aggressive speculative behaviour.
> 
> The alternative is that we do a switch_mm() to init_mm instead of the
> TLB flush. On x86 that should be about the same cost, but I've not
> looked at other architectures yet.
> 
> The second and least favourite alternative is of course special casing
> this for s390 if it turns out its a safe thing to do for them.
> 
> /me goes look through arch code.

Basically we have two special requirements on s390:
1) do not modify ptes while attached to another cpu except with the
   special IPTE / IDTE instructions
2) do a TLB flush before freeing any kind of page table page, s390
   needs a flush for pud, pmd & pte tables.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 11:19                 ` Martin Schwidefsky
  0 siblings, 0 replies; 120+ messages in thread
From: Martin Schwidefsky @ 2012-06-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike

On Thu, 28 Jun 2012 12:55:04 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> > IOW, the point I'm trying to make is that even if there are zero
> > *actual* accesses of user space (because user space is dead, and the
> > kernel hopefully does no "get_user()/put_user()" stuff at this point
> > any more), the CPU may speculatively use user addresses for the
> > bog-standard kernel addresses that happen. 
> 
> Right.. and s390 having done this only says that s390 appears to be ok
> with it. Martin, does s390 hardware guarantee no speculative stuff like
> Linus explained, or might there even be a latent issue on s390?

The cpu can create speculative TLB entries, but only if it runs in the
mode that uses the respective mm. We have two mm's active at the same
time, the kernel mm (init_mm) and the user mm. While the cpu runs only
in kernel mode it is not allowed to create TLBs for the user mm.
While running in user mode it is allowed to speculatively create TLBs.
 
> But it looks like we cannot do this in general, and esp. ARM (as already
> noted by Catalin) has very aggressive speculative behaviour.
> 
> The alternative is that we do a switch_mm() to init_mm instead of the
> TLB flush. On x86 that should be about the same cost, but I've not
> looked at other architectures yet.
> 
> The second and least favourite alternative is of course special casing
> this for s390 if it turns out its a safe thing to do for them.
> 
> /me goes look through arch code.

Basically we have two special requirements on s390:
1) do not modify ptes while attached to another cpu except with the
   special IPTE / IDTE instructions
2) do a TLB flush before freeing any kind of page table page, s390
   needs a flush for pud, pmd & pte tables.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 11:19                 ` Martin Schwidefsky
  0 siblings, 0 replies; 120+ messages in thread
From: Martin Schwidefsky @ 2012-06-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 28 Jun 2012 12:55:04 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2012-06-27 at 16:33 -0700, Linus Torvalds wrote:
> > IOW, the point I'm trying to make is that even if there are zero
> > *actual* accesses of user space (because user space is dead, and the
> > kernel hopefully does no "get_user()/put_user()" stuff at this point
> > any more), the CPU may speculatively use user addresses for the
> > bog-standard kernel addresses that happen. 
> 
> Right.. and s390 having done this only says that s390 appears to be ok
> with it. Martin, does s390 hardware guarantee no speculative stuff like
> Linus explained, or might there even be a latent issue on s390?

The cpu can create speculative TLB entries, but only if it runs in the
mode that uses the respective mm. We have two mm's active at the same
time, the kernel mm (init_mm) and the user mm. While the cpu runs only
in kernel mode it is not allowed to create TLBs for the user mm.
While running in user mode it is allowed to speculatively create TLBs.
 
> But it looks like we cannot do this in general, and esp. ARM (as already
> noted by Catalin) has very aggressive speculative behaviour.
> 
> The alternative is that we do a switch_mm() to init_mm instead of the
> TLB flush. On x86 that should be about the same cost, but I've not
> looked at other architectures yet.
> 
> The second and least favourite alternative is of course special casing
> this for s390 if it turns out its a safe thing to do for them.
> 
> /me goes look through arch code.

Basically we have two special requirements on s390:
1) do not modify ptes while attached to another cpu except with the
   special IPTE / IDTE instructions
2) do a TLB flush before freeing any kind of page table page, s390
   needs a flush for pud, pmd & pte tables.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 11:19                 ` Martin Schwidefsky
  (?)
@ 2012-06-28 11:30                   ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:30 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 13:19 +0200, Martin Schwidefsky wrote:

> The cpu can create speculative TLB entries, but only if it runs in the
> mode that uses the respective mm. We have two mm's active at the same
> time, the kernel mm (init_mm) and the user mm. While the cpu runs only
> in kernel mode it is not allowed to create TLBs for the user mm.
> While running in user mode it is allowed to speculatively create TLBs.

OK, that's neat.

> Basically we have two special requirements on s390:
> 1) do not modify ptes while attached to another cpu except with the
>    special IPTE / IDTE instructions

Right, and your fullmm case works by doing a global invalidate after all
threads have ceased userspace execution, this allows you to do away with
the IPTE/IDTE instructions since there's no other active cpus on the
userspace mm anymore.


> 2) do a TLB flush before freeing any kind of page table page, s390
>    needs a flush for pud, pmd & pte tables. 

Right, we do that (now)..

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 11:30                   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:30 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin

On Thu, 2012-06-28 at 13:19 +0200, Martin Schwidefsky wrote:

> The cpu can create speculative TLB entries, but only if it runs in the
> mode that uses the respective mm. We have two mm's active at the same
> time, the kernel mm (init_mm) and the user mm. While the cpu runs only
> in kernel mode it is not allowed to create TLBs for the user mm.
> While running in user mode it is allowed to speculatively create TLBs.

OK, that's neat.

> Basically we have two special requirements on s390:
> 1) do not modify ptes while attached to another cpu except with the
>    special IPTE / IDTE instructions

Right, and your fullmm case works by doing a global invalidate after all
threads have ceased userspace execution, this allows you to do away with
the IPTE/IDTE instructions since there's no other active cpus on the
userspace mm anymore.


> 2) do a TLB flush before freeing any kind of page table page, s390
>    needs a flush for pud, pmd & pte tables. 

Right, we do that (now)..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 11:30                   ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 11:30 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, Benjamin Herrenschmidt, David Miller,
	Russell King, Catalin Marinas, Chris Metcalf, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 13:19 +0200, Martin Schwidefsky wrote:

> The cpu can create speculative TLB entries, but only if it runs in the
> mode that uses the respective mm. We have two mm's active at the same
> time, the kernel mm (init_mm) and the user mm. While the cpu runs only
> in kernel mode it is not allowed to create TLBs for the user mm.
> While running in user mode it is allowed to speculatively create TLBs.

OK, that's neat.

> Basically we have two special requirements on s390:
> 1) do not modify ptes while attached to another cpu except with the
>    special IPTE / IDTE instructions

Right, and your fullmm case works by doing a global invalidate after all
threads have ceased userspace execution, this allows you to do away with
the IPTE/IDTE instructions since there's no other active cpus on the
userspace mm anymore.


> 2) do a TLB flush before freeing any kind of page table page, s390
>    needs a flush for pud, pmd & pte tables. 

Right, we do that (now)..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-28 11:05           ` Peter Zijlstra
  (?)
@ 2012-06-28 12:00             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 13:05 +0200, Peter Zijlstra wrote:
> 
> > Some embedded ppc's know about the lowest level (SW loaded PMD) but
> > that's not an issue here. We flush these special TLB entries
> > specifically and synchronously in __pte_free_tlb().
> 
> OK, I missed that.. is that
> arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

Yup.

> > > So even if the hardware did do speculative tlb fills, it would do
> them
> > > from the hash-table, but that's already cleared out.
> > 
> > Right,
> 
> Phew at least I got the important thing right ;-)

Yeah as long as we have that hash :-) The day we move on (if ever) it
will be as bad as ARM :-)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28 12:00             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike

On Thu, 2012-06-28 at 13:05 +0200, Peter Zijlstra wrote:
> 
> > Some embedded ppc's know about the lowest level (SW loaded PMD) but
> > that's not an issue here. We flush these special TLB entries
> > specifically and synchronously in __pte_free_tlb().
> 
> OK, I missed that.. is that
> arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

Yup.

> > > So even if the hardware did do speculative tlb fills, it would do
> them
> > > from the hash-table, but that's already cleared out.
> > 
> > Right,
> 
> Phew at least I got the important thing right ;-)

Yeah as long as we have that hash :-) The day we move on (if ever) it
will be as bad as ARM :-)

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-06-28 12:00             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, linux-mm,
	Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel, Hugh Dickins,
	Mel Gorman, Nick Piggin, Alex Shi, Nikunj A. Dadhania,
	Konrad Rzeszutek Wilk, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 13:05 +0200, Peter Zijlstra wrote:
> 
> > Some embedded ppc's know about the lowest level (SW loaded PMD) but
> > that's not an issue here. We flush these special TLB entries
> > specifically and synchronously in __pte_free_tlb().
> 
> OK, I missed that.. is that
> arch/powerpc/mm/tlb_nohash.c:tlb_flush_pgtable() ?

Yup.

> > > So even if the hardware did do speculative tlb fills, it would do
> them
> > > from the hash-table, but that's already cleared out.
> > 
> > Right,
> 
> Phew at least I got the important thing right ;-)

Yeah as long as we have that hash :-) The day we move on (if ever) it
will be as bad as ARM :-)

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 10:59                   ` Peter Zijlstra
@ 2012-06-28 14:53                     ` Catalin Marinas
  -1 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-28 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt

On Thu, Jun 28, 2012 at 11:59:56AM +0100, Peter Zijlstra wrote:
> On Thu, 2012-06-28 at 20:39 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> > > That's definitely an issue on ARM and it was hit on older kernels.
> > > Basically ARM processors can cache any page translation level in the
> > > TLB. We need to make sure that no page entry at any level (either cached
> > > in the TLB or not) points to an invalid next level table (hence the TLB
> > > shootdown). For example, in cases like free_pgd_range(), if the cached
> > > pgd entry points to an already freed pud/pmd table (pgd_clear is not
> > > enough) it may walk the page tables speculatively cache another entry in
> > > the TLB. Depending on the random data it reads from an old table page,
> > > it may find a global entry (it's just a bit in the pte) which is not
> > > tagged with an ASID (application specific id). A latter flush_tlb_mm()
> > > only flushes the current ASID and doesn't touch global entries (used
> > > only by kernel mappings). So we end up with global TLB entry in user
> > > space that overrides any other application mapping.
> > 
> > Right, that's the typical scenario. I haven't looked at your flush
> > implementation though, but surely you can defer the actual freeing so
> > you can batch them & limit the number of TLB flushes right ?
> 
> Yes they do.. its just the up-front TLB invalidate for fullmm that's a
> problem.

The upfront invalidate is fine (i.e. harmless), it's the tlb_flush_mmu()
change to check for !tlb->fullmm that's not helpful on ARM.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 14:53                     ` Catalin Marinas
  0 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-28 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, Jun 28, 2012 at 11:59:56AM +0100, Peter Zijlstra wrote:
> On Thu, 2012-06-28 at 20:39 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 10:16 +0100, Catalin Marinas wrote:
> > > That's definitely an issue on ARM and it was hit on older kernels.
> > > Basically ARM processors can cache any page translation level in the
> > > TLB. We need to make sure that no page entry at any level (either cached
> > > in the TLB or not) points to an invalid next level table (hence the TLB
> > > shootdown). For example, in cases like free_pgd_range(), if the cached
> > > pgd entry points to an already freed pud/pmd table (pgd_clear is not
> > > enough) it may walk the page tables speculatively cache another entry in
> > > the TLB. Depending on the random data it reads from an old table page,
> > > it may find a global entry (it's just a bit in the pte) which is not
> > > tagged with an ASID (application specific id). A latter flush_tlb_mm()
> > > only flushes the current ASID and doesn't touch global entries (used
> > > only by kernel mappings). So we end up with global TLB entry in user
> > > space that overrides any other application mapping.
> > 
> > Right, that's the typical scenario. I haven't looked at your flush
> > implementation though, but surely you can defer the actual freeing so
> > you can batch them & limit the number of TLB flushes right ?
> 
> Yes they do.. its just the up-front TLB invalidate for fullmm that's a
> problem.

The upfront invalidate is fine (i.e. harmless), it's the tlb_flush_mmu()
change to check for !tlb->fullmm that's not helpful on ARM.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 11:30                   ` Peter Zijlstra
@ 2012-06-28 16:00                     ` Avi Kivity
  -1 siblings, 0 replies; 120+ messages in thread
From: Avi Kivity @ 2012-06-28 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Martin Schwidefsky, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Tony Luck

On 06/28/2012 02:30 PM, Peter Zijlstra wrote:
> On Thu, 2012-06-28 at 13:19 +0200, Martin Schwidefsky wrote:
> 
>> The cpu can create speculative TLB entries, but only if it runs in the
>> mode that uses the respective mm. We have two mm's active at the same
>> time, the kernel mm (init_mm) and the user mm. While the cpu runs only
>> in kernel mode it is not allowed to create TLBs for the user mm.
>> While running in user mode it is allowed to speculatively create TLBs.
> 
> OK, that's neat.

Note that we can do that for x86 now using the new PCID feature.
Basically you get a tagged TLB, so you can switch between the
kernel-only address space and the kernel+user address space quickly.

It's still going to be slower than what we do now, but it might please
some security people if the kernel can't accidentally access user data.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 16:00                     ` Avi Kivity
  0 siblings, 0 replies; 120+ messages in thread
From: Avi Kivity @ 2012-06-28 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Martin Schwidefsky, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Tony Luck, Paul Mundt, Jeff Dike,
	Richard Weinberger, Ralf Baechle, Kyle McMartin, James Bottomley,
	Chris Zankel

On 06/28/2012 02:30 PM, Peter Zijlstra wrote:
> On Thu, 2012-06-28 at 13:19 +0200, Martin Schwidefsky wrote:
> 
>> The cpu can create speculative TLB entries, but only if it runs in the
>> mode that uses the respective mm. We have two mm's active at the same
>> time, the kernel mm (init_mm) and the user mm. While the cpu runs only
>> in kernel mode it is not allowed to create TLBs for the user mm.
>> While running in user mode it is allowed to speculatively create TLBs.
> 
> OK, that's neat.

Note that we can do that for x86 now using the new PCID feature.
Basically you get a tagged TLB, so you can switch between the
kernel-only address space and the kernel+user address space quickly.

It's still going to be slower than what we do now, but it might please
some security people if the kernel can't accidentally access user data.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 14:53                     ` Catalin Marinas
@ 2012-06-28 16:20                       ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:20 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff

On Thu, 2012-06-28 at 15:53 +0100, Catalin Marinas wrote:

> > Yes they do.. its just the up-front TLB invalidate for fullmm that's a
> > problem.
> 
> The upfront invalidate is fine (i.e. harmless), it's the tlb_flush_mmu()
> change to check for !tlb->fullmm that's not helpful on ARM.

I think we're saying the same but differently. The point is that the
flush up front isn't sufficient for most of us.

Also, we'd very much want to avoid superfluous flushes since they are
somewhat expensive.

How horrid is something like the below. It detaches the mm so that
hardware speculation simply doesn't matter.

Now the switch_mm should imply the same cache+TBL flush we'd otherwise
do, and I'd think that that would be the majority of the cost. Am I
wrong there?

Also, the below seems to leak mm_structs so I did mess up the
ref-counting, its too bloody hot here.



---
 mm/memory.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 4 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -65,6 +65,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/pgtable.h>
+#include <asm/mmu_context.h>
 
 #include "internal.h"
 
@@ -197,6 +198,33 @@ static int tlb_next_batch(struct mmu_gat
 	return 1;
 }
 
+/*
+ * Anonymize the task by detaching the mm and attaching it
+ * to the init_mm.
+ */
+static void detach_mm(struct mm_struct *mm, struct task_struct *tsk)
+{
+	/*
+	 * We should only be called when there's no users left and we're
+	 * destroying the mm.
+	 */
+	VM_BUG_ON(atomic_read(&mm->mm_users));
+	VM_BUG_ON(tsk->mm != mm);
+	VM_BUG_ON(mm == &init_mm);
+
+	task_lock(tsk);
+	tsk->mm = NULL;
+	tsk->active_mm = &init_mm;
+	switch_mm(mm, &init_mm, tsk);
+	/*
+	 * We have to take an extra ref on init_mm for TASK_DEAD in
+	 * finish_task_switch(), we don't drop our mm->mm_count reference
+	 * since mmput() will do this.
+	 */
+	atomic_inc(&init_mm.mm_count);
+	task_unlock(tsk);
+}
+
 /* tlb_gather_mmu
  *	Called to initialize an (on-stack) mmu_gather structure for page-table
  *	tear-down from @mm. The @fullmm argument is used when @mm is without
@@ -215,16 +243,31 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->active     = &tlb->local;
 
 	tlb_table_init(tlb);
+
+	if (fullmm && current->mm == mm) {
+		/*
+		 * Instead of doing:
+		 *
+		 *  flush_cache_mm(mm);
+		 *  flush_tlb_mm(mm);
+		 *
+		 * We switch to init_mm, this context switch should imply both
+		 * the cache and TLB flush as well as guarantee that hardware
+		 * speculation cannot load TLBs on this mm anymore.
+		 */
+		detach_mm(mm, current);
+	}
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	flush_tlb_mm(tlb->mm);
+	if (!tlb->fullmm && tlb->need_flush) {
+		tlb->need_flush = 0;
+		flush_tlb_mm(tlb->mm);
+	}
+
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 16:20                       ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:20 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 15:53 +0100, Catalin Marinas wrote:

> > Yes they do.. its just the up-front TLB invalidate for fullmm that's a
> > problem.
> 
> The upfront invalidate is fine (i.e. harmless), it's the tlb_flush_mmu()
> change to check for !tlb->fullmm that's not helpful on ARM.

I think we're saying the same but differently. The point is that the
flush up front isn't sufficient for most of us.

Also, we'd very much want to avoid superfluous flushes since they are
somewhat expensive.

How horrid is something like the below. It detaches the mm so that
hardware speculation simply doesn't matter.

Now the switch_mm should imply the same cache+TBL flush we'd otherwise
do, and I'd think that that would be the majority of the cost. Am I
wrong there?

Also, the below seems to leak mm_structs so I did mess up the
ref-counting, its too bloody hot here.



---
 mm/memory.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 4 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -65,6 +65,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/pgtable.h>
+#include <asm/mmu_context.h>
 
 #include "internal.h"
 
@@ -197,6 +198,33 @@ static int tlb_next_batch(struct mmu_gat
 	return 1;
 }
 
+/*
+ * Anonymize the task by detaching the mm and attaching it
+ * to the init_mm.
+ */
+static void detach_mm(struct mm_struct *mm, struct task_struct *tsk)
+{
+	/*
+	 * We should only be called when there's no users left and we're
+	 * destroying the mm.
+	 */
+	VM_BUG_ON(atomic_read(&mm->mm_users));
+	VM_BUG_ON(tsk->mm != mm);
+	VM_BUG_ON(mm == &init_mm);
+
+	task_lock(tsk);
+	tsk->mm = NULL;
+	tsk->active_mm = &init_mm;
+	switch_mm(mm, &init_mm, tsk);
+	/*
+	 * We have to take an extra ref on init_mm for TASK_DEAD in
+	 * finish_task_switch(), we don't drop our mm->mm_count reference
+	 * since mmput() will do this.
+	 */
+	atomic_inc(&init_mm.mm_count);
+	task_unlock(tsk);
+}
+
 /* tlb_gather_mmu
  *	Called to initialize an (on-stack) mmu_gather structure for page-table
  *	tear-down from @mm. The @fullmm argument is used when @mm is without
@@ -215,16 +243,31 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->active     = &tlb->local;
 
 	tlb_table_init(tlb);
+
+	if (fullmm && current->mm == mm) {
+		/*
+		 * Instead of doing:
+		 *
+		 *  flush_cache_mm(mm);
+		 *  flush_tlb_mm(mm);
+		 *
+		 * We switch to init_mm, this context switch should imply both
+		 * the cache and TLB flush as well as guarantee that hardware
+		 * speculation cannot load TLBs on this mm anymore.
+		 */
+		detach_mm(mm, current);
+	}
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
-	if (!tlb->need_flush)
-		return;
-	tlb->need_flush = 0;
-	flush_tlb_mm(tlb->mm);
+	if (!tlb->fullmm && tlb->need_flush) {
+		tlb->need_flush = 0;
+		flush_tlb_mm(tlb->mm);
+	}
+
 	tlb_table_flush(tlb);
 
 	if (tlb_fast_mode(tlb))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 16:20                       ` Peter Zijlstra
@ 2012-06-28 16:38                         ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:38 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff

On Thu, 2012-06-28 at 18:20 +0200, Peter Zijlstra wrote:
> Now the switch_mm should imply the same cache+TBL flush we'd otherwise
> do, and I'd think that that would be the majority of the cost. Am I
> wrong there? 

The advantage of doing this is that you don't need any of the batching
and possibly multiple invalidate nonsense you otherwise need. So it
might still be an over-all win, even if the switch is slightly more
expensive than a regular flush. Simply because you can avoid most (if
not all) the usual complexities.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 16:38                         ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:38 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Benjamin Herrenschmidt, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 18:20 +0200, Peter Zijlstra wrote:
> Now the switch_mm should imply the same cache+TBL flush we'd otherwise
> do, and I'd think that that would be the majority of the cost. Am I
> wrong there? 

The advantage of doing this is that you don't need any of the batching
and possibly multiple invalidate nonsense you otherwise need. So it
might still be an over-all win, even if the switch is slightly more
expensive than a regular flush. Simply because you can avoid most (if
not all) the usual complexities.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 16:20                       ` Peter Zijlstra
@ 2012-06-28 16:45                         ` Linus Torvalds
  -1 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-28 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Benjamin Herrenschmidt, linux-kernel,
	linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff

On Thu, Jun 28, 2012 at 9:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> How horrid is something like the below. It detaches the mm so that
> hardware speculation simply doesn't matter.

Actually, that's wrong. Even when detached, kernel threads may still
use that mm lazily. Now, that only happens on other CPU's (if any
scheduling happens on *this* CPU, they will lazily take the mm of the
thread it scheduled away from), but even if you detach the VM that
doesn't mean that hardware speculation wouldn't matter. Kernel threads
on other CPU's may still be doing TLB accesses.

Of course, I *think* that if we do an IPI on the thing, we also kick
those kernel threads out of using that mm. So it may actually work if
you also do that explicit TLB flush to make sure other CPU's don't
have this MM. I don't think switch_mm() does that for you, it only
does a local-cpu invalidate.

I didn't look at the code, though. Maybe I'm wrong in thinking that
you are wrong.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 16:45                         ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2012-06-28 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Benjamin Herrenschmidt, linux-kernel,
	linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, Jun 28, 2012 at 9:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> How horrid is something like the below. It detaches the mm so that
> hardware speculation simply doesn't matter.

Actually, that's wrong. Even when detached, kernel threads may still
use that mm lazily. Now, that only happens on other CPU's (if any
scheduling happens on *this* CPU, they will lazily take the mm of the
thread it scheduled away from), but even if you detach the VM that
doesn't mean that hardware speculation wouldn't matter. Kernel threads
on other CPU's may still be doing TLB accesses.

Of course, I *think* that if we do an IPI on the thing, we also kick
those kernel threads out of using that mm. So it may actually work if
you also do that explicit TLB flush to make sure other CPU's don't
have this MM. I don't think switch_mm() does that for you, it only
does a local-cpu invalidate.

I didn't look at the code, though. Maybe I'm wrong in thinking that
you are wrong.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 16:45                         ` Linus Torvalds
@ 2012-06-28 16:52                           ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Catalin Marinas, Benjamin Herrenschmidt, linux-kernel,
	linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike

On Thu, 2012-06-28 at 09:45 -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2012 at 9:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > How horrid is something like the below. It detaches the mm so that
> > hardware speculation simply doesn't matter.
> 
> Actually, that's wrong. Even when detached, kernel threads may still
> use that mm lazily. Now, that only happens on other CPU's (if any
> scheduling happens on *this* CPU, they will lazily take the mm of the
> thread it scheduled away from), but even if you detach the VM that
> doesn't mean that hardware speculation wouldn't matter. Kernel threads
> on other CPU's may still be doing TLB accesses.
> 
> Of course, I *think* that if we do an IPI on the thing, we also kick
> those kernel threads out of using that mm. So it may actually work if
> you also do that explicit TLB flush to make sure other CPU's don't
> have this MM. I don't think switch_mm() does that for you, it only
> does a local-cpu invalidate.
> 
> I didn't look at the code, though. Maybe I'm wrong in thinking that
> you are wrong.

No I think you're right (as always).. also an IPI will not force
schedule the thread that might be running on the receiving cpu, also
we'd have to wait for any such schedule to complete in order to
guarantee the mm isn't lazily used anymore.

Bugger.. it would've been nice to do this. I guess I'd better go special
case s390 for now until we can come up with something that would work.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 16:52                           ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Catalin Marinas, Benjamin Herrenschmidt, linux-kernel,
	linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar, akpm,
	Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 09:45 -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2012 at 9:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > How horrid is something like the below. It detaches the mm so that
> > hardware speculation simply doesn't matter.
> 
> Actually, that's wrong. Even when detached, kernel threads may still
> use that mm lazily. Now, that only happens on other CPU's (if any
> scheduling happens on *this* CPU, they will lazily take the mm of the
> thread it scheduled away from), but even if you detach the VM that
> doesn't mean that hardware speculation wouldn't matter. Kernel threads
> on other CPU's may still be doing TLB accesses.
> 
> Of course, I *think* that if we do an IPI on the thing, we also kick
> those kernel threads out of using that mm. So it may actually work if
> you also do that explicit TLB flush to make sure other CPU's don't
> have this MM. I don't think switch_mm() does that for you, it only
> does a local-cpu invalidate.
> 
> I didn't look at the code, though. Maybe I'm wrong in thinking that
> you are wrong.

No I think you're right (as always).. also an IPI will not force
schedule the thread that might be running on the receiving cpu, also
we'd have to wait for any such schedule to complete in order to
guarantee the mm isn't lazily used anymore.

Bugger.. it would've been nice to do this. I guess I'd better go special
case s390 for now until we can come up with something that would work.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 14/20] mm, sh: Convert sh to generic tlb
  2012-06-27 21:15   ` Peter Zijlstra
@ 2012-06-28 18:32     ` Paul Mundt
  -1 siblings, 0 replies; 120+ messages in thread
From: Paul Mundt @ 2012-06-28 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Jeff Dike

On Wed, Jun 27, 2012 at 11:15:54PM +0200, Peter Zijlstra wrote:
> Cc: Paul Mundt <lethal@linux-sh.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/sh/Kconfig           |    1 
>  arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
>  2 files changed, 6 insertions(+), 93 deletions(-)

This blows up in the same way as last time.

I direct you to the same bug report and patch as before:

http://marc.info/?l=linux-kernel&m=133722116507075&w=2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 14/20] mm, sh: Convert sh to generic tlb
@ 2012-06-28 18:32     ` Paul Mundt
  0 siblings, 0 replies; 120+ messages in thread
From: Paul Mundt @ 2012-06-28 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Jeff Dike, Richard Weinberger, Hans-Christian Egtvedt,
	Ralf Baechle, Kyle McMartin, James Bottomley, Chris Zankel

On Wed, Jun 27, 2012 at 11:15:54PM +0200, Peter Zijlstra wrote:
> Cc: Paul Mundt <lethal@linux-sh.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/sh/Kconfig           |    1 
>  arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
>  2 files changed, 6 insertions(+), 93 deletions(-)

This blows up in the same way as last time.

I direct you to the same bug report and patch as before:

http://marc.info/?l=linux-kernel&m=133722116507075&w=2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 14/20] mm, sh: Convert sh to generic tlb
  2012-06-28 18:32     ` Paul Mundt
@ 2012-06-28 20:27       ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 20:27 UTC (permalink / raw)
  To: Paul Mundt
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Jeff Dike, Richard Weinberger, Hans-Christian Egtvedt

On Fri, 2012-06-29 at 03:32 +0900, Paul Mundt wrote:
> On Wed, Jun 27, 2012 at 11:15:54PM +0200, Peter Zijlstra wrote:
> > Cc: Paul Mundt <lethal@linux-sh.org>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  arch/sh/Kconfig           |    1 
> >  arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
> >  2 files changed, 6 insertions(+), 93 deletions(-)
> 
> This blows up in the same way as last time.
> 
> I direct you to the same bug report and patch as before:
> 
> http://marc.info/?l=linux-kernel&m=133722116507075&w=2

Sorry about that.. /me goes amend.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 14/20] mm, sh: Convert sh to generic tlb
@ 2012-06-28 20:27       ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-28 20:27 UTC (permalink / raw)
  To: Paul Mundt
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Linus Torvalds, Rik van Riel, Hugh Dickins, Mel Gorman,
	Nick Piggin, Alex Shi, Nikunj A. Dadhania, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, David Miller, Russell King,
	Catalin Marinas, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Jeff Dike, Richard Weinberger, Hans-Christian Egtvedt,
	Ralf Baechle, Kyle McMartin, James Bottomley, Chris Zankel

On Fri, 2012-06-29 at 03:32 +0900, Paul Mundt wrote:
> On Wed, Jun 27, 2012 at 11:15:54PM +0200, Peter Zijlstra wrote:
> > Cc: Paul Mundt <lethal@linux-sh.org>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  arch/sh/Kconfig           |    1 
> >  arch/sh/include/asm/tlb.h |   98 ++--------------------------------------------
> >  2 files changed, 6 insertions(+), 93 deletions(-)
> 
> This blows up in the same way as last time.
> 
> I direct you to the same bug report and patch as before:
> 
> http://marc.info/?l=linux-kernel&m=133722116507075&w=2

Sorry about that.. /me goes amend.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 16:52                           ` Peter Zijlstra
@ 2012-06-28 21:57                             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 21:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin

On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> No I think you're right (as always).. also an IPI will not force
> schedule the thread that might be running on the receiving cpu, also
> we'd have to wait for any such schedule to complete in order to
> guarantee the mm isn't lazily used anymore.
> 
> Bugger.. 

You can still do it if the mm count is 1 no ? Ie, current is the last
holder of a reference to the mm struct... which will probably be the
common case for short lived programs.

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 21:57                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 21:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> No I think you're right (as always).. also an IPI will not force
> schedule the thread that might be running on the receiving cpu, also
> we'd have to wait for any such schedule to complete in order to
> guarantee the mm isn't lazily used anymore.
> 
> Bugger.. 

You can still do it if the mm count is 1 no ? Ie, current is the last
holder of a reference to the mm struct... which will probably be the
common case for short lived programs.

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 21:57                             ` Benjamin Herrenschmidt
@ 2012-06-28 21:58                               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 21:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin

On Fri, 2012-06-29 at 07:57 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > No I think you're right (as always).. also an IPI will not force
> > schedule the thread that might be running on the receiving cpu, also
> > we'd have to wait for any such schedule to complete in order to
> > guarantee the mm isn't lazily used anymore.
> > 
> > Bugger.. 
> 
> You can still do it if the mm count is 1 no ? Ie, current is the last
> holder of a reference to the mm struct... which will probably be the
> common case for short lived programs.

Also I just remembered... x86 flushes in SMP via IPIs right ? So maybe
you can invent a "detach and flush" variant of it ? 

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-28 21:58                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-28 21:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Fri, 2012-06-29 at 07:57 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > No I think you're right (as always).. also an IPI will not force
> > schedule the thread that might be running on the receiving cpu, also
> > we'd have to wait for any such schedule to complete in order to
> > guarantee the mm isn't lazily used anymore.
> > 
> > Bugger.. 
> 
> You can still do it if the mm count is 1 no ? Ie, current is the last
> holder of a reference to the mm struct... which will probably be the
> common case for short lived programs.

Also I just remembered... x86 flushes in SMP via IPIs right ? So maybe
you can invent a "detach and flush" variant of it ? 

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 21:58                               ` Benjamin Herrenschmidt
@ 2012-06-29  8:49                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-29  8:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike

On Fri, 2012-06-29 at 07:58 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-06-29 at 07:57 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > > No I think you're right (as always).. also an IPI will not force
> > > schedule the thread that might be running on the receiving cpu, also
> > > we'd have to wait for any such schedule to complete in order to
> > > guarantee the mm isn't lazily used anymore.
> > > 
> > > Bugger.. 
> > 
> > You can still do it if the mm count is 1 no ? Ie, current is the last
> > holder of a reference to the mm struct... which will probably be the
> > common case for short lived programs.
> 
> Also I just remembered... x86 flushes in SMP via IPIs right ? So maybe
> you can invent a "detach and flush" variant of it ? 

Its not just x86 I worry about.. I want to share as much as possible
between all our architectures.

But yeah, I could do it for mm_count == 1, but I'd still need to special
case s390 because they always want it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-29  8:49                                 ` Peter Zijlstra
  0 siblings, 0 replies; 120+ messages in thread
From: Peter Zijlstra @ 2012-06-29  8:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Catalin Marinas, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Fri, 2012-06-29 at 07:58 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-06-29 at 07:57 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > > No I think you're right (as always).. also an IPI will not force
> > > schedule the thread that might be running on the receiving cpu, also
> > > we'd have to wait for any such schedule to complete in order to
> > > guarantee the mm isn't lazily used anymore.
> > > 
> > > Bugger.. 
> > 
> > You can still do it if the mm count is 1 no ? Ie, current is the last
> > holder of a reference to the mm struct... which will probably be the
> > common case for short lived programs.
> 
> Also I just remembered... x86 flushes in SMP via IPIs right ? So maybe
> you can invent a "detach and flush" variant of it ? 

Its not just x86 I worry about.. I want to share as much as possible
between all our architectures.

But yeah, I could do it for mm_count == 1, but I'd still need to special
case s390 because they always want it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-28 21:57                             ` Benjamin Herrenschmidt
@ 2012-06-29 15:26                               ` Catalin Marinas
  -1 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-29 15:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt

On Thu, Jun 28, 2012 at 10:57:21PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > No I think you're right (as always).. also an IPI will not force
> > schedule the thread that might be running on the receiving cpu, also
> > we'd have to wait for any such schedule to complete in order to
> > guarantee the mm isn't lazily used anymore.
> > 
> > Bugger.. 
> 
> You can still do it if the mm count is 1 no ? Ie, current is the last
> holder of a reference to the mm struct... which will probably be the
> common case for short lived programs.

BTW, can we not move the free_pgtables() call in exit_mmap() to
__mmdrop()? Something like below but I'm not entirely sure about its
implications:


diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..507ee9f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1372,6 +1372,7 @@ extern void unlink_file_vma(struct vm_area_struct *);
 extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
 	unsigned long addr, unsigned long len, pgoff_t pgoff);
 extern void exit_mmap(struct mm_struct *);
+extern void exit_pgtables(struct mm_struct *mm);
 
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index ab5211b..3412b1a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -588,6 +588,7 @@ struct mm_struct *mm_alloc(void)
 void __mmdrop(struct mm_struct *mm)
 {
 	BUG_ON(mm == &init_mm);
+	exit_pgtables(mm);
 	mm_free_pgd(mm);
 	destroy_context(mm);
 	mmu_notifier_mm_destroy(mm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 074b487..d9ebfdb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2269,7 +2269,6 @@ void exit_mmap(struct mm_struct *mm)
 {
 	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
-	unsigned long nr_accounted = 0;
 
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
@@ -2291,11 +2290,23 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 1);
+	tlb_gather_mmu(&tlb, mm, 0);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
+	tlb_finish_mmu(&tlb, 0, -1);
+}
+
+void exit_pgtables(struct mm_struct *mm)
+{
+	struct mmu_gather tlb;
+	struct vm_area_struct *vma;
+	unsigned long nr_accounted = 0;
 
+	vma = mm->mmap;
+	if (!vma)
+		return;
+	tlb_gather_mmu(&tlb, mm, 1);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
 	tlb_finish_mmu(&tlb, 0, -1);
 

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-29 15:26                               ` Catalin Marinas
  0 siblings, 0 replies; 120+ messages in thread
From: Catalin Marinas @ 2012-06-29 15:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Thu, Jun 28, 2012 at 10:57:21PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > No I think you're right (as always).. also an IPI will not force
> > schedule the thread that might be running on the receiving cpu, also
> > we'd have to wait for any such schedule to complete in order to
> > guarantee the mm isn't lazily used anymore.
> > 
> > Bugger.. 
> 
> You can still do it if the mm count is 1 no ? Ie, current is the last
> holder of a reference to the mm struct... which will probably be the
> common case for short lived programs.

BTW, can we not move the free_pgtables() call in exit_mmap() to
__mmdrop()? Something like below but I'm not entirely sure about its
implications:


diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..507ee9f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1372,6 +1372,7 @@ extern void unlink_file_vma(struct vm_area_struct *);
 extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
 	unsigned long addr, unsigned long len, pgoff_t pgoff);
 extern void exit_mmap(struct mm_struct *);
+extern void exit_pgtables(struct mm_struct *mm);
 
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index ab5211b..3412b1a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -588,6 +588,7 @@ struct mm_struct *mm_alloc(void)
 void __mmdrop(struct mm_struct *mm)
 {
 	BUG_ON(mm == &init_mm);
+	exit_pgtables(mm);
 	mm_free_pgd(mm);
 	destroy_context(mm);
 	mmu_notifier_mm_destroy(mm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 074b487..d9ebfdb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2269,7 +2269,6 @@ void exit_mmap(struct mm_struct *mm)
 {
 	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
-	unsigned long nr_accounted = 0;
 
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
@@ -2291,11 +2290,23 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 1);
+	tlb_gather_mmu(&tlb, mm, 0);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
+	tlb_finish_mmu(&tlb, 0, -1);
+}
+
+void exit_pgtables(struct mm_struct *mm)
+{
+	struct mmu_gather tlb;
+	struct vm_area_struct *vma;
+	unsigned long nr_accounted = 0;
 
+	vma = mm->mmap;
+	if (!vma)
+		return;
+	tlb_gather_mmu(&tlb, mm, 1);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
 	tlb_finish_mmu(&tlb, 0, -1);
 

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
  2012-06-29 15:26                               ` Catalin Marinas
@ 2012-06-29 22:11                                 ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-29 22:11 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf

On Fri, 2012-06-29 at 16:26 +0100, Catalin Marinas wrote:
> On Thu, Jun 28, 2012 at 10:57:21PM +0100, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > > No I think you're right (as always).. also an IPI will not force
> > > schedule the thread that might be running on the receiving cpu, also
> > > we'd have to wait for any such schedule to complete in order to
> > > guarantee the mm isn't lazily used anymore.
> > > 
> > > Bugger.. 
> > 
> > You can still do it if the mm count is 1 no ? Ie, current is the last
> > holder of a reference to the mm struct... which will probably be the
> > common case for short lived programs.
> 
> BTW, can we not move the free_pgtables() call in exit_mmap() to
> __mmdrop()? Something like below but I'm not entirely sure about its
> implications:

The main one is that it might remain active on another core for a
-loooong- time if that cores is only running kernel threads or otherwise
idle, thus wasting memory etc...

Also, mm_count being 1 is probably the common case for many short lived
processes, so it should be fine, I don't think the count can every
increase back at that point can it ? (we could make sure it doesn't,
mark the mm as dead and WARN loudly if somebody tries to increase the
count).

The advantage of doing a "detach & flush" IPI if the count is larger is
that you already do the IPI for flushing anyway, so you just add a
detach to the path.

That avoids the problem of the mm staying around for too long as well.

Cheers,
Ben.

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b36d08c..507ee9f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1372,6 +1372,7 @@ extern void unlink_file_vma(struct vm_area_struct *);
>  extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
>  	unsigned long addr, unsigned long len, pgoff_t pgoff);
>  extern void exit_mmap(struct mm_struct *);
> +extern void exit_pgtables(struct mm_struct *mm);
>  
>  extern int mm_take_all_locks(struct mm_struct *mm);
>  extern void mm_drop_all_locks(struct mm_struct *mm);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ab5211b..3412b1a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -588,6 +588,7 @@ struct mm_struct *mm_alloc(void)
>  void __mmdrop(struct mm_struct *mm)
>  {
>  	BUG_ON(mm == &init_mm);
> +	exit_pgtables(mm);
>  	mm_free_pgd(mm);
>  	destroy_context(mm);
>  	mmu_notifier_mm_destroy(mm);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 074b487..d9ebfdb 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2269,7 +2269,6 @@ void exit_mmap(struct mm_struct *mm)
>  {
>  	struct mmu_gather tlb;
>  	struct vm_area_struct *vma;
> -	unsigned long nr_accounted = 0;
>  
>  	/* mm's last user has gone, and its about to be pulled down */
>  	mmu_notifier_release(mm);
> @@ -2291,11 +2290,23 @@ void exit_mmap(struct mm_struct *mm)
>  
>  	lru_add_drain();
>  	flush_cache_mm(mm);
> -	tlb_gather_mmu(&tlb, mm, 1);
> +	tlb_gather_mmu(&tlb, mm, 0);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
>  	unmap_vmas(&tlb, vma, 0, -1);
> +	tlb_finish_mmu(&tlb, 0, -1);
> +}
> +
> +void exit_pgtables(struct mm_struct *mm)
> +{
> +	struct mmu_gather tlb;
> +	struct vm_area_struct *vma;
> +	unsigned long nr_accounted = 0;
>  
> +	vma = mm->mmap;
> +	if (!vma)
> +		return;
> +	tlb_gather_mmu(&tlb, mm, 1);
>  	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
>  	tlb_finish_mmu(&tlb, 0, -1);
>  
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 08/20] mm: Optimize fullmm TLB flushing
@ 2012-06-29 22:11                                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-29 22:11 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, linux-arch,
	linux-mm, Thomas Gleixner, Ingo Molnar, akpm, Rik van Riel,
	Hugh Dickins, Mel Gorman, Nick Piggin, Alex Shi,
	Nikunj A. Dadhania, Konrad Rzeszutek Wilk, David Miller,
	Russell King, Chris Metcalf, Martin Schwidefsky, Tony Luck,
	Paul Mundt, Jeff Dike, Richard Weinberger, Ralf Baechle,
	Kyle McMartin, James Bottomley, Chris Zankel

On Fri, 2012-06-29 at 16:26 +0100, Catalin Marinas wrote:
> On Thu, Jun 28, 2012 at 10:57:21PM +0100, Benjamin Herrenschmidt wrote:
> > On Thu, 2012-06-28 at 18:52 +0200, Peter Zijlstra wrote:
> > > No I think you're right (as always).. also an IPI will not force
> > > schedule the thread that might be running on the receiving cpu, also
> > > we'd have to wait for any such schedule to complete in order to
> > > guarantee the mm isn't lazily used anymore.
> > > 
> > > Bugger.. 
> > 
> > You can still do it if the mm count is 1 no ? Ie, current is the last
> > holder of a reference to the mm struct... which will probably be the
> > common case for short lived programs.
> 
> BTW, can we not move the free_pgtables() call in exit_mmap() to
> __mmdrop()? Something like below but I'm not entirely sure about its
> implications:

The main one is that it might remain active on another core for a
-loooong- time if that cores is only running kernel threads or otherwise
idle, thus wasting memory etc...

Also, mm_count being 1 is probably the common case for many short lived
processes, so it should be fine, I don't think the count can every
increase back at that point can it ? (we could make sure it doesn't,
mark the mm as dead and WARN loudly if somebody tries to increase the
count).

The advantage of doing a "detach & flush" IPI if the count is larger is
that you already do the IPI for flushing anyway, so you just add a
detach to the path.

That avoids the problem of the mm staying around for too long as well.

Cheers,
Ben.

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b36d08c..507ee9f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1372,6 +1372,7 @@ extern void unlink_file_vma(struct vm_area_struct *);
>  extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
>  	unsigned long addr, unsigned long len, pgoff_t pgoff);
>  extern void exit_mmap(struct mm_struct *);
> +extern void exit_pgtables(struct mm_struct *mm);
>  
>  extern int mm_take_all_locks(struct mm_struct *mm);
>  extern void mm_drop_all_locks(struct mm_struct *mm);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ab5211b..3412b1a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -588,6 +588,7 @@ struct mm_struct *mm_alloc(void)
>  void __mmdrop(struct mm_struct *mm)
>  {
>  	BUG_ON(mm == &init_mm);
> +	exit_pgtables(mm);
>  	mm_free_pgd(mm);
>  	destroy_context(mm);
>  	mmu_notifier_mm_destroy(mm);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 074b487..d9ebfdb 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2269,7 +2269,6 @@ void exit_mmap(struct mm_struct *mm)
>  {
>  	struct mmu_gather tlb;
>  	struct vm_area_struct *vma;
> -	unsigned long nr_accounted = 0;
>  
>  	/* mm's last user has gone, and its about to be pulled down */
>  	mmu_notifier_release(mm);
> @@ -2291,11 +2290,23 @@ void exit_mmap(struct mm_struct *mm)
>  
>  	lru_add_drain();
>  	flush_cache_mm(mm);
> -	tlb_gather_mmu(&tlb, mm, 1);
> +	tlb_gather_mmu(&tlb, mm, 0);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
>  	unmap_vmas(&tlb, vma, 0, -1);
> +	tlb_finish_mmu(&tlb, 0, -1);
> +}
> +
> +void exit_pgtables(struct mm_struct *mm)
> +{
> +	struct mmu_gather tlb;
> +	struct vm_area_struct *vma;
> +	unsigned long nr_accounted = 0;
>  
> +	vma = mm->mmap;
> +	if (!vma)
> +		return;
> +	tlb_gather_mmu(&tlb, mm, 1);
>  	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
>  	tlb_finish_mmu(&tlb, 0, -1);
>  
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
  2012-06-27 23:01       ` Peter Zijlstra
  (?)
@ 2012-07-24  5:12         ` Nikunj A Dadhania
  -1 siblings, 0 replies; 120+ messages in thread
From: Nikunj A Dadhania @ 2012-07-24  5:12 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Konrad Rzeszutek Wilk, Benjamin Herrenschmidt,
	David Miller, Russell King, Catalin Marinas, Chris Metcalf,
	Martin Schwidefsky, Tony Luck, Paul Mundt, Jeff Dike,
	Richard Weinberger, Ralf Baechle, Kyle McMartin, James Bottomley,
	Chris Zankel

On Thu, 28 Jun 2012 01:01:46 +0200, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}

Hi Peter,

When running munmap(https://lkml.org/lkml/2012/5/17/59) test with KVM
and pvflush patches I got a crash. I have verified that the crash
happens on the base(non virt) as well when I have
CONFIG_HAVE_RCU_TABLE_FREE defined. Here is the crash details and my
analysis below:

-----------------------------------------------------------------------

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
PGD 0 
Oops: 0002 [#1] SMP 
CPU 24 
Modules linked in: kvm_intel kvm [last unloaded: scsi_wait_scan]


Pid: 32643, comm: munmap Not tainted 3.5.0-rc7+ #46 IBM System x3850 X5 -[7042CR6]-[root@mx3850x5 ~/Node 1, Processor Card]# 
RIP: 0010:[<ffffffff810d31d9>]  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
RSP: 0018:ffff88203164fc28  EFLAGS: 00010246
RAX: ffff88203164fba8 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffffff81e34280 RSI: ffffffff81130330 RDI: 0000000000000000
RBP: ffff88203164fc58 R08: ffffea00d2680340 R09: 0000000000000000
R10: ffff883c7fbd4ef8 R11: 0000000000000078 R12: ffffffff81130330
R13: 00007f09ee803000 R14: ffff883c2fa5bab0 R15: ffff88203164fe08
FS:  00007f09ee7ee700(0000) GS:ffff883c7fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001e0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process munmap (pid: 32643, threadinfo ffff88203164e000, task ffff882030458a70)
Stack:
 ffff883c2fa5bab0 ffff88203164fe08 ffff88203164fc68 ffff88203164fe08
 ffff88203164fe08 00007f09ee803000 ffff88203164fc68 ffffffff810d33c7
 ffff88203164fc88 ffffffff81130e0d ffff88203164fc88 ffffea00d28e54f8
Call Trace:
 [<ffffffff810d33c7>] call_rcu_sched+0x17/0x20
 [<ffffffff81130e0d>] tlb_table_flush+0x2d/0x40
 [<ffffffff81130e80>] tlb_remove_table+0x60/0xc0
 [<ffffffff8103a5e3>] ___pte_free_tlb+0x63/0x70
 [<ffffffff81131b38>] free_pgd_range+0x298/0x4b0
 [<ffffffff81131e1e>] free_pgtables+0xce/0x120
 [<ffffffff81137247>] exit_mmap+0xa7/0x160
 [<ffffffff81043fdf>] mmput+0x6f/0xf0
 [<ffffffff8104c3f5>] exit_mm+0x105/0x130
 [<ffffffff810d6c7d>] ? taskstats_exit+0x17d/0x240
 [<ffffffff8104c596>] do_exit+0x176/0x480
 [<ffffffff8104c8f5>] do_group_exit+0x55/0xd0
 [<ffffffff8104c987>] sys_exit_group+0x17/0x20
 [<ffffffff818a3829>] system_call_fastpath+0x16/0x1b
Code: ff ff 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 40 f6 c7 03 48 89 fb 49 89 f4 0f 85 19 01 00 00 <4c> 89 63 08 48 c7 03 00 00 00 00 0f ae f0 9c 58 66 66 90 66 90 
RIP  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
 RSP <ffff88203164fc28>
CR2: 0000000000000008
---[ end trace 3ed30a91ea7cb375 ]---

----------------------------------------------------------------------------

I think this is what is happening:

___pte_free_tlb
   tlb_remove_table
      tlb_table_flush
         tlb_table_flush_mmu
            tlb_flush_mmu
                Sets need_flush = 0
                tlb_table_flush (if CONFIG_HAVE_RCU_TABLE_FREE)
                    [Gets called twice with same *tlb!]

                    tlb_table_flush_mmu
                        tlb_flush_mmu(nop as need_flush is 0)
                    call_rcu_sched(&(*batch)->rcu,...);
                    *batch = NULL;
         call_rcu_sched(&(*batch)->rcu,...); <---- *batch would be NULL

I verified this by putting following fix and do not see the crash
anymore:

diff --git a/mm/memory.c b/mm/memory.c
index 1797bc1..329fcb9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -367,7 +367,8 @@ void tlb_table_flush(struct mmu_gather *tlb)
 
 	if (*batch) {
 		tlb_table_flush_mmu(tlb);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+		if(*batch)
+			call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
 }

Thanks
Nikunj


^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-07-24  5:12         ` Nikunj A Dadhania
  0 siblings, 0 replies; 120+ messages in thread
From: Nikunj A Dadhania @ 2012-07-24  5:12 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Konrad Rzeszutek Wilk, Benjamin Herrenschmidt,
	David Miller, Russell King, Catalin Marinas, Chris Metcalf,
	Martin Schwidefsky, Tony Luck, Paul Mundt, Jeff Dike,
	Richard Weinberger, Ralf

On Thu, 28 Jun 2012 01:01:46 +0200, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}

Hi Peter,

When running munmap(https://lkml.org/lkml/2012/5/17/59) test with KVM
and pvflush patches I got a crash. I have verified that the crash
happens on the base(non virt) as well when I have
CONFIG_HAVE_RCU_TABLE_FREE defined. Here is the crash details and my
analysis below:

-----------------------------------------------------------------------

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
PGD 0 
Oops: 0002 [#1] SMP 
CPU 24 
Modules linked in: kvm_intel kvm [last unloaded: scsi_wait_scan]


Pid: 32643, comm: munmap Not tainted 3.5.0-rc7+ #46 IBM System x3850 X5 -[7042CR6]-[root@mx3850x5 ~/Node 1, Processor Card]# 
RIP: 0010:[<ffffffff810d31d9>]  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
RSP: 0018:ffff88203164fc28  EFLAGS: 00010246
RAX: ffff88203164fba8 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffffff81e34280 RSI: ffffffff81130330 RDI: 0000000000000000
RBP: ffff88203164fc58 R08: ffffea00d2680340 R09: 0000000000000000
R10: ffff883c7fbd4ef8 R11: 0000000000000078 R12: ffffffff81130330
R13: 00007f09ee803000 R14: ffff883c2fa5bab0 R15: ffff88203164fe08
FS:  00007f09ee7ee700(0000) GS:ffff883c7fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001e0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process munmap (pid: 32643, threadinfo ffff88203164e000, task ffff882030458a70)
Stack:
 ffff883c2fa5bab0 ffff88203164fe08 ffff88203164fc68 ffff88203164fe08
 ffff88203164fe08 00007f09ee803000 ffff88203164fc68 ffffffff810d33c7
 ffff88203164fc88 ffffffff81130e0d ffff88203164fc88 ffffea00d28e54f8
Call Trace:
 [<ffffffff810d33c7>] call_rcu_sched+0x17/0x20
 [<ffffffff81130e0d>] tlb_table_flush+0x2d/0x40
 [<ffffffff81130e80>] tlb_remove_table+0x60/0xc0
 [<ffffffff8103a5e3>] ___pte_free_tlb+0x63/0x70
 [<ffffffff81131b38>] free_pgd_range+0x298/0x4b0
 [<ffffffff81131e1e>] free_pgtables+0xce/0x120
 [<ffffffff81137247>] exit_mmap+0xa7/0x160
 [<ffffffff81043fdf>] mmput+0x6f/0xf0
 [<ffffffff8104c3f5>] exit_mm+0x105/0x130
 [<ffffffff810d6c7d>] ? taskstats_exit+0x17d/0x240
 [<ffffffff8104c596>] do_exit+0x176/0x480
 [<ffffffff8104c8f5>] do_group_exit+0x55/0xd0
 [<ffffffff8104c987>] sys_exit_group+0x17/0x20
 [<ffffffff818a3829>] system_call_fastpath+0x16/0x1b
Code: ff ff 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 40 f6 c7 03 48 89 fb 49 89 f4 0f 85 19 01 00 00 <4c> 89 63 08 48 c7 03 00 00 00 00 0f ae f0 9c 58 66 66 90 66 90 
RIP  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
 RSP <ffff88203164fc28>
CR2: 0000000000000008
---[ end trace 3ed30a91ea7cb375 ]---

----------------------------------------------------------------------------

I think this is what is happening:

___pte_free_tlb
   tlb_remove_table
      tlb_table_flush
         tlb_table_flush_mmu
            tlb_flush_mmu
                Sets need_flush = 0
                tlb_table_flush (if CONFIG_HAVE_RCU_TABLE_FREE)
                    [Gets called twice with same *tlb!]

                    tlb_table_flush_mmu
                        tlb_flush_mmu(nop as need_flush is 0)
                    call_rcu_sched(&(*batch)->rcu,...);
                    *batch = NULL;
         call_rcu_sched(&(*batch)->rcu,...); <---- *batch would be NULL

I verified this by putting following fix and do not see the crash
anymore:

diff --git a/mm/memory.c b/mm/memory.c
index 1797bc1..329fcb9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -367,7 +367,8 @@ void tlb_table_flush(struct mmu_gather *tlb)
 
 	if (*batch) {
 		tlb_table_flush_mmu(tlb);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+		if(*batch)
+			call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
 }

Thanks
Nikunj

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

* Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing
@ 2012-07-24  5:12         ` Nikunj A Dadhania
  0 siblings, 0 replies; 120+ messages in thread
From: Nikunj A Dadhania @ 2012-07-24  5:12 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: linux-kernel, linux-arch, linux-mm, Thomas Gleixner, Ingo Molnar,
	akpm, Rik van Riel, Hugh Dickins, Mel Gorman, Nick Piggin,
	Alex Shi, Konrad Rzeszutek Wilk, Benjamin Herrenschmidt,
	David Miller, Russell King, Catalin Marinas, Chris Metcalf,
	Martin Schwidefsky, Tony Luck, Paul Mundt, Jeff Dike,
	Richard Weinberger, Ralf Baechle, Kyle McMartin, James Bottomley,
	Chris Zankel

On Thu, 28 Jun 2012 01:01:46 +0200, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
  
> +#ifdef CONFIG_STRICT_TLB_FILL
> +/*
> + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
> + * the PTE entries from their hash-table. Their hardware never looks at the
> + * linux page-table structures, so they don't need a hardware TLB invalidate
> + * when tearing down the page-table structure itself.
> + */
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
> +#else
> +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
> +{
> +	tlb_flush_mmu(tlb);
> +}
> +#endif
> +
>  void tlb_table_flush(struct mmu_gather *tlb)
>  {
>  	struct mmu_table_batch **batch = &tlb->batch;
>  
>  	if (*batch) {
> +		tlb_table_flush_mmu(tlb);
>  		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
>  		*batch = NULL;
>  	}

Hi Peter,

When running munmap(https://lkml.org/lkml/2012/5/17/59) test with KVM
and pvflush patches I got a crash. I have verified that the crash
happens on the base(non virt) as well when I have
CONFIG_HAVE_RCU_TABLE_FREE defined. Here is the crash details and my
analysis below:

-----------------------------------------------------------------------

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
PGD 0 
Oops: 0002 [#1] SMP 
CPU 24 
Modules linked in: kvm_intel kvm [last unloaded: scsi_wait_scan]


Pid: 32643, comm: munmap Not tainted 3.5.0-rc7+ #46 IBM System x3850 X5 -[7042CR6]-[root@mx3850x5 ~/Node 1, Processor Card]# 
RIP: 0010:[<ffffffff810d31d9>]  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
RSP: 0018:ffff88203164fc28  EFLAGS: 00010246
RAX: ffff88203164fba8 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffffff81e34280 RSI: ffffffff81130330 RDI: 0000000000000000
RBP: ffff88203164fc58 R08: ffffea00d2680340 R09: 0000000000000000
R10: ffff883c7fbd4ef8 R11: 0000000000000078 R12: ffffffff81130330
R13: 00007f09ee803000 R14: ffff883c2fa5bab0 R15: ffff88203164fe08
FS:  00007f09ee7ee700(0000) GS:ffff883c7fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001e0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process munmap (pid: 32643, threadinfo ffff88203164e000, task ffff882030458a70)
Stack:
 ffff883c2fa5bab0 ffff88203164fe08 ffff88203164fc68 ffff88203164fe08
 ffff88203164fe08 00007f09ee803000 ffff88203164fc68 ffffffff810d33c7
 ffff88203164fc88 ffffffff81130e0d ffff88203164fc88 ffffea00d28e54f8
Call Trace:
 [<ffffffff810d33c7>] call_rcu_sched+0x17/0x20
 [<ffffffff81130e0d>] tlb_table_flush+0x2d/0x40
 [<ffffffff81130e80>] tlb_remove_table+0x60/0xc0
 [<ffffffff8103a5e3>] ___pte_free_tlb+0x63/0x70
 [<ffffffff81131b38>] free_pgd_range+0x298/0x4b0
 [<ffffffff81131e1e>] free_pgtables+0xce/0x120
 [<ffffffff81137247>] exit_mmap+0xa7/0x160
 [<ffffffff81043fdf>] mmput+0x6f/0xf0
 [<ffffffff8104c3f5>] exit_mm+0x105/0x130
 [<ffffffff810d6c7d>] ? taskstats_exit+0x17d/0x240
 [<ffffffff8104c596>] do_exit+0x176/0x480
 [<ffffffff8104c8f5>] do_group_exit+0x55/0xd0
 [<ffffffff8104c987>] sys_exit_group+0x17/0x20
 [<ffffffff818a3829>] system_call_fastpath+0x16/0x1b
Code: ff ff 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 40 f6 c7 03 48 89 fb 49 89 f4 0f 85 19 01 00 00 <4c> 89 63 08 48 c7 03 00 00 00 00 0f ae f0 9c 58 66 66 90 66 90 
RIP  [<ffffffff810d31d9>] __call_rcu+0x29/0x1c0
 RSP <ffff88203164fc28>
CR2: 0000000000000008
---[ end trace 3ed30a91ea7cb375 ]---

----------------------------------------------------------------------------

I think this is what is happening:

___pte_free_tlb
   tlb_remove_table
      tlb_table_flush
         tlb_table_flush_mmu
            tlb_flush_mmu
                Sets need_flush = 0
                tlb_table_flush (if CONFIG_HAVE_RCU_TABLE_FREE)
                    [Gets called twice with same *tlb!]

                    tlb_table_flush_mmu
                        tlb_flush_mmu(nop as need_flush is 0)
                    call_rcu_sched(&(*batch)->rcu,...);
                    *batch = NULL;
         call_rcu_sched(&(*batch)->rcu,...); <---- *batch would be NULL

I verified this by putting following fix and do not see the crash
anymore:

diff --git a/mm/memory.c b/mm/memory.c
index 1797bc1..329fcb9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -367,7 +367,8 @@ void tlb_table_flush(struct mmu_gather *tlb)
 
 	if (*batch) {
 		tlb_table_flush_mmu(tlb);
-		call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+		if(*batch)
+			call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
 		*batch = NULL;
 	}
 }

Thanks
Nikunj

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2012-07-24  5:13 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-27 21:15 [PATCH 00/20] Unify TLB gather implementations -v3 Peter Zijlstra
2012-06-27 21:15 ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 01/20] mm, x86: Add HAVE_RCU_TABLE_FREE support Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 22:23   ` Linus Torvalds
2012-06-27 22:23     ` Linus Torvalds
2012-06-27 23:01     ` Peter Zijlstra
2012-06-27 23:01       ` Peter Zijlstra
2012-06-27 23:01       ` Peter Zijlstra
2012-06-27 23:42       ` Linus Torvalds
2012-06-27 23:42         ` Linus Torvalds
2012-06-27 23:42         ` Linus Torvalds
2012-06-28  7:09       ` Benjamin Herrenschmidt
2012-06-28  7:09         ` Benjamin Herrenschmidt
2012-06-28  7:09         ` Benjamin Herrenschmidt
2012-06-28 11:05         ` Peter Zijlstra
2012-06-28 11:05           ` Peter Zijlstra
2012-06-28 11:05           ` Peter Zijlstra
2012-06-28 12:00           ` Benjamin Herrenschmidt
2012-06-28 12:00             ` Benjamin Herrenschmidt
2012-06-28 12:00             ` Benjamin Herrenschmidt
2012-07-24  5:12       ` Nikunj A Dadhania
2012-07-24  5:12         ` Nikunj A Dadhania
2012-07-24  5:12         ` Nikunj A Dadhania
2012-06-27 21:15 ` [PATCH 03/20] mm, tlb: Remove a few #ifdefs Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 04/20] mm, s390: use generic RCU page-table freeing code Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 05/20] mm, powerpc: Dont use tlb_flush for external tlb flushes Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 06/20] mm, sparc64: " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 07/20] mm, arch: Remove tlb_flush() Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 08/20] mm: Optimize fullmm TLB flushing Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 22:26   ` Linus Torvalds
2012-06-27 22:26     ` Linus Torvalds
2012-06-27 23:02     ` Peter Zijlstra
2012-06-27 23:02       ` Peter Zijlstra
2012-06-27 23:13       ` Peter Zijlstra
2012-06-27 23:13         ` Peter Zijlstra
2012-06-27 23:13         ` Peter Zijlstra
2012-06-27 23:23         ` Linus Torvalds
2012-06-27 23:23           ` Linus Torvalds
2012-06-27 23:23           ` Linus Torvalds
2012-06-27 23:33           ` Linus Torvalds
2012-06-27 23:33             ` Linus Torvalds
2012-06-27 23:33             ` Linus Torvalds
2012-06-28  9:16             ` Catalin Marinas
2012-06-28  9:16               ` Catalin Marinas
2012-06-28 10:39               ` Benjamin Herrenschmidt
2012-06-28 10:39                 ` Benjamin Herrenschmidt
2012-06-28 10:59                 ` Peter Zijlstra
2012-06-28 10:59                   ` Peter Zijlstra
2012-06-28 14:53                   ` Catalin Marinas
2012-06-28 14:53                     ` Catalin Marinas
2012-06-28 16:20                     ` Peter Zijlstra
2012-06-28 16:20                       ` Peter Zijlstra
2012-06-28 16:38                       ` Peter Zijlstra
2012-06-28 16:38                         ` Peter Zijlstra
2012-06-28 16:45                       ` Linus Torvalds
2012-06-28 16:45                         ` Linus Torvalds
2012-06-28 16:52                         ` Peter Zijlstra
2012-06-28 16:52                           ` Peter Zijlstra
2012-06-28 21:57                           ` Benjamin Herrenschmidt
2012-06-28 21:57                             ` Benjamin Herrenschmidt
2012-06-28 21:58                             ` Benjamin Herrenschmidt
2012-06-28 21:58                               ` Benjamin Herrenschmidt
2012-06-29  8:49                               ` Peter Zijlstra
2012-06-29  8:49                                 ` Peter Zijlstra
2012-06-29 15:26                             ` Catalin Marinas
2012-06-29 15:26                               ` Catalin Marinas
2012-06-29 22:11                               ` Benjamin Herrenschmidt
2012-06-29 22:11                                 ` Benjamin Herrenschmidt
2012-06-28 10:55             ` Peter Zijlstra
2012-06-28 10:55               ` Peter Zijlstra
2012-06-28 10:55               ` Peter Zijlstra
2012-06-28 11:19               ` Martin Schwidefsky
2012-06-28 11:19                 ` Martin Schwidefsky
2012-06-28 11:19                 ` Martin Schwidefsky
2012-06-28 11:30                 ` Peter Zijlstra
2012-06-28 11:30                   ` Peter Zijlstra
2012-06-28 11:30                   ` Peter Zijlstra
2012-06-28 16:00                   ` Avi Kivity
2012-06-28 16:00                     ` Avi Kivity
2012-06-27 21:15 ` [PATCH 09/20] mm, arch: Add end argument to p??_free_tlb() Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 10/20] mm: Provide generic range tracking and flushing Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 11/20] mm, s390: Convert to use generic mmu_gather Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 22:13   ` Peter Zijlstra
2012-06-27 22:13     ` Peter Zijlstra
2012-06-28  7:13     ` Martin Schwidefsky
2012-06-28  7:13       ` Martin Schwidefsky
2012-06-27 21:15 ` [PATCH 12/20] mm, arm: Convert arm to generic tlb Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 13/20] mm, ia64: Convert ia64 " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 14/20] mm, sh: Convert sh " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-28 18:32   ` Paul Mundt
2012-06-28 18:32     ` Paul Mundt
2012-06-28 20:27     ` Peter Zijlstra
2012-06-28 20:27       ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 15/20] mm, um: Convert um " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 16/20] mm, avr32: Convert avr32 " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 17/20] mm, mips: Convert mips " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 18/20] mm, parisc: Convert parisc " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:15 ` [PATCH 19/20] mm, sparc32: Convert sparc32 " Peter Zijlstra
2012-06-27 21:15   ` Peter Zijlstra
2012-06-27 21:16 ` [PATCH 20/20] mm, xtensa: Convert xtensa " Peter Zijlstra
2012-06-27 21:16   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.