All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Further radix TLB flush optimisations
@ 2017-09-07 14:51 Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 1/8] powerpc/64s/radix: Fix theoretical process table entry cache invalidation Nicholas Piggin
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Here is a bit more TLB flush work that mostly attempt to
improve range flushes by reducing barriers, and reducing
the cases we resort to flushing the entire PID.

I haven't done much benchmarking to get good numbers yet
for the exact heuristics settings, just interested in
comments for the overall idea.

Thanks,
Nick

Nicholas Piggin (8):
  powerpc/64s/radix: Fix theoretical process table entry cache
    invalidation
  powerpc/64s/radix: tlbie improve preempt handling
  powerpc/64s/radix: optimize TLB range flush barriers
  powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions
  powerpc/64s/radix: Introduce local single page ceiling for TLB range
    flush
  powerpc/64s/radix: Optimize flush_tlb_range
  powerpc/64s/radix: Improve TLB flushing for unmaps that free a page
    table
  powerpc/64s/radix: Only flush local TLB for spurious fault flushes

 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |   7 +-
 arch/powerpc/include/asm/book3s/64/tlbflush.h      |  11 +
 arch/powerpc/include/asm/mmu_context.h             |   4 +
 arch/powerpc/mm/mmu_context_book3s64.c             |  23 +-
 arch/powerpc/mm/pgtable-book3s64.c                 |   5 +-
 arch/powerpc/mm/pgtable.c                          |   2 +-
 arch/powerpc/mm/tlb-radix.c                        | 263 +++++++++++++++------
 7 files changed, 234 insertions(+), 81 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/8] powerpc/64s/radix: Fix theoretical process table entry cache invalidation
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 2/8] powerpc/64s/radix: tlbie improve preempt handling Nicholas Piggin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

According to the architecture, the process table entry cache must be
flushed with RIC=2 tlbies. This problem doesn't hit in existing
implementations that do not cache process table entries over mtpid. The
PID is only destroyed and re-used after all CPUs have switched away from
the mm, guaranteeing its entry is not cached anywhere. But this is not
generally safe according to the ISA.

Fix this by clearing the process table entry before the final flush
(which is always a RIC=2 flush that invalidates the process table entry
cache).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/mmu_context.h |  4 ++++
 arch/powerpc/mm/mmu_context_book3s64.c | 23 ++++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 309592589e30..0a70221adcf7 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -118,9 +118,13 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
 {
 }
 
+#ifndef CONFIG_PPC_BOOK3S_64
 static inline void arch_exit_mmap(struct mm_struct *mm)
 {
 }
+#else
+extern void arch_exit_mmap(struct mm_struct *mm);
+#endif
 
 static inline void arch_unmap(struct mm_struct *mm,
 			      struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 05e15386d4cb..feb3f43195c2 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -216,19 +216,32 @@ void destroy_context(struct mm_struct *mm)
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	WARN_ON_ONCE(!list_empty(&mm->context.iommu_group_mem_list));
 #endif
+	if (radix_enabled())
+		WARN_ON(process_tb[mm->context.id].prtb0 != 0);
+	else
+		subpage_prot_free(mm);
+	destroy_pagetable_page(mm);
+	__destroy_context(mm->context.id);
+	mm->context.id = MMU_NO_CONTEXT;
+}
+
+void arch_exit_mmap(struct mm_struct *mm)
+{
 	if (radix_enabled()) {
 		/*
 		 * Radix doesn't have a valid bit in the process table
 		 * entries. However we know that at least P9 implementation
 		 * will avoid caching an entry with an invalid RTS field,
 		 * and 0 is invalid. So this will do.
+		 *
+		 * This runs before the "fullmm" tlb flush in exit_mmap,
+		 * which does a RIC_FLUSH_ALL to clear the process table
+		 * entry. No barrier required here after the store because
+		 * this process will do the invalidate, which starts with
+		 * ptesync.
 		 */
 		process_tb[mm->context.id].prtb0 = 0;
-	} else
-		subpage_prot_free(mm);
-	destroy_pagetable_page(mm);
-	__destroy_context(mm->context.id);
-	mm->context.id = MMU_NO_CONTEXT;
+	}
 }
 
 #ifdef CONFIG_PPC_RADIX_MMU
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/8] powerpc/64s/radix: tlbie improve preempt handling
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 1/8] powerpc/64s/radix: Fix theoretical process table entry cache invalidation Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 3/8] powerpc/64s/radix: optimize TLB range flush barriers Nicholas Piggin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Preempt should be consistently disabled for mm_is_thread_local tests,
so bring the rest of these under preempt_disable().

Preempt does not need to be disabled for the mm->context.id tests, which
allows simplification and removal of gotos.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 47 +++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index b3e849c4886e..1ed61baf58da 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -186,16 +186,15 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
 {
 	unsigned long pid;
 
-	preempt_disable();
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		goto no_context;
+		return;
 
+	preempt_disable();
 	if (!mm_is_thread_local(mm))
 		_tlbie_pid(pid, RIC_FLUSH_TLB);
 	else
 		_tlbiel_pid(pid, RIC_FLUSH_TLB);
-no_context:
 	preempt_enable();
 }
 EXPORT_SYMBOL(radix__flush_tlb_mm);
@@ -204,16 +203,15 @@ static void radix__flush_all_mm(struct mm_struct *mm)
 {
 	unsigned long pid;
 
-	preempt_disable();
 	pid = mm->context.id;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		goto no_context;
+		return;
 
+	preempt_disable();
 	if (!mm_is_thread_local(mm))
 		_tlbie_pid(pid, RIC_FLUSH_ALL);
 	else
 		_tlbiel_pid(pid, RIC_FLUSH_ALL);
-no_context:
 	preempt_enable();
 }
 
@@ -229,15 +227,14 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 	unsigned long pid;
 	unsigned long ap = mmu_get_ap(psize);
 
-	preempt_disable();
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		goto bail;
+		return;
+	preempt_disable();
 	if (!mm_is_thread_local(mm))
 		_tlbie_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
 	else
 		_tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
-bail:
 	preempt_enable();
 }
 
@@ -322,46 +319,44 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 {
 	unsigned long pid;
 	unsigned long addr;
-	int local = mm_is_thread_local(mm);
+	bool local;
 	unsigned long ap = mmu_get_ap(psize);
 	unsigned long page_size = 1UL << mmu_psize_defs[psize].shift;
 
-
-	preempt_disable();
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		goto err_out;
+		return;
 
+	preempt_disable();
+	local = mm_is_thread_local(mm);
 	if (end == TLB_FLUSH_ALL ||
 	    (end - start) > tlb_single_page_flush_ceiling * page_size) {
 		if (local)
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
 		else
 			_tlbie_pid(pid, RIC_FLUSH_TLB);
-		goto err_out;
-	}
-	for (addr = start; addr < end; addr += page_size) {
+	} else {
+		for (addr = start; addr < end; addr += page_size) {
 
-		if (local)
-			_tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
-		else
-			_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
+			if (local)
+				_tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
+			else
+				_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
+		}
 	}
-err_out:
 	preempt_enable();
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 {
-	int local = mm_is_thread_local(mm);
 	unsigned long ap = mmu_get_ap(mmu_virtual_psize);
 	unsigned long pid, end;
-
+	bool local;
 
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		goto no_context;
+		return;
 
 	/* 4k page size, just blow the world */
 	if (PAGE_SIZE == 0x1000) {
@@ -369,6 +364,8 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		return;
 	}
 
+	preempt_disable();
+	local = mm_is_thread_local(mm);
 	/* Otherwise first do the PWC */
 	if (local)
 		_tlbiel_pid(pid, RIC_FLUSH_PWC);
@@ -383,7 +380,7 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		else
 			_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
 	}
-no_context:
+
 	preempt_enable();
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/8] powerpc/64s/radix: optimize TLB range flush barriers
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 1/8] powerpc/64s/radix: Fix theoretical process table entry cache invalidation Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 2/8] powerpc/64s/radix: tlbie improve preempt handling Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 4/8] powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions Nicholas Piggin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Short range flushes issue a sequences of tlbie(l) instructions for
individual effective addresses. These do not all require individual
barrier sequences, only one set around all instructions.

Commit f7327e0ba3 ("powerpc/mm/radix: Remove unnecessary ptesync")
made a similar optimization for tlbiel for PID flushing.

For tlbie, the ISA says:

    The tlbsync instruction provides an ordering function for the
    effects of all tlbie instructions executed by the thread executing
    the tlbsync instruction, with respect to the memory barrier
    created by a subsequent ptesync instruction executed by the same
    thread.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1ed61baf58da..c30f3faf5356 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -84,7 +84,7 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
 	trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
-static inline void _tlbiel_va(unsigned long va, unsigned long pid,
+static inline void __tlbiel_va(unsigned long va, unsigned long pid,
 			      unsigned long ap, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
@@ -95,14 +95,20 @@ static inline void _tlbiel_va(unsigned long va, unsigned long pid,
 	prs = 1; /* process scoped */
 	r = 1;   /* raidx format */
 
-	asm volatile("ptesync": : :"memory");
 	asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
 		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
-	asm volatile("ptesync": : :"memory");
 	trace_tlbie(0, 1, rb, rs, ric, prs, r);
 }
 
-static inline void _tlbie_va(unsigned long va, unsigned long pid,
+static inline void _tlbiel_va(unsigned long va, unsigned long pid,
+			      unsigned long ap, unsigned long ric)
+{
+	asm volatile("ptesync": : :"memory");
+	__tlbiel_va(va, pid, ap, ric);
+	asm volatile("ptesync": : :"memory");
+}
+
+static inline void __tlbie_va(unsigned long va, unsigned long pid,
 			     unsigned long ap, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
@@ -113,13 +119,20 @@ static inline void _tlbie_va(unsigned long va, unsigned long pid,
 	prs = 1; /* process scoped */
 	r = 1;   /* raidx format */
 
-	asm volatile("ptesync": : :"memory");
 	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
 		     : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
-	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 	trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
+static inline void _tlbie_va(unsigned long va, unsigned long pid,
+			     unsigned long ap, unsigned long ric)
+{
+	asm volatile("ptesync": : :"memory");
+	__tlbie_va(va, pid, ap, ric);
+	asm volatile("eieio; tlbsync; ptesync": : :"memory");
+}
+
+
 /*
  * Base TLB flushing operations:
  *
@@ -335,14 +348,20 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
 		else
 			_tlbie_pid(pid, RIC_FLUSH_TLB);
+
 	} else {
+		asm volatile("ptesync": : :"memory");
 		for (addr = start; addr < end; addr += page_size) {
 
 			if (local)
-				_tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
+				__tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
 			else
-				_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
+				__tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
 		}
+		if (local)
+			asm volatile("ptesync": : :"memory");
+		else
+			asm volatile("eieio; tlbsync; ptesync": : :"memory");
 	}
 	preempt_enable();
 }
@@ -373,6 +392,7 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		_tlbie_pid(pid, RIC_FLUSH_PWC);
 
 	/* Then iterate the pages */
+	asm volatile("ptesync": : :"memory");
 	end = addr + HPAGE_PMD_SIZE;
 	for (; addr < end; addr += PAGE_SIZE) {
 		if (local)
@@ -380,7 +400,10 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		else
 			_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
 	}
-
+	if (local)
+		asm volatile("ptesync": : :"memory");
+	else
+		asm volatile("eieio; tlbsync; ptesync": : :"memory");
 	preempt_enable();
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/8] powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-09-07 14:51 ` [RFC PATCH 3/8] powerpc/64s/radix: optimize TLB range flush barriers Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 5/8] powerpc/64s/radix: Introduce local single page ceiling for TLB range flush Nicholas Piggin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Move the barriers and range iteration down into the _tlbie* level,
which improves readability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 70 ++++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index c30f3faf5356..1d3cbc01596d 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -85,7 +85,7 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
 }
 
 static inline void __tlbiel_va(unsigned long va, unsigned long pid,
-			      unsigned long ap, unsigned long ric)
+			       unsigned long ap, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
 
@@ -101,13 +101,28 @@ static inline void __tlbiel_va(unsigned long va, unsigned long pid,
 }
 
 static inline void _tlbiel_va(unsigned long va, unsigned long pid,
-			      unsigned long ap, unsigned long ric)
+			      unsigned long psize, unsigned long ric)
 {
+	unsigned long ap = mmu_get_ap(psize);
+
 	asm volatile("ptesync": : :"memory");
 	__tlbiel_va(va, pid, ap, ric);
 	asm volatile("ptesync": : :"memory");
 }
 
+static inline void _tlbiel_va_range(unsigned long start, unsigned long end,
+				    unsigned long pid, unsigned long page_size,
+				    unsigned long psize)
+{
+	unsigned long addr;
+	unsigned long ap = mmu_get_ap(psize);
+
+	asm volatile("ptesync": : :"memory");
+	for (addr = start; addr < end; addr += page_size)
+		__tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
+	asm volatile("ptesync": : :"memory");
+}
+
 static inline void __tlbie_va(unsigned long va, unsigned long pid,
 			     unsigned long ap, unsigned long ric)
 {
@@ -125,13 +140,27 @@ static inline void __tlbie_va(unsigned long va, unsigned long pid,
 }
 
 static inline void _tlbie_va(unsigned long va, unsigned long pid,
-			     unsigned long ap, unsigned long ric)
+			      unsigned long psize, unsigned long ric)
 {
+	unsigned long ap = mmu_get_ap(psize);
+
 	asm volatile("ptesync": : :"memory");
 	__tlbie_va(va, pid, ap, ric);
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
+static inline void _tlbie_va_range(unsigned long start, unsigned long end,
+				    unsigned long pid, unsigned long page_size,
+				    unsigned long psize)
+{
+	unsigned long addr;
+	unsigned long ap = mmu_get_ap(psize);
+
+	asm volatile("ptesync": : :"memory");
+	for (addr = start; addr < end; addr += page_size)
+		__tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
+	asm volatile("eieio; tlbsync; ptesync": : :"memory");
+}
 
 /*
  * Base TLB flushing operations:
@@ -173,12 +202,11 @@ void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmadd
 				       int psize)
 {
 	unsigned long pid;
-	unsigned long ap = mmu_get_ap(psize);
 
 	preempt_disable();
 	pid = mm ? mm->context.id : 0;
 	if (pid != MMU_NO_CONTEXT)
-		_tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
+		_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
 	preempt_enable();
 }
 
@@ -238,16 +266,15 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 				 int psize)
 {
 	unsigned long pid;
-	unsigned long ap = mmu_get_ap(psize);
 
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		return;
 	preempt_disable();
 	if (!mm_is_thread_local(mm))
-		_tlbie_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
+		_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
 	else
-		_tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB);
+		_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
 	preempt_enable();
 }
 
@@ -331,9 +358,7 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 				  unsigned long end, int psize)
 {
 	unsigned long pid;
-	unsigned long addr;
 	bool local;
-	unsigned long ap = mmu_get_ap(psize);
 	unsigned long page_size = 1UL << mmu_psize_defs[psize].shift;
 
 	pid = mm ? mm->context.id : 0;
@@ -350,18 +375,10 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 			_tlbie_pid(pid, RIC_FLUSH_TLB);
 
 	} else {
-		asm volatile("ptesync": : :"memory");
-		for (addr = start; addr < end; addr += page_size) {
-
-			if (local)
-				__tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
-			else
-				__tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
-		}
 		if (local)
-			asm volatile("ptesync": : :"memory");
+			_tlbiel_va_range(start, end, pid, page_size, psize);
 		else
-			asm volatile("eieio; tlbsync; ptesync": : :"memory");
+			_tlbie_va_range(start, end, pid, page_size, psize);
 	}
 	preempt_enable();
 }
@@ -369,7 +386,6 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 {
-	unsigned long ap = mmu_get_ap(mmu_virtual_psize);
 	unsigned long pid, end;
 	bool local;
 
@@ -392,18 +408,12 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		_tlbie_pid(pid, RIC_FLUSH_PWC);
 
 	/* Then iterate the pages */
-	asm volatile("ptesync": : :"memory");
 	end = addr + HPAGE_PMD_SIZE;
-	for (; addr < end; addr += PAGE_SIZE) {
-		if (local)
-			_tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
-		else
-			_tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
-	}
+
 	if (local)
-		asm volatile("ptesync": : :"memory");
+		_tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize);
 	else
-		asm volatile("eieio; tlbsync; ptesync": : :"memory");
+		_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize);
 	preempt_enable();
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/8] powerpc/64s/radix: Introduce local single page ceiling for TLB range flush
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-09-07 14:51 ` [RFC PATCH 4/8] powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 6/8] powerpc/64s/radix: Optimize flush_tlb_range Nicholas Piggin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

The single page flush ceiling is the cut-off point at which we switch
from invalidating individual pages, to invalidating the entire process
address space in response to a range flush.

Introduce a local variant of this heuristic because local and global
tlbie have significantly different properties:
- Local tlbiel requires 128 instructions to invalidate a PID, global
  tlbie only 1 instruction.
- Global tlbie instructions are expensive broadcast operations.

The local ceiling has been made much higher, 2x the number of
instructions required to invalidate the entire PID (this has not
yet been benchmarked in detail).
---
 arch/powerpc/mm/tlb-radix.c | 49 +++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1d3cbc01596d..8ec59b57d46c 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -348,35 +348,41 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 }
 
 #define TLB_FLUSH_ALL -1UL
+
 /*
- * Number of pages above which we will do a bcast tlbie. Just a
- * number at this point copied from x86
+ * Number of pages above which we invalidate the entire PID rather than
+ * flush individual pages, for local and global flushes respectively.
+ *
+ * tlbie goes out to the interconnect and individual ops are more costly.
+ * It also does not iterate over sets like the local tlbiel variant when
+ * invalidating a full PID, so it has a far lower threshold to change from
+ * individual page flushes to full-pid flushes.
  */
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
+static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
 void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 				  unsigned long end, int psize)
 {
 	unsigned long pid;
-	bool local;
-	unsigned long page_size = 1UL << mmu_psize_defs[psize].shift;
+	unsigned int page_shift = mmu_psize_defs[psize].shift;
+	unsigned long page_size = 1UL << page_shift;
 
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
 		return;
 
 	preempt_disable();
-	local = mm_is_thread_local(mm);
-	if (end == TLB_FLUSH_ALL ||
-	    (end - start) > tlb_single_page_flush_ceiling * page_size) {
-		if (local)
+	if (mm_is_thread_local(mm)) {
+		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
+					tlb_local_single_page_flush_ceiling)
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
 		else
-			_tlbie_pid(pid, RIC_FLUSH_TLB);
-
-	} else {
-		if (local)
 			_tlbiel_va_range(start, end, pid, page_size, psize);
+	} else {
+		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
+					tlb_single_page_flush_ceiling)
+			_tlbie_pid(pid, RIC_FLUSH_TLB);
 		else
 			_tlbie_va_range(start, end, pid, page_size, psize);
 	}
@@ -387,7 +393,6 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 {
 	unsigned long pid, end;
-	bool local;
 
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
@@ -399,21 +404,17 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 		return;
 	}
 
-	preempt_disable();
-	local = mm_is_thread_local(mm);
-	/* Otherwise first do the PWC */
-	if (local)
-		_tlbiel_pid(pid, RIC_FLUSH_PWC);
-	else
-		_tlbie_pid(pid, RIC_FLUSH_PWC);
-
-	/* Then iterate the pages */
 	end = addr + HPAGE_PMD_SIZE;
 
-	if (local)
+	/* Otherwise first do the PWC, then iterate the pages. */
+	preempt_disable();
+	if (mm_is_thread_local(mm)) {
+		_tlbiel_pid(pid, RIC_FLUSH_PWC);
 		_tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize);
-	else
+	} else {
+		_tlbie_pid(pid, RIC_FLUSH_PWC);
 		_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize);
+	}
 	preempt_enable();
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/8] powerpc/64s/radix: Optimize flush_tlb_range
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-09-07 14:51 ` [RFC PATCH 5/8] powerpc/64s/radix: Introduce local single page ceiling for TLB range flush Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 7/8] powerpc/64s/radix: Improve TLB flushing for unmaps that free a page table Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes Nicholas Piggin
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Currently for radix, flush_tlb_range flushes the entire PID, because
we don't know about THP vs regular pages. This is quite sub-optimal
for small mremap/mprotect/change_protection.

Instead, implement this with two range flush passes, one for each
page size. If the small page range flush ended up doing the full PID
invalidation, then avoid the second flush. If not, the second flush
is an order of magnitude or two fewer operations than the first, so
it's relatively insignificant.

There is still room for improvement here with some changes to generic
APIs, particularly if there are a lot of huge pages in place.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  2 +-
 arch/powerpc/mm/tlb-radix.c                        | 52 +++++++++++++++++-----
 2 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 9b433a624bf3..b12460b306a7 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -12,7 +12,7 @@ static inline int mmu_get_ap(int psize)
 
 extern void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma,
 					   unsigned long start, unsigned long end);
-extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
+extern bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 					 unsigned long end, int psize);
 extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
 				       unsigned long start, unsigned long end);
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 8ec59b57d46c..1b0cac656680 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -299,17 +299,40 @@ void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
 }
 EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
 
-/*
- * Currently, for range flushing, we just do a full mm flush. Because
- * we use this in code path where we don' track the page size.
- */
 void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		     unsigned long end)
 
 {
 	struct mm_struct *mm = vma->vm_mm;
+	bool full;
 
-	radix__flush_tlb_mm(mm);
+#ifdef CONFIG_HUGETLB_PAGE
+	if (is_vm_hugetlb_page(vma))
+		return radix__flush_hugetlb_tlb_range(vma, start, end);
+#endif
+	full = radix__flush_tlb_range_psize(mm, start, end, mmu_virtual_psize);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	if (!full) {
+		/*
+		 * If the small page flush was not a full PID flush, we have
+		 * to do a second pass to flush transparent huge pages. This
+		 * will be a far smaller number of invalidates, so it's not
+		 * worth calculating.
+		 *
+		 * Range flushes are still sub-optimal for cases of all or
+		 * no hugepages (moreso the former), which should be improved
+		 * by changing the flush API.
+		 */
+		unsigned long hstart, hend;
+		hstart = (start + HPAGE_PMD_SIZE - 1) >> HPAGE_PMD_SHIFT;
+		hend = end >> HPAGE_PMD_SHIFT;
+		if (hstart != hend) {
+			hstart <<= HPAGE_PMD_SHIFT;
+			hend <<= HPAGE_PMD_SHIFT;
+			radix__flush_tlb_range_psize(mm, hstart, hend, MMU_PAGE_2M);
+		}
+	}
+#endif
 }
 EXPORT_SYMBOL(radix__flush_tlb_range);
 
@@ -361,32 +384,39 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
-void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
+bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 				  unsigned long end, int psize)
 {
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[psize].shift;
 	unsigned long page_size = 1UL << page_shift;
+	bool full = false;
 
 	pid = mm ? mm->context.id : 0;
 	if (unlikely(pid == MMU_NO_CONTEXT))
-		return;
+		return full;
 
 	preempt_disable();
 	if (mm_is_thread_local(mm)) {
 		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
-					tlb_local_single_page_flush_ceiling)
+					tlb_local_single_page_flush_ceiling) {
+			full = true;
 			_tlbiel_pid(pid, RIC_FLUSH_TLB);
-		else
+		} else {
 			_tlbiel_va_range(start, end, pid, page_size, psize);
+		}
 	} else {
 		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
-					tlb_single_page_flush_ceiling)
+					tlb_single_page_flush_ceiling) {
+			full = true;
 			_tlbie_pid(pid, RIC_FLUSH_TLB);
-		else
+		} else {
 			_tlbie_va_range(start, end, pid, page_size, psize);
+		}
 	}
 	preempt_enable();
+
+	return full;
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/8] powerpc/64s/radix: Improve TLB flushing for unmaps that free a page table
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
                   ` (5 preceding siblings ...)
  2017-09-07 14:51 ` [RFC PATCH 6/8] powerpc/64s/radix: Optimize flush_tlb_range Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 14:51 ` [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes Nicholas Piggin
  7 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

Unmaps that free page tables always flush the PID, which is sub
optimal. Allow those to do TLB range flushes with separate PWC flush.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 51 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1b0cac656680..7452e1f4aa3c 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -351,23 +351,35 @@ static int radix_get_mmu_psize(int page_size)
 	return psize;
 }
 
+static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				  unsigned long end, int psize);
+
 void radix__tlb_flush(struct mmu_gather *tlb)
 {
 	int psize = 0;
 	struct mm_struct *mm = tlb->mm;
 	int page_size = tlb->page_size;
 
-	psize = radix_get_mmu_psize(page_size);
 	/*
 	 * if page size is not something we understand, do a full mm flush
 	 */
-	if (psize != -1 && !tlb->fullmm && !tlb->need_flush_all)
-		radix__flush_tlb_range_psize(mm, tlb->start, tlb->end, psize);
-	else if (tlb->need_flush_all) {
-		tlb->need_flush_all = 0;
+	if (tlb->fullmm) {
 		radix__flush_all_mm(mm);
-	} else
-		radix__flush_tlb_mm(mm);
+	} else if ( (psize = radix_get_mmu_psize(page_size)) == -1) {
+		if (!tlb->need_flush_all)
+			radix__flush_tlb_mm(mm);
+		else
+			radix__flush_all_mm(mm);
+	} else {
+		unsigned long start = tlb->start;
+		unsigned long end = tlb->end;
+
+		if (!tlb->need_flush_all)
+			radix__flush_tlb_range_psize(mm, start, end, psize);
+		else
+			radix__flush_tlb_pwc_range_psize(mm, start, end, psize);
+	}
+	tlb->need_flush_all = 0;
 }
 
 #define TLB_FLUSH_ALL -1UL
@@ -384,8 +396,9 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
-bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize)
+static bool __radix__flush_tlb_range_psize(struct mm_struct *mm,
+				unsigned long start, unsigned long end,
+				int psize, bool also_pwc)
 {
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[psize].shift;
@@ -401,17 +414,21 @@ bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
 					tlb_local_single_page_flush_ceiling) {
 			full = true;
-			_tlbiel_pid(pid, RIC_FLUSH_TLB);
+			_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
 		} else {
 			_tlbiel_va_range(start, end, pid, page_size, psize);
+			if (also_pwc)
+				_tlbiel_pid(pid, RIC_FLUSH_PWC);
 		}
 	} else {
 		if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
 					tlb_single_page_flush_ceiling) {
 			full = true;
-			_tlbie_pid(pid, RIC_FLUSH_TLB);
+			_tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
 		} else {
 			_tlbie_va_range(start, end, pid, page_size, psize);
+			if (also_pwc)
+				_tlbie_pid(pid, RIC_FLUSH_PWC);
 		}
 	}
 	preempt_enable();
@@ -419,6 +436,18 @@ bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 	return full;
 }
 
+bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
+				  unsigned long end, int psize)
+{
+	return __radix__flush_tlb_range_psize(mm, start, end, psize, false);
+}
+
+static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				  unsigned long end, int psize)
+{
+	__radix__flush_tlb_range_psize(mm, start, end, psize, true);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
                   ` (6 preceding siblings ...)
  2017-09-07 14:51 ` [RFC PATCH 7/8] powerpc/64s/radix: Improve TLB flushing for unmaps that free a page table Nicholas Piggin
@ 2017-09-07 14:51 ` Nicholas Piggin
  2017-09-07 22:05   ` Benjamin Herrenschmidt
  2017-09-08  5:53   ` Aneesh Kumar K.V
  7 siblings, 2 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-07 14:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Benjamin Herrenschmidt,
	Anton Blanchard

When permissiveness is relaxed, or found to have been relaxed by
another thread, we flush that address out of the TLB to avoid a
future fault or micro-fault due to a stale TLB entry.

Currently for processes with TLBs on other CPUs, this flush is always
done with a global tlbie. Although that could reduce faults on remote
CPUs, a broadcast operation seems to be wasteful for something that
can be handled in-core by the remote CPU if it comes to it.

This is not benchmarked yet. It does seem cut some tlbie operations
from the bus.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  5 ++++
 arch/powerpc/include/asm/book3s/64/tlbflush.h      | 11 +++++++++
 arch/powerpc/mm/pgtable-book3s64.c                 |  5 +++-
 arch/powerpc/mm/pgtable.c                          |  2 +-
 arch/powerpc/mm/tlb-radix.c                        | 27 ++++++++++++++++++++++
 5 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index b12460b306a7..34cd864b8fc1 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -16,6 +16,8 @@ extern bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long sta
 					 unsigned long end, int psize);
 extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
 				       unsigned long start, unsigned long end);
+extern void radix__local_flush_pmd_tlb_range(struct vm_area_struct *vma,
+				unsigned long start, unsigned long end);
 extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end);
 extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end);
@@ -24,6 +26,9 @@ extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
 extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
 extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
 					      int psize);
+extern void radix__local_flush_tlb_range_psize(struct mm_struct *mm,
+				unsigned long start, unsigned long end,
+				int psize);
 extern void radix__tlb_flush(struct mmu_gather *tlb);
 #ifdef CONFIG_SMP
 extern void radix__flush_tlb_mm(struct mm_struct *mm);
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 72b925f97bab..8a8b3e11a28e 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -83,6 +83,17 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 #define flush_tlb_mm(mm)		local_flush_tlb_mm(mm)
 #define flush_tlb_page(vma, addr)	local_flush_tlb_page(vma, addr)
 #endif /* CONFIG_SMP */
+
+#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
+static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
+						unsigned long address)
+{
+	if (radix_enabled())
+		radix__local_flush_tlb_page(vma, address);
+	else
+		flush_tlb_page(vma, address);
+}
+
 /*
  * flush the page walk cache for the address
  */
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..e46f346388d6 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -40,7 +40,10 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 	if (changed) {
 		__ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
 					pmd_pte(entry), address);
-		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+		if (radix_enabled())
+			radix__local_flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+		else
+			flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	}
 	return changed;
 }
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index a03ff3d99e0c..acd6ae8062ce 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -223,7 +223,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 		if (!is_vm_hugetlb_page(vma))
 			assert_pte_locked(vma->vm_mm, address);
 		__ptep_set_access_flags(vma->vm_mm, ptep, entry, address);
-		flush_tlb_page(vma, address);
+		flush_tlb_fix_spurious_fault(vma, address);
 	}
 	return changed;
 }
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 7452e1f4aa3c..bcb41d037593 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -396,6 +396,27 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
+void radix__local_flush_tlb_range_psize(struct mm_struct *mm,
+				unsigned long start, unsigned long end,
+				int psize)
+{
+	unsigned long pid;
+	unsigned int page_shift = mmu_psize_defs[psize].shift;
+	unsigned long page_size = 1UL << page_shift;
+
+	pid = mm ? mm->context.id : 0;
+	if (unlikely(pid == MMU_NO_CONTEXT))
+		return;
+
+	preempt_disable();
+	if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
+				tlb_local_single_page_flush_ceiling)
+		_tlbiel_pid(pid, RIC_FLUSH_TLB);
+	else
+		_tlbiel_va_range(start, end, pid, page_size, psize);
+	preempt_enable();
+}
+
 static bool __radix__flush_tlb_range_psize(struct mm_struct *mm,
 				unsigned long start, unsigned long end,
 				int psize, bool also_pwc)
@@ -518,6 +539,12 @@ void radix__flush_tlb_lpid(unsigned long lpid)
 }
 EXPORT_SYMBOL(radix__flush_tlb_lpid);
 
+void radix__local_flush_pmd_tlb_range(struct vm_area_struct *vma,
+				unsigned long start, unsigned long end)
+{
+	radix__local_flush_tlb_range_psize(vma->vm_mm, start, end, MMU_PAGE_2M);
+}
+
 void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
 				unsigned long start, unsigned long end)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-07 14:51 ` [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes Nicholas Piggin
@ 2017-09-07 22:05   ` Benjamin Herrenschmidt
  2017-09-08  4:44     ` Nicholas Piggin
  2017-09-08  5:53   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-07 22:05 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Anton Blanchard

On Fri, 2017-09-08 at 00:51 +1000, Nicholas Piggin wrote:
> When permissiveness is relaxed, or found to have been relaxed by
> another thread, we flush that address out of the TLB to avoid a
> future fault or micro-fault due to a stale TLB entry.
> 
> Currently for processes with TLBs on other CPUs, this flush is always
> done with a global tlbie. Although that could reduce faults on remote
> CPUs, a broadcast operation seems to be wasteful for something that
> can be handled in-core by the remote CPU if it comes to it.
> 
> This is not benchmarked yet. It does seem cut some tlbie operations
> from the bus.

What happens with the nest MMU here ?

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  5 ++++
>  arch/powerpc/include/asm/book3s/64/tlbflush.h      | 11 +++++++++
>  arch/powerpc/mm/pgtable-book3s64.c                 |  5 +++-
>  arch/powerpc/mm/pgtable.c                          |  2 +-
>  arch/powerpc/mm/tlb-radix.c                        | 27 ++++++++++++++++++++++
>  5 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> index b12460b306a7..34cd864b8fc1 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> @@ -16,6 +16,8 @@ extern bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long sta
>  					 unsigned long end, int psize);
>  extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
>  				       unsigned long start, unsigned long end);
> +extern void radix__local_flush_pmd_tlb_range(struct vm_area_struct *vma,
> +				unsigned long start, unsigned long end);
>  extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>  			    unsigned long end);
>  extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end);
> @@ -24,6 +26,9 @@ extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
>  extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
>  extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
>  					      int psize);
> +extern void radix__local_flush_tlb_range_psize(struct mm_struct *mm,
> +				unsigned long start, unsigned long end,
> +				int psize);
>  extern void radix__tlb_flush(struct mmu_gather *tlb);
>  #ifdef CONFIG_SMP
>  extern void radix__flush_tlb_mm(struct mm_struct *mm);
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index 72b925f97bab..8a8b3e11a28e 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -83,6 +83,17 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
>  #define flush_tlb_mm(mm)		local_flush_tlb_mm(mm)
>  #define flush_tlb_page(vma, addr)	local_flush_tlb_page(vma, addr)
>  #endif /* CONFIG_SMP */
> +
> +#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
> +static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> +						unsigned long address)
> +{
> +	if (radix_enabled())
> +		radix__local_flush_tlb_page(vma, address);
> +	else
> +		flush_tlb_page(vma, address);
> +}
> +
>  /*
>   * flush the page walk cache for the address
>   */
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 3b65917785a5..e46f346388d6 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -40,7 +40,10 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  	if (changed) {
>  		__ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
>  					pmd_pte(entry), address);
> -		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> +		if (radix_enabled())
> +			radix__local_flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> +		else
> +			flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>  	}
>  	return changed;
>  }
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index a03ff3d99e0c..acd6ae8062ce 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -223,7 +223,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  		if (!is_vm_hugetlb_page(vma))
>  			assert_pte_locked(vma->vm_mm, address);
>  		__ptep_set_access_flags(vma->vm_mm, ptep, entry, address);
> -		flush_tlb_page(vma, address);
> +		flush_tlb_fix_spurious_fault(vma, address);
>  	}
>  	return changed;
>  }
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 7452e1f4aa3c..bcb41d037593 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -396,6 +396,27 @@ void radix__tlb_flush(struct mmu_gather *tlb)
>  static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>  
> +void radix__local_flush_tlb_range_psize(struct mm_struct *mm,
> +				unsigned long start, unsigned long end,
> +				int psize)
> +{
> +	unsigned long pid;
> +	unsigned int page_shift = mmu_psize_defs[psize].shift;
> +	unsigned long page_size = 1UL << page_shift;
> +
> +	pid = mm ? mm->context.id : 0;
> +	if (unlikely(pid == MMU_NO_CONTEXT))
> +		return;
> +
> +	preempt_disable();
> +	if (end == TLB_FLUSH_ALL || ((end - start) >> page_shift) >
> +				tlb_local_single_page_flush_ceiling)
> +		_tlbiel_pid(pid, RIC_FLUSH_TLB);
> +	else
> +		_tlbiel_va_range(start, end, pid, page_size, psize);
> +	preempt_enable();
> +}
> +
>  static bool __radix__flush_tlb_range_psize(struct mm_struct *mm,
>  				unsigned long start, unsigned long end,
>  				int psize, bool also_pwc)
> @@ -518,6 +539,12 @@ void radix__flush_tlb_lpid(unsigned long lpid)
>  }
>  EXPORT_SYMBOL(radix__flush_tlb_lpid);
>  
> +void radix__local_flush_pmd_tlb_range(struct vm_area_struct *vma,
> +				unsigned long start, unsigned long end)
> +{
> +	radix__local_flush_tlb_range_psize(vma->vm_mm, start, end, MMU_PAGE_2M);
> +}
> +
>  void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
>  				unsigned long start, unsigned long end)
>  {

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-07 22:05   ` Benjamin Herrenschmidt
@ 2017-09-08  4:44     ` Nicholas Piggin
  2017-09-08  5:55       ` Benjamin Herrenschmidt
  2017-09-08  7:03       ` Nicholas Piggin
  0 siblings, 2 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-08  4:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Aneesh Kumar K . V, Anton Blanchard

On Fri, 08 Sep 2017 08:05:38 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2017-09-08 at 00:51 +1000, Nicholas Piggin wrote:
> > When permissiveness is relaxed, or found to have been relaxed by
> > another thread, we flush that address out of the TLB to avoid a
> > future fault or micro-fault due to a stale TLB entry.
> > 
> > Currently for processes with TLBs on other CPUs, this flush is always
> > done with a global tlbie. Although that could reduce faults on remote
> > CPUs, a broadcast operation seems to be wasteful for something that
> > can be handled in-core by the remote CPU if it comes to it.
> > 
> > This is not benchmarked yet. It does seem cut some tlbie operations
> > from the bus.  
> 
> What happens with the nest MMU here ?

Good question, I'm not sure. I can't tell from the UM or not if the
agent and NMMU must discard cached translations if there is a
translation cached but it has a permission fault. It's not clear 
from that I've read that if it's relying on the host to send back a
tlbie.

I'll keep digging.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-07 14:51 ` [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes Nicholas Piggin
  2017-09-07 22:05   ` Benjamin Herrenschmidt
@ 2017-09-08  5:53   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 14+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-08  5:53 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Nicholas Piggin, Benjamin Herrenschmidt, Anton Blanchard

Nicholas Piggin <npiggin@gmail.com> writes:

> When permissiveness is relaxed, or found to have been relaxed by
> another thread, we flush that address out of the TLB to avoid a
> future fault or micro-fault due to a stale TLB entry.
>
> Currently for processes with TLBs on other CPUs, this flush is always
> done with a global tlbie. Although that could reduce faults on remote
> CPUs, a broadcast operation seems to be wasteful for something that
> can be handled in-core by the remote CPU if it comes to it.
>
> This is not benchmarked yet. It does seem cut some tlbie operations
> from the bus.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  .../powerpc/include/asm/book3s/64/tlbflush-radix.h |  5 ++++
>  arch/powerpc/include/asm/book3s/64/tlbflush.h      | 11 +++++++++
>  arch/powerpc/mm/pgtable-book3s64.c                 |  5 +++-
>  arch/powerpc/mm/pgtable.c                          |  2 +-
>  arch/powerpc/mm/tlb-radix.c                        | 27 ++++++++++++++++++++++
>  5 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> index b12460b306a7..34cd864b8fc1 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> @@ -16,6 +16,8 @@ extern bool radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long sta
>  					 unsigned long end, int psize);
>  extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
>  				       unsigned long start, unsigned long end);
> +extern void radix__local_flush_pmd_tlb_range(struct vm_area_struct *vma,
> +				unsigned long start, unsigned long end);
>  extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>  			    unsigned long end);
>  extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end);
> @@ -24,6 +26,9 @@ extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
>  extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
>  extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
>  					      int psize);
> +extern void radix__local_flush_tlb_range_psize(struct mm_struct *mm,
> +				unsigned long start, unsigned long end,
> +				int psize);
>  extern void radix__tlb_flush(struct mmu_gather *tlb);
>  #ifdef CONFIG_SMP
>  extern void radix__flush_tlb_mm(struct mm_struct *mm);
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index 72b925f97bab..8a8b3e11a28e 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -83,6 +83,17 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
>  #define flush_tlb_mm(mm)		local_flush_tlb_mm(mm)
>  #define flush_tlb_page(vma, addr)	local_flush_tlb_page(vma, addr)
>  #endif /* CONFIG_SMP */
> +
> +#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
> +static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> +						unsigned long address)
> +{
> +	if (radix_enabled())
> +		radix__local_flush_tlb_page(vma, address);
> +	else
> +		flush_tlb_page(vma, address);
> +}
> +
>  /*
>   * flush the page walk cache for the address
>   */
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 3b65917785a5..e46f346388d6 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -40,7 +40,10 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  	if (changed) {
>  		__ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
>  					pmd_pte(entry), address);
> -		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> +		if (radix_enabled())
> +			radix__local_flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> +		else
> +			flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
			^^^^  this is no-op for hash.


>  	}
>  	return changed;
>  }
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index a03ff3d99e0c..acd6ae8062ce 100644


-aneesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-08  4:44     ` Nicholas Piggin
@ 2017-09-08  5:55       ` Benjamin Herrenschmidt
  2017-09-08  7:03       ` Nicholas Piggin
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-08  5:55 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Aneesh Kumar K . V, Anton Blanchard

On Fri, 2017-09-08 at 14:44 +1000, Nicholas Piggin wrote:
> On Fri, 08 Sep 2017 08:05:38 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Fri, 2017-09-08 at 00:51 +1000, Nicholas Piggin wrote:
> > > When permissiveness is relaxed, or found to have been relaxed by
> > > another thread, we flush that address out of the TLB to avoid a
> > > future fault or micro-fault due to a stale TLB entry.
> > > 
> > > Currently for processes with TLBs on other CPUs, this flush is always
> > > done with a global tlbie. Although that could reduce faults on remote
> > > CPUs, a broadcast operation seems to be wasteful for something that
> > > can be handled in-core by the remote CPU if it comes to it.
> > > 
> > > This is not benchmarked yet. It does seem cut some tlbie operations
> > > from the bus.  
> > 
> > What happens with the nest MMU here ?
> 
> Good question, I'm not sure. I can't tell from the UM or not if the
> agent and NMMU must discard cached translations if there is a
> translation cached but it has a permission fault. It's not clear 
> from that I've read that if it's relying on the host to send back a
> tlbie.

I think it's supposed to re-do a tablewalk.

> I'll keep digging.
> 
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes
  2017-09-08  4:44     ` Nicholas Piggin
  2017-09-08  5:55       ` Benjamin Herrenschmidt
@ 2017-09-08  7:03       ` Nicholas Piggin
  1 sibling, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2017-09-08  7:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Aneesh Kumar K . V, Anton Blanchard, Alistair Popple

On Fri, 8 Sep 2017 14:44:37 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Fri, 08 Sep 2017 08:05:38 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Fri, 2017-09-08 at 00:51 +1000, Nicholas Piggin wrote:  
> > > When permissiveness is relaxed, or found to have been relaxed by
> > > another thread, we flush that address out of the TLB to avoid a
> > > future fault or micro-fault due to a stale TLB entry.
> > > 
> > > Currently for processes with TLBs on other CPUs, this flush is always
> > > done with a global tlbie. Although that could reduce faults on remote
> > > CPUs, a broadcast operation seems to be wasteful for something that
> > > can be handled in-core by the remote CPU if it comes to it.
> > > 
> > > This is not benchmarked yet. It does seem cut some tlbie operations
> > > from the bus.    
> > 
> > What happens with the nest MMU here ?  
> 
> Good question, I'm not sure. I can't tell from the UM or not if the
> agent and NMMU must discard cached translations if there is a
> translation cached but it has a permission fault. It's not clear 
> from that I've read that if it's relying on the host to send back a
> tlbie.
> 
> I'll keep digging.

Okay, talking to Alistair and for NPU/CAPI, this is a concern because
the NMMU may not walk page tables if it has a cached translation. Have
to confirm that with hardware.

So we've got them wanting to hook into the core code and make it give
them tibies.

For now I'll put this on hold until NPU and CAPI and VAS get their
patches in and working, then maybe revisit. Might be good to get them
all moved to using a common nmmu.c driver that gives them a page fault
handler and uses mmu notifiers to do their nmmu invalidations.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-08  7:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07 14:51 [RFC PATCH 0/8] Further radix TLB flush optimisations Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 1/8] powerpc/64s/radix: Fix theoretical process table entry cache invalidation Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 2/8] powerpc/64s/radix: tlbie improve preempt handling Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 3/8] powerpc/64s/radix: optimize TLB range flush barriers Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 4/8] powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 5/8] powerpc/64s/radix: Introduce local single page ceiling for TLB range flush Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 6/8] powerpc/64s/radix: Optimize flush_tlb_range Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 7/8] powerpc/64s/radix: Improve TLB flushing for unmaps that free a page table Nicholas Piggin
2017-09-07 14:51 ` [RFC PATCH 8/8] powerpc/64s/radix: Only flush local TLB for spurious fault flushes Nicholas Piggin
2017-09-07 22:05   ` Benjamin Herrenschmidt
2017-09-08  4:44     ` Nicholas Piggin
2017-09-08  5:55       ` Benjamin Herrenschmidt
2017-09-08  7:03       ` Nicholas Piggin
2017-09-08  5:53   ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.