All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] mm: mmu_gather changes to support explicit paging
@ 2018-07-25 14:06 ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-arch, linuxppc-dev, linux-arm-kernel, Nicholas Piggin

The first 3 patches in this series are some generic mm changes I
would like to make, including a possible fix which may(?) be needed
for ARM64. Other than the bugfix, these first 3 patches should not
change anything so hopefully they aren't too controversial.

The powerpc patch is also there for reference. 

Thanks,
Nick

Nicholas Piggin (4):
  mm: move tlb_table_flush to tlb_flush_mmu_free
  mm: mmu_notifier fix for tlb_end_vma
  mm: allow arch to have tlb_flush caled on an empty TLB range
  powerpc/64s/radix: optimise TLB flush with precise TLB ranges in
    mmu_gather

 arch/powerpc/include/asm/tlb.h | 34 +++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      | 35 ++++++++++++++++++++++++++++++----
 mm/memory.c                    | 14 ++------------
 4 files changed, 77 insertions(+), 16 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 0/4] mm: mmu_gather changes to support explicit paging
@ 2018-07-25 14:06 ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: Nicholas Piggin, linuxppc-dev, linux-arch, linux-arm-kernel

The first 3 patches in this series are some generic mm changes I
would like to make, including a possible fix which may(?) be needed
for ARM64. Other than the bugfix, these first 3 patches should not
change anything so hopefully they aren't too controversial.

The powerpc patch is also there for reference. 

Thanks,
Nick

Nicholas Piggin (4):
  mm: move tlb_table_flush to tlb_flush_mmu_free
  mm: mmu_notifier fix for tlb_end_vma
  mm: allow arch to have tlb_flush caled on an empty TLB range
  powerpc/64s/radix: optimise TLB flush with precise TLB ranges in
    mmu_gather

 arch/powerpc/include/asm/tlb.h | 34 +++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      | 35 ++++++++++++++++++++++++++++++----
 mm/memory.c                    | 14 ++------------
 4 files changed, 77 insertions(+), 16 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 0/4] mm: mmu_gather changes to support explicit paging
@ 2018-07-25 14:06 ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

The first 3 patches in this series are some generic mm changes I
would like to make, including a possible fix which may(?) be needed
for ARM64. Other than the bugfix, these first 3 patches should not
change anything so hopefully they aren't too controversial.

The powerpc patch is also there for reference. 

Thanks,
Nick

Nicholas Piggin (4):
  mm: move tlb_table_flush to tlb_flush_mmu_free
  mm: mmu_notifier fix for tlb_end_vma
  mm: allow arch to have tlb_flush caled on an empty TLB range
  powerpc/64s/radix: optimise TLB flush with precise TLB ranges in
    mmu_gather

 arch/powerpc/include/asm/tlb.h | 34 +++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      | 35 ++++++++++++++++++++++++++++++----
 mm/memory.c                    | 14 ++------------
 4 files changed, 77 insertions(+), 16 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/4] mm: move tlb_table_flush to tlb_flush_mmu_free
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-arch, linuxppc-dev, linux-arm-kernel, Nicholas Piggin

There is no need to call this from tlb_flush_mmu_tlbonly, it
logically belongs with tlb_flush_mmu_free. This allows some
code consolidation with a subsequent fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..bc053d5e9d41 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -245,9 +245,6 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 
 	tlb_flush(tlb);
 	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
 	__tlb_reset_range(tlb);
 }
 
@@ -255,6 +252,9 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/4] mm: move tlb_table_flush to tlb_flush_mmu_free
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: Nicholas Piggin, linuxppc-dev, linux-arch, linux-arm-kernel

There is no need to call this from tlb_flush_mmu_tlbonly, it
logically belongs with tlb_flush_mmu_free. This allows some
code consolidation with a subsequent fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..bc053d5e9d41 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -245,9 +245,6 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 
 	tlb_flush(tlb);
 	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
 	__tlb_reset_range(tlb);
 }
 
@@ -255,6 +252,9 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/4] mm: move tlb_table_flush to tlb_flush_mmu_free
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to call this from tlb_flush_mmu_tlbonly, it
logically belongs with tlb_flush_mmu_free. This allows some
code consolidation with a subsequent fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..bc053d5e9d41 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -245,9 +245,6 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 
 	tlb_flush(tlb);
 	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-	tlb_table_flush(tlb);
-#endif
 	__tlb_reset_range(tlb);
 }
 
@@ -255,6 +252,9 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/4] mm: mmu_notifier fix for tlb_end_vma
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-arch, linuxppc-dev, linux-arm-kernel, Nicholas Piggin

The generic tlb_end_vma does not call invalidate_range mmu notifier,
and it resets resets the mmu_gather range, which means the notifier
won't be called on part of the range in case of an unmap that spans
multiple vmas.

ARM64 seems to be the only arch I could see that has notifiers and
uses the generic tlb_end_vma. I have not actually tested it.
---
 include/asm-generic/tlb.h | 17 +++++++++++++----
 mm/memory.c               | 10 ----------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3063125197ad..b3353e21f3b3 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -15,6 +15,7 @@
 #ifndef _ASM_GENERIC__TLB_H
 #define _ASM_GENERIC__TLB_H
 
+#include <linux/mmu_notifier.h>
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -138,6 +139,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
+{
+	if (!tlb->end)
+		return;
+
+	tlb_flush(tlb);
+	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
+	__tlb_reset_range(tlb);
+}
+
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
 					struct page *page, int page_size)
 {
@@ -186,10 +197,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 
 #define __tlb_end_vma(tlb, vma)					\
 	do {							\
-		if (!tlb->fullmm && tlb->end) {			\
-			tlb_flush(tlb);				\
-			__tlb_reset_range(tlb);			\
-		}						\
+		if (!tlb->fullmm)				\
+			tlb_flush_mmu_tlbonly(tlb);		\
 	} while (0)
 
 #ifndef tlb_end_vma
diff --git a/mm/memory.c b/mm/memory.c
index bc053d5e9d41..135d18b31e44 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -238,16 +238,6 @@ void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 	__tlb_reset_range(tlb);
 }
 
-static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	if (!tlb->end)
-		return;
-
-	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-	__tlb_reset_range(tlb);
-}
-
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/4] mm: mmu_notifier fix for tlb_end_vma
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: Nicholas Piggin, linuxppc-dev, linux-arch, linux-arm-kernel

The generic tlb_end_vma does not call invalidate_range mmu notifier,
and it resets resets the mmu_gather range, which means the notifier
won't be called on part of the range in case of an unmap that spans
multiple vmas.

ARM64 seems to be the only arch I could see that has notifiers and
uses the generic tlb_end_vma. I have not actually tested it.
---
 include/asm-generic/tlb.h | 17 +++++++++++++----
 mm/memory.c               | 10 ----------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3063125197ad..b3353e21f3b3 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -15,6 +15,7 @@
 #ifndef _ASM_GENERIC__TLB_H
 #define _ASM_GENERIC__TLB_H
 
+#include <linux/mmu_notifier.h>
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -138,6 +139,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
+{
+	if (!tlb->end)
+		return;
+
+	tlb_flush(tlb);
+	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
+	__tlb_reset_range(tlb);
+}
+
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
 					struct page *page, int page_size)
 {
@@ -186,10 +197,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 
 #define __tlb_end_vma(tlb, vma)					\
 	do {							\
-		if (!tlb->fullmm && tlb->end) {			\
-			tlb_flush(tlb);				\
-			__tlb_reset_range(tlb);			\
-		}						\
+		if (!tlb->fullmm)				\
+			tlb_flush_mmu_tlbonly(tlb);		\
 	} while (0)
 
 #ifndef tlb_end_vma
diff --git a/mm/memory.c b/mm/memory.c
index bc053d5e9d41..135d18b31e44 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -238,16 +238,6 @@ void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 	__tlb_reset_range(tlb);
 }
 
-static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	if (!tlb->end)
-		return;
-
-	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-	__tlb_reset_range(tlb);
-}
-
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/4] mm: mmu_notifier fix for tlb_end_vma
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

The generic tlb_end_vma does not call invalidate_range mmu notifier,
and it resets resets the mmu_gather range, which means the notifier
won't be called on part of the range in case of an unmap that spans
multiple vmas.

ARM64 seems to be the only arch I could see that has notifiers and
uses the generic tlb_end_vma. I have not actually tested it.
---
 include/asm-generic/tlb.h | 17 +++++++++++++----
 mm/memory.c               | 10 ----------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3063125197ad..b3353e21f3b3 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -15,6 +15,7 @@
 #ifndef _ASM_GENERIC__TLB_H
 #define _ASM_GENERIC__TLB_H
 
+#include <linux/mmu_notifier.h>
 #include <linux/swap.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
@@ -138,6 +139,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
+{
+	if (!tlb->end)
+		return;
+
+	tlb_flush(tlb);
+	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
+	__tlb_reset_range(tlb);
+}
+
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
 					struct page *page, int page_size)
 {
@@ -186,10 +197,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 
 #define __tlb_end_vma(tlb, vma)					\
 	do {							\
-		if (!tlb->fullmm && tlb->end) {			\
-			tlb_flush(tlb);				\
-			__tlb_reset_range(tlb);			\
-		}						\
+		if (!tlb->fullmm)				\
+			tlb_flush_mmu_tlbonly(tlb);		\
 	} while (0)
 
 #ifndef tlb_end_vma
diff --git a/mm/memory.c b/mm/memory.c
index bc053d5e9d41..135d18b31e44 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -238,16 +238,6 @@ void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 	__tlb_reset_range(tlb);
 }
 
-static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-	if (!tlb->end)
-		return;
-
-	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
-	__tlb_reset_range(tlb);
-}
-
 static void tlb_flush_mmu_free(struct mmu_gather *tlb)
 {
 	struct mmu_gather_batch *batch;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/4] mm: allow arch to have tlb_flush caled on an empty TLB range
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-arch, linuxppc-dev, linux-arm-kernel, Nicholas Piggin

powerpc wants to de-couple page table caching structure flushes
from TLB flushes, which will make it possible to have mmu_gather
with freed page table pages but no TLB range. These must be sent
to tlb_flush, so allow the arch to specify when mmu_gather with
empty ranges should have tlb_flush called.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/tlb.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..b320c0cc8996 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -139,14 +139,27 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+/*
+ * arch_tlb_mustflush specifies if tlb_flush is to be called even if the
+ * TLB range is empty (this can be the case for freeing page table pages
+ * if the arch does not adjust TLB range to cover them).
+ */
+#ifndef arch_tlb_mustflush
+#define arch_tlb_mustflush(tlb) false
+#endif
+
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-	if (!tlb->end)
+	unsigned long start = tlb->start;
+	unsigned long end = tlb->end;
+
+	if (!(end || arch_tlb_mustflush(tlb)))
 		return;
 
 	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
 	__tlb_reset_range(tlb);
+	if (end)
+		mmu_notifier_invalidate_range(tlb->mm, start, end);
 }
 
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/4] mm: allow arch to have tlb_flush caled on an empty TLB range
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: Nicholas Piggin, linuxppc-dev, linux-arch, linux-arm-kernel

powerpc wants to de-couple page table caching structure flushes
from TLB flushes, which will make it possible to have mmu_gather
with freed page table pages but no TLB range. These must be sent
to tlb_flush, so allow the arch to specify when mmu_gather with
empty ranges should have tlb_flush called.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/tlb.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..b320c0cc8996 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -139,14 +139,27 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+/*
+ * arch_tlb_mustflush specifies if tlb_flush is to be called even if the
+ * TLB range is empty (this can be the case for freeing page table pages
+ * if the arch does not adjust TLB range to cover them).
+ */
+#ifndef arch_tlb_mustflush
+#define arch_tlb_mustflush(tlb) false
+#endif
+
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-	if (!tlb->end)
+	unsigned long start = tlb->start;
+	unsigned long end = tlb->end;
+
+	if (!(end || arch_tlb_mustflush(tlb)))
 		return;
 
 	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
 	__tlb_reset_range(tlb);
+	if (end)
+		mmu_notifier_invalidate_range(tlb->mm, start, end);
 }
 
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/4] mm: allow arch to have tlb_flush caled on an empty TLB range
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

powerpc wants to de-couple page table caching structure flushes
from TLB flushes, which will make it possible to have mmu_gather
with freed page table pages but no TLB range. These must be sent
to tlb_flush, so allow the arch to specify when mmu_gather with
empty ranges should have tlb_flush called.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/tlb.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..b320c0cc8996 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -139,14 +139,27 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
 	}
 }
 
+/*
+ * arch_tlb_mustflush specifies if tlb_flush is to be called even if the
+ * TLB range is empty (this can be the case for freeing page table pages
+ * if the arch does not adjust TLB range to cover them).
+ */
+#ifndef arch_tlb_mustflush
+#define arch_tlb_mustflush(tlb) false
+#endif
+
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-	if (!tlb->end)
+	unsigned long start = tlb->start;
+	unsigned long end = tlb->end;
+
+	if (!(end || arch_tlb_mustflush(tlb)))
 		return;
 
 	tlb_flush(tlb);
-	mmu_notifier_invalidate_range(tlb->mm, tlb->start, tlb->end);
 	__tlb_reset_range(tlb);
+	if (end)
+		mmu_notifier_invalidate_range(tlb->mm, start, end);
 }
 
 static inline void tlb_remove_page_size(struct mmu_gather *tlb,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/4] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-arch, linuxppc-dev, linux-arm-kernel, Nicholas Piggin

The mmu_gather APIs keep track of the invalidated address range, and
the generic page table freeing accessors expand the invalidated range
to cover the addresses corresponding to the page tables even if there
are no ptes and therefore no TLB entries to invalidate. This is done
for architectures that have paging structure caches that are
invalidated with their TLB invalidate instructions (e.g., x86).

powerpc/64s/radix does have a "page walk cache" (PWC), but it is
invalidated with a specific instruction and tracked independently in
the mmu_gather (using the need_flush_all flag to indicate PWC must be
flushed). Therefore TLB invalidation does not have to be expanded to
cover freed page tables.

This patch defines p??_free_tlb functions for 64s, which do not expand
the TLB flush range over page table pages. This brings the number of
tlbiel instructions required by a kernel compile from 33M to 25M, most
avoided from exec => shift_arg_pages().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/tlb.h | 34 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      |  5 +++++
 3 files changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 9138baccebb0..5d3107f2b014 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,40 @@
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
+#ifdef CONFIG_PPC_BOOK3S_64
+/*
+ * powerpc book3s hash does not have page table structure caches, and
+ * radix requires explicit management with PWC invalidate tlb type, so
+ * there is no need to expand the mmu_gather range over invalidated page
+ * table pages like the generic code does.
+ */
+
+#define pte_free_tlb(tlb, ptep, address)			\
+	do {							\
+		__pte_free_tlb(tlb, ptep, address);		\
+	} while (0)
+
+#define pmd_free_tlb(tlb, pmdp, address)			\
+	do {							\
+		__pmd_free_tlb(tlb, pmdp, address);		\
+	} while (0)
+
+#define pud_free_tlb(tlb, pudp, address)			\
+	do {							\
+		__pud_free_tlb(tlb, pudp, address);		\
+	} while (0)
+
+/*
+ * Radix sets need_flush_all when page table pages have been unmapped
+ * and the PWC needs flushing. Generic code must call our tlb_flush
+ * even on empty ranges in this case.
+ *
+ * This will always be false for hash.
+ */
+#define arch_tlb_mustflush(tlb) (tlb->need_flush_all)
+
+#endif
+
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1135b43a597c..238b20a513e7 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -862,6 +862,16 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	unsigned long start = tlb->start;
 	unsigned long end = tlb->end;
 
+	/*
+	 * This can happen if need_flush_all is set due to a page table
+	 * invalidate, but no ptes under it freed (see arch_tlb_mustflush).
+	 * Set end = start to prevent any TLB flushing here (only PWC).
+	 */
+	if (!end) {
+		WARN_ON_ONCE(!tlb->need_flush_all);
+		end = start;
+	}
+
 	/*
 	 * if page size is not something we understand, do a full mm flush
 	 *
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b320c0cc8996..a55ef1425f0d 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -285,6 +285,11 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw@mail.gmail.com
  *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
+ *
+ * Update: powerpc (Book3S 64-bit, radix MMU) has an architected page table
+ * cache (called PWC), and invalidates it specifically. It sets the
+ * need_flush_all flag to indicate the PWC requires flushing, so it defines
+ * its own p??_free_tlb functions which do not expand the TLB range.
  */
 
 #ifndef pte_free_tlb
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/4] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-mm; +Cc: Nicholas Piggin, linuxppc-dev, linux-arch, linux-arm-kernel

The mmu_gather APIs keep track of the invalidated address range, and
the generic page table freeing accessors expand the invalidated range
to cover the addresses corresponding to the page tables even if there
are no ptes and therefore no TLB entries to invalidate. This is done
for architectures that have paging structure caches that are
invalidated with their TLB invalidate instructions (e.g., x86).

powerpc/64s/radix does have a "page walk cache" (PWC), but it is
invalidated with a specific instruction and tracked independently in
the mmu_gather (using the need_flush_all flag to indicate PWC must be
flushed). Therefore TLB invalidation does not have to be expanded to
cover freed page tables.

This patch defines p??_free_tlb functions for 64s, which do not expand
the TLB flush range over page table pages. This brings the number of
tlbiel instructions required by a kernel compile from 33M to 25M, most
avoided from exec => shift_arg_pages().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/tlb.h | 34 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      |  5 +++++
 3 files changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 9138baccebb0..5d3107f2b014 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,40 @@
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
+#ifdef CONFIG_PPC_BOOK3S_64
+/*
+ * powerpc book3s hash does not have page table structure caches, and
+ * radix requires explicit management with PWC invalidate tlb type, so
+ * there is no need to expand the mmu_gather range over invalidated page
+ * table pages like the generic code does.
+ */
+
+#define pte_free_tlb(tlb, ptep, address)			\
+	do {							\
+		__pte_free_tlb(tlb, ptep, address);		\
+	} while (0)
+
+#define pmd_free_tlb(tlb, pmdp, address)			\
+	do {							\
+		__pmd_free_tlb(tlb, pmdp, address);		\
+	} while (0)
+
+#define pud_free_tlb(tlb, pudp, address)			\
+	do {							\
+		__pud_free_tlb(tlb, pudp, address);		\
+	} while (0)
+
+/*
+ * Radix sets need_flush_all when page table pages have been unmapped
+ * and the PWC needs flushing. Generic code must call our tlb_flush
+ * even on empty ranges in this case.
+ *
+ * This will always be false for hash.
+ */
+#define arch_tlb_mustflush(tlb) (tlb->need_flush_all)
+
+#endif
+
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1135b43a597c..238b20a513e7 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -862,6 +862,16 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	unsigned long start = tlb->start;
 	unsigned long end = tlb->end;
 
+	/*
+	 * This can happen if need_flush_all is set due to a page table
+	 * invalidate, but no ptes under it freed (see arch_tlb_mustflush).
+	 * Set end = start to prevent any TLB flushing here (only PWC).
+	 */
+	if (!end) {
+		WARN_ON_ONCE(!tlb->need_flush_all);
+		end = start;
+	}
+
 	/*
 	 * if page size is not something we understand, do a full mm flush
 	 *
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b320c0cc8996..a55ef1425f0d 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -285,6 +285,11 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw@mail.gmail.com
  *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
+ *
+ * Update: powerpc (Book3S 64-bit, radix MMU) has an architected page table
+ * cache (called PWC), and invalidates it specifically. It sets the
+ * need_flush_all flag to indicate the PWC requires flushing, so it defines
+ * its own p??_free_tlb functions which do not expand the TLB range.
  */
 
 #ifndef pte_free_tlb
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/4] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
@ 2018-07-25 14:06   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2018-07-25 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

The mmu_gather APIs keep track of the invalidated address range, and
the generic page table freeing accessors expand the invalidated range
to cover the addresses corresponding to the page tables even if there
are no ptes and therefore no TLB entries to invalidate. This is done
for architectures that have paging structure caches that are
invalidated with their TLB invalidate instructions (e.g., x86).

powerpc/64s/radix does have a "page walk cache" (PWC), but it is
invalidated with a specific instruction and tracked independently in
the mmu_gather (using the need_flush_all flag to indicate PWC must be
flushed). Therefore TLB invalidation does not have to be expanded to
cover freed page tables.

This patch defines p??_free_tlb functions for 64s, which do not expand
the TLB flush range over page table pages. This brings the number of
tlbiel instructions required by a kernel compile from 33M to 25M, most
avoided from exec => shift_arg_pages().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/tlb.h | 34 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/tlb-radix.c    | 10 ++++++++++
 include/asm-generic/tlb.h      |  5 +++++
 3 files changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 9138baccebb0..5d3107f2b014 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,40 @@
 #define __tlb_remove_tlb_entry	__tlb_remove_tlb_entry
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
+#ifdef CONFIG_PPC_BOOK3S_64
+/*
+ * powerpc book3s hash does not have page table structure caches, and
+ * radix requires explicit management with PWC invalidate tlb type, so
+ * there is no need to expand the mmu_gather range over invalidated page
+ * table pages like the generic code does.
+ */
+
+#define pte_free_tlb(tlb, ptep, address)			\
+	do {							\
+		__pte_free_tlb(tlb, ptep, address);		\
+	} while (0)
+
+#define pmd_free_tlb(tlb, pmdp, address)			\
+	do {							\
+		__pmd_free_tlb(tlb, pmdp, address);		\
+	} while (0)
+
+#define pud_free_tlb(tlb, pudp, address)			\
+	do {							\
+		__pud_free_tlb(tlb, pudp, address);		\
+	} while (0)
+
+/*
+ * Radix sets need_flush_all when page table pages have been unmapped
+ * and the PWC needs flushing. Generic code must call our tlb_flush
+ * even on empty ranges in this case.
+ *
+ * This will always be false for hash.
+ */
+#define arch_tlb_mustflush(tlb) (tlb->need_flush_all)
+
+#endif
+
 extern void tlb_flush(struct mmu_gather *tlb);
 
 /* Get the generic bits... */
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 1135b43a597c..238b20a513e7 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -862,6 +862,16 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	unsigned long start = tlb->start;
 	unsigned long end = tlb->end;
 
+	/*
+	 * This can happen if need_flush_all is set due to a page table
+	 * invalidate, but no ptes under it freed (see arch_tlb_mustflush).
+	 * Set end = start to prevent any TLB flushing here (only PWC).
+	 */
+	if (!end) {
+		WARN_ON_ONCE(!tlb->need_flush_all);
+		end = start;
+	}
+
 	/*
 	 * if page size is not something we understand, do a full mm flush
 	 *
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b320c0cc8996..a55ef1425f0d 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -285,6 +285,11 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw at mail.gmail.com
  *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
+ *
+ * Update: powerpc (Book3S 64-bit, radix MMU) has an architected page table
+ * cache (called PWC), and invalidates it specifically. It sets the
+ * need_flush_all flag to indicate the PWC requires flushing, so it defines
+ * its own p??_free_tlb functions which do not expand the TLB range.
  */
 
 #ifndef pte_free_tlb
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-07-25 15:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-25 14:06 [RFC PATCH 0/4] mm: mmu_gather changes to support explicit paging Nicholas Piggin
2018-07-25 14:06 ` Nicholas Piggin
2018-07-25 14:06 ` Nicholas Piggin
2018-07-25 14:06 ` [RFC PATCH 1/4] mm: move tlb_table_flush to tlb_flush_mmu_free Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06 ` [RFC PATCH 2/4] mm: mmu_notifier fix for tlb_end_vma Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06 ` [RFC PATCH 3/4] mm: allow arch to have tlb_flush caled on an empty TLB range Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06 ` [RFC PATCH 4/4] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin
2018-07-25 14:06   ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.