All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
@ 2019-03-27  0:41 Gary Guo
  2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

This is the v4 of the general TLB/I$ flush improvement series.
I still have tlbi_method=ipi being the default as opposed to what
Atish suggests, as:
* There are still usage of BBL in the wild
* OpenSBI's support isn't made into stable yet.
* OpenSBI's support on the dev branch has some racing issues yet to resolve.

Once most SBIs used in the wild can handle remote shootdown properly we can
the submit another patch to change the default value to tlbi_method=sbi again.

This patches does:
1. Move long and expensive functions aways from header files.
2. Fix missing arguments for SBI calls.
3. Performance improvements for TLB flush.
4. Implement IPI-based remote shootdown in case the SBI ignores ASID and
   vaddr operands.

Changes since v3:
 - Document tlbi_max_ops and tlbi_method in kernel-parameter.txt
 - Split IPI-based shootdown implementation into its own commit

Changes since v2:
 - Replace __setup with early_param
 - Rebase on top of for-next

Changes since v1:
 - Use kernel boot parameters instead of Kconfig
 - Style fixes

Gary Guo (5):
  riscv: move flush_icache_{all,mm} to cacheflush.c
  riscv: move switch_mm to its own file
  riscv: fix sbi_remote_sfence_vma{,_asid}.
  riscv: rewrite tlb flush for performance
  riscv: implement IPI-based remote TLB shootdown

 .../admin-guide/kernel-parameters.rst         |   1 +
 .../admin-guide/kernel-parameters.txt         |  13 ++
 arch/riscv/include/asm/cacheflush.h           |   2 +-
 arch/riscv/include/asm/mmu_context.h          |  59 +----
 arch/riscv/include/asm/pgtable.h              |   2 +-
 arch/riscv/include/asm/sbi.h                  |  19 +-
 arch/riscv/include/asm/tlbflush.h             |  76 +++----
 arch/riscv/kernel/smp.c                       |  49 ----
 arch/riscv/mm/Makefile                        |   2 +
 arch/riscv/mm/cacheflush.c                    |  61 +++++
 arch/riscv/mm/context.c                       |  77 +++++++
 arch/riscv/mm/init.c                          |   2 +-
 arch/riscv/mm/tlbflush.c                      | 215 ++++++++++++++++++
 13 files changed, 417 insertions(+), 161 deletions(-)
 create mode 100644 arch/riscv/mm/context.c
 create mode 100644 arch/riscv/mm/tlbflush.c

-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
@ 2019-03-27  0:41 ` Gary Guo
  2019-03-27  7:06   ` Christoph Hellwig
  2019-03-28  6:45   ` Anup Patel
  2019-03-27  0:41 ` [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid} Gary Guo
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

Currently, flush_icache_all is macro-expanded into a SBI call, yet no
asm/sbi.h is included in asm/cacheflush.h. This could be moved to
mm/cacheflush.c instead (SBI call will dominate performance-wise and
there is no worry to not have it inlined.

Currently, flush_icache_mm stays in kernel/smp.c, which looks like a
hack to prevent it from being compiled when CONFIG_SMP=n. It should
also be in mm/cacheflush.c.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 arch/riscv/include/asm/cacheflush.h |  2 +-
 arch/riscv/kernel/smp.c             | 49 -----------------------
 arch/riscv/mm/cacheflush.c          | 61 +++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 50 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 8f13074413a7..1f4ba68ab9aa 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -47,7 +47,7 @@ static inline void flush_dcache_page(struct page *page)
 
 #else /* CONFIG_SMP */
 
-#define flush_icache_all() sbi_remote_fence_i(NULL)
+void flush_icache_all(void);
 void flush_icache_mm(struct mm_struct *mm, bool local);
 
 #endif /* CONFIG_SMP */
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 0c41d07ec281..17f491e8ed0a 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -199,52 +199,3 @@ void smp_send_reschedule(int cpu)
 	send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
 }
 
-/*
- * Performs an icache flush for the given MM context.  RISC-V has no direct
- * mechanism for instruction cache shoot downs, so instead we send an IPI that
- * informs the remote harts they need to flush their local instruction caches.
- * To avoid pathologically slow behavior in a common case (a bunch of
- * single-hart processes on a many-hart machine, ie 'make -j') we avoid the
- * IPIs for harts that are not currently executing a MM context and instead
- * schedule a deferred local instruction cache flush to be performed before
- * execution resumes on each hart.
- */
-void flush_icache_mm(struct mm_struct *mm, bool local)
-{
-	unsigned int cpu;
-	cpumask_t others, hmask, *mask;
-
-	preempt_disable();
-
-	/* Mark every hart's icache as needing a flush for this MM. */
-	mask = &mm->context.icache_stale_mask;
-	cpumask_setall(mask);
-	/* Flush this hart's I$ now, and mark it as flushed. */
-	cpu = smp_processor_id();
-	cpumask_clear_cpu(cpu, mask);
-	local_flush_icache_all();
-
-	/*
-	 * Flush the I$ of other harts concurrently executing, and mark them as
-	 * flushed.
-	 */
-	cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
-	local |= cpumask_empty(&others);
-	if (mm != current->active_mm || !local) {
-		cpumask_clear(&hmask);
-		riscv_cpuid_to_hartid_mask(&others, &hmask);
-		sbi_remote_fence_i(hmask.bits);
-	} else {
-		/*
-		 * It's assumed that at least one strongly ordered operation is
-		 * performed on this hart between setting a hart's cpumask bit
-		 * and scheduling this MM context on that hart.  Sending an SBI
-		 * remote message will do this, but in the case where no
-		 * messages are sent we still need to order this hart's writes
-		 * with flush_icache_deferred().
-		 */
-		smp_mb();
-	}
-
-	preempt_enable();
-}
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 498c0a0814fe..497b7d07af0c 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -14,6 +14,67 @@
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
 
+#ifdef CONFIG_SMP
+
+#include <asm/sbi.h>
+
+void flush_icache_all(void)
+{
+	sbi_remote_fence_i(NULL);
+}
+
+/*
+ * Performs an icache flush for the given MM context.  RISC-V has no direct
+ * mechanism for instruction cache shoot downs, so instead we send an IPI that
+ * informs the remote harts they need to flush their local instruction caches.
+ * To avoid pathologically slow behavior in a common case (a bunch of
+ * single-hart processes on a many-hart machine, ie 'make -j') we avoid the
+ * IPIs for harts that are not currently executing a MM context and instead
+ * schedule a deferred local instruction cache flush to be performed before
+ * execution resumes on each hart.
+ */
+void flush_icache_mm(struct mm_struct *mm, bool local)
+{
+	unsigned int cpu;
+	cpumask_t others, hmask, *mask;
+
+	preempt_disable();
+
+	/* Mark every hart's icache as needing a flush for this MM. */
+	mask = &mm->context.icache_stale_mask;
+	cpumask_setall(mask);
+	/* Flush this hart's I$ now, and mark it as flushed. */
+	cpu = smp_processor_id();
+	cpumask_clear_cpu(cpu, mask);
+	local_flush_icache_all();
+
+	/*
+	 * Flush the I$ of other harts concurrently executing, and mark them as
+	 * flushed.
+	 */
+	cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
+	local |= cpumask_empty(&others);
+	if (mm != current->active_mm || !local) {
+		cpumask_clear(&hmask);
+		riscv_cpuid_to_hartid_mask(&others, &hmask);
+		sbi_remote_fence_i(hmask.bits);
+	} else {
+		/*
+		 * It's assumed that at least one strongly ordered operation is
+		 * performed on this hart between setting a hart's cpumask bit
+		 * and scheduling this MM context on that hart.  Sending an SBI
+		 * remote message will do this, but in the case where no
+		 * messages are sent we still need to order this hart's writes
+		 * with flush_icache_deferred().
+		 */
+		smp_mb();
+	}
+
+	preempt_enable();
+}
+
+#endif /* CONFIG_SMP */
+
 void flush_icache_pte(pte_t pte)
 {
 	struct page *page = pte_page(pte);
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 2/5] riscv: move switch_mm to its own file
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
                   ` (2 preceding siblings ...)
  2019-03-27  0:41 ` [PATCH v4 4/5] riscv: rewrite tlb flush for performance Gary Guo
@ 2019-03-27  0:41 ` Gary Guo
  2019-03-27  7:08   ` Christoph Hellwig
                     ` (2 more replies)
  2019-03-27  0:41 ` [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown Gary Guo
  2019-04-10  7:04 ` [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Christoph Hellwig
  5 siblings, 3 replies; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

switch_mm is an expensive operations that has two users.
flush_icache_deferred is only called within switch_mm and can be moved
together. The function is expected to be more complicated when ASID
support is added, so clean up eagerly.

By moving them to a separate file we also removes some excessive
dependency of tlbflush.h and cacheflush.h.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 arch/riscv/include/asm/mmu_context.h | 59 +----------------------
 arch/riscv/mm/Makefile               |  1 +
 arch/riscv/mm/context.c              | 71 ++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 57 deletions(-)
 create mode 100644 arch/riscv/mm/context.c

diff --git a/arch/riscv/include/asm/mmu_context.h b/arch/riscv/include/asm/mmu_context.h
index 336d60ec5698..bf4f097a9051 100644
--- a/arch/riscv/include/asm/mmu_context.h
+++ b/arch/riscv/include/asm/mmu_context.h
@@ -20,8 +20,6 @@
 
 #include <linux/mm.h>
 #include <linux/sched.h>
-#include <asm/tlbflush.h>
-#include <asm/cacheflush.h>
 
 static inline void enter_lazy_tlb(struct mm_struct *mm,
 	struct task_struct *task)
@@ -39,61 +37,8 @@ static inline void destroy_context(struct mm_struct *mm)
 {
 }
 
-/*
- * When necessary, performs a deferred icache flush for the given MM context,
- * on the local CPU.  RISC-V has no direct mechanism for instruction cache
- * shoot downs, so instead we send an IPI that informs the remote harts they
- * need to flush their local instruction caches.  To avoid pathologically slow
- * behavior in a common case (a bunch of single-hart processes on a many-hart
- * machine, ie 'make -j') we avoid the IPIs for harts that are not currently
- * executing a MM context and instead schedule a deferred local instruction
- * cache flush to be performed before execution resumes on each hart.  This
- * actually performs that local instruction cache flush, which implicitly only
- * refers to the current hart.
- */
-static inline void flush_icache_deferred(struct mm_struct *mm)
-{
-#ifdef CONFIG_SMP
-	unsigned int cpu = smp_processor_id();
-	cpumask_t *mask = &mm->context.icache_stale_mask;
-
-	if (cpumask_test_cpu(cpu, mask)) {
-		cpumask_clear_cpu(cpu, mask);
-		/*
-		 * Ensure the remote hart's writes are visible to this hart.
-		 * This pairs with a barrier in flush_icache_mm.
-		 */
-		smp_mb();
-		local_flush_icache_all();
-	}
-#endif
-}
-
-static inline void switch_mm(struct mm_struct *prev,
-	struct mm_struct *next, struct task_struct *task)
-{
-	if (likely(prev != next)) {
-		/*
-		 * Mark the current MM context as inactive, and the next as
-		 * active.  This is at least used by the icache flushing
-		 * routines in order to determine who should
-		 */
-		unsigned int cpu = smp_processor_id();
-
-		cpumask_clear_cpu(cpu, mm_cpumask(prev));
-		cpumask_set_cpu(cpu, mm_cpumask(next));
-
-		/*
-		 * Use the old spbtr name instead of using the current satp
-		 * name to support binutils 2.29 which doesn't know about the
-		 * privileged ISA 1.10 yet.
-		 */
-		csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
-		local_flush_tlb_all();
-
-		flush_icache_deferred(next);
-	}
-}
+void switch_mm(struct mm_struct *prev, struct mm_struct *next,
+	struct task_struct *task);
 
 static inline void activate_mm(struct mm_struct *prev,
 			       struct mm_struct *next)
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index eb22ab49b3e0..d75b035786d6 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -3,3 +3,4 @@ obj-y += fault.o
 obj-y += extable.o
 obj-y += ioremap.o
 obj-y += cacheflush.o
+obj-y += context.o
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
new file mode 100644
index 000000000000..fbb1cfe80267
--- /dev/null
+++ b/arch/riscv/mm/context.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2012 Regents of the University of California
+ * Copyright (C) 2017 SiFive
+ */
+
+#include <linux/mm.h>
+
+#include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
+
+/*
+ * When necessary, performs a deferred icache flush for the given MM context,
+ * on the local CPU.  RISC-V has no direct mechanism for instruction cache
+ * shoot downs, so instead we send an IPI that informs the remote harts they
+ * need to flush their local instruction caches.  To avoid pathologically slow
+ * behavior in a common case (a bunch of single-hart processes on a many-hart
+ * machine, ie 'make -j') we avoid the IPIs for harts that are not currently
+ * executing a MM context and instead schedule a deferred local instruction
+ * cache flush to be performed before execution resumes on each hart.  This
+ * actually performs that local instruction cache flush, which implicitly only
+ * refers to the current hart.
+ */
+static inline void flush_icache_deferred(struct mm_struct *mm)
+{
+#ifdef CONFIG_SMP
+	unsigned int cpu = smp_processor_id();
+	cpumask_t *mask = &mm->context.icache_stale_mask;
+
+	if (cpumask_test_cpu(cpu, mask)) {
+		cpumask_clear_cpu(cpu, mask);
+		/*
+		 * Ensure the remote hart's writes are visible to this hart.
+		 * This pairs with a barrier in flush_icache_mm.
+		 */
+		smp_mb();
+		local_flush_icache_all();
+	}
+
+#endif
+}
+
+void switch_mm(struct mm_struct *prev, struct mm_struct *next,
+	struct task_struct *task)
+{
+	unsigned int cpu;
+
+	if (unlikely(prev == next))
+		return;
+
+	/*
+	 * Mark the current MM context as inactive, and the next as
+	 * active.  This is at least used by the icache flushing
+	 * routines in order to determine who should be flushed.
+	 */
+	cpu = smp_processor_id();
+
+	cpumask_clear_cpu(cpu, mm_cpumask(prev));
+	cpumask_set_cpu(cpu, mm_cpumask(next));
+
+	/*
+	 * Use the old spbtr name instead of using the current satp
+	 * name to support binutils 2.29 which doesn't know about the
+	 * privileged ISA 1.10 yet.
+	 */
+	csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
+	local_flush_tlb_all();
+
+	flush_icache_deferred(next);
+}
+
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid}.
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
  2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
@ 2019-03-27  0:41 ` Gary Guo
  2019-03-27  7:08   ` Christoph Hellwig
  2019-03-28  6:47   ` Anup Patel
  2019-03-27  0:41 ` [PATCH v4 4/5] riscv: rewrite tlb flush for performance Gary Guo
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

Currently sbi_remote_sfence_vma{,_asid} does not pass their arguments
to SBI at all, which is semantically incorrect.

Neither BBL nor OpenSBI is using these arguments at the moment, and
they just do a global flush instead. However we still need to provide
correct arguments.

Signed-off-by: Gary Guo <gary@garyguo.net>
---
 arch/riscv/include/asm/sbi.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index b6bb10b92fe2..19f231615510 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -26,22 +26,27 @@
 #define SBI_REMOTE_SFENCE_VMA_ASID 7
 #define SBI_SHUTDOWN 8
 
-#define SBI_CALL(which, arg0, arg1, arg2) ({			\
+#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({		\
 	register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0);	\
 	register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1);	\
 	register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2);	\
+	register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3);	\
 	register uintptr_t a7 asm ("a7") = (uintptr_t)(which);	\
 	asm volatile ("ecall"					\
 		      : "+r" (a0)				\
-		      : "r" (a1), "r" (a2), "r" (a7)		\
+		      : "r" (a1), "r" (a2), "r" (a3), "r" (a7)	\
 		      : "memory");				\
 	a0;							\
 })
 
 /* Lazy implementations until SBI is finalized */
-#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0)
-#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0)
-#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0)
+#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0, 0)
+#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0, 0)
+#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0, 0)
+#define SBI_CALL_3(which, arg0, arg1, arg2) \
+		SBI_CALL(which, arg0, arg1, arg2, 0)
+#define SBI_CALL_4(which, arg0, arg1, arg2, arg3) \
+		SBI_CALL(which, arg0, arg1, arg2, arg3)
 
 static inline void sbi_console_putchar(int ch)
 {
@@ -86,7 +91,7 @@ static inline void sbi_remote_sfence_vma(const unsigned long *hart_mask,
 					 unsigned long start,
 					 unsigned long size)
 {
-	SBI_CALL_1(SBI_REMOTE_SFENCE_VMA, hart_mask);
+	SBI_CALL_3(SBI_REMOTE_SFENCE_VMA, hart_mask, start, size);
 }
 
 static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
@@ -94,7 +99,7 @@ static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
 					      unsigned long size,
 					      unsigned long asid)
 {
-	SBI_CALL_1(SBI_REMOTE_SFENCE_VMA_ASID, hart_mask);
+	SBI_CALL_4(SBI_REMOTE_SFENCE_VMA_ASID, hart_mask, start, size, asid);
 }
 
 #endif
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
  2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
  2019-03-27  0:41 ` [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid} Gary Guo
@ 2019-03-27  0:41 ` Gary Guo
  2019-03-27  7:25   ` Christoph Hellwig
  2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

This patch rewrites the logic related to TLB flushing, both to cleanup
the code and to improve performance.

We now use sfence.vma variant with specified ASID and virtual address
whenever possible.  Even though only ASID 0 is used, it still improves
performance by preventing global mappings from being flushed from TLB.

Signed-off-by: Gary Guo <gary@garyguo.net>
Tested-by: Atish Patra <atish.patra@wdc.com>
---
 .../admin-guide/kernel-parameters.rst         |   1 +
 .../admin-guide/kernel-parameters.txt         |   8 ++
 arch/riscv/include/asm/pgtable.h              |   2 +-
 arch/riscv/include/asm/tlbflush.h             |  76 +++++------
 arch/riscv/mm/Makefile                        |   1 +
 arch/riscv/mm/context.c                       |   8 +-
 arch/riscv/mm/init.c                          |   2 +-
 arch/riscv/mm/tlbflush.c                      | 128 ++++++++++++++++++
 8 files changed, 178 insertions(+), 48 deletions(-)
 create mode 100644 arch/riscv/mm/tlbflush.c

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index b8d0bc07ed0a..8037db7ab25c 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -139,6 +139,7 @@ parameter is applicable::
 	PS2	Appropriate PS/2 support is enabled.
 	RAM	RAM disk support is enabled.
 	RDT	Intel Resource Director Technology.
+	RV	RISC-V architecture is enabled.
 	S390	S390 architecture is enabled.
 	SCSI	Appropriate SCSI support is enabled.
 			A lot of drivers have their options described inside
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 858b6c0b9a15..7a60edef09d2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4544,6 +4544,14 @@
 			Force threading of all interrupt handlers except those
 			marked explicitly IRQF_NO_THREAD.
 
+	tlbi_max_ops=	[RV]
+			Format: <int> (must be >= 1 and < PTRS_PER_PTE)
+			Default: 1
+			Controls the maximum amount of page-level sfence.vma
+			that the kernel can issue when a range needs to be
+			flushed.
+			See arch/riscv/mm/tlbflush.c
+
 	tmem		[KNL,XEN]
 			Enable the Transcendent memory driver if built-in.
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1141364d990e..19d1aeb059da 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -279,7 +279,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
 	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
 	 */
-	local_flush_tlb_page(address);
+	local_flush_tlb_page(vma, address);
 }
 
 #define __HAVE_ARCH_PTE_SAME
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 54fee0cadb1e..29a780ca232a 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -1,22 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2009 Chen Liqin <liqin.chen@sunplusct.com>
  * Copyright (C) 2012 Regents of the University of California
- *
- *   This program is free software; you can redistribute it and/or
- *   modify it under the terms of the GNU General Public License
- *   as published by the Free Software Foundation, version 2.
- *
- *   This program is distributed in the hope that it will be useful,
- *   but WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *   GNU General Public License for more details.
+ * Copyright (C) 2019 Gary Guo, University of Cambridge
  */
 
 #ifndef _ASM_RISCV_TLBFLUSH_H
 #define _ASM_RISCV_TLBFLUSH_H
 
 #include <linux/mm_types.h>
-#include <asm/smp.h>
 
 /*
  * Flush entire local TLB.  'sfence.vma' implicitly fences with the instruction
@@ -27,53 +19,47 @@ static inline void local_flush_tlb_all(void)
 	__asm__ __volatile__ ("sfence.vma" : : : "memory");
 }
 
-/* Flush one page from local TLB */
-static inline void local_flush_tlb_page(unsigned long addr)
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
 {
-	__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
+	/* Flush ASID 0 so that global mappings are not affected */
+	__asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (0) : "memory");
 }
 
-#ifndef CONFIG_SMP
-
-#define flush_tlb_all() local_flush_tlb_all()
-#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+	unsigned long addr)
+{
+	__asm__ __volatile__ ("sfence.vma %0, %1"
+			      : : "r" (addr), "r" (0)
+			      : "memory");
+}
 
-static inline void flush_tlb_range(struct vm_area_struct *vma,
-		unsigned long start, unsigned long end)
+static inline void local_flush_tlb_kernel_page(unsigned long addr)
 {
-	local_flush_tlb_all();
+	__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
 }
 
-#define flush_tlb_mm(mm) flush_tlb_all()
+void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+	unsigned long end);
+void local_flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
-#else /* CONFIG_SMP */
+#ifdef CONFIG_SMP
 
-#include <asm/sbi.h>
+void flush_tlb_all(void);
+void flush_tlb_mm(struct mm_struct *mm);
+void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
+void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+	unsigned long end);
+void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
-static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start,
-				     unsigned long size)
-{
-	struct cpumask hmask;
-
-	cpumask_clear(&hmask);
-	riscv_cpuid_to_hartid_mask(cmask, &hmask);
-	sbi_remote_sfence_vma(hmask.bits, start, size);
-}
+#else /* CONFIG_SMP */
 
-#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
-#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
-#define flush_tlb_range(vma, start, end) \
-	remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start))
-#define flush_tlb_mm(mm) \
-	remote_sfence_vma(mm_cpumask(mm), 0, -1)
+#define flush_tlb_all() local_flush_tlb_all()
+#define flush_tlb_mm(mm) local_flush_tlb_mm(mm)
+#define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr)
+#define flush_tlb_range(vma, start, end) local_flush_tlb_range(vma, start, end)
+#define flush_tlb_kernel_range(start, end) \
+	local_flush_tlb_kernel_range(start, end)
 
 #endif /* CONFIG_SMP */
 
-/* Flush a range of kernel pages */
-static inline void flush_tlb_kernel_range(unsigned long start,
-	unsigned long end)
-{
-	flush_tlb_all();
-}
-
 #endif /* _ASM_RISCV_TLBFLUSH_H */
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index d75b035786d6..53b68fd3cb45 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -4,3 +4,4 @@ obj-y += extable.o
 obj-y += ioremap.o
 obj-y += cacheflush.o
 obj-y += context.o
+obj-y += tlbflush.o
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index fbb1cfe80267..0f787bcd3a7a 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -64,7 +64,13 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	 * privileged ISA 1.10 yet.
 	 */
 	csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
-	local_flush_tlb_all();
+
+	/*
+	 * sfence.vma after SATP write. We call it on MM context instead of
+	 * calling local_flush_tlb_all to prevent global mappings from being
+	 * affected.
+	 */
+	local_flush_tlb_mm(next);
 
 	flush_icache_deferred(next);
 }
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b379a75ac6a6..858f55e8b219 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -168,7 +168,7 @@ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
 		set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, prot));
 	} else {
 		pte_clear(&init_mm, addr, ptep);
-		local_flush_tlb_page(addr);
+		local_flush_tlb_kernel_page(addr);
 	}
 }
 
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
new file mode 100644
index 000000000000..33083f48a936
--- /dev/null
+++ b/arch/riscv/mm/tlbflush.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Gary Guo, University of Cambridge
+ */
+
+#include <linux/mm.h>
+#include <asm/sbi.h>
+
+#define SFENCE_VMA_FLUSH_ALL ((unsigned long) -1)
+
+/*
+ * This controls the maximum amount of page-level sfence.vma that the kernel
+ * can issue when the kernel needs to flush a range from the TLB.  If the size
+ * of range goes beyond this threshold, a full sfence.vma is issued.
+ *
+ * Increase this number can negatively impact performance on implementations
+ * where sfence.vma's address operand is ignored and always perform a global
+ * TLB flush.  On the other hand, implementations with page-level TLB flush
+ * support can benefit from a larger number.
+ */
+static unsigned long tlbi_range_threshold = PAGE_SIZE;
+
+static int __init setup_tlbi_max_ops(char *str)
+{
+	int value = 0;
+
+	get_option(&str, &value);
+
+	/*
+	 * This value cannot be greater or equal to PTRS_PER_PTE, as we need
+	 * to full flush for any non-leaf page table change. The value has also
+	 * be at least 1.
+	 */
+	if (value >= PTRS_PER_PTE || value < 1)
+		return -EINVAL;
+
+	tlbi_range_threshold = value * PAGE_SIZE;
+	return 0;
+}
+early_param("tlbi_max_ops", setup_tlbi_max_ops);
+
+void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+			   unsigned long end)
+{
+	if (end - start > tlbi_range_threshold) {
+		local_flush_tlb_mm(vma->vm_mm);
+		return;
+	}
+
+	while (start < end) {
+		__asm__ __volatile__ ("sfence.vma %0, %1"
+				      : : "r" (start), "r" (0)
+				      : "memory");
+		start += PAGE_SIZE;
+	}
+}
+
+void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	if (end - start > tlbi_range_threshold) {
+		local_flush_tlb_all();
+		return;
+	}
+
+	while (start < end) {
+		__asm__ __volatile__ ("sfence.vma %0"
+				      : : "r" (start)
+				      : "memory");
+		start += PAGE_SIZE;
+	}
+}
+
+#ifdef CONFIG_SMP
+
+static void remote_sfence_vma(unsigned long start, unsigned long size)
+{
+	sbi_remote_sfence_vma(NULL, start, size);
+}
+
+static void remote_sfence_vma_asid(cpumask_t *mask, unsigned long start,
+				   unsigned long size, unsigned long asid)
+{
+	cpumask_t hmask;
+
+	cpumask_clear(&hmask);
+	riscv_cpuid_to_hartid_mask(mask, &hmask);
+	sbi_remote_sfence_vma_asid(hmask.bits, start, size, asid);
+}
+
+
+void flush_tlb_all(void)
+{
+	remote_sfence_vma(0, SFENCE_VMA_FLUSH_ALL);
+}
+
+void flush_tlb_mm(struct mm_struct *mm)
+{
+	remote_sfence_vma_asid(mm_cpumask(mm), 0, SFENCE_VMA_FLUSH_ALL, 0);
+}
+
+void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
+{
+	remote_sfence_vma_asid(mm_cpumask(vma->vm_mm), addr, PAGE_SIZE, 0);
+}
+
+
+void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+		     unsigned long end)
+{
+	if (end - start > tlbi_range_threshold) {
+		flush_tlb_mm(vma->vm_mm);
+		return;
+	}
+
+	remote_sfence_vma_asid(mm_cpumask(vma->vm_mm), start, end - start, 0);
+}
+
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	if (end - start > tlbi_range_threshold) {
+		flush_tlb_all();
+		return;
+	}
+
+	remote_sfence_vma(start, end - start);
+}
+
+#endif /* CONFIG_SMP */
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
                   ` (3 preceding siblings ...)
  2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
@ 2019-03-27  0:41 ` Gary Guo
  2019-03-27  7:31   ` Christoph Hellwig
  2019-03-28  6:50   ` Anup Patel
  2019-04-10  7:04 ` [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Christoph Hellwig
  5 siblings, 2 replies; 29+ messages in thread
From: Gary Guo @ 2019-03-27  0:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou
  Cc: Christoph Hellwig, Atish Patra, linux-riscv, Anup Patel, Gary Guo

From: Gary Guo <gary@garyguo.net>

This patch implements IPI-based remote TLB shootdown, which is useful
at this stage for testing because BBL/OpenSBI ignores operands of
sbi_remote_sfence_vma_asid and always perform a global TLB flush.
The SBI-based remote TLB shootdown can still be opt-in using boot
cmdline "tlbi_method=sbi".

Signed-off-by: Gary Guo <gary@garyguo.net>
Tested-by: Atish Patra <atish.patra@wdc.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +
 arch/riscv/mm/tlbflush.c                      | 99 +++++++++++++++++--
 2 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7a60edef09d2..afd34fa1db91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4552,6 +4552,11 @@
 			flushed.
 			See arch/riscv/mm/tlbflush.c
 
+	tlbi_method=	[RV]
+			Format: { "sbi", "ipi" }
+			Default: "ipi"
+			Specify the method used to perform remote TLB shootdown.
+
 	tmem		[KNL,XEN]
 			Enable the Transcendent memory driver if built-in.
 
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 33083f48a936..ceee76f14a0a 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -72,19 +72,106 @@ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 
 #ifdef CONFIG_SMP
 
+/*
+ * SBI has interfaces for remote TLB shootdown.  If there is no hardware
+ * remote TLB shootdown support, SBI perform IPIs itself instead.  Some SBI
+ * implementations may also ignore ASID and address ranges provided and do a
+ * full TLB flush instead.  In these cases we might want to do IPIs ourselves.
+ *
+ * This parameter allows the approach (IPI/SBI) to be specified using boot
+ * cmdline.
+ */
+static bool tlbi_ipi = true;
+
+static int __init setup_tlbi_method(char *str)
+{
+	if (strcmp(str, "ipi") == 0)
+		tlbi_ipi = true;
+	else if (strcmp(str, "sbi") == 0)
+		tlbi_ipi = false;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+early_param("tlbi_method", setup_tlbi_method);
+
+
+struct tlbi {
+	unsigned long start;
+	unsigned long size;
+	unsigned long asid;
+};
+
+static void ipi_remote_sfence_vma(void *info)
+{
+	struct tlbi *data = info;
+	unsigned long start = data->start;
+	unsigned long size = data->size;
+	unsigned long i;
+
+	if (size == SFENCE_VMA_FLUSH_ALL) {
+		local_flush_tlb_all();
+	}
+
+	for (i = 0; i < size; i += PAGE_SIZE) {
+		__asm__ __volatile__ ("sfence.vma %0"
+				      : : "r" (start + i)
+				      : "memory");
+	}
+}
+
+static void ipi_remote_sfence_vma_asid(void *info)
+{
+	struct tlbi *data = info;
+	unsigned long asid = data->asid;
+	unsigned long start = data->start;
+	unsigned long size = data->size;
+	unsigned long i;
+
+	if (size == SFENCE_VMA_FLUSH_ALL) {
+		__asm__ __volatile__ ("sfence.vma x0, %0"
+				      : : "r" (asid)
+				      : "memory");
+		return;
+	}
+
+	for (i = 0; i < size; i += PAGE_SIZE) {
+		__asm__ __volatile__ ("sfence.vma %0, %1"
+				      : : "r" (start + i), "r" (asid)
+				      : "memory");
+	}
+}
+
 static void remote_sfence_vma(unsigned long start, unsigned long size)
 {
-	sbi_remote_sfence_vma(NULL, start, size);
+	if (tlbi_ipi) {
+		struct tlbi info = {
+			.start = start,
+			.size = size,
+		};
+		on_each_cpu(ipi_remote_sfence_vma, &info, 1);
+	} else
+		sbi_remote_sfence_vma(NULL, start, size);
 }
 
 static void remote_sfence_vma_asid(cpumask_t *mask, unsigned long start,
 				   unsigned long size, unsigned long asid)
 {
-	cpumask_t hmask;
-
-	cpumask_clear(&hmask);
-	riscv_cpuid_to_hartid_mask(mask, &hmask);
-	sbi_remote_sfence_vma_asid(hmask.bits, start, size, asid);
+	if (tlbi_ipi) {
+		struct tlbi info = {
+			.start = start,
+			.size = size,
+			.asid = asid,
+		};
+		on_each_cpu_mask(mask, ipi_remote_sfence_vma_asid, &info, 1);
+	} else {
+		cpumask_t hmask;
+
+		cpumask_clear(&hmask);
+		riscv_cpuid_to_hartid_mask(mask, &hmask);
+		sbi_remote_sfence_vma_asid(hmask.bits, start, size, asid);
+	}
 }
 
 
-- 
2.17.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c
  2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
@ 2019-03-27  7:06   ` Christoph Hellwig
  2019-03-28  6:45   ` Anup Patel
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:06 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 2/5] riscv: move switch_mm to its own file
  2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
@ 2019-03-27  7:08   ` Christoph Hellwig
  2019-03-27  7:18   ` Christoph Hellwig
  2019-03-28  6:47   ` Anup Patel
  2 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:08 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

> +#include <linux/mm.h>
> +
> +#include <asm/tlbflush.h>
> +#include <asm/cacheflush.h>

Minor nitpick: no real need for the empty line above.

Otherwise this looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid}.
  2019-03-27  0:41 ` [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid} Gary Guo
@ 2019-03-27  7:08   ` Christoph Hellwig
  2019-03-28  6:47   ` Anup Patel
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:08 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 2/5] riscv: move switch_mm to its own file
  2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
  2019-03-27  7:08   ` Christoph Hellwig
@ 2019-03-27  7:18   ` Christoph Hellwig
  2019-03-28  6:47   ` Anup Patel
  2 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:18 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

> +	csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
> +	local_flush_tlb_all();
> +
> +	flush_icache_deferred(next);
> +}
> +

Btw, git-am complains about adding a blank line at EOF here, please
consider dropping that empty line.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-27  0:41 ` [PATCH v4 4/5] riscv: rewrite tlb flush for performance Gary Guo
@ 2019-03-27  7:25   ` Christoph Hellwig
  2019-03-27 13:56     ` Gary Guo
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:25 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

> @@ -27,53 +19,47 @@ static inline void local_flush_tlb_all(void)
>  	__asm__ __volatile__ ("sfence.vma" : : : "memory");
>  }
>  
> -/* Flush one page from local TLB */
> -static inline void local_flush_tlb_page(unsigned long addr)
> +static inline void local_flush_tlb_mm(struct mm_struct *mm)
>  {
> -	__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
> +	/* Flush ASID 0 so that global mappings are not affected */
> +	__asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (0) : "memory");
>  }
>  
> -#ifndef CONFIG_SMP
> -
> -#define flush_tlb_all() local_flush_tlb_all()
> -#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
> +static inline void local_flush_tlb_page(struct vm_area_struct *vma,
> +	unsigned long addr)
> +{
> +	__asm__ __volatile__ ("sfence.vma %0, %1"
> +			      : : "r" (addr), "r" (0)
> +			      : "memory");
> +}

Why do we pass the vma argument here even if it is never used?  That
just seems to create some rather pointless churn.  Also I'd add
local_flush_tlb_mm below local_flush_tlb_page to avoid churn as well,
nevermind that it seems the more logical order to me.

> +void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> +	unsigned long end);
> +void local_flush_tlb_kernel_range(unsigned long start, unsigned long end);

As far as I can tell these are only used for the !SMP case and only
to implement the non-local prefixed versions.  In that case we should
just drop the local_prefix and implement those APIs directly, and only
for !SMP builds.

> +
> +#include <linux/mm.h>
> +#include <asm/sbi.h>
> +
> +#define SFENCE_VMA_FLUSH_ALL ((unsigned long) -1)
> +
> +/*
> + * This controls the maximum amount of page-level sfence.vma that the kernel
> + * can issue when the kernel needs to flush a range from the TLB.  If the size
> + * of range goes beyond this threshold, a full sfence.vma is issued.
> + *
> + * Increase this number can negatively impact performance on implementations
> + * where sfence.vma's address operand is ignored and always perform a global
> + * TLB flush.  On the other hand, implementations with page-level TLB flush
> + * support can benefit from a larger number.
> + */
> +static unsigned long tlbi_range_threshold = PAGE_SIZE;

I really hate having this is a tunable in the kernel code.  I think
the right answer is to have a device tree entry to carry this number
so that the platform can supply it.  Btw, what are examples of
platforms that flush globalls vs per-page at the moment?  What is a good
larger value for the latter based on your testing?

Also I wonder if we should also split this tunable and the optional
global flush into a separate patch.  This is in this first patch
just make use of the asid,  and then another patch to add the threshold
for doing the full flush.

> +void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> +			   unsigned long end)
> +{
> +	if (end - start > tlbi_range_threshold) {
> +		local_flush_tlb_mm(vma->vm_mm);
> +		return;
> +	}
> +
> +	while (start < end) {
> +		__asm__ __volatile__ ("sfence.vma %0, %1"
> +				      : : "r" (start), "r" (0)
> +				      : "memory");

I think this should just call local_flush_tlb_page.

> +		start += PAGE_SIZE;
> +	}

And maybe use a for loop to short cut it a bit:

	for (; start < end; start += PAGE_SIZE)
		local_flush_tlb_page(start);

> +void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> +	if (end - start > tlbi_range_threshold) {
> +		local_flush_tlb_all();
> +		return;
> +	}
> +
> +	while (start < end) {
> +		__asm__ __volatile__ ("sfence.vma %0"
> +				      : : "r" (start)
> +				      : "memory");
> +		start += PAGE_SIZE;

Same here, just with local_flush_tlb_kernel_page.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-27  0:41 ` [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown Gary Guo
@ 2019-03-27  7:31   ` Christoph Hellwig
  2019-03-27 14:03     ` Gary Guo
  2019-03-28  6:50   ` Anup Patel
  1 sibling, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-27  7:31 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Wed, Mar 27, 2019 at 12:41:30AM +0000, Gary Guo wrote:
> From: Gary Guo <gary@garyguo.net>
> 
> This patch implements IPI-based remote TLB shootdown, which is useful
> at this stage for testing because BBL/OpenSBI ignores operands of
> sbi_remote_sfence_vma_asid and always perform a global TLB flush.
> The SBI-based remote TLB shootdown can still be opt-in using boot
> cmdline "tlbi_method=sbi".

I think Anup now fixes OpenSBI.  Do you have benchmarks vs BBL,
old OpenSBI and new OpenSBI?

> +static void ipi_remote_sfence_vma(void *info)
> +{
> +	struct tlbi *data = info;
> +	unsigned long start = data->start;
> +	unsigned long size = data->size;
> +	unsigned long i;
> +
> +	if (size == SFENCE_VMA_FLUSH_ALL) {
> +		local_flush_tlb_all();
> +	}

Doesn't this need a return to skip the latter code?  Also it might
we worth to just split the all case into entirely separate helpers,
that way the on_each_cpu calls don't need to pass a private data
argument at all.

> +
> +	for (i = 0; i < size; i += PAGE_SIZE) {
> +		__asm__ __volatile__ ("sfence.vma %0"
> +				      : : "r" (start + i)
> +				      : "memory");
> +	}

local_flush_tlb_kernel_page?

> +static void ipi_remote_sfence_vma_asid(void *info)
> +{
> +	struct tlbi *data = info;
> +	unsigned long asid = data->asid;
> +	unsigned long start = data->start;
> +	unsigned long size = data->size;
> +	unsigned long i;
> +
> +	if (size == SFENCE_VMA_FLUSH_ALL) {
> +		__asm__ __volatile__ ("sfence.vma x0, %0"
> +				      : : "r" (asid)
> +				      : "memory");
> +		return;
> +	}
> +
> +	for (i = 0; i < size; i += PAGE_SIZE) {
> +		__asm__ __volatile__ ("sfence.vma %0, %1"
> +				      : : "r" (start + i), "r" (asid)
> +				      : "memory");
> +	}

local_flush_tlb_range?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-27  7:25   ` Christoph Hellwig
@ 2019-03-27 13:56     ` Gary Guo
  2019-03-28 16:17       ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Gary Guo @ 2019-03-27 13:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-riscv



On 27/03/2019 07:25, Christoph Hellwig wrote:
>> @@ -27,53 +19,47 @@ static inline void local_flush_tlb_all(void)
>>   	__asm__ __volatile__ ("sfence.vma" : : : "memory");
>>   }
>>   
>> -/* Flush one page from local TLB */
>> -static inline void local_flush_tlb_page(unsigned long addr)
>> +static inline void local_flush_tlb_mm(struct mm_struct *mm)
>>   {
>> -	__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
>> +	/* Flush ASID 0 so that global mappings are not affected */
>> +	__asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (0) : "memory");
>>   }
>>   
>> -#ifndef CONFIG_SMP
>> -
>> -#define flush_tlb_all() local_flush_tlb_all()
>> -#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
>> +static inline void local_flush_tlb_page(struct vm_area_struct *vma,
>> +	unsigned long addr)
>> +{
>> +	__asm__ __volatile__ ("sfence.vma %0, %1"
>> +			      : : "r" (addr), "r" (0)
>> +			      : "memory");
>> +}
> 
> Why do we pass the vma argument here even if it is never used?  That
> just seems to create some rather pointless churn.  Also I'd add
> local_flush_tlb_mm below local_flush_tlb_page to avoid churn as well,
> nevermind that it seems the more logical order to me >
This isn't used now, but we need that for ASID support. It also more 
consistent with the non-SMP flush signature, and more consistent with 
code of other architectures.
> 
>> +void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>> +	unsigned long end);
>> +void local_flush_tlb_kernel_range(unsigned long start, unsigned long end);
> 
> As far as I can tell these are only used for the !SMP case and only
> to implement the non-local prefixed versions.  In that case we should
> just drop the local_prefix and implement those APIs directly, and only
> for !SMP builds.
> 
Ok, in that case I'll also move it to tlbflush.c.
>> +
>> +#include <linux/mm.h>
>> +#include <asm/sbi.h>
>> +
>> +#define SFENCE_VMA_FLUSH_ALL ((unsigned long) -1)
>> +
>> +/*
>> + * This controls the maximum amount of page-level sfence.vma that the kernel
>> + * can issue when the kernel needs to flush a range from the TLB.  If the size
>> + * of range goes beyond this threshold, a full sfence.vma is issued.
>> + *
>> + * Increase this number can negatively impact performance on implementations
>> + * where sfence.vma's address operand is ignored and always perform a global
>> + * TLB flush.  On the other hand, implementations with page-level TLB flush
>> + * support can benefit from a larger number.
>> + */
>> +static unsigned long tlbi_range_threshold = PAGE_SIZE;
> 
> I really hate having this is a tunable in the kernel code.  I think
> the right answer is to have a device tree entry to carry this number
> so that the platform can supply it.  Btw, what are examples of
> platforms that flush globalls vs per-page at the moment?  What is a good
> larger value for the latter based on your testing?
> 
This is discussed in previous versions of this patch, and we arrived at 
the conclusion that a boot parameter is the best way to do it now, as at 
the moment we have no other ways to get this information. The actual 
value really depends on the actual implementation. If the implementation 
has a super large TLB where full invalidation would be super expensive, 
they might even want the value to be 511.

> Also I wonder if we should also split this tunable and the optional
> global flush into a separate patch.  This is in this first patch
> just make use of the asid,  and then another patch to add the threshold
> for doing the full flush.
I don't think we should. This patch is more like a rewrite to old logic 
rather than patching things up incrementally.
> 
>> +void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>> +			   unsigned long end)
>> +{
>> +	if (end - start > tlbi_range_threshold) {
>> +		local_flush_tlb_mm(vma->vm_mm);
>> +		return;
>> +	}
>> +
>> +	while (start < end) {
>> +		__asm__ __volatile__ ("sfence.vma %0, %1"
>> +				      : : "r" (start), "r" (0)
>> +				      : "memory");
> 
> I think this should just call local_flush_tlb_page.
> 
I do this to minimise changes we need if we want to add ASID (in which 
case we want to avoid retrieving ASID from atomic variable multiple times).
>> +		start += PAGE_SIZE;
>> +	}
> 
> And maybe use a for loop to short cut it a bit:
> 
> 	for (; start < end; start += PAGE_SIZE)
> 		local_flush_tlb_page(start);
> 
Ok
>> +void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
>> +{
>> +	if (end - start > tlbi_range_threshold) {
>> +		local_flush_tlb_all();
>> +		return;
>> +	}
>> +
>> +	while (start < end) {
>> +		__asm__ __volatile__ ("sfence.vma %0"
>> +				      : : "r" (start)
>> +				      : "memory");
>> +		start += PAGE_SIZE;
> 
> Same here, just with local_flush_tlb_kernel_page.
> 
Ok
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-27  7:31   ` Christoph Hellwig
@ 2019-03-27 14:03     ` Gary Guo
  2019-03-28 16:36       ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Gary Guo @ 2019-03-27 14:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Atish Patra, Anup Patel, Palmer Dabbelt, linux-riscv, Albert Ou



On 27/03/2019 07:31, Christoph Hellwig wrote:
> On Wed, Mar 27, 2019 at 12:41:30AM +0000, Gary Guo wrote:
>> From: Gary Guo <gary@garyguo.net>
>>
>> This patch implements IPI-based remote TLB shootdown, which is useful
>> at this stage for testing because BBL/OpenSBI ignores operands of
>> sbi_remote_sfence_vma_asid and always perform a global TLB flush.
>> The SBI-based remote TLB shootdown can still be opt-in using boot
>> cmdline "tlbi_method=sbi".
> 
> I think Anup now fixes OpenSBI.  Do you have benchmarks vs BBL,
> old OpenSBI and new OpenSBI?
> 
See the cover letter. OpenSBI's code hasn't been made into stable, and 
it has race conditions now.
>> +static void ipi_remote_sfence_vma(void *info)
>> +{
>> +	struct tlbi *data = info;
>> +	unsigned long start = data->start;
>> +	unsigned long size = data->size;
>> +	unsigned long i;
>> +
>> +	if (size == SFENCE_VMA_FLUSH_ALL) {
>> +		local_flush_tlb_all();
>> +	}
> 
> Doesn't this need a return to skip the latter code?  Also it might
> we worth to just split the all case into entirely separate helpers,
> that way the on_each_cpu calls don't need to pass a private data
> argument at all.
I managed to get the return missing when doing rebasing... My fault. I 
don't think it is necessary to make it into a separate helpers, as the 
private data argument isn't expensive at all.
> 
>> +
>> +	for (i = 0; i < size; i += PAGE_SIZE) {
>> +		__asm__ __volatile__ ("sfence.vma %0"
>> +				      : : "r" (start + i)
>> +				      : "memory");
>> +	}
> 
> local_flush_tlb_kernel_page?
Good catch
> 
>> +static void ipi_remote_sfence_vma_asid(void *info)
>> +{
>> +	struct tlbi *data = info;
>> +	unsigned long asid = data->asid;
>> +	unsigned long start = data->start;
>> +	unsigned long size = data->size;
>> +	unsigned long i;
>> +
>> +	if (size == SFENCE_VMA_FLUSH_ALL) {
>> +		__asm__ __volatile__ ("sfence.vma x0, %0"
>> +				      : : "r" (asid)
>> +				      : "memory");
>> +		return;
>> +	}
>> +
>> +	for (i = 0; i < size; i += PAGE_SIZE) {
>> +		__asm__ __volatile__ ("sfence.vma %0, %1"
>> +				      : : "r" (start + i), "r" (asid)
>> +				      : "memory");
>> +	}
> 
> local_flush_tlb_range?
> 
We don't have the VMA structures now so no. This is related to future 
ASID patch as well.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c
  2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
  2019-03-27  7:06   ` Christoph Hellwig
@ 2019-03-28  6:45   ` Anup Patel
  1 sibling, 0 replies; 29+ messages in thread
From: Anup Patel @ 2019-03-28  6:45 UTC (permalink / raw)
  To: Gary Guo
  Cc: Anup Patel, Palmer Dabbelt, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Wed, Mar 27, 2019 at 6:11 AM Gary Guo <gary@garyguo.net> wrote:
>
> From: Gary Guo <gary@garyguo.net>
>
> Currently, flush_icache_all is macro-expanded into a SBI call, yet no
> asm/sbi.h is included in asm/cacheflush.h. This could be moved to
> mm/cacheflush.c instead (SBI call will dominate performance-wise and
> there is no worry to not have it inlined.
>
> Currently, flush_icache_mm stays in kernel/smp.c, which looks like a
> hack to prevent it from being compiled when CONFIG_SMP=n. It should
> also be in mm/cacheflush.c.
>
> Signed-off-by: Gary Guo <gary@garyguo.net>
> ---
>  arch/riscv/include/asm/cacheflush.h |  2 +-
>  arch/riscv/kernel/smp.c             | 49 -----------------------
>  arch/riscv/mm/cacheflush.c          | 61 +++++++++++++++++++++++++++++
>  3 files changed, 62 insertions(+), 50 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 8f13074413a7..1f4ba68ab9aa 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -47,7 +47,7 @@ static inline void flush_dcache_page(struct page *page)
>
>  #else /* CONFIG_SMP */
>
> -#define flush_icache_all() sbi_remote_fence_i(NULL)
> +void flush_icache_all(void);
>  void flush_icache_mm(struct mm_struct *mm, bool local);
>
>  #endif /* CONFIG_SMP */
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index 0c41d07ec281..17f491e8ed0a 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -199,52 +199,3 @@ void smp_send_reschedule(int cpu)
>         send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE);
>  }
>
> -/*
> - * Performs an icache flush for the given MM context.  RISC-V has no direct
> - * mechanism for instruction cache shoot downs, so instead we send an IPI that
> - * informs the remote harts they need to flush their local instruction caches.
> - * To avoid pathologically slow behavior in a common case (a bunch of
> - * single-hart processes on a many-hart machine, ie 'make -j') we avoid the
> - * IPIs for harts that are not currently executing a MM context and instead
> - * schedule a deferred local instruction cache flush to be performed before
> - * execution resumes on each hart.
> - */
> -void flush_icache_mm(struct mm_struct *mm, bool local)
> -{
> -       unsigned int cpu;
> -       cpumask_t others, hmask, *mask;
> -
> -       preempt_disable();
> -
> -       /* Mark every hart's icache as needing a flush for this MM. */
> -       mask = &mm->context.icache_stale_mask;
> -       cpumask_setall(mask);
> -       /* Flush this hart's I$ now, and mark it as flushed. */
> -       cpu = smp_processor_id();
> -       cpumask_clear_cpu(cpu, mask);
> -       local_flush_icache_all();
> -
> -       /*
> -        * Flush the I$ of other harts concurrently executing, and mark them as
> -        * flushed.
> -        */
> -       cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
> -       local |= cpumask_empty(&others);
> -       if (mm != current->active_mm || !local) {
> -               cpumask_clear(&hmask);
> -               riscv_cpuid_to_hartid_mask(&others, &hmask);
> -               sbi_remote_fence_i(hmask.bits);
> -       } else {
> -               /*
> -                * It's assumed that at least one strongly ordered operation is
> -                * performed on this hart between setting a hart's cpumask bit
> -                * and scheduling this MM context on that hart.  Sending an SBI
> -                * remote message will do this, but in the case where no
> -                * messages are sent we still need to order this hart's writes
> -                * with flush_icache_deferred().
> -                */
> -               smp_mb();
> -       }
> -
> -       preempt_enable();
> -}
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 498c0a0814fe..497b7d07af0c 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -14,6 +14,67 @@
>  #include <asm/pgtable.h>
>  #include <asm/cacheflush.h>
>
> +#ifdef CONFIG_SMP
> +
> +#include <asm/sbi.h>
> +
> +void flush_icache_all(void)
> +{
> +       sbi_remote_fence_i(NULL);
> +}
> +
> +/*
> + * Performs an icache flush for the given MM context.  RISC-V has no direct
> + * mechanism for instruction cache shoot downs, so instead we send an IPI that
> + * informs the remote harts they need to flush their local instruction caches.
> + * To avoid pathologically slow behavior in a common case (a bunch of
> + * single-hart processes on a many-hart machine, ie 'make -j') we avoid the
> + * IPIs for harts that are not currently executing a MM context and instead
> + * schedule a deferred local instruction cache flush to be performed before
> + * execution resumes on each hart.
> + */
> +void flush_icache_mm(struct mm_struct *mm, bool local)
> +{
> +       unsigned int cpu;
> +       cpumask_t others, hmask, *mask;
> +
> +       preempt_disable();
> +
> +       /* Mark every hart's icache as needing a flush for this MM. */
> +       mask = &mm->context.icache_stale_mask;
> +       cpumask_setall(mask);
> +       /* Flush this hart's I$ now, and mark it as flushed. */
> +       cpu = smp_processor_id();
> +       cpumask_clear_cpu(cpu, mask);
> +       local_flush_icache_all();
> +
> +       /*
> +        * Flush the I$ of other harts concurrently executing, and mark them as
> +        * flushed.
> +        */
> +       cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
> +       local |= cpumask_empty(&others);
> +       if (mm != current->active_mm || !local) {
> +               cpumask_clear(&hmask);
> +               riscv_cpuid_to_hartid_mask(&others, &hmask);
> +               sbi_remote_fence_i(hmask.bits);
> +       } else {
> +               /*
> +                * It's assumed that at least one strongly ordered operation is
> +                * performed on this hart between setting a hart's cpumask bit
> +                * and scheduling this MM context on that hart.  Sending an SBI
> +                * remote message will do this, but in the case where no
> +                * messages are sent we still need to order this hart's writes
> +                * with flush_icache_deferred().
> +                */
> +               smp_mb();
> +       }
> +
> +       preempt_enable();
> +}
> +
> +#endif /* CONFIG_SMP */
> +
>  void flush_icache_pte(pte_t pte)
>  {
>         struct page *page = pte_page(pte);
> --
> 2.17.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 2/5] riscv: move switch_mm to its own file
  2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
  2019-03-27  7:08   ` Christoph Hellwig
  2019-03-27  7:18   ` Christoph Hellwig
@ 2019-03-28  6:47   ` Anup Patel
  2 siblings, 0 replies; 29+ messages in thread
From: Anup Patel @ 2019-03-28  6:47 UTC (permalink / raw)
  To: Gary Guo
  Cc: Anup Patel, Palmer Dabbelt, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Wed, Mar 27, 2019 at 6:11 AM Gary Guo <gary@garyguo.net> wrote:
>
> From: Gary Guo <gary@garyguo.net>
>
> switch_mm is an expensive operations that has two users.
> flush_icache_deferred is only called within switch_mm and can be moved
> together. The function is expected to be more complicated when ASID
> support is added, so clean up eagerly.
>
> By moving them to a separate file we also removes some excessive
> dependency of tlbflush.h and cacheflush.h.
>
> Signed-off-by: Gary Guo <gary@garyguo.net>
> ---
>  arch/riscv/include/asm/mmu_context.h | 59 +----------------------
>  arch/riscv/mm/Makefile               |  1 +
>  arch/riscv/mm/context.c              | 71 ++++++++++++++++++++++++++++
>  3 files changed, 74 insertions(+), 57 deletions(-)
>  create mode 100644 arch/riscv/mm/context.c
>
> diff --git a/arch/riscv/include/asm/mmu_context.h b/arch/riscv/include/asm/mmu_context.h
> index 336d60ec5698..bf4f097a9051 100644
> --- a/arch/riscv/include/asm/mmu_context.h
> +++ b/arch/riscv/include/asm/mmu_context.h
> @@ -20,8 +20,6 @@
>
>  #include <linux/mm.h>
>  #include <linux/sched.h>
> -#include <asm/tlbflush.h>
> -#include <asm/cacheflush.h>
>
>  static inline void enter_lazy_tlb(struct mm_struct *mm,
>         struct task_struct *task)
> @@ -39,61 +37,8 @@ static inline void destroy_context(struct mm_struct *mm)
>  {
>  }
>
> -/*
> - * When necessary, performs a deferred icache flush for the given MM context,
> - * on the local CPU.  RISC-V has no direct mechanism for instruction cache
> - * shoot downs, so instead we send an IPI that informs the remote harts they
> - * need to flush their local instruction caches.  To avoid pathologically slow
> - * behavior in a common case (a bunch of single-hart processes on a many-hart
> - * machine, ie 'make -j') we avoid the IPIs for harts that are not currently
> - * executing a MM context and instead schedule a deferred local instruction
> - * cache flush to be performed before execution resumes on each hart.  This
> - * actually performs that local instruction cache flush, which implicitly only
> - * refers to the current hart.
> - */
> -static inline void flush_icache_deferred(struct mm_struct *mm)
> -{
> -#ifdef CONFIG_SMP
> -       unsigned int cpu = smp_processor_id();
> -       cpumask_t *mask = &mm->context.icache_stale_mask;
> -
> -       if (cpumask_test_cpu(cpu, mask)) {
> -               cpumask_clear_cpu(cpu, mask);
> -               /*
> -                * Ensure the remote hart's writes are visible to this hart.
> -                * This pairs with a barrier in flush_icache_mm.
> -                */
> -               smp_mb();
> -               local_flush_icache_all();
> -       }
> -#endif
> -}
> -
> -static inline void switch_mm(struct mm_struct *prev,
> -       struct mm_struct *next, struct task_struct *task)
> -{
> -       if (likely(prev != next)) {
> -               /*
> -                * Mark the current MM context as inactive, and the next as
> -                * active.  This is at least used by the icache flushing
> -                * routines in order to determine who should
> -                */
> -               unsigned int cpu = smp_processor_id();
> -
> -               cpumask_clear_cpu(cpu, mm_cpumask(prev));
> -               cpumask_set_cpu(cpu, mm_cpumask(next));
> -
> -               /*
> -                * Use the old spbtr name instead of using the current satp
> -                * name to support binutils 2.29 which doesn't know about the
> -                * privileged ISA 1.10 yet.
> -                */
> -               csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
> -               local_flush_tlb_all();
> -
> -               flush_icache_deferred(next);
> -       }
> -}
> +void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> +       struct task_struct *task);
>
>  static inline void activate_mm(struct mm_struct *prev,
>                                struct mm_struct *next)
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index eb22ab49b3e0..d75b035786d6 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -3,3 +3,4 @@ obj-y += fault.o
>  obj-y += extable.o
>  obj-y += ioremap.o
>  obj-y += cacheflush.o
> +obj-y += context.o
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> new file mode 100644
> index 000000000000..fbb1cfe80267
> --- /dev/null
> +++ b/arch/riscv/mm/context.c
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2017 SiFive
> + */
> +
> +#include <linux/mm.h>
> +
> +#include <asm/tlbflush.h>
> +#include <asm/cacheflush.h>
> +
> +/*
> + * When necessary, performs a deferred icache flush for the given MM context,
> + * on the local CPU.  RISC-V has no direct mechanism for instruction cache
> + * shoot downs, so instead we send an IPI that informs the remote harts they
> + * need to flush their local instruction caches.  To avoid pathologically slow
> + * behavior in a common case (a bunch of single-hart processes on a many-hart
> + * machine, ie 'make -j') we avoid the IPIs for harts that are not currently
> + * executing a MM context and instead schedule a deferred local instruction
> + * cache flush to be performed before execution resumes on each hart.  This
> + * actually performs that local instruction cache flush, which implicitly only
> + * refers to the current hart.
> + */
> +static inline void flush_icache_deferred(struct mm_struct *mm)
> +{
> +#ifdef CONFIG_SMP
> +       unsigned int cpu = smp_processor_id();
> +       cpumask_t *mask = &mm->context.icache_stale_mask;
> +
> +       if (cpumask_test_cpu(cpu, mask)) {
> +               cpumask_clear_cpu(cpu, mask);
> +               /*
> +                * Ensure the remote hart's writes are visible to this hart.
> +                * This pairs with a barrier in flush_icache_mm.
> +                */
> +               smp_mb();
> +               local_flush_icache_all();
> +       }
> +
> +#endif
> +}
> +
> +void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> +       struct task_struct *task)
> +{
> +       unsigned int cpu;
> +
> +       if (unlikely(prev == next))
> +               return;
> +
> +       /*
> +        * Mark the current MM context as inactive, and the next as
> +        * active.  This is at least used by the icache flushing
> +        * routines in order to determine who should be flushed.
> +        */
> +       cpu = smp_processor_id();
> +
> +       cpumask_clear_cpu(cpu, mm_cpumask(prev));
> +       cpumask_set_cpu(cpu, mm_cpumask(next));
> +
> +       /*
> +        * Use the old spbtr name instead of using the current satp
> +        * name to support binutils 2.29 which doesn't know about the
> +        * privileged ISA 1.10 yet.
> +        */
> +       csr_write(sptbr, virt_to_pfn(next->pgd) | SATP_MODE);
> +       local_flush_tlb_all();
> +
> +       flush_icache_deferred(next);
> +}
> +
> --
> 2.17.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Apart from nit pointed out by Christoph, looks good to me.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid}.
  2019-03-27  0:41 ` [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid} Gary Guo
  2019-03-27  7:08   ` Christoph Hellwig
@ 2019-03-28  6:47   ` Anup Patel
  1 sibling, 0 replies; 29+ messages in thread
From: Anup Patel @ 2019-03-28  6:47 UTC (permalink / raw)
  To: Gary Guo
  Cc: Anup Patel, Palmer Dabbelt, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Wed, Mar 27, 2019 at 6:11 AM Gary Guo <gary@garyguo.net> wrote:
>
> From: Gary Guo <gary@garyguo.net>
>
> Currently sbi_remote_sfence_vma{,_asid} does not pass their arguments
> to SBI at all, which is semantically incorrect.
>
> Neither BBL nor OpenSBI is using these arguments at the moment, and
> they just do a global flush instead. However we still need to provide
> correct arguments.
>
> Signed-off-by: Gary Guo <gary@garyguo.net>
> ---
>  arch/riscv/include/asm/sbi.h | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index b6bb10b92fe2..19f231615510 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -26,22 +26,27 @@
>  #define SBI_REMOTE_SFENCE_VMA_ASID 7
>  #define SBI_SHUTDOWN 8
>
> -#define SBI_CALL(which, arg0, arg1, arg2) ({                   \
> +#define SBI_CALL(which, arg0, arg1, arg2, arg3) ({             \
>         register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0);   \
>         register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1);   \
>         register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2);   \
> +       register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3);   \
>         register uintptr_t a7 asm ("a7") = (uintptr_t)(which);  \
>         asm volatile ("ecall"                                   \
>                       : "+r" (a0)                               \
> -                     : "r" (a1), "r" (a2), "r" (a7)            \
> +                     : "r" (a1), "r" (a2), "r" (a3), "r" (a7)  \
>                       : "memory");                              \
>         a0;                                                     \
>  })
>
>  /* Lazy implementations until SBI is finalized */
> -#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0)
> -#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0)
> -#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0)
> +#define SBI_CALL_0(which) SBI_CALL(which, 0, 0, 0, 0)
> +#define SBI_CALL_1(which, arg0) SBI_CALL(which, arg0, 0, 0, 0)
> +#define SBI_CALL_2(which, arg0, arg1) SBI_CALL(which, arg0, arg1, 0, 0)
> +#define SBI_CALL_3(which, arg0, arg1, arg2) \
> +               SBI_CALL(which, arg0, arg1, arg2, 0)
> +#define SBI_CALL_4(which, arg0, arg1, arg2, arg3) \
> +               SBI_CALL(which, arg0, arg1, arg2, arg3)
>
>  static inline void sbi_console_putchar(int ch)
>  {
> @@ -86,7 +91,7 @@ static inline void sbi_remote_sfence_vma(const unsigned long *hart_mask,
>                                          unsigned long start,
>                                          unsigned long size)
>  {
> -       SBI_CALL_1(SBI_REMOTE_SFENCE_VMA, hart_mask);
> +       SBI_CALL_3(SBI_REMOTE_SFENCE_VMA, hart_mask, start, size);
>  }
>
>  static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
> @@ -94,7 +99,7 @@ static inline void sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
>                                               unsigned long size,
>                                               unsigned long asid)
>  {
> -       SBI_CALL_1(SBI_REMOTE_SFENCE_VMA_ASID, hart_mask);
> +       SBI_CALL_4(SBI_REMOTE_SFENCE_VMA_ASID, hart_mask, start, size, asid);
>  }
>
>  #endif
> --
> 2.17.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-27  0:41 ` [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown Gary Guo
  2019-03-27  7:31   ` Christoph Hellwig
@ 2019-03-28  6:50   ` Anup Patel
  1 sibling, 0 replies; 29+ messages in thread
From: Anup Patel @ 2019-03-28  6:50 UTC (permalink / raw)
  To: Gary Guo
  Cc: Anup Patel, Palmer Dabbelt, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Wed, Mar 27, 2019 at 6:11 AM Gary Guo <gary@garyguo.net> wrote:
>
> From: Gary Guo <gary@garyguo.net>
>
> This patch implements IPI-based remote TLB shootdown, which is useful
> at this stage for testing because BBL/OpenSBI ignores operands of
> sbi_remote_sfence_vma_asid and always perform a global TLB flush.
> The SBI-based remote TLB shootdown can still be opt-in using boot
> cmdline "tlbi_method=sbi".
>
> Signed-off-by: Gary Guo <gary@garyguo.net>
> Tested-by: Atish Patra <atish.patra@wdc.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 +
>  arch/riscv/mm/tlbflush.c                      | 99 +++++++++++++++++--
>  2 files changed, 98 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7a60edef09d2..afd34fa1db91 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4552,6 +4552,11 @@
>                         flushed.
>                         See arch/riscv/mm/tlbflush.c
>
> +       tlbi_method=    [RV]
> +                       Format: { "sbi", "ipi" }
> +                       Default: "ipi"
> +                       Specify the method used to perform remote TLB shootdown.
> +
>         tmem            [KNL,XEN]
>                         Enable the Transcendent memory driver if built-in.
>
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 33083f48a936..ceee76f14a0a 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -72,19 +72,106 @@ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
>
>  #ifdef CONFIG_SMP
>
> +/*
> + * SBI has interfaces for remote TLB shootdown.  If there is no hardware
> + * remote TLB shootdown support, SBI perform IPIs itself instead.  Some SBI
> + * implementations may also ignore ASID and address ranges provided and do a
> + * full TLB flush instead.  In these cases we might want to do IPIs ourselves.
> + *
> + * This parameter allows the approach (IPI/SBI) to be specified using boot
> + * cmdline.
> + */
> +static bool tlbi_ipi = true;
> +
> +static int __init setup_tlbi_method(char *str)
> +{
> +       if (strcmp(str, "ipi") == 0)
> +               tlbi_ipi = true;
> +       else if (strcmp(str, "sbi") == 0)
> +               tlbi_ipi = false;
> +       else
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +early_param("tlbi_method", setup_tlbi_method);
> +
> +
> +struct tlbi {
> +       unsigned long start;
> +       unsigned long size;
> +       unsigned long asid;
> +};
> +
> +static void ipi_remote_sfence_vma(void *info)
> +{
> +       struct tlbi *data = info;
> +       unsigned long start = data->start;
> +       unsigned long size = data->size;
> +       unsigned long i;
> +
> +       if (size == SFENCE_VMA_FLUSH_ALL) {
> +               local_flush_tlb_all();
> +       }

No brace required here.

> +
> +       for (i = 0; i < size; i += PAGE_SIZE) {
> +               __asm__ __volatile__ ("sfence.vma %0"
> +                                     : : "r" (start + i)
> +                                     : "memory");
> +       }

No brace required here as well.

> +}
> +
> +static void ipi_remote_sfence_vma_asid(void *info)
> +{
> +       struct tlbi *data = info;
> +       unsigned long asid = data->asid;
> +       unsigned long start = data->start;
> +       unsigned long size = data->size;
> +       unsigned long i;
> +
> +       if (size == SFENCE_VMA_FLUSH_ALL) {
> +               __asm__ __volatile__ ("sfence.vma x0, %0"
> +                                     : : "r" (asid)
> +                                     : "memory");
> +               return;
> +       }
> +
> +       for (i = 0; i < size; i += PAGE_SIZE) {
> +               __asm__ __volatile__ ("sfence.vma %0, %1"
> +                                     : : "r" (start + i), "r" (asid)
> +                                     : "memory");
> +       }

No brace required here as well.

> +}
> +
>  static void remote_sfence_vma(unsigned long start, unsigned long size)
>  {
> -       sbi_remote_sfence_vma(NULL, start, size);
> +       if (tlbi_ipi) {
> +               struct tlbi info = {
> +                       .start = start,
> +                       .size = size,
> +               };
> +               on_each_cpu(ipi_remote_sfence_vma, &info, 1);
> +       } else
> +               sbi_remote_sfence_vma(NULL, start, size);
>  }
>
>  static void remote_sfence_vma_asid(cpumask_t *mask, unsigned long start,
>                                    unsigned long size, unsigned long asid)
>  {
> -       cpumask_t hmask;
> -
> -       cpumask_clear(&hmask);
> -       riscv_cpuid_to_hartid_mask(mask, &hmask);
> -       sbi_remote_sfence_vma_asid(hmask.bits, start, size, asid);
> +       if (tlbi_ipi) {
> +               struct tlbi info = {
> +                       .start = start,
> +                       .size = size,
> +                       .asid = asid,
> +               };
> +               on_each_cpu_mask(mask, ipi_remote_sfence_vma_asid, &info, 1);
> +       } else {
> +               cpumask_t hmask;
> +
> +               cpumask_clear(&hmask);
> +               riscv_cpuid_to_hartid_mask(mask, &hmask);
> +               sbi_remote_sfence_vma_asid(hmask.bits, start, size, asid);
> +       }
>  }
>
>
> --
> 2.17.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-27 13:56     ` Gary Guo
@ 2019-03-28 16:17       ` Christoph Hellwig
  2019-03-28 16:39         ` Gary Guo
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-28 16:17 UTC (permalink / raw)
  To: Gary Guo; +Cc: Christoph Hellwig, linux-riscv

On Wed, Mar 27, 2019 at 01:56:28PM +0000, Gary Guo wrote:
> >> +static inline void local_flush_tlb_page(struct vm_area_struct *vma,
> >> +	unsigned long addr)
> >> +{
> >> +	__asm__ __volatile__ ("sfence.vma %0, %1"
> >> +			      : : "r" (addr), "r" (0)
> >> +			      : "memory");
> >> +}
> > 
> > Why do we pass the vma argument here even if it is never used?  That
> > just seems to create some rather pointless churn.  Also I'd add
> > local_flush_tlb_mm below local_flush_tlb_page to avoid churn as well,
> > nevermind that it seems the more logical order to me >
> This isn't used now, but we need that for ASID support. It also more 
> consistent with the non-SMP flush signature, and more consistent with 
> code of other architectures.

I'd rather keep it simple for now.  For ASID support I suspect you'll
only need it to get the asid from the mm_struct pointer to by the
vma, right?  I'd rather pass the asid directly in that case.

> >> +static unsigned long tlbi_range_threshold = PAGE_SIZE;
> > 
> > I really hate having this is a tunable in the kernel code.  I think
> > the right answer is to have a device tree entry to carry this number
> > so that the platform can supply it.  Btw, what are examples of
> > platforms that flush globalls vs per-page at the moment?  What is a good
> > larger value for the latter based on your testing?
> > 
> This is discussed in previous versions of this patch, and we arrived at 
> the conclusion that a boot parameter is the best way to do it now, as at 
> the moment we have no other ways to get this information. The actual 
> value really depends on the actual implementation. If the implementation 
> has a super large TLB where full invalidation would be super expensive, 
> they might even want the value to be 511.

Sorry, I might not have been clear above - the tunable is ok for
playing around and benchmarking, but it is not the kind of interface we
should have regular users to poke at for good performance.  So I don't
mind keeping the paramter in, but we also really need to define a way
how the value could be passed through the device tree so that we get
a good default.

And I'd still like an answer for my sectond question above - what
were the good values for say the sifive u54 and qemu in your tests?

> > Also I wonder if we should also split this tunable and the optional
> > global flush into a separate patch.  This is in this first patch
> > just make use of the asid,  and then another patch to add the threshold
> > for doing the full flush.
> I don't think we should. This patch is more like a rewrite to old logic 
> rather than patching things up incrementally.

Well, we have two pretty distinct changes - one is to use a threshold
to do a global(-ish) flush instead of a per-page one, and the other is
to use AISD 0 explicitly.  In Linux we generally try to keep things at
the smallest logical change.  I'm not going to push hard for this, but
that is just how we normally do it.


> >> +	while (start < end) {
> >> +		__asm__ __volatile__ ("sfence.vma %0, %1"
> >> +				      : : "r" (start), "r" (0)
> >> +				      : "memory");
> > 
> > I think this should just call local_flush_tlb_page.
> > 
> I do this to minimise changes we need if we want to add ASID (in which 
> case we want to avoid retrieving ASID from atomic variable multiple times).

We can take vare of that later, preferably with a nice helper that gets
the ASID as an argument (see my local_flush_tlb_page comment above).

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-27 14:03     ` Gary Guo
@ 2019-03-28 16:36       ` Christoph Hellwig
  2019-03-28 16:47         ` Gary Guo
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-28 16:36 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

> > 
> > I think Anup now fixes OpenSBI.  Do you have benchmarks vs BBL,
> > old OpenSBI and new OpenSBI?
> > 
> See the cover letter. OpenSBI's code hasn't been made into stable, and 
> it has race conditions now.

Well, I'd still like to know numbers.  That is how people generally
justify more complex code that claims to be more optimal :)

> > local_flush_tlb_range?
> > 
> We don't have the VMA structures now so no. This is related to future 
> ASID patch as well.

Well, sprinkling inline assembly all over is generally not a good idea,
so I'd like to have some helper.  And as pointed out before, IFF we pass
an asid instead of the vma to local_flush_tlb_page once ASID support
is added local_flush_tlb_page would nicely do that job.  If we really
want another indirection it could be local_flush_tlb_page_asid instead,
but I don't really see the point.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-28 16:17       ` Christoph Hellwig
@ 2019-03-28 16:39         ` Gary Guo
  2019-03-28 16:55           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Gary Guo @ 2019-03-28 16:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-riscv



On 28/03/2019 16:17, Christoph Hellwig wrote:
> On Wed, Mar 27, 2019 at 01:56:28PM +0000, Gary Guo wrote:
>>>> +static inline void local_flush_tlb_page(struct vm_area_struct *vma,
>>>> +	unsigned long addr)
>>>> +{
>>>> +	__asm__ __volatile__ ("sfence.vma %0, %1"
>>>> +			      : : "r" (addr), "r" (0)
>>>> +			      : "memory");
>>>> +}
>>>
>>> Why do we pass the vma argument here even if it is never used?  That
>>> just seems to create some rather pointless churn.  Also I'd add
>>> local_flush_tlb_mm below local_flush_tlb_page to avoid churn as well,
>>> nevermind that it seems the more logical order to me >
>> This isn't used now, but we need that for ASID support. It also more
>> consistent with the non-SMP flush signature, and more consistent with
>> code of other architectures.
> 
> I'd rather keep it simple for now.  For ASID support I suspect you'll
> only need it to get the asid from the mm_struct pointer to by the
> vma, right?  I'd rather pass the asid directly in that case.
> 

Yes, just takes it from mm_struct. But the key point is to keep it 
similar to the local_flush_tlb_page.

>>>> +static unsigned long tlbi_range_threshold = PAGE_SIZE;
>>>
>>> I really hate having this is a tunable in the kernel code.  I think
>>> the right answer is to have a device tree entry to carry this number
>>> so that the platform can supply it.  Btw, what are examples of
>>> platforms that flush globalls vs per-page at the moment?  What is a good
>>> larger value for the latter based on your testing?
>>>
>> This is discussed in previous versions of this patch, and we arrived at
>> the conclusion that a boot parameter is the best way to do it now, as at
>> the moment we have no other ways to get this information. The actual
>> value really depends on the actual implementation. If the implementation
>> has a super large TLB where full invalidation would be super expensive,
>> they might even want the value to be 511.
> 
> Sorry, I might not have been clear above - the tunable is ok for
> playing around and benchmarking, but it is not the kind of interface we
> should have regular users to poke at for good performance.  So I don't
> mind keeping the paramter in, but we also really need to define a way
> how the value could be passed through the device tree so that we get
> a good default.
> 
> And I'd still like an answer for my sectond question above - what
> were the good values for say the sifive u54 and qemu in your tests?
> 
QEMU currently treats all SFENCE.VMA as global. Technically the QEMU's 
implementation can be modified to do page-level flush instead but the 
performance will not differ, as the dominating factor is resetting jump 
cache.

I don't have a SiFive board so I can't tell what's a good value for that.

On a hypothetical platform that I am working at (simulation only) we can 
benefit even when setting it to 511 (max allowed, as if value >=512 we 
don't know if a non-leaf entry is changed, in which case spec mandates a 
full flush).

So this really depends on the platform.

>>> Also I wonder if we should also split this tunable and the optional
>>> global flush into a separate patch.  This is in this first patch
>>> just make use of the asid,  and then another patch to add the threshold
>>> for doing the full flush.
>> I don't think we should. This patch is more like a rewrite to old logic
>> rather than patching things up incrementally.
> 
> Well, we have two pretty distinct changes - one is to use a threshold
> to do a global(-ish) flush instead of a per-page one, and the other is
> to use AISD 0 explicitly.  In Linux we generally try to keep things at
> the smallest logical change.  I'm not going to push hard for this, but
> that is just how we normally do it.
> 
> 
>>>> +	while (start < end) {
>>>> +		__asm__ __volatile__ ("sfence.vma %0, %1"
>>>> +				      : : "r" (start), "r" (0)
>>>> +				      : "memory");
>>>
>>> I think this should just call local_flush_tlb_page.
>>>
>> I do this to minimise changes we need if we want to add ASID (in which
>> case we want to avoid retrieving ASID from atomic variable multiple times).
> 
> We can take vare of that later, preferably with a nice helper that gets
> the ASID as an argument (see my local_flush_tlb_page comment above).
> 
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-28 16:36       ` Christoph Hellwig
@ 2019-03-28 16:47         ` Gary Guo
  2019-03-28 16:57           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Gary Guo @ 2019-03-28 16:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Atish Patra, linux-riscv, Anup Patel, Palmer Dabbelt, Albert Ou



On 28/03/2019 16:36, Christoph Hellwig wrote:
>>>
>>> I think Anup now fixes OpenSBI.  Do you have benchmarks vs BBL,
>>> old OpenSBI and new OpenSBI?
>>>
>> See the cover letter. OpenSBI's code hasn't been made into stable, and
>> it has race conditions now.
> 
> Well, I'd still like to know numbers.  That is how people generally
> justify more complex code that claims to be more optimal :)
> 
No visible difference on QEMU, as all SFENCE.VMAs are global so we 
doesn't save anything, and the added cost of doing IPI is barely visible.

Don't have a board so can't test it out.

The main gain is to allow hardware devs to test their TLB implementation 
with Linux. If OpenSBI implements this properly we don't probably need 
the IPI code I guess.

>>> local_flush_tlb_range?
>>>
>> We don't have the VMA structures now so no. This is related to future
>> ASID patch as well.
> 
> Well, sprinkling inline assembly all over is generally not a good idea,
> so I'd like to have some helper.  And as pointed out before, IFF we pass
> an asid instead of the vma to local_flush_tlb_page once ASID support
> is added local_flush_tlb_page would nicely do that job.  If we really
> want another indirection it could be local_flush_tlb_page_asid instead,
> but I don't really see the point.
> 
Caller of local_flush_tlb_page (and the non-local) version shouldn't 
care about ASID. They only care about a particular MM. flush_tlb_page 
always takes a MM as argument and I see no point about why we shouldn't 
in the local version.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/5] riscv: rewrite tlb flush for performance
  2019-03-28 16:39         ` Gary Guo
@ 2019-03-28 16:55           ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-28 16:55 UTC (permalink / raw)
  To: Gary Guo; +Cc: Christoph Hellwig, linux-riscv

On Thu, Mar 28, 2019 at 04:39:53PM +0000, Gary Guo wrote:
> > I'd rather keep it simple for now.  For ASID support I suspect you'll
> > only need it to get the asid from the mm_struct pointer to by the
> > vma, right?  I'd rather pass the asid directly in that case.
> > 
> 
> Yes, just takes it from mm_struct. But the key point is to keep it 
> similar to the local_flush_tlb_page.

And I'd much rather not pass unused argument that also require
duplicating the inline assembly code.

> > And I'd still like an answer for my sectond question above - what
> > were the good values for say the sifive u54 and qemu in your tests?
> > 
> QEMU currently treats all SFENCE.VMA as global. Technically the QEMU's 
> implementation can be modified to do page-level flush instead but the 
> performance will not differ, as the dominating factor is resetting jump 
> cache.
> 
> I don't have a SiFive board so I can't tell what's a good value for that.
> 
> On a hypothetical platform that I am working at (simulation only) we can 
> benefit even when setting it to 511 (max allowed, as if value >=512 we 
> don't know if a non-leaf entry is changed, in which case spec mandates a 
> full flush).
> 
> So this really depends on the platform.

Ok, so before moving on we really should figure out a good way for
the currently only support (well mostly supported) hardware platform
and figure out a way to pass that through DT.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown
  2019-03-28 16:47         ` Gary Guo
@ 2019-03-28 16:57           ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2019-03-28 16:57 UTC (permalink / raw)
  To: Gary Guo
  Cc: Palmer Dabbelt, Anup Patel, Christoph Hellwig, Atish Patra,
	Albert Ou, linux-riscv

On Thu, Mar 28, 2019 at 04:47:36PM +0000, Gary Guo wrote:
> Caller of local_flush_tlb_page (and the non-local) version shouldn't 
> care about ASID. They only care about a particular MM. flush_tlb_page 
> always takes a MM as argument and I see no point about why we shouldn't 
> in the local version.

There is exactly two callers in the tree with your patches, one of
which is the !SMP version of flush_tlb_page, and the other is
update_mmu_cache, so I'm really not worried about spreading ASID
information too far.  And all that is only a concern once we get
ASID support anyway.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
  2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
                   ` (4 preceding siblings ...)
  2019-03-27  0:41 ` [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown Gary Guo
@ 2019-04-10  7:04 ` Christoph Hellwig
  2019-04-10  9:01   ` Anup Patel
  5 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-04-10  7:04 UTC (permalink / raw)
  To: Gary Guo
  Cc: Anup Patel, Palmer Dabbelt, Christoph Hellwig, Atish Patra,
	Albert Ou, opensbi, linux-riscv

On Wed, Mar 27, 2019 at 12:41:11AM +0000, Gary Guo wrote:
> From: Gary Guo <gary@garyguo.net>
> 
> This is the v4 of the general TLB/I$ flush improvement series.
> I still have tlbi_method=ipi being the default as opposed to what
> Atish suggests, as:
> * There are still usage of BBL in the wild
> * OpenSBI's support isn't made into stable yet.
> * OpenSBI's support on the dev branch has some racing issues yet to resolve.

Can you clarify the races?  I know Anup had some FIFO-order commits
in opensbi about a week ago, did they address you concerns?

Anup, do you have performance numbers for the old opensbi vs your
implementation of the optimized TLB flushing vs this patch?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
  2019-04-10  7:04 ` [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Christoph Hellwig
@ 2019-04-10  9:01   ` Anup Patel
  2019-04-10 10:11     ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Anup Patel @ 2019-04-10  9:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Albert Ou, Gary Guo, Anup Patel, Atish Patra, Palmer Dabbelt,
	opensbi, linux-riscv

On Wed, Apr 10, 2019 at 12:34 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Mar 27, 2019 at 12:41:11AM +0000, Gary Guo wrote:
> > From: Gary Guo <gary@garyguo.net>
> >
> > This is the v4 of the general TLB/I$ flush improvement series.
> > I still have tlbi_method=ipi being the default as opposed to what
> > Atish suggests, as:
> > * There are still usage of BBL in the wild
> > * OpenSBI's support isn't made into stable yet.
> > * OpenSBI's support on the dev branch has some racing issues yet to resolve.
>
> Can you clarify the races?  I know Anup had some FIFO-order commits
> in opensbi about a week ago, did they address you concerns?
>
> Anup, do you have performance numbers for the old opensbi vs your
> implementation of the optimized TLB flushing vs this patch?

Atish had posted performance numbers on his GitHub PR at:
https://github.com/riscv/opensbi/pull/111

These performance numbers are as follows.....

Benchmark used: A microbenchmark that mmap a ramdisk (1G) and
multiple threads access 50MB of memory randomly.

https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c

The result is averaged over 25 iterations for 8 threads on HiFive
Unleashed board. In both cases around ~1M remote tlb flushes are triggered.

                                 IPI               SBI            Gain
Average Write Time             2.53183        2.43263           +4.34%
Average Read Time             1.32198         1.24643           +6.09%
Total Time                    97.7589         92.859            +5.01%


I believe he has more optimizations in-pipeline for OpenSBI
so we might see even better numbers.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
  2019-04-10  9:01   ` Anup Patel
@ 2019-04-10 10:11     ` Christoph Hellwig
  2019-04-10 10:22       ` Anup Patel
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2019-04-10 10:11 UTC (permalink / raw)
  To: Anup Patel
  Cc: Albert Ou, opensbi, Anup Patel, Christoph Hellwig, Atish Patra,
	Palmer Dabbelt, Gary Guo, linux-riscv

On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote:
> > Can you clarify the races?  I know Anup had some FIFO-order commits
> > in opensbi about a week ago, did they address you concerns?
> >
> > Anup, do you have performance numbers for the old opensbi vs your
> > implementation of the optimized TLB flushing vs this patch?
> 
> Atish had posted performance numbers on his GitHub PR at:
> https://github.com/riscv/opensbi/pull/111
> 
> These performance numbers are as follows.....
> 
> Benchmark used: A microbenchmark that mmap a ramdisk (1G) and
> multiple threads access 50MB of memory randomly.
> 
> https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c
> 
> The result is averaged over 25 iterations for 8 threads on HiFive
> Unleashed board. In both cases around ~1M remote tlb flushes are triggered.
> 
>                                  IPI               SBI            Gain
> Average Write Time             2.53183        2.43263           +4.34%
> Average Read Time             1.32198         1.24643           +6.09%
> Total Time                    97.7589         92.859            +5.01%

So what does this mean?  I assume the codebases are latest(-ish)
opensbi and latest(-ish) kernel with the patches from Gary, and
IPI is with the lernel based code enabled, and SBI is with the SBI
calls?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
  2019-04-10 10:11     ` Christoph Hellwig
@ 2019-04-10 10:22       ` Anup Patel
  2019-04-11  1:24         ` Atish Patra
  0 siblings, 1 reply; 29+ messages in thread
From: Anup Patel @ 2019-04-10 10:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Albert Ou, Gary Guo, Anup Patel, Atish Patra, Palmer Dabbelt,
	opensbi, linux-riscv

On Wed, Apr 10, 2019 at 3:41 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote:
> > > Can you clarify the races?  I know Anup had some FIFO-order commits
> > > in opensbi about a week ago, did they address you concerns?
> > >
> > > Anup, do you have performance numbers for the old opensbi vs your
> > > implementation of the optimized TLB flushing vs this patch?
> >
> > Atish had posted performance numbers on his GitHub PR at:
> > https://github.com/riscv/opensbi/pull/111
> >
> > These performance numbers are as follows.....
> >
> > Benchmark used: A microbenchmark that mmap a ramdisk (1G) and
> > multiple threads access 50MB of memory randomly.
> >
> > https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c
> >
> > The result is averaged over 25 iterations for 8 threads on HiFive
> > Unleashed board. In both cases around ~1M remote tlb flushes are triggered.
> >
> >                                  IPI               SBI            Gain
> > Average Write Time             2.53183        2.43263           +4.34%
> > Average Read Time             1.32198         1.24643           +6.09%
> > Total Time                    97.7589         92.859            +5.01%
>
> So what does this mean?  I assume the codebases are latest(-ish)
> opensbi and latest(-ish) kernel with the patches from Gary, and
> IPI is with the lernel based code enabled, and SBI is with the SBI
> calls?

Yes, this is measured using Gary's v4 patches.The IPI numbers are
with in-kernel remote TLB flush whereas SBI numbers are with
SBI-based remote TLB flush.

Atish's changes for remote TLB flushes are available in latest OpenSBI.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 0/5] TLB/I$ flush cleanups and improvements
  2019-04-10 10:22       ` Anup Patel
@ 2019-04-11  1:24         ` Atish Patra
  0 siblings, 0 replies; 29+ messages in thread
From: Atish Patra @ 2019-04-11  1:24 UTC (permalink / raw)
  To: Anup Patel, Christoph Hellwig
  Cc: Albert Ou, Gary Guo, Anup Patel, Palmer Dabbelt, opensbi, linux-riscv

On 4/10/19 3:22 AM, Anup Patel wrote:
> On Wed, Apr 10, 2019 at 3:41 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Wed, Apr 10, 2019 at 02:31:04PM +0530, Anup Patel wrote:
>>>> Can you clarify the races?  I know Anup had some FIFO-order commits
>>>> in opensbi about a week ago, did they address you concerns?
>>>>
>>>> Anup, do you have performance numbers for the old opensbi vs your
>>>> implementation of the optimized TLB flushing vs this patch?
>>>
>>> Atish had posted performance numbers on his GitHub PR at:
>>> https://github.com/riscv/opensbi/pull/111
>>>
>>> These performance numbers are as follows.....
>>>
>>> Benchmark used: A microbenchmark that mmap a ramdisk (1G) and
>>> multiple threads access 50MB of memory randomly.
>>>
>>> https://github.com/westerndigitalcorporation/hmmap/blob/master/userspace/hmmap_uspace_common.c
>>>
>>> The result is averaged over 25 iterations for 8 threads on HiFive
>>> Unleashed board. In both cases around ~1M remote tlb flushes are triggered.
>>>
>>>                                   IPI               SBI            Gain
>>> Average Write Time             2.53183        2.43263           +4.34%
>>> Average Read Time             1.32198         1.24643           +6.09%
>>> Total Time                    97.7589         92.859            +5.01%
>>
>> So what does this mean?  I assume the codebases are latest(-ish)
>> opensbi and latest(-ish) kernel with the patches from Gary, and
>> IPI is with the lernel based code enabled, and SBI is with the SBI
>> calls?
> 
> Yes, this is measured using Gary's v4 patches.The IPI numbers are
> with in-kernel remote TLB flush whereas SBI numbers are with
> SBI-based remote TLB flush.
> 
> Atish's changes for remote TLB flushes are available in latest OpenSBI.
> 
> Regards,
> Anup
> 

The patch to access the tlb statistics in vmstat can be found here.

https://patchwork.kernel.org/project/linux-riscv/list/?series=103939

Regards,
Atish

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-04-11  1:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27  0:41 [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Gary Guo
2019-03-27  0:41 ` [PATCH v4 1/5] riscv: move flush_icache_{all,mm} to cacheflush.c Gary Guo
2019-03-27  7:06   ` Christoph Hellwig
2019-03-28  6:45   ` Anup Patel
2019-03-27  0:41 ` [PATCH v4 3/5] riscv: fix sbi_remote_sfence_vma{,_asid} Gary Guo
2019-03-27  7:08   ` Christoph Hellwig
2019-03-28  6:47   ` Anup Patel
2019-03-27  0:41 ` [PATCH v4 4/5] riscv: rewrite tlb flush for performance Gary Guo
2019-03-27  7:25   ` Christoph Hellwig
2019-03-27 13:56     ` Gary Guo
2019-03-28 16:17       ` Christoph Hellwig
2019-03-28 16:39         ` Gary Guo
2019-03-28 16:55           ` Christoph Hellwig
2019-03-27  0:41 ` [PATCH v4 2/5] riscv: move switch_mm to its own file Gary Guo
2019-03-27  7:08   ` Christoph Hellwig
2019-03-27  7:18   ` Christoph Hellwig
2019-03-28  6:47   ` Anup Patel
2019-03-27  0:41 ` [PATCH v4 5/5] riscv: implement IPI-based remote TLB shootdown Gary Guo
2019-03-27  7:31   ` Christoph Hellwig
2019-03-27 14:03     ` Gary Guo
2019-03-28 16:36       ` Christoph Hellwig
2019-03-28 16:47         ` Gary Guo
2019-03-28 16:57           ` Christoph Hellwig
2019-03-28  6:50   ` Anup Patel
2019-04-10  7:04 ` [PATCH v4 0/5] TLB/I$ flush cleanups and improvements Christoph Hellwig
2019-04-10  9:01   ` Anup Patel
2019-04-10 10:11     ` Christoph Hellwig
2019-04-10 10:22       ` Anup Patel
2019-04-11  1:24         ` Atish Patra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.