linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] riscv: tlb flush improvements
@ 2023-09-11 13:12 Alexandre Ghiti
  2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-11 13:12 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Alexandre Ghiti

This series optimizes the tlb flushes on riscv which used to simply
flush the whole tlb whatever the size of the range to flush or the size
of the stride.

Patch 3 introduces a threshold that is microarchitecture specific and
will very likely be modified by vendors, not sure though which mechanism
we'll use to do that (dt? alternatives? vendor initialization code?).

Next steps would be to implement:
- svinval extension as Mayuresh did here [1]
- BATCHED_UNMAP_TLB_FLUSH (I'll wait for arm64 patchset to land)
- MMU_GATHER_RCU_TABLE_FREE
- MMU_GATHER_MERGE_VMAS

Any other idea welcome.

[1] https://lore.kernel.org/linux-riscv/20230623123849.1425805-1-mchitale@ventanamicro.com/

Changes in v4:
- Correctly handle the stride size for a NAPOT hugepage, thanks to Aaron Durbin!
- Fix flush_tlb_kernel_range() which passed a wrong argument to __flush_tlb_range()
- Factorize code to handle asid/no asid flushes
- Fix kernel flush bug where I used to pass 0 instead of x0, big thanks to Samuel for finding that!

Changes in v3:
- Add RB from Andrew, thanks!
- Unwrap a few lines, as suggested by Andrew
- Introduce defines for -1 constants used in tlbflush.c, as suggested by Andrew and Conor
- Use huge_page_size() directly instead of using the shift, as suggested by Andrew
- Remove misleading comments as suggested by Conor

Changes in v2:
- Make static tlb_flush_all_threshold, we'll figure out later how to
  override this value on a vendor basis, as suggested by Conor and Palmer
- Fix nommu build, as reported by Conor

Alexandre Ghiti (4):
  riscv: Improve flush_tlb()
  riscv: Improve flush_tlb_range() for hugetlb pages
  riscv: Make __flush_tlb_range() loop over pte instead of flushing the
    whole tlb
  riscv: Improve flush_tlb_kernel_range()

 arch/riscv/include/asm/sbi.h      |   3 -
 arch/riscv/include/asm/tlb.h      |   8 +-
 arch/riscv/include/asm/tlbflush.h |  15 ++-
 arch/riscv/kernel/sbi.c           |  32 ++---
 arch/riscv/mm/tlbflush.c          | 192 ++++++++++++++++++++----------
 5 files changed, 155 insertions(+), 95 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/4] riscv: Improve flush_tlb()
  2023-09-11 13:12 [PATCH v4 0/4] riscv: tlb flush improvements Alexandre Ghiti
@ 2023-09-11 13:12 ` Alexandre Ghiti
  2023-09-19 12:07   ` Lad, Prabhakar
  2023-10-09 17:53   ` Samuel Holland
  2023-09-11 13:12 ` [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-11 13:12 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Alexandre Ghiti, Andrew Jones

For now, flush_tlb() simply calls flush_tlb_mm() which results in a
flush of the whole TLB. So let's use mmu_gather fields to provide a more
fine-grained flush of the TLB.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/include/asm/tlb.h      | 8 +++++++-
 arch/riscv/include/asm/tlbflush.h | 3 +++
 arch/riscv/mm/tlbflush.c          | 7 +++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
index 120bcf2ed8a8..1eb5682b2af6 100644
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb);
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	flush_tlb_mm(tlb->mm);
+#ifdef CONFIG_MMU
+	if (tlb->fullmm || tlb->need_flush_all)
+		flush_tlb_mm(tlb->mm);
+	else
+		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
+				   tlb_get_unmap_size(tlb));
+#endif
 }
 
 #endif /* _ASM_RISCV_TLB_H */
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index a09196f8de68..f5c4fb0ae642 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr)
 #if defined(CONFIG_SMP) && defined(CONFIG_MMU)
 void flush_tlb_all(void);
 void flush_tlb_mm(struct mm_struct *mm);
+void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
+			unsigned long end, unsigned int page_size);
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		     unsigned long end);
@@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 }
 
 #define flush_tlb_mm(mm) flush_tlb_all()
+#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
 #endif /* !CONFIG_SMP || !CONFIG_MMU */
 
 /* Flush a range of kernel pages */
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 77be59aadc73..fa03289853d8 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm)
 	__flush_tlb_range(mm, 0, -1, PAGE_SIZE);
 }
 
+void flush_tlb_mm_range(struct mm_struct *mm,
+			unsigned long start, unsigned long end,
+			unsigned int page_size)
+{
+	__flush_tlb_range(mm, start, end - start, page_size);
+}
+
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 {
 	__flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages
  2023-09-11 13:12 [PATCH v4 0/4] riscv: tlb flush improvements Alexandre Ghiti
  2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
@ 2023-09-11 13:12 ` Alexandre Ghiti
  2023-09-19 12:07   ` Lad, Prabhakar
  2023-10-09 17:53   ` Samuel Holland
  2023-09-11 13:12 ` [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb Alexandre Ghiti
  2023-09-11 13:12 ` [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
  3 siblings, 2 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-11 13:12 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Alexandre Ghiti, Andrew Jones

flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
whole tlb: so set a stride of the size of the hugetlb mapping in order to
only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
all PTEs that constitute this mapping must be invalidated, so the stride
size must actually be the size of the PTE.

Note that THPs are directly handled by flush_pmd_tlb_range().

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/mm/tlbflush.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index fa03289853d8..5bda6d4fed90 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -3,6 +3,7 @@
 #include <linux/mm.h>
 #include <linux/smp.h>
 #include <linux/sched.h>
+#include <linux/hugetlb.h>
 #include <asm/sbi.h>
 #include <asm/mmu_context.h>
 
@@ -147,7 +148,43 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		     unsigned long end)
 {
-	__flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE);
+	unsigned long stride_size;
+
+	stride_size = is_vm_hugetlb_page(vma) ?
+				huge_page_size(hstate_vma(vma)) :
+				PAGE_SIZE;
+
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+	/*
+	 * As stated in the privileged specification, every PTE in a NAPOT
+	 * region must be invalidated, so reset the stride in that case.
+	 */
+	if (has_svnapot()) {
+		unsigned long order, napot_size;
+
+		for_each_napot_order(order) {
+			napot_size = napot_cont_size(order);
+
+			if (stride_size != napot_size)
+				continue;
+
+			if (napot_size >= PGDIR_SIZE)
+				stride_size = PGDIR_SIZE;
+			else if (napot_size >= P4D_SIZE)
+				stride_size = P4D_SIZE;
+			else if (napot_size >= PUD_SIZE)
+				stride_size = PUD_SIZE;
+			else if (napot_size >= PMD_SIZE)
+				stride_size = PMD_SIZE;
+			else
+				stride_size = PAGE_SIZE;
+
+			break;
+		}
+	}
+#endif
+
+	__flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
 }
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
  2023-09-11 13:12 [PATCH v4 0/4] riscv: tlb flush improvements Alexandre Ghiti
  2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
  2023-09-11 13:12 ` [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
@ 2023-09-11 13:12 ` Alexandre Ghiti
  2023-09-19 12:09   ` Lad, Prabhakar
  2023-09-11 13:12 ` [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
  3 siblings, 1 reply; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-11 13:12 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Alexandre Ghiti, Andrew Jones

Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.

Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/include/asm/sbi.h      |   3 -
 arch/riscv/include/asm/tlbflush.h |   3 +
 arch/riscv/kernel/sbi.c           |  32 +++------
 arch/riscv/mm/tlbflush.c          | 115 +++++++++++++++---------------
 4 files changed, 72 insertions(+), 81 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 5b4a1bf5f439..b79d0228144f 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -273,9 +273,6 @@ void sbi_set_timer(uint64_t stime_value);
 void sbi_shutdown(void);
 void sbi_send_ipi(unsigned int cpu);
 int sbi_remote_fence_i(const struct cpumask *cpu_mask);
-int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
-			   unsigned long start,
-			   unsigned long size);
 
 int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
 				unsigned long start,
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index f5c4fb0ae642..170a49c531c6 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -11,6 +11,9 @@
 #include <asm/smp.h>
 #include <asm/errata_list.h>
 
+#define FLUSH_TLB_MAX_SIZE      ((unsigned long)-1)
+#define FLUSH_TLB_NO_ASID       ((unsigned long)-1)
+
 #ifdef CONFIG_MMU
 extern unsigned long asid_mask;
 
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index c672c8ba9a2a..5a62ed1da453 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -11,6 +11,7 @@
 #include <linux/reboot.h>
 #include <asm/sbi.h>
 #include <asm/smp.h>
+#include <asm/tlbflush.h>
 
 /* default SBI version is 0.1 */
 unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT;
@@ -376,32 +377,15 @@ int sbi_remote_fence_i(const struct cpumask *cpu_mask)
 }
 EXPORT_SYMBOL(sbi_remote_fence_i);
 
-/**
- * sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote
- *			     harts for the specified virtual address range.
- * @cpu_mask: A cpu mask containing all the target harts.
- * @start: Start of the virtual address
- * @size: Total size of the virtual address range.
- *
- * Return: 0 on success, appropriate linux error code otherwise.
- */
-int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
-			   unsigned long start,
-			   unsigned long size)
-{
-	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
-			    cpu_mask, start, size, 0, 0);
-}
-EXPORT_SYMBOL(sbi_remote_sfence_vma);
-
 /**
  * sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given
- * remote harts for a virtual address range belonging to a specific ASID.
+ * remote harts for a virtual address range belonging to a specific ASID or not.
  *
  * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the virtual address
  * @size: Total size of the virtual address range.
- * @asid: The value of address space identifier (ASID).
+ * @asid: The value of address space identifier (ASID), or FLUSH_TLB_NO_ASID
+ * for flushing all address spaces.
  *
  * Return: 0 on success, appropriate linux error code otherwise.
  */
@@ -410,8 +394,12 @@ int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
 				unsigned long size,
 				unsigned long asid)
 {
-	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
-			    cpu_mask, start, size, asid, 0);
+	if (asid == FLUSH_TLB_NO_ASID)
+		return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
+				    cpu_mask, start, size, 0, 0);
+	else
+		return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
+				    cpu_mask, start, size, asid, 0);
 }
 EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
 
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 5bda6d4fed90..2c1136d73411 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -9,28 +9,50 @@
 
 static inline void local_flush_tlb_all_asid(unsigned long asid)
 {
-	__asm__ __volatile__ ("sfence.vma x0, %0"
-			:
-			: "r" (asid)
-			: "memory");
+	if (asid != FLUSH_TLB_NO_ASID)
+		__asm__ __volatile__ ("sfence.vma x0, %0"
+				:
+				: "r" (asid)
+				: "memory");
+	else
+		local_flush_tlb_all();
 }
 
 static inline void local_flush_tlb_page_asid(unsigned long addr,
 		unsigned long asid)
 {
-	__asm__ __volatile__ ("sfence.vma %0, %1"
-			:
-			: "r" (addr), "r" (asid)
-			: "memory");
+	if (asid != FLUSH_TLB_NO_ASID)
+		__asm__ __volatile__ ("sfence.vma %0, %1"
+				:
+				: "r" (addr), "r" (asid)
+				: "memory");
+	else
+		local_flush_tlb_page(addr);
 }
 
-static inline void local_flush_tlb_range(unsigned long start,
-		unsigned long size, unsigned long stride)
+/*
+ * Flush entire TLB if number of entries to be flushed is greater
+ * than the threshold below.
+ */
+static unsigned long tlb_flush_all_threshold __read_mostly = 64;
+
+static void local_flush_tlb_range_threshold_asid(unsigned long start,
+						 unsigned long size,
+						 unsigned long stride,
+						 unsigned long asid)
 {
-	if (size <= stride)
-		local_flush_tlb_page(start);
-	else
-		local_flush_tlb_all();
+	u16 nr_ptes_in_range = DIV_ROUND_UP(size, stride);
+	int i;
+
+	if (nr_ptes_in_range > tlb_flush_all_threshold) {
+		local_flush_tlb_all_asid(asid);
+		return;
+	}
+
+	for (i = 0; i < nr_ptes_in_range; ++i) {
+		local_flush_tlb_page_asid(start, asid);
+		start += stride;
+	}
 }
 
 static inline void local_flush_tlb_range_asid(unsigned long start,
@@ -38,8 +60,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
 {
 	if (size <= stride)
 		local_flush_tlb_page_asid(start, asid);
-	else
+	else if (size == FLUSH_TLB_MAX_SIZE)
 		local_flush_tlb_all_asid(asid);
+	else
+		local_flush_tlb_range_threshold_asid(start, size, stride, asid);
 }
 
 static void __ipi_flush_tlb_all(void *info)
@@ -52,7 +76,7 @@ void flush_tlb_all(void)
 	if (riscv_use_ipi_for_rfence())
 		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
 	else
-		sbi_remote_sfence_vma(NULL, 0, -1);
+		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
 }
 
 struct flush_tlb_range_data {
@@ -69,18 +93,12 @@ static void __ipi_flush_tlb_range_asid(void *info)
 	local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
 }
 
-static void __ipi_flush_tlb_range(void *info)
-{
-	struct flush_tlb_range_data *d = info;
-
-	local_flush_tlb_range(d->start, d->size, d->stride);
-}
-
 static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
 			      unsigned long size, unsigned long stride)
 {
 	struct flush_tlb_range_data ftd;
 	struct cpumask *cmask = mm_cpumask(mm);
+	unsigned long asid = FLUSH_TLB_NO_ASID;
 	unsigned int cpuid;
 	bool broadcast;
 
@@ -90,39 +108,24 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
 	cpuid = get_cpu();
 	/* check if the tlbflush needs to be sent to other CPUs */
 	broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
-	if (static_branch_unlikely(&use_asid_allocator)) {
-		unsigned long asid = atomic_long_read(&mm->context.id) & asid_mask;
-
-		if (broadcast) {
-			if (riscv_use_ipi_for_rfence()) {
-				ftd.asid = asid;
-				ftd.start = start;
-				ftd.size = size;
-				ftd.stride = stride;
-				on_each_cpu_mask(cmask,
-						 __ipi_flush_tlb_range_asid,
-						 &ftd, 1);
-			} else
-				sbi_remote_sfence_vma_asid(cmask,
-							   start, size, asid);
-		} else {
-			local_flush_tlb_range_asid(start, size, stride, asid);
-		}
+
+	if (static_branch_unlikely(&use_asid_allocator))
+		asid = atomic_long_read(&mm->context.id) & asid_mask;
+
+	if (broadcast) {
+		if (riscv_use_ipi_for_rfence()) {
+			ftd.asid = asid;
+			ftd.start = start;
+			ftd.size = size;
+			ftd.stride = stride;
+			on_each_cpu_mask(cmask,
+					 __ipi_flush_tlb_range_asid,
+					 &ftd, 1);
+		} else
+			sbi_remote_sfence_vma_asid(cmask,
+						   start, size, asid);
 	} else {
-		if (broadcast) {
-			if (riscv_use_ipi_for_rfence()) {
-				ftd.asid = 0;
-				ftd.start = start;
-				ftd.size = size;
-				ftd.stride = stride;
-				on_each_cpu_mask(cmask,
-						 __ipi_flush_tlb_range,
-						 &ftd, 1);
-			} else
-				sbi_remote_sfence_vma(cmask, start, size);
-		} else {
-			local_flush_tlb_range(start, size, stride);
-		}
+		local_flush_tlb_range_asid(start, size, stride, asid);
 	}
 
 	put_cpu();
@@ -130,7 +133,7 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	__flush_tlb_range(mm, 0, -1, PAGE_SIZE);
+	__flush_tlb_range(mm, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
 }
 
 void flush_tlb_mm_range(struct mm_struct *mm,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range()
  2023-09-11 13:12 [PATCH v4 0/4] riscv: tlb flush improvements Alexandre Ghiti
                   ` (2 preceding siblings ...)
  2023-09-11 13:12 ` [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb Alexandre Ghiti
@ 2023-09-11 13:12 ` Alexandre Ghiti
  2023-09-13  8:04   ` Alexandre Ghiti
  2023-09-19 12:09   ` Lad, Prabhakar
  3 siblings, 2 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-11 13:12 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Alexandre Ghiti, Andrew Jones

This function used to simply flush the whole tlb of all harts, be more
subtile and try to only flush the range.

The problem is that we can only use PAGE_SIZE as stride since we don't know
the size of the underlying mapping and then this function will be improved
only if the size of the region to flush is < threshold * PAGE_SIZE.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/include/asm/tlbflush.h | 11 ++++++-----
 arch/riscv/mm/tlbflush.c          | 33 ++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 170a49c531c6..8f3418c5f172 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		     unsigned long end);
+void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
@@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	local_flush_tlb_all();
 }
 
-#define flush_tlb_mm(mm) flush_tlb_all()
-#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
-#endif /* !CONFIG_SMP || !CONFIG_MMU */
-
 /* Flush a range of kernel pages */
 static inline void flush_tlb_kernel_range(unsigned long start,
 	unsigned long end)
 {
-	flush_tlb_all();
+	local_flush_tlb_all();
 }
 
+#define flush_tlb_mm(mm) flush_tlb_all()
+#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
+#endif /* !CONFIG_SMP || !CONFIG_MMU */
+
 #endif /* _ASM_RISCV_TLBFLUSH_H */
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 2c1136d73411..28cd8539b575 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -97,19 +97,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
 			      unsigned long size, unsigned long stride)
 {
 	struct flush_tlb_range_data ftd;
-	struct cpumask *cmask = mm_cpumask(mm);
+	struct cpumask *cmask, full_cmask;
 	unsigned long asid = FLUSH_TLB_NO_ASID;
-	unsigned int cpuid;
 	bool broadcast;
 
-	if (cpumask_empty(cmask))
-		return;
+	if (mm) {
+		unsigned int cpuid;
+
+		cmask = mm_cpumask(mm);
+		if (cpumask_empty(cmask))
+			return;
 
-	cpuid = get_cpu();
-	/* check if the tlbflush needs to be sent to other CPUs */
-	broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
+		cpuid = get_cpu();
+		/* check if the tlbflush needs to be sent to other CPUs */
+		broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
+	} else {
+		cpumask_setall(&full_cmask);
+		cmask = &full_cmask;
+		broadcast = true;
+	}
 
-	if (static_branch_unlikely(&use_asid_allocator))
+	if (static_branch_unlikely(&use_asid_allocator) && mm)
 		asid = atomic_long_read(&mm->context.id) & asid_mask;
 
 	if (broadcast) {
@@ -128,7 +136,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
 		local_flush_tlb_range_asid(start, size, stride, asid);
 	}
 
-	put_cpu();
+	if (mm)
+		put_cpu();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
@@ -189,6 +198,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 
 	__flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
 }
+
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	__flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			unsigned long end)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range()
  2023-09-11 13:12 ` [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
@ 2023-09-13  8:04   ` Alexandre Ghiti
  2023-09-13  8:23     ` Lad, Prabhakar
  2023-09-19 12:09   ` Lad, Prabhakar
  1 sibling, 1 reply; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-13  8:04 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Lad Prabhakar
  Cc: Andrew Jones

@Lad, Prabhakar Any chance you give a try to this new patchset? So
that we make sure Samuel found your issue :)

On Mon, Sep 11, 2023 at 3:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This function used to simply flush the whole tlb of all harts, be more
> subtile and try to only flush the range.
>
> The problem is that we can only use PAGE_SIZE as stride since we don't know
> the size of the underlying mapping and then this function will be improved
> only if the size of the region to flush is < threshold * PAGE_SIZE.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/tlbflush.h | 11 ++++++-----
>  arch/riscv/mm/tlbflush.c          | 33 ++++++++++++++++++++++---------
>  2 files changed, 30 insertions(+), 14 deletions(-)
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 170a49c531c6..8f3418c5f172 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                      unsigned long end);
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end);
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> @@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>         local_flush_tlb_all();
>  }
>
> -#define flush_tlb_mm(mm) flush_tlb_all()
> -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> -
>  /* Flush a range of kernel pages */
>  static inline void flush_tlb_kernel_range(unsigned long start,
>         unsigned long end)
>  {
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>  }
>
> +#define flush_tlb_mm(mm) flush_tlb_all()
> +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> +#endif /* !CONFIG_SMP || !CONFIG_MMU */
> +
>  #endif /* _ASM_RISCV_TLBFLUSH_H */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 2c1136d73411..28cd8539b575 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -97,19 +97,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>                               unsigned long size, unsigned long stride)
>  {
>         struct flush_tlb_range_data ftd;
> -       struct cpumask *cmask = mm_cpumask(mm);
> +       struct cpumask *cmask, full_cmask;
>         unsigned long asid = FLUSH_TLB_NO_ASID;
> -       unsigned int cpuid;
>         bool broadcast;
>
> -       if (cpumask_empty(cmask))
> -               return;
> +       if (mm) {
> +               unsigned int cpuid;
> +
> +               cmask = mm_cpumask(mm);
> +               if (cpumask_empty(cmask))
> +                       return;
>
> -       cpuid = get_cpu();
> -       /* check if the tlbflush needs to be sent to other CPUs */
> -       broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> +               cpuid = get_cpu();
> +               /* check if the tlbflush needs to be sent to other CPUs */
> +               broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> +       } else {
> +               cpumask_setall(&full_cmask);
> +               cmask = &full_cmask;
> +               broadcast = true;
> +       }
>
> -       if (static_branch_unlikely(&use_asid_allocator))
> +       if (static_branch_unlikely(&use_asid_allocator) && mm)
>                 asid = atomic_long_read(&mm->context.id) & asid_mask;
>
>         if (broadcast) {
> @@ -128,7 +136,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>                 local_flush_tlb_range_asid(start, size, stride, asid);
>         }
>
> -       put_cpu();
> +       if (mm)
> +               put_cpu();
>  }
>
>  void flush_tlb_mm(struct mm_struct *mm)
> @@ -189,6 +198,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>
>         __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
>  }
> +
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> +       __flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                         unsigned long end)
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range()
  2023-09-13  8:04   ` Alexandre Ghiti
@ 2023-09-13  8:23     ` Lad, Prabhakar
  2023-09-13  8:32       ` Alexandre Ghiti
  0 siblings, 1 reply; 16+ messages in thread
From: Lad, Prabhakar @ 2023-09-13  8:23 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

Hi Alexandre,

On Wed, Sep 13, 2023 at 9:04 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> @Lad, Prabhakar Any chance you give a try to this new patchset? So
> that we make sure Samuel found your issue :)
>
I have given the patches a try and not seen the module load failures
as seen previously. I have some rigorous tests which test the complete
platform. I'm just waiting for it to complete before I give Tested by.

Cheers,
Prabhakar

> On Mon, Sep 11, 2023 at 3:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > This function used to simply flush the whole tlb of all harts, be more
> > subtile and try to only flush the range.
> >
> > The problem is that we can only use PAGE_SIZE as stride since we don't know
> > the size of the underlying mapping and then this function will be improved
> > only if the size of the region to flush is < threshold * PAGE_SIZE.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/include/asm/tlbflush.h | 11 ++++++-----
> >  arch/riscv/mm/tlbflush.c          | 33 ++++++++++++++++++++++---------
> >  2 files changed, 30 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> > index 170a49c531c6..8f3418c5f172 100644
> > --- a/arch/riscv/include/asm/tlbflush.h
> > +++ b/arch/riscv/include/asm/tlbflush.h
> > @@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> >  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
> >  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >                      unsigned long end);
> > +void flush_tlb_kernel_range(unsigned long start, unsigned long end);
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
> >  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > @@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
> >         local_flush_tlb_all();
> >  }
> >
> > -#define flush_tlb_mm(mm) flush_tlb_all()
> > -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> > -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> > -
> >  /* Flush a range of kernel pages */
> >  static inline void flush_tlb_kernel_range(unsigned long start,
> >         unsigned long end)
> >  {
> > -       flush_tlb_all();
> > +       local_flush_tlb_all();
> >  }
> >
> > +#define flush_tlb_mm(mm) flush_tlb_all()
> > +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> > +#endif /* !CONFIG_SMP || !CONFIG_MMU */
> > +
> >  #endif /* _ASM_RISCV_TLBFLUSH_H */
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 2c1136d73411..28cd8539b575 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -97,19 +97,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
> >                               unsigned long size, unsigned long stride)
> >  {
> >         struct flush_tlb_range_data ftd;
> > -       struct cpumask *cmask = mm_cpumask(mm);
> > +       struct cpumask *cmask, full_cmask;
> >         unsigned long asid = FLUSH_TLB_NO_ASID;
> > -       unsigned int cpuid;
> >         bool broadcast;
> >
> > -       if (cpumask_empty(cmask))
> > -               return;
> > +       if (mm) {
> > +               unsigned int cpuid;
> > +
> > +               cmask = mm_cpumask(mm);
> > +               if (cpumask_empty(cmask))
> > +                       return;
> >
> > -       cpuid = get_cpu();
> > -       /* check if the tlbflush needs to be sent to other CPUs */
> > -       broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> > +               cpuid = get_cpu();
> > +               /* check if the tlbflush needs to be sent to other CPUs */
> > +               broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> > +       } else {
> > +               cpumask_setall(&full_cmask);
> > +               cmask = &full_cmask;
> > +               broadcast = true;
> > +       }
> >
> > -       if (static_branch_unlikely(&use_asid_allocator))
> > +       if (static_branch_unlikely(&use_asid_allocator) && mm)
> >                 asid = atomic_long_read(&mm->context.id) & asid_mask;
> >
> >         if (broadcast) {
> > @@ -128,7 +136,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
> >                 local_flush_tlb_range_asid(start, size, stride, asid);
> >         }
> >
> > -       put_cpu();
> > +       if (mm)
> > +               put_cpu();
> >  }
> >
> >  void flush_tlb_mm(struct mm_struct *mm)
> > @@ -189,6 +198,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >
> >         __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
> >  }
> > +
> > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > +{
> > +       __flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
> > +}
> > +
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >                         unsigned long end)
> > --
> > 2.39.2
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range()
  2023-09-13  8:23     ` Lad, Prabhakar
@ 2023-09-13  8:32       ` Alexandre Ghiti
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-09-13  8:32 UTC (permalink / raw)
  To: Lad, Prabhakar
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

On Wed, Sep 13, 2023 at 10:24 AM Lad, Prabhakar
<prabhakar.csengg@gmail.com> wrote:
>
> Hi Alexandre,
>
> On Wed, Sep 13, 2023 at 9:04 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > @Lad, Prabhakar Any chance you give a try to this new patchset? So
> > that we make sure Samuel found your issue :)
> >
> I have given the patches a try and not seen the module load failures
> as seen previously. I have some rigorous tests which test the complete
> platform. I'm just waiting for it to complete before I give Tested by.
>

Awesome, thanks for the update! Well done @Samuel Holland

> Cheers,
> Prabhakar
>
> > On Mon, Sep 11, 2023 at 3:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >
> > > This function used to simply flush the whole tlb of all harts, be more
> > > subtile and try to only flush the range.
> > >
> > > The problem is that we can only use PAGE_SIZE as stride since we don't know
> > > the size of the underlying mapping and then this function will be improved
> > > only if the size of the region to flush is < threshold * PAGE_SIZE.
> > >
> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> > > ---
> > >  arch/riscv/include/asm/tlbflush.h | 11 ++++++-----
> > >  arch/riscv/mm/tlbflush.c          | 33 ++++++++++++++++++++++---------
> > >  2 files changed, 30 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> > > index 170a49c531c6..8f3418c5f172 100644
> > > --- a/arch/riscv/include/asm/tlbflush.h
> > > +++ b/arch/riscv/include/asm/tlbflush.h
> > > @@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> > >  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
> > >  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > >                      unsigned long end);
> > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end);
> > >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > >  #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
> > >  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > > @@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
> > >         local_flush_tlb_all();
> > >  }
> > >
> > > -#define flush_tlb_mm(mm) flush_tlb_all()
> > > -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> > > -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> > > -
> > >  /* Flush a range of kernel pages */
> > >  static inline void flush_tlb_kernel_range(unsigned long start,
> > >         unsigned long end)
> > >  {
> > > -       flush_tlb_all();
> > > +       local_flush_tlb_all();
> > >  }
> > >
> > > +#define flush_tlb_mm(mm) flush_tlb_all()
> > > +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> > > +#endif /* !CONFIG_SMP || !CONFIG_MMU */
> > > +
> > >  #endif /* _ASM_RISCV_TLBFLUSH_H */
> > > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > > index 2c1136d73411..28cd8539b575 100644
> > > --- a/arch/riscv/mm/tlbflush.c
> > > +++ b/arch/riscv/mm/tlbflush.c
> > > @@ -97,19 +97,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
> > >                               unsigned long size, unsigned long stride)
> > >  {
> > >         struct flush_tlb_range_data ftd;
> > > -       struct cpumask *cmask = mm_cpumask(mm);
> > > +       struct cpumask *cmask, full_cmask;
> > >         unsigned long asid = FLUSH_TLB_NO_ASID;
> > > -       unsigned int cpuid;
> > >         bool broadcast;
> > >
> > > -       if (cpumask_empty(cmask))
> > > -               return;
> > > +       if (mm) {
> > > +               unsigned int cpuid;
> > > +
> > > +               cmask = mm_cpumask(mm);
> > > +               if (cpumask_empty(cmask))
> > > +                       return;
> > >
> > > -       cpuid = get_cpu();
> > > -       /* check if the tlbflush needs to be sent to other CPUs */
> > > -       broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> > > +               cpuid = get_cpu();
> > > +               /* check if the tlbflush needs to be sent to other CPUs */
> > > +               broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> > > +       } else {
> > > +               cpumask_setall(&full_cmask);
> > > +               cmask = &full_cmask;
> > > +               broadcast = true;
> > > +       }
> > >
> > > -       if (static_branch_unlikely(&use_asid_allocator))
> > > +       if (static_branch_unlikely(&use_asid_allocator) && mm)
> > >                 asid = atomic_long_read(&mm->context.id) & asid_mask;
> > >
> > >         if (broadcast) {
> > > @@ -128,7 +136,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
> > >                 local_flush_tlb_range_asid(start, size, stride, asid);
> > >         }
> > >
> > > -       put_cpu();
> > > +       if (mm)
> > > +               put_cpu();
> > >  }
> > >
> > >  void flush_tlb_mm(struct mm_struct *mm)
> > > @@ -189,6 +198,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > >
> > >         __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
> > >  }
> > > +
> > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > +{
> > > +       __flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
> > > +}
> > > +
> > >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > >  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > >                         unsigned long end)
> > > --
> > > 2.39.2
> > >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] riscv: Improve flush_tlb()
  2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
@ 2023-09-19 12:07   ` Lad, Prabhakar
  2023-10-09 17:53   ` Samuel Holland
  1 sibling, 0 replies; 16+ messages in thread
From: Lad, Prabhakar @ 2023-09-19 12:07 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

On Mon, Sep 11, 2023 at 2:13 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> For now, flush_tlb() simply calls flush_tlb_mm() which results in a
> flush of the whole TLB. So let's use mmu_gather fields to provide a more
> fine-grained flush of the TLB.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/tlb.h      | 8 +++++++-
>  arch/riscv/include/asm/tlbflush.h | 3 +++
>  arch/riscv/mm/tlbflush.c          | 7 +++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
On RZ/Five SMARC

Cheers,
Prabhakar

> diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
> index 120bcf2ed8a8..1eb5682b2af6 100644
> --- a/arch/riscv/include/asm/tlb.h
> +++ b/arch/riscv/include/asm/tlb.h
> @@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb);
>
>  static inline void tlb_flush(struct mmu_gather *tlb)
>  {
> -       flush_tlb_mm(tlb->mm);
> +#ifdef CONFIG_MMU
> +       if (tlb->fullmm || tlb->need_flush_all)
> +               flush_tlb_mm(tlb->mm);
> +       else
> +               flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
> +                                  tlb_get_unmap_size(tlb));
> +#endif
>  }
>
>  #endif /* _ASM_RISCV_TLB_H */
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index a09196f8de68..f5c4fb0ae642 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  #if defined(CONFIG_SMP) && defined(CONFIG_MMU)
>  void flush_tlb_all(void);
>  void flush_tlb_mm(struct mm_struct *mm);
> +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> +                       unsigned long end, unsigned int page_size);
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                      unsigned long end);
> @@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>  }
>
>  #define flush_tlb_mm(mm) flush_tlb_all()
> +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
>  #endif /* !CONFIG_SMP || !CONFIG_MMU */
>
>  /* Flush a range of kernel pages */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 77be59aadc73..fa03289853d8 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm)
>         __flush_tlb_range(mm, 0, -1, PAGE_SIZE);
>  }
>
> +void flush_tlb_mm_range(struct mm_struct *mm,
> +                       unsigned long start, unsigned long end,
> +                       unsigned int page_size)
> +{
> +       __flush_tlb_range(mm, start, end - start, page_size);
> +}
> +
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>  {
>         __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages
  2023-09-11 13:12 ` [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
@ 2023-09-19 12:07   ` Lad, Prabhakar
  2023-10-09 17:53   ` Samuel Holland
  1 sibling, 0 replies; 16+ messages in thread
From: Lad, Prabhakar @ 2023-09-19 12:07 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

On Mon, Sep 11, 2023 at 2:14 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
> when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
> whole tlb: so set a stride of the size of the hugetlb mapping in order to
> only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
> all PTEs that constitute this mapping must be invalidated, so the stride
> size must actually be the size of the PTE.
>
> Note that THPs are directly handled by flush_pmd_tlb_range().
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/mm/tlbflush.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
On RZ/Five SMARC

Cheers,
Prabhakar

> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index fa03289853d8..5bda6d4fed90 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -3,6 +3,7 @@
>  #include <linux/mm.h>
>  #include <linux/smp.h>
>  #include <linux/sched.h>
> +#include <linux/hugetlb.h>
>  #include <asm/sbi.h>
>  #include <asm/mmu_context.h>
>
> @@ -147,7 +148,43 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                      unsigned long end)
>  {
> -       __flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE);
> +       unsigned long stride_size;
> +
> +       stride_size = is_vm_hugetlb_page(vma) ?
> +                               huge_page_size(hstate_vma(vma)) :
> +                               PAGE_SIZE;
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +       /*
> +        * As stated in the privileged specification, every PTE in a NAPOT
> +        * region must be invalidated, so reset the stride in that case.
> +        */
> +       if (has_svnapot()) {
> +               unsigned long order, napot_size;
> +
> +               for_each_napot_order(order) {
> +                       napot_size = napot_cont_size(order);
> +
> +                       if (stride_size != napot_size)
> +                               continue;
> +
> +                       if (napot_size >= PGDIR_SIZE)
> +                               stride_size = PGDIR_SIZE;
> +                       else if (napot_size >= P4D_SIZE)
> +                               stride_size = P4D_SIZE;
> +                       else if (napot_size >= PUD_SIZE)
> +                               stride_size = PUD_SIZE;
> +                       else if (napot_size >= PMD_SIZE)
> +                               stride_size = PMD_SIZE;
> +                       else
> +                               stride_size = PAGE_SIZE;
> +
> +                       break;
> +               }
> +       }
> +#endif
> +
> +       __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
>  }
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
  2023-09-11 13:12 ` [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb Alexandre Ghiti
@ 2023-09-19 12:09   ` Lad, Prabhakar
  0 siblings, 0 replies; 16+ messages in thread
From: Lad, Prabhakar @ 2023-09-19 12:09 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

On Mon, Sep 11, 2023 at 2:15 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Currently, when the range to flush covers more than one page (a 4K page or
> a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
> tlb comes with a greater cost than flushing a single entry so we should
> flush single entries up to a certain threshold so that:
> threshold * cost of flushing a single entry < cost of flushing the whole
> tlb.
>
> Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/sbi.h      |   3 -
>  arch/riscv/include/asm/tlbflush.h |   3 +
>  arch/riscv/kernel/sbi.c           |  32 +++------
>  arch/riscv/mm/tlbflush.c          | 115 +++++++++++++++---------------
>  4 files changed, 72 insertions(+), 81 deletions(-)
>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
On RZ/Five SMARC

Cheers,
Prabhakar

> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 5b4a1bf5f439..b79d0228144f 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -273,9 +273,6 @@ void sbi_set_timer(uint64_t stime_value);
>  void sbi_shutdown(void);
>  void sbi_send_ipi(unsigned int cpu);
>  int sbi_remote_fence_i(const struct cpumask *cpu_mask);
> -int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
> -                          unsigned long start,
> -                          unsigned long size);
>
>  int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
>                                 unsigned long start,
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index f5c4fb0ae642..170a49c531c6 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -11,6 +11,9 @@
>  #include <asm/smp.h>
>  #include <asm/errata_list.h>
>
> +#define FLUSH_TLB_MAX_SIZE      ((unsigned long)-1)
> +#define FLUSH_TLB_NO_ASID       ((unsigned long)-1)
> +
>  #ifdef CONFIG_MMU
>  extern unsigned long asid_mask;
>
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index c672c8ba9a2a..5a62ed1da453 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -11,6 +11,7 @@
>  #include <linux/reboot.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
> +#include <asm/tlbflush.h>
>
>  /* default SBI version is 0.1 */
>  unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT;
> @@ -376,32 +377,15 @@ int sbi_remote_fence_i(const struct cpumask *cpu_mask)
>  }
>  EXPORT_SYMBOL(sbi_remote_fence_i);
>
> -/**
> - * sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote
> - *                          harts for the specified virtual address range.
> - * @cpu_mask: A cpu mask containing all the target harts.
> - * @start: Start of the virtual address
> - * @size: Total size of the virtual address range.
> - *
> - * Return: 0 on success, appropriate linux error code otherwise.
> - */
> -int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
> -                          unsigned long start,
> -                          unsigned long size)
> -{
> -       return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
> -                           cpu_mask, start, size, 0, 0);
> -}
> -EXPORT_SYMBOL(sbi_remote_sfence_vma);
> -
>  /**
>   * sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given
> - * remote harts for a virtual address range belonging to a specific ASID.
> + * remote harts for a virtual address range belonging to a specific ASID or not.
>   *
>   * @cpu_mask: A cpu mask containing all the target harts.
>   * @start: Start of the virtual address
>   * @size: Total size of the virtual address range.
> - * @asid: The value of address space identifier (ASID).
> + * @asid: The value of address space identifier (ASID), or FLUSH_TLB_NO_ASID
> + * for flushing all address spaces.
>   *
>   * Return: 0 on success, appropriate linux error code otherwise.
>   */
> @@ -410,8 +394,12 @@ int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
>                                 unsigned long size,
>                                 unsigned long asid)
>  {
> -       return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
> -                           cpu_mask, start, size, asid, 0);
> +       if (asid == FLUSH_TLB_NO_ASID)
> +               return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
> +                                   cpu_mask, start, size, 0, 0);
> +       else
> +               return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
> +                                   cpu_mask, start, size, asid, 0);
>  }
>  EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
>
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 5bda6d4fed90..2c1136d73411 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -9,28 +9,50 @@
>
>  static inline void local_flush_tlb_all_asid(unsigned long asid)
>  {
> -       __asm__ __volatile__ ("sfence.vma x0, %0"
> -                       :
> -                       : "r" (asid)
> -                       : "memory");
> +       if (asid != FLUSH_TLB_NO_ASID)
> +               __asm__ __volatile__ ("sfence.vma x0, %0"
> +                               :
> +                               : "r" (asid)
> +                               : "memory");
> +       else
> +               local_flush_tlb_all();
>  }
>
>  static inline void local_flush_tlb_page_asid(unsigned long addr,
>                 unsigned long asid)
>  {
> -       __asm__ __volatile__ ("sfence.vma %0, %1"
> -                       :
> -                       : "r" (addr), "r" (asid)
> -                       : "memory");
> +       if (asid != FLUSH_TLB_NO_ASID)
> +               __asm__ __volatile__ ("sfence.vma %0, %1"
> +                               :
> +                               : "r" (addr), "r" (asid)
> +                               : "memory");
> +       else
> +               local_flush_tlb_page(addr);
>  }
>
> -static inline void local_flush_tlb_range(unsigned long start,
> -               unsigned long size, unsigned long stride)
> +/*
> + * Flush entire TLB if number of entries to be flushed is greater
> + * than the threshold below.
> + */
> +static unsigned long tlb_flush_all_threshold __read_mostly = 64;
> +
> +static void local_flush_tlb_range_threshold_asid(unsigned long start,
> +                                                unsigned long size,
> +                                                unsigned long stride,
> +                                                unsigned long asid)
>  {
> -       if (size <= stride)
> -               local_flush_tlb_page(start);
> -       else
> -               local_flush_tlb_all();
> +       u16 nr_ptes_in_range = DIV_ROUND_UP(size, stride);
> +       int i;
> +
> +       if (nr_ptes_in_range > tlb_flush_all_threshold) {
> +               local_flush_tlb_all_asid(asid);
> +               return;
> +       }
> +
> +       for (i = 0; i < nr_ptes_in_range; ++i) {
> +               local_flush_tlb_page_asid(start, asid);
> +               start += stride;
> +       }
>  }
>
>  static inline void local_flush_tlb_range_asid(unsigned long start,
> @@ -38,8 +60,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
>  {
>         if (size <= stride)
>                 local_flush_tlb_page_asid(start, asid);
> -       else
> +       else if (size == FLUSH_TLB_MAX_SIZE)
>                 local_flush_tlb_all_asid(asid);
> +       else
> +               local_flush_tlb_range_threshold_asid(start, size, stride, asid);
>  }
>
>  static void __ipi_flush_tlb_all(void *info)
> @@ -52,7 +76,7 @@ void flush_tlb_all(void)
>         if (riscv_use_ipi_for_rfence())
>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>         else
> -               sbi_remote_sfence_vma(NULL, 0, -1);
> +               sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>  }
>
>  struct flush_tlb_range_data {
> @@ -69,18 +93,12 @@ static void __ipi_flush_tlb_range_asid(void *info)
>         local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
>  }
>
> -static void __ipi_flush_tlb_range(void *info)
> -{
> -       struct flush_tlb_range_data *d = info;
> -
> -       local_flush_tlb_range(d->start, d->size, d->stride);
> -}
> -
>  static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>                               unsigned long size, unsigned long stride)
>  {
>         struct flush_tlb_range_data ftd;
>         struct cpumask *cmask = mm_cpumask(mm);
> +       unsigned long asid = FLUSH_TLB_NO_ASID;
>         unsigned int cpuid;
>         bool broadcast;
>
> @@ -90,39 +108,24 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>         cpuid = get_cpu();
>         /* check if the tlbflush needs to be sent to other CPUs */
>         broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> -       if (static_branch_unlikely(&use_asid_allocator)) {
> -               unsigned long asid = atomic_long_read(&mm->context.id) & asid_mask;
> -
> -               if (broadcast) {
> -                       if (riscv_use_ipi_for_rfence()) {
> -                               ftd.asid = asid;
> -                               ftd.start = start;
> -                               ftd.size = size;
> -                               ftd.stride = stride;
> -                               on_each_cpu_mask(cmask,
> -                                                __ipi_flush_tlb_range_asid,
> -                                                &ftd, 1);
> -                       } else
> -                               sbi_remote_sfence_vma_asid(cmask,
> -                                                          start, size, asid);
> -               } else {
> -                       local_flush_tlb_range_asid(start, size, stride, asid);
> -               }
> +
> +       if (static_branch_unlikely(&use_asid_allocator))
> +               asid = atomic_long_read(&mm->context.id) & asid_mask;
> +
> +       if (broadcast) {
> +               if (riscv_use_ipi_for_rfence()) {
> +                       ftd.asid = asid;
> +                       ftd.start = start;
> +                       ftd.size = size;
> +                       ftd.stride = stride;
> +                       on_each_cpu_mask(cmask,
> +                                        __ipi_flush_tlb_range_asid,
> +                                        &ftd, 1);
> +               } else
> +                       sbi_remote_sfence_vma_asid(cmask,
> +                                                  start, size, asid);
>         } else {
> -               if (broadcast) {
> -                       if (riscv_use_ipi_for_rfence()) {
> -                               ftd.asid = 0;
> -                               ftd.start = start;
> -                               ftd.size = size;
> -                               ftd.stride = stride;
> -                               on_each_cpu_mask(cmask,
> -                                                __ipi_flush_tlb_range,
> -                                                &ftd, 1);
> -                       } else
> -                               sbi_remote_sfence_vma(cmask, start, size);
> -               } else {
> -                       local_flush_tlb_range(start, size, stride);
> -               }
> +               local_flush_tlb_range_asid(start, size, stride, asid);
>         }
>
>         put_cpu();
> @@ -130,7 +133,7 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>
>  void flush_tlb_mm(struct mm_struct *mm)
>  {
> -       __flush_tlb_range(mm, 0, -1, PAGE_SIZE);
> +       __flush_tlb_range(mm, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
>  }
>
>  void flush_tlb_mm_range(struct mm_struct *mm,
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range()
  2023-09-11 13:12 ` [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
  2023-09-13  8:04   ` Alexandre Ghiti
@ 2023-09-19 12:09   ` Lad, Prabhakar
  1 sibling, 0 replies; 16+ messages in thread
From: Lad, Prabhakar @ 2023-09-19 12:09 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Mayuresh Chitale, Vincent Chen, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-arch, linux-mm, linux-riscv,
	linux-kernel, Samuel Holland, Andrew Jones

On Mon, Sep 11, 2023 at 2:16 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> This function used to simply flush the whole tlb of all harts, be more
> subtile and try to only flush the range.
>
> The problem is that we can only use PAGE_SIZE as stride since we don't know
> the size of the underlying mapping and then this function will be improved
> only if the size of the region to flush is < threshold * PAGE_SIZE.
>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/tlbflush.h | 11 ++++++-----
>  arch/riscv/mm/tlbflush.c          | 33 ++++++++++++++++++++++---------
>  2 files changed, 30 insertions(+), 14 deletions(-)
>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
On RZ/Five SMARC

Cheers,
Prabhakar

> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 170a49c531c6..8f3418c5f172 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                      unsigned long end);
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end);
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> @@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>         local_flush_tlb_all();
>  }
>
> -#define flush_tlb_mm(mm) flush_tlb_all()
> -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> -
>  /* Flush a range of kernel pages */
>  static inline void flush_tlb_kernel_range(unsigned long start,
>         unsigned long end)
>  {
> -       flush_tlb_all();
> +       local_flush_tlb_all();
>  }
>
> +#define flush_tlb_mm(mm) flush_tlb_all()
> +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> +#endif /* !CONFIG_SMP || !CONFIG_MMU */
> +
>  #endif /* _ASM_RISCV_TLBFLUSH_H */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 2c1136d73411..28cd8539b575 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -97,19 +97,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>                               unsigned long size, unsigned long stride)
>  {
>         struct flush_tlb_range_data ftd;
> -       struct cpumask *cmask = mm_cpumask(mm);
> +       struct cpumask *cmask, full_cmask;
>         unsigned long asid = FLUSH_TLB_NO_ASID;
> -       unsigned int cpuid;
>         bool broadcast;
>
> -       if (cpumask_empty(cmask))
> -               return;
> +       if (mm) {
> +               unsigned int cpuid;
> +
> +               cmask = mm_cpumask(mm);
> +               if (cpumask_empty(cmask))
> +                       return;
>
> -       cpuid = get_cpu();
> -       /* check if the tlbflush needs to be sent to other CPUs */
> -       broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> +               cpuid = get_cpu();
> +               /* check if the tlbflush needs to be sent to other CPUs */
> +               broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
> +       } else {
> +               cpumask_setall(&full_cmask);
> +               cmask = &full_cmask;
> +               broadcast = true;
> +       }
>
> -       if (static_branch_unlikely(&use_asid_allocator))
> +       if (static_branch_unlikely(&use_asid_allocator) && mm)
>                 asid = atomic_long_read(&mm->context.id) & asid_mask;
>
>         if (broadcast) {
> @@ -128,7 +136,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
>                 local_flush_tlb_range_asid(start, size, stride, asid);
>         }
>
> -       put_cpu();
> +       if (mm)
> +               put_cpu();
>  }
>
>  void flush_tlb_mm(struct mm_struct *mm)
> @@ -189,6 +198,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>
>         __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
>  }
> +
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> +       __flush_tlb_range(NULL, start, end - start, PAGE_SIZE);
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>                         unsigned long end)
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] riscv: Improve flush_tlb()
  2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
  2023-09-19 12:07   ` Lad, Prabhakar
@ 2023-10-09 17:53   ` Samuel Holland
  2023-10-18 11:26     ` Alexandre Ghiti
  1 sibling, 1 reply; 16+ messages in thread
From: Samuel Holland @ 2023-10-09 17:53 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andrew Jones, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Mayuresh Chitale, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-arch, linux-mm,
	linux-riscv, linux-kernel, Samuel Holland, Lad Prabhakar

On 2023-09-11 8:12 AM, Alexandre Ghiti wrote:
> For now, flush_tlb() simply calls flush_tlb_mm() which results in a

s/flush_tlb/tlb_flush/ here and in the subject.

Otherwise:
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>

> flush of the whole TLB. So let's use mmu_gather fields to provide a more
> fine-grained flush of the TLB.
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/tlb.h      | 8 +++++++-
>  arch/riscv/include/asm/tlbflush.h | 3 +++
>  arch/riscv/mm/tlbflush.c          | 7 +++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
> index 120bcf2ed8a8..1eb5682b2af6 100644
> --- a/arch/riscv/include/asm/tlb.h
> +++ b/arch/riscv/include/asm/tlb.h
> @@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb);
>  
>  static inline void tlb_flush(struct mmu_gather *tlb)
>  {
> -	flush_tlb_mm(tlb->mm);
> +#ifdef CONFIG_MMU
> +	if (tlb->fullmm || tlb->need_flush_all)
> +		flush_tlb_mm(tlb->mm);
> +	else
> +		flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
> +				   tlb_get_unmap_size(tlb));
> +#endif
>  }
>  
>  #endif /* _ASM_RISCV_TLB_H */
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index a09196f8de68..f5c4fb0ae642 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  #if defined(CONFIG_SMP) && defined(CONFIG_MMU)
>  void flush_tlb_all(void);
>  void flush_tlb_mm(struct mm_struct *mm);
> +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> +			unsigned long end, unsigned int page_size);
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>  		     unsigned long end);
> @@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>  }
>  
>  #define flush_tlb_mm(mm) flush_tlb_all()
> +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
>  #endif /* !CONFIG_SMP || !CONFIG_MMU */
>  
>  /* Flush a range of kernel pages */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 77be59aadc73..fa03289853d8 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm)
>  	__flush_tlb_range(mm, 0, -1, PAGE_SIZE);
>  }
>  
> +void flush_tlb_mm_range(struct mm_struct *mm,
> +			unsigned long start, unsigned long end,
> +			unsigned int page_size)
> +{
> +	__flush_tlb_range(mm, start, end - start, page_size);
> +}
> +
>  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>  {
>  	__flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages
  2023-09-11 13:12 ` [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
  2023-09-19 12:07   ` Lad, Prabhakar
@ 2023-10-09 17:53   ` Samuel Holland
  2023-10-18 11:32     ` Alexandre Ghiti
  1 sibling, 1 reply; 16+ messages in thread
From: Samuel Holland @ 2023-10-09 17:53 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andrew Jones, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Mayuresh Chitale, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-arch, linux-mm,
	linux-riscv, linux-kernel, Samuel Holland, Lad Prabhakar

Hi Alex,

On 2023-09-11 8:12 AM, Alexandre Ghiti wrote:
> flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
> when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
> whole tlb: so set a stride of the size of the hugetlb mapping in order to
> only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
> all PTEs that constitute this mapping must be invalidated, so the stride
> size must actually be the size of the PTE.
> 
> Note that THPs are directly handled by flush_pmd_tlb_range().
> 
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/mm/tlbflush.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index fa03289853d8..5bda6d4fed90 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -3,6 +3,7 @@
>  #include <linux/mm.h>
>  #include <linux/smp.h>
>  #include <linux/sched.h>
> +#include <linux/hugetlb.h>
>  #include <asm/sbi.h>
>  #include <asm/mmu_context.h>
>  
> @@ -147,7 +148,43 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>  		     unsigned long end)
>  {
> -	__flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE);
> +	unsigned long stride_size;
> +
> +	stride_size = is_vm_hugetlb_page(vma) ?
> +				huge_page_size(hstate_vma(vma)) :
> +				PAGE_SIZE;
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +	/*
> +	 * As stated in the privileged specification, every PTE in a NAPOT
> +	 * region must be invalidated, so reset the stride in that case.
> +	 */
> +	if (has_svnapot()) {

This whole block should probably go inside the is_vm_hugetlb_page(vma) check,
since we have to perform that check anyway.

> +		unsigned long order, napot_size;
> +
> +		for_each_napot_order(order) {
> +			napot_size = napot_cont_size(order);
> +
> +			if (stride_size != napot_size)
> +				continue;
> +
> +			if (napot_size >= PGDIR_SIZE)

Can you check stride_size here directly, and drop the loop? We should be able to
assume that the huge page size is valid. Non-NAPOT hugepages will hit one of the
equal-to cases below, which is fine.

Regards,
Samuel

> +				stride_size = PGDIR_SIZE;
> +			else if (napot_size >= P4D_SIZE)
> +				stride_size = P4D_SIZE;
> +			else if (napot_size >= PUD_SIZE)
> +				stride_size = PUD_SIZE;
> +			else if (napot_size >= PMD_SIZE)
> +				stride_size = PMD_SIZE;
> +			else
> +				stride_size = PAGE_SIZE;
> +
> +			break;
> +		}
> +	}
> +#endif
> +
> +	__flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
>  }
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] riscv: Improve flush_tlb()
  2023-10-09 17:53   ` Samuel Holland
@ 2023-10-18 11:26     ` Alexandre Ghiti
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-10-18 11:26 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Jones, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Mayuresh Chitale, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-arch, linux-mm,
	linux-riscv, linux-kernel, Samuel Holland, Lad Prabhakar

Hi Samuel,

On Mon, Oct 9, 2023 at 7:53 PM Samuel Holland <samuel.holland@sifive.com> wrote:
>
> On 2023-09-11 8:12 AM, Alexandre Ghiti wrote:
> > For now, flush_tlb() simply calls flush_tlb_mm() which results in a
>
> s/flush_tlb/tlb_flush/ here and in the subject.
>
> Otherwise:
> Reviewed-by: Samuel Holland <samuel.holland@sifive.com>

Ahah good catch, thanks for that and the RB!

Alex

>
> > flush of the whole TLB. So let's use mmu_gather fields to provide a more
> > fine-grained flush of the TLB.
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/include/asm/tlb.h      | 8 +++++++-
> >  arch/riscv/include/asm/tlbflush.h | 3 +++
> >  arch/riscv/mm/tlbflush.c          | 7 +++++++
> >  3 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
> > index 120bcf2ed8a8..1eb5682b2af6 100644
> > --- a/arch/riscv/include/asm/tlb.h
> > +++ b/arch/riscv/include/asm/tlb.h
> > @@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb);
> >
> >  static inline void tlb_flush(struct mmu_gather *tlb)
> >  {
> > -     flush_tlb_mm(tlb->mm);
> > +#ifdef CONFIG_MMU
> > +     if (tlb->fullmm || tlb->need_flush_all)
> > +             flush_tlb_mm(tlb->mm);
> > +     else
> > +             flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
> > +                                tlb_get_unmap_size(tlb));
> > +#endif
> >  }
> >
> >  #endif /* _ASM_RISCV_TLB_H */
> > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> > index a09196f8de68..f5c4fb0ae642 100644
> > --- a/arch/riscv/include/asm/tlbflush.h
> > +++ b/arch/riscv/include/asm/tlbflush.h
> > @@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr)
> >  #if defined(CONFIG_SMP) && defined(CONFIG_MMU)
> >  void flush_tlb_all(void);
> >  void flush_tlb_mm(struct mm_struct *mm);
> > +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> > +                     unsigned long end, unsigned int page_size);
> >  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr);
> >  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >                    unsigned long end);
> > @@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
> >  }
> >
> >  #define flush_tlb_mm(mm) flush_tlb_all()
> > +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> >  #endif /* !CONFIG_SMP || !CONFIG_MMU */
> >
> >  /* Flush a range of kernel pages */
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 77be59aadc73..fa03289853d8 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm)
> >       __flush_tlb_range(mm, 0, -1, PAGE_SIZE);
> >  }
> >
> > +void flush_tlb_mm_range(struct mm_struct *mm,
> > +                     unsigned long start, unsigned long end,
> > +                     unsigned int page_size)
> > +{
> > +     __flush_tlb_range(mm, start, end - start, page_size);
> > +}
> > +
> >  void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
> >  {
> >       __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages
  2023-10-09 17:53   ` Samuel Holland
@ 2023-10-18 11:32     ` Alexandre Ghiti
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandre Ghiti @ 2023-10-18 11:32 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Jones, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Mayuresh Chitale, Vincent Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-arch, linux-mm,
	linux-riscv, linux-kernel, Samuel Holland, Lad Prabhakar

On Mon, Oct 9, 2023 at 7:53 PM Samuel Holland <samuel.holland@sifive.com> wrote:
>
> Hi Alex,
>
> On 2023-09-11 8:12 AM, Alexandre Ghiti wrote:
> > flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
> > when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
> > whole tlb: so set a stride of the size of the hugetlb mapping in order to
> > only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
> > all PTEs that constitute this mapping must be invalidated, so the stride
> > size must actually be the size of the PTE.
> >
> > Note that THPs are directly handled by flush_pmd_tlb_range().
> >
> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/mm/tlbflush.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index fa03289853d8..5bda6d4fed90 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -3,6 +3,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/smp.h>
> >  #include <linux/sched.h>
> > +#include <linux/hugetlb.h>
> >  #include <asm/sbi.h>
> >  #include <asm/mmu_context.h>
> >
> > @@ -147,7 +148,43 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
> >  void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >                    unsigned long end)
> >  {
> > -     __flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE);
> > +     unsigned long stride_size;
> > +
> > +     stride_size = is_vm_hugetlb_page(vma) ?
> > +                             huge_page_size(hstate_vma(vma)) :
> > +                             PAGE_SIZE;
> > +
> > +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> > +     /*
> > +      * As stated in the privileged specification, every PTE in a NAPOT
> > +      * region must be invalidated, so reset the stride in that case.
> > +      */
> > +     if (has_svnapot()) {
>
> This whole block should probably go inside the is_vm_hugetlb_page(vma) check,
> since we have to perform that check anyway.

Yes, you're right.

>
> > +             unsigned long order, napot_size;
> > +
> > +             for_each_napot_order(order) {
> > +                     napot_size = napot_cont_size(order);
> > +
> > +                     if (stride_size != napot_size)
> > +                             continue;
> > +
> > +                     if (napot_size >= PGDIR_SIZE)
>
> Can you check stride_size here directly, and drop the loop? We should be able to
> assume that the huge page size is valid. Non-NAPOT hugepages will hit one of the
> equal-to cases below, which is fine.

Yes, again, you're right.

I'll respin a new version now, let it go through our CI and send it tomorrow,

Thanks,

Alex

>
> Regards,
> Samuel
>
> > +                             stride_size = PGDIR_SIZE;
> > +                     else if (napot_size >= P4D_SIZE)
> > +                             stride_size = P4D_SIZE;
> > +                     else if (napot_size >= PUD_SIZE)
> > +                             stride_size = PUD_SIZE;
> > +                     else if (napot_size >= PMD_SIZE)
> > +                             stride_size = PMD_SIZE;
> > +                     else
> > +                             stride_size = PAGE_SIZE;
> > +
> > +                     break;
> > +             }
> > +     }
> > +#endif
> > +
> > +     __flush_tlb_range(vma->vm_mm, start, end - start, stride_size);
> >  }
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-10-18 11:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 13:12 [PATCH v4 0/4] riscv: tlb flush improvements Alexandre Ghiti
2023-09-11 13:12 ` [PATCH v4 1/4] riscv: Improve flush_tlb() Alexandre Ghiti
2023-09-19 12:07   ` Lad, Prabhakar
2023-10-09 17:53   ` Samuel Holland
2023-10-18 11:26     ` Alexandre Ghiti
2023-09-11 13:12 ` [PATCH v4 2/4] riscv: Improve flush_tlb_range() for hugetlb pages Alexandre Ghiti
2023-09-19 12:07   ` Lad, Prabhakar
2023-10-09 17:53   ` Samuel Holland
2023-10-18 11:32     ` Alexandre Ghiti
2023-09-11 13:12 ` [PATCH v4 3/4] riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb Alexandre Ghiti
2023-09-19 12:09   ` Lad, Prabhakar
2023-09-11 13:12 ` [PATCH v4 4/4] riscv: Improve flush_tlb_kernel_range() Alexandre Ghiti
2023-09-13  8:04   ` Alexandre Ghiti
2023-09-13  8:23     ` Lad, Prabhakar
2023-09-13  8:32       ` Alexandre Ghiti
2023-09-19 12:09   ` Lad, Prabhakar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).