All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements
@ 2024-03-27  4:49 ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

This series converts uniprocessor kernel builds to use the same TLB
flushing code as SMP builds, to take advantage of batching and existing
range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
SBI calls based on the online CPU count, which also covers the scenario
where SMP was enabled at build time but only one CPU is present/online.
A final optimization is to use single-ASID flushes wherever possible, to
avoid unnecessary TLB misses for kernel mappings.

This series has a semantic conflict with the AIA patches that are in
linux-next due to the removal of the third parameter of
riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
the extra argument from the call site.

Here are some numbers from D1 which show the performance impact:

v6.9-rc1:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        198.5     46.2
 File Copy 1024 bufsize 2000 maxblocks          3960.0      73934.4    186.7
 File Copy 256 bufsize 500 maxblocks            1655.0      20242.6    122.3
 File Copy 4096 bufsize 8000 maxblocks          5800.0     197706.4    340.9
 Pipe Throughput                               12440.0     176974.2    142.3
 Pipe-based Context Switching                   4000.0      23626.8     59.1
 Process Creation                                126.0        449.9     35.7
 Shell Scripts (1 concurrent)                     42.4        544.4    128.4
 Shell Scripts (16 concurrent)                     ---         35.3      ---
 Shell Scripts (8 concurrent)                      6.0         71.6    119.3
 System Call Overhead                          15000.0     248072.6    165.4
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          110.6

v6.9-rc1 + this patch series:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        196.8     45.8
 File Copy 1024 bufsize 2000 maxblocks          3960.0      71782.2    181.3
 File Copy 256 bufsize 500 maxblocks            1655.0      21269.4    128.5
 File Copy 4096 bufsize 8000 maxblocks          5800.0     199424.0    343.8
 Pipe Throughput                               12440.0     196468.6    157.9
 Pipe-based Context Switching                   4000.0      24261.8     60.7
 Process Creation                                126.0        459.0     36.4
 Shell Scripts (1 concurrent)                     42.4        543.8    128.2
 Shell Scripts (16 concurrent)                     ---         35.5      ---
 Shell Scripts (8 concurrent)                      6.0         71.7    119.6
 System Call Overhead                          15000.0     259415.2    172.9
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          113.0

Changes in v6:
 - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build
 - Clarify the commit message for patch 3 based on ML discussion
 - Clarify the commit message for patch 8 based on ML discussion
 - Rebased on v6.9-rc1

Changes in v5:
 - Rebase on v6.8-rc1 + riscv/for-next (for the fast GUP implementation)
 - Add patch for minor refactoring in asm/pgalloc.h
 - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h
 - Leave use_asid_allocator declared in asm/mmu_context.h

Changes in v4:
 - Fix a possible race between flush_icache_*() and SMP bringup
 - Refactor riscv_use_ipi_for_rfence() to make later changes cleaner
 - Optimize kernel TLB flushes with only one CPU online
 - Optimize global cache/TLB flushes with only one CPU online
 - Merge the two copies of __flush_tlb_range() and rely on the compiler
   to optimize out the broadcast path (both clang and gcc do this)
 - Merge the two copies of flush_tlb_all() and rely on constant folding
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - Fixed a performance regression caused by executing sfence.vma in a
   loop on implementations affected by SiFive CIP-1200
 - Rebased on v6.7-rc1

Changes in v2:
 - Move the SMP/UP merge earlier in the series to avoid build issues
 - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
 - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)

Samuel Holland (13):
  riscv: Flush the instruction cache during SMP bringup
  riscv: Factor out page table TLB synchronization
  riscv: Use IPIs for remote cache/TLB flushes by default
  riscv: mm: Broadcast kernel TLB flushes only when needed
  riscv: Only send remote fences when some other CPU is online
  riscv: mm: Combine the SMP and UP TLB flush code
  riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  riscv: mm: Introduce cntx2asid/cntx2version helper macros
  riscv: mm: Use a fixed layout for the MM context ID
  riscv: mm: Make asid_bits a local variable
  riscv: mm: Preserve global TLB entries when switching contexts
  riscv: mm: Always use an ASID to flush mm contexts

 arch/riscv/Kconfig                   |  2 +-
 arch/riscv/errata/sifive/errata.c    |  5 ++
 arch/riscv/include/asm/errata_list.h | 12 ++++-
 arch/riscv/include/asm/mmu.h         |  3 ++
 arch/riscv/include/asm/pgalloc.h     | 32 ++++++------
 arch/riscv/include/asm/sbi.h         |  4 ++
 arch/riscv/include/asm/smp.h         | 15 +-----
 arch/riscv/include/asm/tlbflush.h    | 52 ++++++++-----------
 arch/riscv/kernel/sbi-ipi.c          | 11 +++-
 arch/riscv/kernel/smp.c              | 11 +---
 arch/riscv/kernel/smpboot.c          |  7 +--
 arch/riscv/mm/Makefile               |  5 +-
 arch/riscv/mm/cacheflush.c           |  7 +--
 arch/riscv/mm/context.c              | 23 ++++-----
 arch/riscv/mm/tlbflush.c             | 75 ++++++++--------------------
 drivers/clocksource/timer-clint.c    |  2 +-
 16 files changed, 114 insertions(+), 152 deletions(-)

-- 
2.43.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements
@ 2024-03-27  4:49 ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

This series converts uniprocessor kernel builds to use the same TLB
flushing code as SMP builds, to take advantage of batching and existing
range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
SBI calls based on the online CPU count, which also covers the scenario
where SMP was enabled at build time but only one CPU is present/online.
A final optimization is to use single-ASID flushes wherever possible, to
avoid unnecessary TLB misses for kernel mappings.

This series has a semantic conflict with the AIA patches that are in
linux-next due to the removal of the third parameter of
riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
the extra argument from the call site.

Here are some numbers from D1 which show the performance impact:

v6.9-rc1:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        198.5     46.2
 File Copy 1024 bufsize 2000 maxblocks          3960.0      73934.4    186.7
 File Copy 256 bufsize 500 maxblocks            1655.0      20242.6    122.3
 File Copy 4096 bufsize 8000 maxblocks          5800.0     197706.4    340.9
 Pipe Throughput                               12440.0     176974.2    142.3
 Pipe-based Context Switching                   4000.0      23626.8     59.1
 Process Creation                                126.0        449.9     35.7
 Shell Scripts (1 concurrent)                     42.4        544.4    128.4
 Shell Scripts (16 concurrent)                     ---         35.3      ---
 Shell Scripts (8 concurrent)                      6.0         71.6    119.3
 System Call Overhead                          15000.0     248072.6    165.4
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          110.6

v6.9-rc1 + this patch series:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        196.8     45.8
 File Copy 1024 bufsize 2000 maxblocks          3960.0      71782.2    181.3
 File Copy 256 bufsize 500 maxblocks            1655.0      21269.4    128.5
 File Copy 4096 bufsize 8000 maxblocks          5800.0     199424.0    343.8
 Pipe Throughput                               12440.0     196468.6    157.9
 Pipe-based Context Switching                   4000.0      24261.8     60.7
 Process Creation                                126.0        459.0     36.4
 Shell Scripts (1 concurrent)                     42.4        543.8    128.2
 Shell Scripts (16 concurrent)                     ---         35.5      ---
 Shell Scripts (8 concurrent)                      6.0         71.7    119.6
 System Call Overhead                          15000.0     259415.2    172.9
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          113.0

Changes in v6:
 - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build
 - Clarify the commit message for patch 3 based on ML discussion
 - Clarify the commit message for patch 8 based on ML discussion
 - Rebased on v6.9-rc1

Changes in v5:
 - Rebase on v6.8-rc1 + riscv/for-next (for the fast GUP implementation)
 - Add patch for minor refactoring in asm/pgalloc.h
 - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h
 - Leave use_asid_allocator declared in asm/mmu_context.h

Changes in v4:
 - Fix a possible race between flush_icache_*() and SMP bringup
 - Refactor riscv_use_ipi_for_rfence() to make later changes cleaner
 - Optimize kernel TLB flushes with only one CPU online
 - Optimize global cache/TLB flushes with only one CPU online
 - Merge the two copies of __flush_tlb_range() and rely on the compiler
   to optimize out the broadcast path (both clang and gcc do this)
 - Merge the two copies of flush_tlb_all() and rely on constant folding
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - Fixed a performance regression caused by executing sfence.vma in a
   loop on implementations affected by SiFive CIP-1200
 - Rebased on v6.7-rc1

Changes in v2:
 - Move the SMP/UP merge earlier in the series to avoid build issues
 - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
 - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)

Samuel Holland (13):
  riscv: Flush the instruction cache during SMP bringup
  riscv: Factor out page table TLB synchronization
  riscv: Use IPIs for remote cache/TLB flushes by default
  riscv: mm: Broadcast kernel TLB flushes only when needed
  riscv: Only send remote fences when some other CPU is online
  riscv: mm: Combine the SMP and UP TLB flush code
  riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  riscv: mm: Introduce cntx2asid/cntx2version helper macros
  riscv: mm: Use a fixed layout for the MM context ID
  riscv: mm: Make asid_bits a local variable
  riscv: mm: Preserve global TLB entries when switching contexts
  riscv: mm: Always use an ASID to flush mm contexts

 arch/riscv/Kconfig                   |  2 +-
 arch/riscv/errata/sifive/errata.c    |  5 ++
 arch/riscv/include/asm/errata_list.h | 12 ++++-
 arch/riscv/include/asm/mmu.h         |  3 ++
 arch/riscv/include/asm/pgalloc.h     | 32 ++++++------
 arch/riscv/include/asm/sbi.h         |  4 ++
 arch/riscv/include/asm/smp.h         | 15 +-----
 arch/riscv/include/asm/tlbflush.h    | 52 ++++++++-----------
 arch/riscv/kernel/sbi-ipi.c          | 11 +++-
 arch/riscv/kernel/smp.c              | 11 +---
 arch/riscv/kernel/smpboot.c          |  7 +--
 arch/riscv/mm/Makefile               |  5 +-
 arch/riscv/mm/cacheflush.c           |  7 +--
 arch/riscv/mm/context.c              | 23 ++++-----
 arch/riscv/mm/tlbflush.c             | 75 ++++++++--------------------
 drivers/clocksource/timer-clint.c    |  2 +-
 16 files changed, 114 insertions(+), 152 deletions(-)

-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v6 01/13] riscv: Flush the instruction cache during SMP bringup
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Instruction cache flush IPIs are sent only to CPUs in cpu_online_mask,
so they will not target a CPU until it calls set_cpu_online() earlier in
smp_callin(). As a result, if instruction memory is modified between the
CPU coming out of reset and that point, then its instruction cache may
contain stale data. Therefore, the instruction cache must be flushed
after the set_cpu_online() synchronization point.

Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/kernel/smpboot.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index d41090fc3203..4b3c50da48ba 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -26,7 +26,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/sched/mm.h>
 
-#include <asm/cpufeature.h>
+#include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
 #include <asm/irq.h>
 #include <asm/mmu_context.h>
@@ -234,9 +234,10 @@ asmlinkage __visible void smp_callin(void)
 	riscv_user_isa_enable();
 
 	/*
-	 * Remote TLB flushes are ignored while the CPU is offline, so emit
-	 * a local TLB flush right now just in case.
+	 * Remote cache and TLB flushes are ignored while the CPU is offline,
+	 * so flush them both right now just in case.
 	 */
+	local_flush_icache_all();
 	local_flush_tlb_all();
 	complete(&cpu_running);
 	/*
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 01/13] riscv: Flush the instruction cache during SMP bringup
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Instruction cache flush IPIs are sent only to CPUs in cpu_online_mask,
so they will not target a CPU until it calls set_cpu_online() earlier in
smp_callin(). As a result, if instruction memory is modified between the
CPU coming out of reset and that point, then its instruction cache may
contain stale data. Therefore, the instruction cache must be flushed
after the set_cpu_online() synchronization point.

Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/kernel/smpboot.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index d41090fc3203..4b3c50da48ba 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -26,7 +26,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/sched/mm.h>
 
-#include <asm/cpufeature.h>
+#include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
 #include <asm/irq.h>
 #include <asm/mmu_context.h>
@@ -234,9 +234,10 @@ asmlinkage __visible void smp_callin(void)
 	riscv_user_isa_enable();
 
 	/*
-	 * Remote TLB flushes are ignored while the CPU is offline, so emit
-	 * a local TLB flush right now just in case.
+	 * Remote cache and TLB flushes are ignored while the CPU is offline,
+	 * so flush them both right now just in case.
 	 */
+	local_flush_icache_all();
 	local_flush_tlb_all();
 	complete(&cpu_running);
 	/*
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 02/13] riscv: Factor out page table TLB synchronization
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

The logic is the same for all page table levels. See commit 69be3fb111e7
("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU").

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build

Changes in v5:
 - New patch for v5

 arch/riscv/include/asm/pgalloc.h | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index deaf971253a2..b34587da8882 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -15,6 +15,14 @@
 #define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
+static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
+{
+	if (riscv_use_ipi_for_rfence())
+		tlb_remove_page_ptdesc(tlb, pt);
+	else
+		tlb_remove_ptdesc(tlb, pt);
+}
+
 static inline void pmd_populate_kernel(struct mm_struct *mm,
 	pmd_t *pmd, pte_t *pte)
 {
@@ -102,10 +110,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 		struct ptdesc *ptdesc = virt_to_ptdesc(pud);
 
 		pagetable_pud_dtor(ptdesc);
-		if (riscv_use_ipi_for_rfence())
-			tlb_remove_page_ptdesc(tlb, ptdesc);
-		else
-			tlb_remove_ptdesc(tlb, ptdesc);
+		riscv_tlb_remove_ptdesc(tlb, ptdesc);
 	}
 }
 
@@ -139,12 +144,8 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 				  unsigned long addr)
 {
-	if (pgtable_l5_enabled) {
-		if (riscv_use_ipi_for_rfence())
-			tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
-		else
-			tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
-	}
+	if (pgtable_l5_enabled)
+		riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
@@ -176,10 +177,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
 	struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
 	pagetable_pmd_dtor(ptdesc);
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, ptdesc);
-	else
-		tlb_remove_ptdesc(tlb, ptdesc);
+	riscv_tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 #endif /* __PAGETABLE_PMD_FOLDED */
@@ -190,10 +188,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	struct ptdesc *ptdesc = page_ptdesc(pte);
 
 	pagetable_pte_dtor(ptdesc);
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, ptdesc);
-	else
-		tlb_remove_ptdesc(tlb, ptdesc);
+	riscv_tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif /* CONFIG_MMU */
 
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 02/13] riscv: Factor out page table TLB synchronization
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

The logic is the same for all page table levels. See commit 69be3fb111e7
("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU").

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build

Changes in v5:
 - New patch for v5

 arch/riscv/include/asm/pgalloc.h | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index deaf971253a2..b34587da8882 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -15,6 +15,14 @@
 #define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
+static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
+{
+	if (riscv_use_ipi_for_rfence())
+		tlb_remove_page_ptdesc(tlb, pt);
+	else
+		tlb_remove_ptdesc(tlb, pt);
+}
+
 static inline void pmd_populate_kernel(struct mm_struct *mm,
 	pmd_t *pmd, pte_t *pte)
 {
@@ -102,10 +110,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 		struct ptdesc *ptdesc = virt_to_ptdesc(pud);
 
 		pagetable_pud_dtor(ptdesc);
-		if (riscv_use_ipi_for_rfence())
-			tlb_remove_page_ptdesc(tlb, ptdesc);
-		else
-			tlb_remove_ptdesc(tlb, ptdesc);
+		riscv_tlb_remove_ptdesc(tlb, ptdesc);
 	}
 }
 
@@ -139,12 +144,8 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 				  unsigned long addr)
 {
-	if (pgtable_l5_enabled) {
-		if (riscv_use_ipi_for_rfence())
-			tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
-		else
-			tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
-	}
+	if (pgtable_l5_enabled)
+		riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
@@ -176,10 +177,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
 	struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
 	pagetable_pmd_dtor(ptdesc);
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, ptdesc);
-	else
-		tlb_remove_ptdesc(tlb, ptdesc);
+	riscv_tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 #endif /* __PAGETABLE_PMD_FOLDED */
@@ -190,10 +188,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	struct ptdesc *ptdesc = page_ptdesc(pte);
 
 	pagetable_pte_dtor(ptdesc);
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, ptdesc);
-	else
-		tlb_remove_ptdesc(tlb, ptdesc);
+	riscv_tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif /* CONFIG_MMU */
 
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 03/13] riscv: Use IPIs for remote cache/TLB flushes by default
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland, Anup Patel

An IPI backend is always required in an SMP configuration, but an SBI
implementation is not. For example, SBI will be unavailable when the
kernel runs in M mode. For this reason, consider IPI delivery of cache
and TLB flushes to be the base case, and any other implementation (such
as the SBI remote fence extension) to be an optimization.

Generally, if IPIs can be delivered without firmware assistance, they
are assumed to be faster than SBI calls due to the SBI context switch
overhead. However, when SBI is used as the IPI backend, then the context
switch cost must be paid anyway, and performing the cache/TLB flush
directly in the SBI implementation is more efficient than injecting an
interrupt to S-mode. This is the only existing scenario where
riscv_ipi_set_virq_range() is called with use_for_rfence set to false.

sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only
calls riscv_ipi_set_virq_range() when no other IPI device is available.
This allows moving the static key and dropping the use_for_rfence
parameter. This decouples the static key from the irqchip driver probe
order.

Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is
enabled. Optherwise, IPIs must be used. Add a fallback definition of
riscv_use_sbi_for_rfence() which handles this case and removes the need
to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Clarify the commit message for patch 3 based on ML discussion

Changes in v5:
 - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h

Changes in v4:
 - New patch for v4

 arch/riscv/include/asm/pgalloc.h  |  7 ++++---
 arch/riscv/include/asm/sbi.h      |  4 ++++
 arch/riscv/include/asm/smp.h      | 15 ++-------------
 arch/riscv/kernel/sbi-ipi.c       | 11 ++++++++++-
 arch/riscv/kernel/smp.c           | 11 +----------
 arch/riscv/mm/cacheflush.c        |  5 ++---
 arch/riscv/mm/tlbflush.c          | 31 ++++++++++++++-----------------
 drivers/clocksource/timer-clint.c |  2 +-
 8 files changed, 38 insertions(+), 48 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index b34587da8882..f52264304f77 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -8,6 +8,7 @@
 #define _ASM_RISCV_PGALLOC_H
 
 #include <linux/mm.h>
+#include <asm/sbi.h>
 #include <asm/tlb.h>
 
 #ifdef CONFIG_MMU
@@ -17,10 +18,10 @@
 
 static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
 {
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, pt);
-	else
+	if (riscv_use_sbi_for_rfence())
 		tlb_remove_ptdesc(tlb, pt);
+	else
+		tlb_remove_page_ptdesc(tlb, pt);
 }
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..ea84392ca9d7 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -375,8 +375,12 @@ unsigned long riscv_cached_marchid(unsigned int cpu_id);
 unsigned long riscv_cached_mimpid(unsigned int cpu_id);
 
 #if IS_ENABLED(CONFIG_SMP) && IS_ENABLED(CONFIG_RISCV_SBI)
+DECLARE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
+#define riscv_use_sbi_for_rfence() \
+	static_branch_unlikely(&riscv_sbi_for_rfence)
 void sbi_ipi_init(void);
 #else
+static inline bool riscv_use_sbi_for_rfence(void) { return false; }
 static inline void sbi_ipi_init(void) { }
 #endif
 
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 0d555847cde6..7ac80e9f2288 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -49,12 +49,7 @@ void riscv_ipi_disable(void);
 bool riscv_ipi_have_virq_range(void);
 
 /* Set the IPI interrupt numbers for arch (called by irqchip drivers) */
-void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence);
-
-/* Check if we can use IPIs for remote FENCEs */
-DECLARE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
-#define riscv_use_ipi_for_rfence() \
-	static_branch_unlikely(&riscv_ipi_for_rfence)
+void riscv_ipi_set_virq_range(int virq, int nr);
 
 /* Check other CPUs stop or not */
 bool smp_crash_stop_failed(void);
@@ -104,16 +99,10 @@ static inline bool riscv_ipi_have_virq_range(void)
 	return false;
 }
 
-static inline void riscv_ipi_set_virq_range(int virq, int nr,
-					    bool use_for_rfence)
+static inline void riscv_ipi_set_virq_range(int virq, int nr)
 {
 }
 
-static inline bool riscv_use_ipi_for_rfence(void)
-{
-	return false;
-}
-
 #endif /* CONFIG_SMP */
 
 #if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
diff --git a/arch/riscv/kernel/sbi-ipi.c b/arch/riscv/kernel/sbi-ipi.c
index a4559695ce62..1026e22955cc 100644
--- a/arch/riscv/kernel/sbi-ipi.c
+++ b/arch/riscv/kernel/sbi-ipi.c
@@ -13,6 +13,9 @@
 #include <linux/irqdomain.h>
 #include <asm/sbi.h>
 
+DEFINE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
+EXPORT_SYMBOL_GPL(riscv_sbi_for_rfence);
+
 static int sbi_ipi_virq;
 
 static void sbi_ipi_handle(struct irq_desc *desc)
@@ -72,6 +75,12 @@ void __init sbi_ipi_init(void)
 			  "irqchip/sbi-ipi:starting",
 			  sbi_ipi_starting_cpu, NULL);
 
-	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, false);
+	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
 	pr_info("providing IPIs using SBI IPI extension\n");
+
+	/*
+	 * Use the SBI remote fence extension to avoid
+	 * the extra context switch needed to handle IPIs.
+	 */
+	static_branch_enable(&riscv_sbi_for_rfence);
 }
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 45dd4035416e..8e6eb64459af 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -171,10 +171,7 @@ bool riscv_ipi_have_virq_range(void)
 	return (ipi_virq_base) ? true : false;
 }
 
-DEFINE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
-EXPORT_SYMBOL_GPL(riscv_ipi_for_rfence);
-
-void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
+void riscv_ipi_set_virq_range(int virq, int nr)
 {
 	int i, err;
 
@@ -197,12 +194,6 @@ void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
 
 	/* Enabled IPIs for boot CPU immediately */
 	riscv_ipi_enable();
-
-	/* Update RFENCE static key */
-	if (use_for_rfence)
-		static_branch_enable(&riscv_ipi_for_rfence);
-	else
-		static_branch_disable(&riscv_ipi_for_rfence);
 }
 
 static const char * const ipi_names[] = {
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index bc61ee5975e4..d76fc73e594b 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,7 @@ void flush_icache_all(void)
 {
 	local_flush_icache_all();
 
-	if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
+	if (riscv_use_sbi_for_rfence())
 		sbi_remote_fence_i(NULL);
 	else
 		on_each_cpu(ipi_remote_fence_i, NULL, 1);
@@ -69,8 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 		 * with flush_icache_deferred().
 		 */
 		smp_mb();
-	} else if (IS_ENABLED(CONFIG_RISCV_SBI) &&
-		   !riscv_use_ipi_for_rfence()) {
+	} else if (riscv_use_sbi_for_rfence()) {
 		sbi_remote_fence_i(&others);
 	} else {
 		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 893566e004b7..0435605b07d0 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -79,10 +79,10 @@ static void __ipi_flush_tlb_all(void *info)
 
 void flush_tlb_all(void)
 {
-	if (riscv_use_ipi_for_rfence())
-		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
-	else
+	if (riscv_use_sbi_for_rfence())
 		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
+	else
+		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
 }
 
 struct flush_tlb_range_data {
@@ -103,7 +103,6 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 			      unsigned long start, unsigned long size,
 			      unsigned long stride)
 {
-	struct flush_tlb_range_data ftd;
 	bool broadcast;
 
 	if (cpumask_empty(cmask))
@@ -119,20 +118,18 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 		broadcast = true;
 	}
 
-	if (broadcast) {
-		if (riscv_use_ipi_for_rfence()) {
-			ftd.asid = asid;
-			ftd.start = start;
-			ftd.size = size;
-			ftd.stride = stride;
-			on_each_cpu_mask(cmask,
-					 __ipi_flush_tlb_range_asid,
-					 &ftd, 1);
-		} else
-			sbi_remote_sfence_vma_asid(cmask,
-						   start, size, asid);
-	} else {
+	if (!broadcast) {
 		local_flush_tlb_range_asid(start, size, stride, asid);
+	} else if (riscv_use_sbi_for_rfence()) {
+		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
+	} else {
+		struct flush_tlb_range_data ftd;
+
+		ftd.asid = asid;
+		ftd.start = start;
+		ftd.size = size;
+		ftd.stride = stride;
+		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
 	}
 
 	if (cmask != cpu_online_mask)
diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
index 09fd292eb83d..0bdd9d7ec545 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -251,7 +251,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
 	}
 
 	irq_set_chained_handler(clint_ipi_irq, clint_ipi_interrupt);
-	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE, true);
+	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE);
 	clint_clear_ipi();
 #endif
 
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 03/13] riscv: Use IPIs for remote cache/TLB flushes by default
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland, Anup Patel

An IPI backend is always required in an SMP configuration, but an SBI
implementation is not. For example, SBI will be unavailable when the
kernel runs in M mode. For this reason, consider IPI delivery of cache
and TLB flushes to be the base case, and any other implementation (such
as the SBI remote fence extension) to be an optimization.

Generally, if IPIs can be delivered without firmware assistance, they
are assumed to be faster than SBI calls due to the SBI context switch
overhead. However, when SBI is used as the IPI backend, then the context
switch cost must be paid anyway, and performing the cache/TLB flush
directly in the SBI implementation is more efficient than injecting an
interrupt to S-mode. This is the only existing scenario where
riscv_ipi_set_virq_range() is called with use_for_rfence set to false.

sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only
calls riscv_ipi_set_virq_range() when no other IPI device is available.
This allows moving the static key and dropping the use_for_rfence
parameter. This decouples the static key from the irqchip driver probe
order.

Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is
enabled. Optherwise, IPIs must be used. Add a fallback definition of
riscv_use_sbi_for_rfence() which handles this case and removes the need
to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Clarify the commit message for patch 3 based on ML discussion

Changes in v5:
 - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h

Changes in v4:
 - New patch for v4

 arch/riscv/include/asm/pgalloc.h  |  7 ++++---
 arch/riscv/include/asm/sbi.h      |  4 ++++
 arch/riscv/include/asm/smp.h      | 15 ++-------------
 arch/riscv/kernel/sbi-ipi.c       | 11 ++++++++++-
 arch/riscv/kernel/smp.c           | 11 +----------
 arch/riscv/mm/cacheflush.c        |  5 ++---
 arch/riscv/mm/tlbflush.c          | 31 ++++++++++++++-----------------
 drivers/clocksource/timer-clint.c |  2 +-
 8 files changed, 38 insertions(+), 48 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index b34587da8882..f52264304f77 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -8,6 +8,7 @@
 #define _ASM_RISCV_PGALLOC_H
 
 #include <linux/mm.h>
+#include <asm/sbi.h>
 #include <asm/tlb.h>
 
 #ifdef CONFIG_MMU
@@ -17,10 +18,10 @@
 
 static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
 {
-	if (riscv_use_ipi_for_rfence())
-		tlb_remove_page_ptdesc(tlb, pt);
-	else
+	if (riscv_use_sbi_for_rfence())
 		tlb_remove_ptdesc(tlb, pt);
+	else
+		tlb_remove_page_ptdesc(tlb, pt);
 }
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..ea84392ca9d7 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -375,8 +375,12 @@ unsigned long riscv_cached_marchid(unsigned int cpu_id);
 unsigned long riscv_cached_mimpid(unsigned int cpu_id);
 
 #if IS_ENABLED(CONFIG_SMP) && IS_ENABLED(CONFIG_RISCV_SBI)
+DECLARE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
+#define riscv_use_sbi_for_rfence() \
+	static_branch_unlikely(&riscv_sbi_for_rfence)
 void sbi_ipi_init(void);
 #else
+static inline bool riscv_use_sbi_for_rfence(void) { return false; }
 static inline void sbi_ipi_init(void) { }
 #endif
 
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 0d555847cde6..7ac80e9f2288 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -49,12 +49,7 @@ void riscv_ipi_disable(void);
 bool riscv_ipi_have_virq_range(void);
 
 /* Set the IPI interrupt numbers for arch (called by irqchip drivers) */
-void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence);
-
-/* Check if we can use IPIs for remote FENCEs */
-DECLARE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
-#define riscv_use_ipi_for_rfence() \
-	static_branch_unlikely(&riscv_ipi_for_rfence)
+void riscv_ipi_set_virq_range(int virq, int nr);
 
 /* Check other CPUs stop or not */
 bool smp_crash_stop_failed(void);
@@ -104,16 +99,10 @@ static inline bool riscv_ipi_have_virq_range(void)
 	return false;
 }
 
-static inline void riscv_ipi_set_virq_range(int virq, int nr,
-					    bool use_for_rfence)
+static inline void riscv_ipi_set_virq_range(int virq, int nr)
 {
 }
 
-static inline bool riscv_use_ipi_for_rfence(void)
-{
-	return false;
-}
-
 #endif /* CONFIG_SMP */
 
 #if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
diff --git a/arch/riscv/kernel/sbi-ipi.c b/arch/riscv/kernel/sbi-ipi.c
index a4559695ce62..1026e22955cc 100644
--- a/arch/riscv/kernel/sbi-ipi.c
+++ b/arch/riscv/kernel/sbi-ipi.c
@@ -13,6 +13,9 @@
 #include <linux/irqdomain.h>
 #include <asm/sbi.h>
 
+DEFINE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
+EXPORT_SYMBOL_GPL(riscv_sbi_for_rfence);
+
 static int sbi_ipi_virq;
 
 static void sbi_ipi_handle(struct irq_desc *desc)
@@ -72,6 +75,12 @@ void __init sbi_ipi_init(void)
 			  "irqchip/sbi-ipi:starting",
 			  sbi_ipi_starting_cpu, NULL);
 
-	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, false);
+	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
 	pr_info("providing IPIs using SBI IPI extension\n");
+
+	/*
+	 * Use the SBI remote fence extension to avoid
+	 * the extra context switch needed to handle IPIs.
+	 */
+	static_branch_enable(&riscv_sbi_for_rfence);
 }
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 45dd4035416e..8e6eb64459af 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -171,10 +171,7 @@ bool riscv_ipi_have_virq_range(void)
 	return (ipi_virq_base) ? true : false;
 }
 
-DEFINE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
-EXPORT_SYMBOL_GPL(riscv_ipi_for_rfence);
-
-void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
+void riscv_ipi_set_virq_range(int virq, int nr)
 {
 	int i, err;
 
@@ -197,12 +194,6 @@ void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
 
 	/* Enabled IPIs for boot CPU immediately */
 	riscv_ipi_enable();
-
-	/* Update RFENCE static key */
-	if (use_for_rfence)
-		static_branch_enable(&riscv_ipi_for_rfence);
-	else
-		static_branch_disable(&riscv_ipi_for_rfence);
 }
 
 static const char * const ipi_names[] = {
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index bc61ee5975e4..d76fc73e594b 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,7 @@ void flush_icache_all(void)
 {
 	local_flush_icache_all();
 
-	if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
+	if (riscv_use_sbi_for_rfence())
 		sbi_remote_fence_i(NULL);
 	else
 		on_each_cpu(ipi_remote_fence_i, NULL, 1);
@@ -69,8 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 		 * with flush_icache_deferred().
 		 */
 		smp_mb();
-	} else if (IS_ENABLED(CONFIG_RISCV_SBI) &&
-		   !riscv_use_ipi_for_rfence()) {
+	} else if (riscv_use_sbi_for_rfence()) {
 		sbi_remote_fence_i(&others);
 	} else {
 		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 893566e004b7..0435605b07d0 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -79,10 +79,10 @@ static void __ipi_flush_tlb_all(void *info)
 
 void flush_tlb_all(void)
 {
-	if (riscv_use_ipi_for_rfence())
-		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
-	else
+	if (riscv_use_sbi_for_rfence())
 		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
+	else
+		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
 }
 
 struct flush_tlb_range_data {
@@ -103,7 +103,6 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 			      unsigned long start, unsigned long size,
 			      unsigned long stride)
 {
-	struct flush_tlb_range_data ftd;
 	bool broadcast;
 
 	if (cpumask_empty(cmask))
@@ -119,20 +118,18 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 		broadcast = true;
 	}
 
-	if (broadcast) {
-		if (riscv_use_ipi_for_rfence()) {
-			ftd.asid = asid;
-			ftd.start = start;
-			ftd.size = size;
-			ftd.stride = stride;
-			on_each_cpu_mask(cmask,
-					 __ipi_flush_tlb_range_asid,
-					 &ftd, 1);
-		} else
-			sbi_remote_sfence_vma_asid(cmask,
-						   start, size, asid);
-	} else {
+	if (!broadcast) {
 		local_flush_tlb_range_asid(start, size, stride, asid);
+	} else if (riscv_use_sbi_for_rfence()) {
+		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
+	} else {
+		struct flush_tlb_range_data ftd;
+
+		ftd.asid = asid;
+		ftd.start = start;
+		ftd.size = size;
+		ftd.stride = stride;
+		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
 	}
 
 	if (cmask != cpu_online_mask)
diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
index 09fd292eb83d..0bdd9d7ec545 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -251,7 +251,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
 	}
 
 	irq_set_chained_handler(clint_ipi_irq, clint_ipi_interrupt);
-	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE, true);
+	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE);
 	clint_clear_ipi();
 #endif
 
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 04/13] riscv: mm: Broadcast kernel TLB flushes only when needed
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

__flush_tlb_range() avoids broadcasting TLB flushes when an mm context
is only active on the local CPU. Apply this same optimization to TLB
flushes of kernel memory when only one CPU is online. This check can be
constant-folded when SMP is disabled.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/mm/tlbflush.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 0435605b07d0..da821315d43e 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -103,22 +103,15 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 			      unsigned long start, unsigned long size,
 			      unsigned long stride)
 {
-	bool broadcast;
+	unsigned int cpu;
 
 	if (cpumask_empty(cmask))
 		return;
 
-	if (cmask != cpu_online_mask) {
-		unsigned int cpuid;
+	cpu = get_cpu();
 
-		cpuid = get_cpu();
-		/* check if the tlbflush needs to be sent to other CPUs */
-		broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
-	} else {
-		broadcast = true;
-	}
-
-	if (!broadcast) {
+	/* Check if the TLB flush needs to be sent to other CPUs. */
+	if (cpumask_any_but(cmask, cpu) >= nr_cpu_ids) {
 		local_flush_tlb_range_asid(start, size, stride, asid);
 	} else if (riscv_use_sbi_for_rfence()) {
 		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
@@ -132,8 +125,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
 	}
 
-	if (cmask != cpu_online_mask)
-		put_cpu();
+	put_cpu();
 }
 
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 04/13] riscv: mm: Broadcast kernel TLB flushes only when needed
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

__flush_tlb_range() avoids broadcasting TLB flushes when an mm context
is only active on the local CPU. Apply this same optimization to TLB
flushes of kernel memory when only one CPU is online. This check can be
constant-folded when SMP is disabled.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/mm/tlbflush.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 0435605b07d0..da821315d43e 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -103,22 +103,15 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 			      unsigned long start, unsigned long size,
 			      unsigned long stride)
 {
-	bool broadcast;
+	unsigned int cpu;
 
 	if (cpumask_empty(cmask))
 		return;
 
-	if (cmask != cpu_online_mask) {
-		unsigned int cpuid;
+	cpu = get_cpu();
 
-		cpuid = get_cpu();
-		/* check if the tlbflush needs to be sent to other CPUs */
-		broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
-	} else {
-		broadcast = true;
-	}
-
-	if (!broadcast) {
+	/* Check if the TLB flush needs to be sent to other CPUs. */
+	if (cpumask_any_but(cmask, cpu) >= nr_cpu_ids) {
 		local_flush_tlb_range_asid(start, size, stride, asid);
 	} else if (riscv_use_sbi_for_rfence()) {
 		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
@@ -132,8 +125,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
 	}
 
-	if (cmask != cpu_online_mask)
-		put_cpu();
+	put_cpu();
 }
 
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

If no other CPU is online, a local cache or TLB flush is sufficient.
These checks can be constant-folded when SMP is disabled.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/mm/cacheflush.c | 4 +++-
 arch/riscv/mm/tlbflush.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index d76fc73e594b..f5be1fec8191 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,9 @@ void flush_icache_all(void)
 {
 	local_flush_icache_all();
 
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		return;
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_fence_i(NULL);
 	else
 		on_each_cpu(ipi_remote_fence_i, NULL, 1);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index da821315d43e..0901aa47b58f 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
 
 void flush_tlb_all(void)
 {
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		local_flush_tlb_all();
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
 	else
 		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

If no other CPU is online, a local cache or TLB flush is sufficient.
These checks can be constant-folded when SMP is disabled.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - New patch for v4

 arch/riscv/mm/cacheflush.c | 4 +++-
 arch/riscv/mm/tlbflush.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index d76fc73e594b..f5be1fec8191 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -21,7 +21,9 @@ void flush_icache_all(void)
 {
 	local_flush_icache_all();
 
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		return;
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_fence_i(NULL);
 	else
 		on_each_cpu(ipi_remote_fence_i, NULL, 1);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index da821315d43e..0901aa47b58f 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
 
 void flush_tlb_all(void)
 {
-	if (riscv_use_sbi_for_rfence())
+	if (num_online_cpus() < 2)
+		local_flush_tlb_all();
+	else if (riscv_use_sbi_for_rfence())
 		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
 	else
 		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 06/13] riscv: mm: Combine the SMP and UP TLB flush code
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

In SMP configurations, all TLB flushing narrower than flush_tlb_all()
goes through __flush_tlb_range(). Do the same in UP configurations.

This allows UP configurations to take advantage of recent improvements
to the code in tlbflush.c, such as support for huge pages and flushing
multiple-page ranges.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - Merge the two copies of __flush_tlb_range() and rely on the compiler
   to optimize out the broadcast path (both clang and gcc do this)
 - Merge the two copies of flush_tlb_all() and rely on constant folding

Changes in v2:
 - Move the SMP/UP merge earlier in the series to avoid build issues
 - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
 - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)

 arch/riscv/Kconfig                |  2 +-
 arch/riscv/include/asm/tlbflush.h | 31 +++----------------------------
 arch/riscv/mm/Makefile            |  5 +----
 3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..442532819a44 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -60,7 +60,7 @@ config RISCV
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USES_CFI_TRAPS if CFI_CLANG
-	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP && MMU
+	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if MMU
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 4112cc8d1d69..4f86424b1ba5 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -27,12 +27,7 @@ static inline void local_flush_tlb_page(unsigned long addr)
 {
 	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
 }
-#else /* CONFIG_MMU */
-#define local_flush_tlb_all()			do { } while (0)
-#define local_flush_tlb_page(addr)		do { } while (0)
-#endif /* CONFIG_MMU */
 
-#if defined(CONFIG_SMP) && defined(CONFIG_MMU)
 void flush_tlb_all(void);
 void flush_tlb_mm(struct mm_struct *mm);
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
@@ -54,28 +49,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       unsigned long uaddr);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
-
-#else /* CONFIG_SMP && CONFIG_MMU */
-
-#define flush_tlb_all() local_flush_tlb_all()
-#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
-
-static inline void flush_tlb_range(struct vm_area_struct *vma,
-		unsigned long start, unsigned long end)
-{
-	local_flush_tlb_all();
-}
-
-/* Flush a range of kernel pages */
-static inline void flush_tlb_kernel_range(unsigned long start,
-	unsigned long end)
-{
-	local_flush_tlb_all();
-}
-
-#define flush_tlb_mm(mm) flush_tlb_all()
-#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
-#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
-#endif /* !CONFIG_SMP || !CONFIG_MMU */
+#else /* CONFIG_MMU */
+#define local_flush_tlb_all()			do { } while (0)
+#endif /* CONFIG_MMU */
 
 #endif /* _ASM_RISCV_TLBFLUSH_H */
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 2c869f8026a8..cbe4d775ef56 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -13,14 +13,11 @@ endif
 KCOV_INSTRUMENT_init.o := n
 
 obj-y += init.o
-obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o
+obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o tlbflush.o
 obj-y += cacheflush.o
 obj-y += context.o
 obj-y += pmem.o
 
-ifeq ($(CONFIG_MMU),y)
-obj-$(CONFIG_SMP) += tlbflush.o
-endif
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
 obj-$(CONFIG_KASAN)   += kasan_init.o
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 06/13] riscv: mm: Combine the SMP and UP TLB flush code
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

In SMP configurations, all TLB flushing narrower than flush_tlb_all()
goes through __flush_tlb_range(). Do the same in UP configurations.

This allows UP configurations to take advantage of recent improvements
to the code in tlbflush.c, such as support for huge pages and flushing
multiple-page ranges.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v4)

Changes in v4:
 - Merge the two copies of __flush_tlb_range() and rely on the compiler
   to optimize out the broadcast path (both clang and gcc do this)
 - Merge the two copies of flush_tlb_all() and rely on constant folding

Changes in v2:
 - Move the SMP/UP merge earlier in the series to avoid build issues
 - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
 - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)

 arch/riscv/Kconfig                |  2 +-
 arch/riscv/include/asm/tlbflush.h | 31 +++----------------------------
 arch/riscv/mm/Makefile            |  5 +----
 3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..442532819a44 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -60,7 +60,7 @@ config RISCV
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USES_CFI_TRAPS if CFI_CLANG
-	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP && MMU
+	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if MMU
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 4112cc8d1d69..4f86424b1ba5 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -27,12 +27,7 @@ static inline void local_flush_tlb_page(unsigned long addr)
 {
 	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
 }
-#else /* CONFIG_MMU */
-#define local_flush_tlb_all()			do { } while (0)
-#define local_flush_tlb_page(addr)		do { } while (0)
-#endif /* CONFIG_MMU */
 
-#if defined(CONFIG_SMP) && defined(CONFIG_MMU)
 void flush_tlb_all(void);
 void flush_tlb_mm(struct mm_struct *mm);
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
@@ -54,28 +49,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       unsigned long uaddr);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
-
-#else /* CONFIG_SMP && CONFIG_MMU */
-
-#define flush_tlb_all() local_flush_tlb_all()
-#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
-
-static inline void flush_tlb_range(struct vm_area_struct *vma,
-		unsigned long start, unsigned long end)
-{
-	local_flush_tlb_all();
-}
-
-/* Flush a range of kernel pages */
-static inline void flush_tlb_kernel_range(unsigned long start,
-	unsigned long end)
-{
-	local_flush_tlb_all();
-}
-
-#define flush_tlb_mm(mm) flush_tlb_all()
-#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
-#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
-#endif /* !CONFIG_SMP || !CONFIG_MMU */
+#else /* CONFIG_MMU */
+#define local_flush_tlb_all()			do { } while (0)
+#endif /* CONFIG_MMU */
 
 #endif /* _ASM_RISCV_TLBFLUSH_H */
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 2c869f8026a8..cbe4d775ef56 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -13,14 +13,11 @@ endif
 KCOV_INSTRUMENT_init.o := n
 
 obj-y += init.o
-obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o
+obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o tlbflush.o
 obj-y += cacheflush.o
 obj-y += context.o
 obj-y += pmem.o
 
-ifeq ($(CONFIG_MMU),y)
-obj-$(CONFIG_SMP) += tlbflush.o
-endif
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
 obj-$(CONFIG_KASAN)   += kasan_init.o
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

commit 3f1e782998cd ("riscv: add ASID-based tlbflushing methods") added
calls to the sfence.vma instruction with rs2 != x0. These single-ASID
instruction variants are also affected by SiFive errata CIP-1200.

Until now, the errata workaround was not needed for the single-ASID
sfence.vma variants, because they were only used when the ASID allocator
was enabled, and the affected SiFive platforms do not support multiple
ASIDs. However, we are going to start using those sfence.vma variants
regardless of ASID support, so now we need alternatives covering them.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Rebase on Alexandre's "riscv: tlb flush improvements" series v5

 arch/riscv/include/asm/errata_list.h | 12 +++++++++++-
 arch/riscv/include/asm/tlbflush.h    | 19 ++++++++++++++++++-
 arch/riscv/mm/tlbflush.c             | 23 -----------------------
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 1f2dbfb8a8bf..35ce26899960 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -43,11 +43,21 @@ ALTERNATIVE(__stringify(RISCV_PTR do_page_fault),			\
 	    CONFIG_ERRATA_SIFIVE_CIP_453)
 #else /* !__ASSEMBLY__ */
 
-#define ALT_FLUSH_TLB_PAGE(x)						\
+#define ALT_SFENCE_VMA_ASID(asid)					\
+asm(ALTERNATIVE("sfence.vma x0, %0", "sfence.vma", SIFIVE_VENDOR_ID,	\
+		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
+		: : "r" (asid) : "memory")
+
+#define ALT_SFENCE_VMA_ADDR(addr)					\
 asm(ALTERNATIVE("sfence.vma %0", "sfence.vma", SIFIVE_VENDOR_ID,	\
 		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
 		: : "r" (addr) : "memory")
 
+#define ALT_SFENCE_VMA_ADDR_ASID(addr, asid)				\
+asm(ALTERNATIVE("sfence.vma %0, %1", "sfence.vma", SIFIVE_VENDOR_ID,	\
+		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
+		: : "r" (addr), "r" (asid) : "memory")
+
 /*
  * _val is marked as "will be overwritten", so need to set it to 0
  * in the default case.
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 4f86424b1ba5..463b615d7728 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -22,10 +22,27 @@ static inline void local_flush_tlb_all(void)
 	__asm__ __volatile__ ("sfence.vma" : : : "memory");
 }
 
+static inline void local_flush_tlb_all_asid(unsigned long asid)
+{
+	if (asid != FLUSH_TLB_NO_ASID)
+		ALT_SFENCE_VMA_ASID(asid);
+	else
+		local_flush_tlb_all();
+}
+
 /* Flush one page from local TLB */
 static inline void local_flush_tlb_page(unsigned long addr)
 {
-	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
+	ALT_SFENCE_VMA_ADDR(addr);
+}
+
+static inline void local_flush_tlb_page_asid(unsigned long addr,
+					     unsigned long asid)
+{
+	if (asid != FLUSH_TLB_NO_ASID)
+		ALT_SFENCE_VMA_ADDR_ASID(addr, asid);
+	else
+		local_flush_tlb_page(addr);
 }
 
 void flush_tlb_all(void);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 0901aa47b58f..ad7bdcfcc219 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -7,29 +7,6 @@
 #include <asm/sbi.h>
 #include <asm/mmu_context.h>
 
-static inline void local_flush_tlb_all_asid(unsigned long asid)
-{
-	if (asid != FLUSH_TLB_NO_ASID)
-		__asm__ __volatile__ ("sfence.vma x0, %0"
-				:
-				: "r" (asid)
-				: "memory");
-	else
-		local_flush_tlb_all();
-}
-
-static inline void local_flush_tlb_page_asid(unsigned long addr,
-		unsigned long asid)
-{
-	if (asid != FLUSH_TLB_NO_ASID)
-		__asm__ __volatile__ ("sfence.vma %0, %1"
-				:
-				: "r" (addr), "r" (asid)
-				: "memory");
-	else
-		local_flush_tlb_page(addr);
-}
-
 /*
  * Flush entire TLB if number of entries to be flushed is greater
  * than the threshold below.
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

commit 3f1e782998cd ("riscv: add ASID-based tlbflushing methods") added
calls to the sfence.vma instruction with rs2 != x0. These single-ASID
instruction variants are also affected by SiFive errata CIP-1200.

Until now, the errata workaround was not needed for the single-ASID
sfence.vma variants, because they were only used when the ASID allocator
was enabled, and the affected SiFive platforms do not support multiple
ASIDs. However, we are going to start using those sfence.vma variants
regardless of ASID support, so now we need alternatives covering them.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Rebase on Alexandre's "riscv: tlb flush improvements" series v5

 arch/riscv/include/asm/errata_list.h | 12 +++++++++++-
 arch/riscv/include/asm/tlbflush.h    | 19 ++++++++++++++++++-
 arch/riscv/mm/tlbflush.c             | 23 -----------------------
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 1f2dbfb8a8bf..35ce26899960 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -43,11 +43,21 @@ ALTERNATIVE(__stringify(RISCV_PTR do_page_fault),			\
 	    CONFIG_ERRATA_SIFIVE_CIP_453)
 #else /* !__ASSEMBLY__ */
 
-#define ALT_FLUSH_TLB_PAGE(x)						\
+#define ALT_SFENCE_VMA_ASID(asid)					\
+asm(ALTERNATIVE("sfence.vma x0, %0", "sfence.vma", SIFIVE_VENDOR_ID,	\
+		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
+		: : "r" (asid) : "memory")
+
+#define ALT_SFENCE_VMA_ADDR(addr)					\
 asm(ALTERNATIVE("sfence.vma %0", "sfence.vma", SIFIVE_VENDOR_ID,	\
 		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
 		: : "r" (addr) : "memory")
 
+#define ALT_SFENCE_VMA_ADDR_ASID(addr, asid)				\
+asm(ALTERNATIVE("sfence.vma %0, %1", "sfence.vma", SIFIVE_VENDOR_ID,	\
+		ERRATA_SIFIVE_CIP_1200, CONFIG_ERRATA_SIFIVE_CIP_1200)	\
+		: : "r" (addr), "r" (asid) : "memory")
+
 /*
  * _val is marked as "will be overwritten", so need to set it to 0
  * in the default case.
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 4f86424b1ba5..463b615d7728 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -22,10 +22,27 @@ static inline void local_flush_tlb_all(void)
 	__asm__ __volatile__ ("sfence.vma" : : : "memory");
 }
 
+static inline void local_flush_tlb_all_asid(unsigned long asid)
+{
+	if (asid != FLUSH_TLB_NO_ASID)
+		ALT_SFENCE_VMA_ASID(asid);
+	else
+		local_flush_tlb_all();
+}
+
 /* Flush one page from local TLB */
 static inline void local_flush_tlb_page(unsigned long addr)
 {
-	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
+	ALT_SFENCE_VMA_ADDR(addr);
+}
+
+static inline void local_flush_tlb_page_asid(unsigned long addr,
+					     unsigned long asid)
+{
+	if (asid != FLUSH_TLB_NO_ASID)
+		ALT_SFENCE_VMA_ADDR_ASID(addr, asid);
+	else
+		local_flush_tlb_page(addr);
 }
 
 void flush_tlb_all(void);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 0901aa47b58f..ad7bdcfcc219 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -7,29 +7,6 @@
 #include <asm/sbi.h>
 #include <asm/mmu_context.h>
 
-static inline void local_flush_tlb_all_asid(unsigned long asid)
-{
-	if (asid != FLUSH_TLB_NO_ASID)
-		__asm__ __volatile__ ("sfence.vma x0, %0"
-				:
-				: "r" (asid)
-				: "memory");
-	else
-		local_flush_tlb_all();
-}
-
-static inline void local_flush_tlb_page_asid(unsigned long addr,
-		unsigned long asid)
-{
-	if (asid != FLUSH_TLB_NO_ASID)
-		__asm__ __volatile__ ("sfence.vma %0, %1"
-				:
-				: "r" (addr), "r" (asid)
-				: "memory");
-	else
-		local_flush_tlb_page(addr);
-}
-
 /*
  * Flush entire TLB if number of entries to be flushed is greater
  * than the threshold below.
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Implementations affected by SiFive errata CIP-1200 have a bug which
forces the kernel to always use the global variant of the sfence.vma
instruction. When affected by this errata, do not attempt to flush a
range of addresses; each iteration of the loop would actually flush the
whole TLB instead. Instead, minimize the overall number of sfence.vma
instructions.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Clarify the commit message for patch 8 based on ML discussion

Changes in v4:
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - New patch for v3

 arch/riscv/errata/sifive/errata.c | 5 +++++
 arch/riscv/include/asm/tlbflush.h | 2 ++
 arch/riscv/mm/tlbflush.c          | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index 3d9a32d791f7..716cfedad3a2 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
 		return false;
 	if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
 		return false;
+
+#ifdef CONFIG_MMU
+	tlb_flush_all_threshold = 0;
+#endif
+
 	return true;
 }
 
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 463b615d7728..8e329721375b 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       unsigned long uaddr);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
+
+extern unsigned long tlb_flush_all_threshold;
 #else /* CONFIG_MMU */
 #define local_flush_tlb_all()			do { } while (0)
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index ad7bdcfcc219..18af7b5053af 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -11,7 +11,7 @@
  * Flush entire TLB if number of entries to be flushed is greater
  * than the threshold below.
  */
-static unsigned long tlb_flush_all_threshold __read_mostly = 64;
+unsigned long tlb_flush_all_threshold __read_mostly = 64;
 
 static void local_flush_tlb_range_threshold_asid(unsigned long start,
 						 unsigned long size,
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Implementations affected by SiFive errata CIP-1200 have a bug which
forces the kernel to always use the global variant of the sfence.vma
instruction. When affected by this errata, do not attempt to flush a
range of addresses; each iteration of the loop would actually flush the
whole TLB instead. Instead, minimize the overall number of sfence.vma
instructions.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v6:
 - Clarify the commit message for patch 8 based on ML discussion

Changes in v4:
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - New patch for v3

 arch/riscv/errata/sifive/errata.c | 5 +++++
 arch/riscv/include/asm/tlbflush.h | 2 ++
 arch/riscv/mm/tlbflush.c          | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index 3d9a32d791f7..716cfedad3a2 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
 		return false;
 	if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
 		return false;
+
+#ifdef CONFIG_MMU
+	tlb_flush_all_threshold = 0;
+#endif
+
 	return true;
 }
 
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 463b615d7728..8e329721375b 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
 			       unsigned long uaddr);
 void arch_flush_tlb_batched_pending(struct mm_struct *mm);
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
+
+extern unsigned long tlb_flush_all_threshold;
 #else /* CONFIG_MMU */
 #define local_flush_tlb_all()			do { } while (0)
 #endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index ad7bdcfcc219..18af7b5053af 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -11,7 +11,7 @@
  * Flush entire TLB if number of entries to be flushed is greater
  * than the threshold below.
  */
-static unsigned long tlb_flush_all_threshold __read_mostly = 64;
+unsigned long tlb_flush_all_threshold __read_mostly = 64;
 
 static void local_flush_tlb_range_threshold_asid(unsigned long start,
 						 unsigned long size,
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

When using the ASID allocator, the MM context ID contains two values:
the ASID in the lower bits, and the allocator version number in the
remaining bits. Use macros to make this separation more obvious.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/include/asm/mmu.h |  3 +++
 arch/riscv/mm/context.c      | 12 ++++++------
 arch/riscv/mm/tlbflush.c     |  2 +-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 355504b37f8e..a550fbf770be 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -26,6 +26,9 @@ typedef struct {
 #endif
 } mm_context_t;
 
+#define cntx2asid(cntx)		((cntx) & asid_mask)
+#define cntx2version(cntx)	((cntx) & ~asid_mask)
+
 void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
 			       phys_addr_t sz, pgprot_t prot);
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index ba8eb3944687..b562b3c44487 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -81,7 +81,7 @@ static void __flush_context(void)
 		if (cntx == 0)
 			cntx = per_cpu(reserved_context, i);
 
-		__set_bit(cntx & asid_mask, context_asid_map);
+		__set_bit(cntx2asid(cntx), context_asid_map);
 		per_cpu(reserved_context, i) = cntx;
 	}
 
@@ -102,7 +102,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 	lockdep_assert_held(&context_lock);
 
 	if (cntx != 0) {
-		unsigned long newcntx = ver | (cntx & asid_mask);
+		unsigned long newcntx = ver | cntx2asid(cntx);
 
 		/*
 		 * If our current CONTEXT was active during a rollover, we
@@ -115,7 +115,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 		 * We had a valid CONTEXT in a previous life, so try to
 		 * re-use it if possible.
 		 */
-		if (!__test_and_set_bit(cntx & asid_mask, context_asid_map))
+		if (!__test_and_set_bit(cntx2asid(cntx), context_asid_map))
 			return newcntx;
 	}
 
@@ -168,7 +168,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 	 */
 	old_active_cntx = atomic_long_read(&per_cpu(active_context, cpu));
 	if (old_active_cntx &&
-	    ((cntx & ~asid_mask) == atomic_long_read(&current_version)) &&
+	    (cntx2version(cntx) == atomic_long_read(&current_version)) &&
 	    atomic_long_cmpxchg_relaxed(&per_cpu(active_context, cpu),
 					old_active_cntx, cntx))
 		goto switch_mm_fast;
@@ -177,7 +177,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 	/* Check that our ASID belongs to the current_version. */
 	cntx = atomic_long_read(&mm->context.id);
-	if ((cntx & ~asid_mask) != atomic_long_read(&current_version)) {
+	if (cntx2version(cntx) != atomic_long_read(&current_version)) {
 		cntx = __new_context(mm);
 		atomic_long_set(&mm->context.id, cntx);
 	}
@@ -191,7 +191,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 switch_mm_fast:
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
-		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
+		  (cntx2asid(cntx) << SATP_ASID_SHIFT) |
 		  satp_mode);
 
 	if (need_flush_tlb)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 18af7b5053af..35266dd9a9a2 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -110,7 +110,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
 {
 	return static_branch_unlikely(&use_asid_allocator) ?
-			atomic_long_read(&mm->context.id) & asid_mask : FLUSH_TLB_NO_ASID;
+			cntx2asid(atomic_long_read(&mm->context.id)) : FLUSH_TLB_NO_ASID;
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

When using the ASID allocator, the MM context ID contains two values:
the ASID in the lower bits, and the allocator version number in the
remaining bits. Use macros to make this separation more obvious.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/include/asm/mmu.h |  3 +++
 arch/riscv/mm/context.c      | 12 ++++++------
 arch/riscv/mm/tlbflush.c     |  2 +-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 355504b37f8e..a550fbf770be 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -26,6 +26,9 @@ typedef struct {
 #endif
 } mm_context_t;
 
+#define cntx2asid(cntx)		((cntx) & asid_mask)
+#define cntx2version(cntx)	((cntx) & ~asid_mask)
+
 void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
 			       phys_addr_t sz, pgprot_t prot);
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index ba8eb3944687..b562b3c44487 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -81,7 +81,7 @@ static void __flush_context(void)
 		if (cntx == 0)
 			cntx = per_cpu(reserved_context, i);
 
-		__set_bit(cntx & asid_mask, context_asid_map);
+		__set_bit(cntx2asid(cntx), context_asid_map);
 		per_cpu(reserved_context, i) = cntx;
 	}
 
@@ -102,7 +102,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 	lockdep_assert_held(&context_lock);
 
 	if (cntx != 0) {
-		unsigned long newcntx = ver | (cntx & asid_mask);
+		unsigned long newcntx = ver | cntx2asid(cntx);
 
 		/*
 		 * If our current CONTEXT was active during a rollover, we
@@ -115,7 +115,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 		 * We had a valid CONTEXT in a previous life, so try to
 		 * re-use it if possible.
 		 */
-		if (!__test_and_set_bit(cntx & asid_mask, context_asid_map))
+		if (!__test_and_set_bit(cntx2asid(cntx), context_asid_map))
 			return newcntx;
 	}
 
@@ -168,7 +168,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 	 */
 	old_active_cntx = atomic_long_read(&per_cpu(active_context, cpu));
 	if (old_active_cntx &&
-	    ((cntx & ~asid_mask) == atomic_long_read(&current_version)) &&
+	    (cntx2version(cntx) == atomic_long_read(&current_version)) &&
 	    atomic_long_cmpxchg_relaxed(&per_cpu(active_context, cpu),
 					old_active_cntx, cntx))
 		goto switch_mm_fast;
@@ -177,7 +177,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 	/* Check that our ASID belongs to the current_version. */
 	cntx = atomic_long_read(&mm->context.id);
-	if ((cntx & ~asid_mask) != atomic_long_read(&current_version)) {
+	if (cntx2version(cntx) != atomic_long_read(&current_version)) {
 		cntx = __new_context(mm);
 		atomic_long_set(&mm->context.id, cntx);
 	}
@@ -191,7 +191,7 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 switch_mm_fast:
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
-		  ((cntx & asid_mask) << SATP_ASID_SHIFT) |
+		  (cntx2asid(cntx) << SATP_ASID_SHIFT) |
 		  satp_mode);
 
 	if (need_flush_tlb)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 18af7b5053af..35266dd9a9a2 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -110,7 +110,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
 {
 	return static_branch_unlikely(&use_asid_allocator) ?
-			atomic_long_read(&mm->context.id) & asid_mask : FLUSH_TLB_NO_ASID;
+			cntx2asid(atomic_long_read(&mm->context.id)) : FLUSH_TLB_NO_ASID;
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 10/13] riscv: mm: Use a fixed layout for the MM context ID
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Currently, the size of the ASID field in the MM context ID dynamically
depends on the number of hardware-supported ASID bits. This requires
reading a global variable to extract either field from the context ID.
Instead, allocate the maximum possible number of bits to the ASID field,
so the layout of the context ID is known at compile-time.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/include/asm/mmu.h      | 4 ++--
 arch/riscv/include/asm/tlbflush.h | 2 --
 arch/riscv/mm/context.c           | 6 ++----
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index a550fbf770be..dc0273f7905f 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -26,8 +26,8 @@ typedef struct {
 #endif
 } mm_context_t;
 
-#define cntx2asid(cntx)		((cntx) & asid_mask)
-#define cntx2version(cntx)	((cntx) & ~asid_mask)
+#define cntx2asid(cntx)		((cntx) & SATP_ASID_MASK)
+#define cntx2version(cntx)	((cntx) & ~SATP_ASID_MASK)
 
 void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
 			       phys_addr_t sz, pgprot_t prot);
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 8e329721375b..72e559934952 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -15,8 +15,6 @@
 #define FLUSH_TLB_NO_ASID       ((unsigned long)-1)
 
 #ifdef CONFIG_MMU
-extern unsigned long asid_mask;
-
 static inline void local_flush_tlb_all(void)
 {
 	__asm__ __volatile__ ("sfence.vma" : : : "memory");
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index b562b3c44487..5315af06cd4d 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -22,7 +22,6 @@ DEFINE_STATIC_KEY_FALSE(use_asid_allocator);
 
 static unsigned long asid_bits;
 static unsigned long num_asids;
-unsigned long asid_mask;
 
 static atomic_long_t current_version;
 
@@ -128,7 +127,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 		goto set_asid;
 
 	/* We're out of ASIDs, so increment current_version */
-	ver = atomic_long_add_return_relaxed(num_asids, &current_version);
+	ver = atomic_long_add_return_relaxed(BIT(SATP_ASID_BITS), &current_version);
 
 	/* Flush everything  */
 	__flush_context();
@@ -247,7 +246,6 @@ static int __init asids_init(void)
 	/* Pre-compute ASID details */
 	if (asid_bits) {
 		num_asids = 1 << asid_bits;
-		asid_mask = num_asids - 1;
 	}
 
 	/*
@@ -255,7 +253,7 @@ static int __init asids_init(void)
 	 * at-least twice more than CPUs
 	 */
 	if (num_asids > (2 * num_possible_cpus())) {
-		atomic_long_set(&current_version, num_asids);
+		atomic_long_set(&current_version, BIT(SATP_ASID_BITS));
 
 		context_asid_map = bitmap_zalloc(num_asids, GFP_KERNEL);
 		if (!context_asid_map)
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 10/13] riscv: mm: Use a fixed layout for the MM context ID
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Currently, the size of the ASID field in the MM context ID dynamically
depends on the number of hardware-supported ASID bits. This requires
reading a global variable to extract either field from the context ID.
Instead, allocate the maximum possible number of bits to the ASID field,
so the layout of the context ID is known at compile-time.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/include/asm/mmu.h      | 4 ++--
 arch/riscv/include/asm/tlbflush.h | 2 --
 arch/riscv/mm/context.c           | 6 ++----
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index a550fbf770be..dc0273f7905f 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -26,8 +26,8 @@ typedef struct {
 #endif
 } mm_context_t;
 
-#define cntx2asid(cntx)		((cntx) & asid_mask)
-#define cntx2version(cntx)	((cntx) & ~asid_mask)
+#define cntx2asid(cntx)		((cntx) & SATP_ASID_MASK)
+#define cntx2version(cntx)	((cntx) & ~SATP_ASID_MASK)
 
 void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
 			       phys_addr_t sz, pgprot_t prot);
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 8e329721375b..72e559934952 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -15,8 +15,6 @@
 #define FLUSH_TLB_NO_ASID       ((unsigned long)-1)
 
 #ifdef CONFIG_MMU
-extern unsigned long asid_mask;
-
 static inline void local_flush_tlb_all(void)
 {
 	__asm__ __volatile__ ("sfence.vma" : : : "memory");
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index b562b3c44487..5315af06cd4d 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -22,7 +22,6 @@ DEFINE_STATIC_KEY_FALSE(use_asid_allocator);
 
 static unsigned long asid_bits;
 static unsigned long num_asids;
-unsigned long asid_mask;
 
 static atomic_long_t current_version;
 
@@ -128,7 +127,7 @@ static unsigned long __new_context(struct mm_struct *mm)
 		goto set_asid;
 
 	/* We're out of ASIDs, so increment current_version */
-	ver = atomic_long_add_return_relaxed(num_asids, &current_version);
+	ver = atomic_long_add_return_relaxed(BIT(SATP_ASID_BITS), &current_version);
 
 	/* Flush everything  */
 	__flush_context();
@@ -247,7 +246,6 @@ static int __init asids_init(void)
 	/* Pre-compute ASID details */
 	if (asid_bits) {
 		num_asids = 1 << asid_bits;
-		asid_mask = num_asids - 1;
 	}
 
 	/*
@@ -255,7 +253,7 @@ static int __init asids_init(void)
 	 * at-least twice more than CPUs
 	 */
 	if (num_asids > (2 * num_possible_cpus())) {
-		atomic_long_set(&current_version, num_asids);
+		atomic_long_set(&current_version, BIT(SATP_ASID_BITS));
 
 		context_asid_map = bitmap_zalloc(num_asids, GFP_KERNEL);
 		if (!context_asid_map)
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 11/13] riscv: mm: Make asid_bits a local variable
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

This variable is only used inside asids_init().

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/mm/context.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 5315af06cd4d..0bf6d0070a14 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -20,7 +20,6 @@
 
 DEFINE_STATIC_KEY_FALSE(use_asid_allocator);
 
-static unsigned long asid_bits;
 static unsigned long num_asids;
 
 static atomic_long_t current_version;
@@ -226,7 +225,7 @@ static inline void set_mm(struct mm_struct *prev,
 
 static int __init asids_init(void)
 {
-	unsigned long old;
+	unsigned long asid_bits, old;
 
 	/* Figure-out number of ASID bits in HW */
 	old = csr_read(CSR_SATP);
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 11/13] riscv: mm: Make asid_bits a local variable
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

This variable is only used inside asids_init().

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/mm/context.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 5315af06cd4d..0bf6d0070a14 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -20,7 +20,6 @@
 
 DEFINE_STATIC_KEY_FALSE(use_asid_allocator);
 
-static unsigned long asid_bits;
 static unsigned long num_asids;
 
 static atomic_long_t current_version;
@@ -226,7 +225,7 @@ static inline void set_mm(struct mm_struct *prev,
 
 static int __init asids_init(void)
 {
-	unsigned long old;
+	unsigned long asid_bits, old;
 
 	/* Figure-out number of ASID bits in HW */
 	old = csr_read(CSR_SATP);
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 12/13] riscv: mm: Preserve global TLB entries when switching contexts
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

If the CPU does not support multiple ASIDs, all MM contexts use ASID 0.
In this case, it is still beneficial to flush the TLB by ASID, as the
single-ASID variant of the sfence.vma instruction preserves TLB entries
for global (kernel) pages.

This optimization is recommended by the RISC-V privileged specification:

  If the implementation does not provide ASIDs, or software chooses
  to always use ASID 0, then after every satp write, software should
  execute SFENCE.VMA with rs1=x0. In the common case that no global
  translations have been modified, rs2 should be set to a register
  other than x0 but which contains the value zero, so that global
  translations are not flushed.

It is not possible to apply this optimization when using the ASID
allocator, because that code must flush the TLB for all ASIDs at once
when incrementing the version number.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/mm/context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 0bf6d0070a14..60cb0b82240e 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -200,7 +200,7 @@ static void set_mm_noasid(struct mm_struct *mm)
 {
 	/* Switch the page table and blindly nuke entire local TLB */
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
-	local_flush_tlb_all();
+	local_flush_tlb_all_asid(0);
 }
 
 static inline void set_mm(struct mm_struct *prev,
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 12/13] riscv: mm: Preserve global TLB entries when switching contexts
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

If the CPU does not support multiple ASIDs, all MM contexts use ASID 0.
In this case, it is still beneficial to flush the TLB by ASID, as the
single-ASID variant of the sfence.vma instruction preserves TLB entries
for global (kernel) pages.

This optimization is recommended by the RISC-V privileged specification:

  If the implementation does not provide ASIDs, or software chooses
  to always use ASID 0, then after every satp write, software should
  execute SFENCE.VMA with rs1=x0. In the common case that no global
  translations have been modified, rs2 should be set to a register
  other than x0 but which contains the value zero, so that global
  translations are not flushed.

It is not possible to apply this optimization when using the ASID
allocator, because that code must flush the TLB for all ASIDs at once
when incrementing the version number.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/riscv/mm/context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 0bf6d0070a14..60cb0b82240e 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -200,7 +200,7 @@ static void set_mm_noasid(struct mm_struct *mm)
 {
 	/* Switch the page table and blindly nuke entire local TLB */
 	csr_write(CSR_SATP, virt_to_pfn(mm->pgd) | satp_mode);
-	local_flush_tlb_all();
+	local_flush_tlb_all_asid(0);
 }
 
 static inline void set_mm(struct mm_struct *prev,
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 13/13] riscv: mm: Always use an ASID to flush mm contexts
  2024-03-27  4:49 ` Samuel Holland
@ 2024-03-27  4:49   ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Even if multiple ASIDs are not supported, using the single-ASID variant
of the sfence.vma instruction preserves TLB entries for global (kernel)
pages. So it is always more efficient to use the single-ASID code path.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v5)

Changes in v5:
 - Leave use_asid_allocator declared in asm/mmu_context.h

Changes in v4:
 - There is now only one copy of __flush_tlb_range()

Changes in v2:
 - Update both copies of __flush_tlb_range()

 arch/riscv/mm/tlbflush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 35266dd9a9a2..44e7ed4e194f 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -109,8 +109,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
 {
-	return static_branch_unlikely(&use_asid_allocator) ?
-			cntx2asid(atomic_long_read(&mm->context.id)) : FLUSH_TLB_NO_ASID;
+	return cntx2asid(atomic_long_read(&mm->context.id));
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v6 13/13] riscv: mm: Always use an ASID to flush mm contexts
@ 2024-03-27  4:49   ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27  4:49 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Samuel Holland

Even if multiple ASIDs are not supported, using the single-ASID variant
of the sfence.vma instruction preserves TLB entries for global (kernel)
pages. So it is always more efficient to use the single-ASID code path.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v5)

Changes in v5:
 - Leave use_asid_allocator declared in asm/mmu_context.h

Changes in v4:
 - There is now only one copy of __flush_tlb_range()

Changes in v2:
 - Update both copies of __flush_tlb_range()

 arch/riscv/mm/tlbflush.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 35266dd9a9a2..44e7ed4e194f 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -109,8 +109,7 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
 
 static inline unsigned long get_mm_asid(struct mm_struct *mm)
 {
-	return static_branch_unlikely(&use_asid_allocator) ?
-			cntx2asid(atomic_long_read(&mm->context.id)) : FLUSH_TLB_NO_ASID;
+	return cntx2asid(atomic_long_read(&mm->context.id));
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
  2024-03-27  4:49   ` Samuel Holland
@ 2024-03-27  6:16     ` yunhui cui
  -1 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:16 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>  - New patch for v4
>
>  arch/riscv/mm/cacheflush.c | 4 +++-
>  arch/riscv/mm/tlbflush.c   | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index d76fc73e594b..f5be1fec8191 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>  {
>         local_flush_icache_all();
>
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               return;
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_fence_i(NULL);
>         else
>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index da821315d43e..0901aa47b58f 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>
>  void flush_tlb_all(void)
>  {
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               local_flush_tlb_all();
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>         else
>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> --
> 2.43.1
>

From a perceptual point of view, the modification here is not
necessary. There is such logic in on_each_cpu(). Can you share your
test data?


Thanks,
Yunhui

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
@ 2024-03-27  6:16     ` yunhui cui
  0 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:16 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>  - New patch for v4
>
>  arch/riscv/mm/cacheflush.c | 4 +++-
>  arch/riscv/mm/tlbflush.c   | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index d76fc73e594b..f5be1fec8191 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>  {
>         local_flush_icache_all();
>
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               return;
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_fence_i(NULL);
>         else
>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index da821315d43e..0901aa47b58f 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>
>  void flush_tlb_all(void)
>  {
> -       if (riscv_use_sbi_for_rfence())
> +       if (num_online_cpus() < 2)
> +               local_flush_tlb_all();
> +       else if (riscv_use_sbi_for_rfence())
>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>         else
>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> --
> 2.43.1
>

From a perceptual point of view, the modification here is not
necessary. There is such logic in on_each_cpu(). Can you share your
test data?


Thanks,
Yunhui

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 06/13] riscv: mm: Combine the SMP and UP TLB flush code
  2024-03-27  4:49   ` Samuel Holland
@ 2024-03-27  6:23     ` yunhui cui
  -1 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:23 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:51 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> In SMP configurations, all TLB flushing narrower than flush_tlb_all()
> goes through __flush_tlb_range(). Do the same in UP configurations.
>
> This allows UP configurations to take advantage of recent improvements
> to the code in tlbflush.c, such as support for huge pages and flushing
> multiple-page ranges.
>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>  - Merge the two copies of __flush_tlb_range() and rely on the compiler
>    to optimize out the broadcast path (both clang and gcc do this)
>  - Merge the two copies of flush_tlb_all() and rely on constant folding
>
> Changes in v2:
>  - Move the SMP/UP merge earlier in the series to avoid build issues
>  - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
>  - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)
>
>  arch/riscv/Kconfig                |  2 +-
>  arch/riscv/include/asm/tlbflush.h | 31 +++----------------------------
>  arch/riscv/mm/Makefile            |  5 +----
>  3 files changed, 5 insertions(+), 33 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index be09c8836d56..442532819a44 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -60,7 +60,7 @@ config RISCV
>         select ARCH_USE_MEMTEST
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USES_CFI_TRAPS if CFI_CLANG
> -       select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP && MMU
> +       select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if MMU
>         select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>         select ARCH_WANT_FRAME_POINTERS
>         select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 4112cc8d1d69..4f86424b1ba5 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -27,12 +27,7 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  {
>         ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
>  }
> -#else /* CONFIG_MMU */
> -#define local_flush_tlb_all()                  do { } while (0)
> -#define local_flush_tlb_page(addr)             do { } while (0)
> -#endif /* CONFIG_MMU */
>
> -#if defined(CONFIG_SMP) && defined(CONFIG_MMU)
>  void flush_tlb_all(void);
>  void flush_tlb_mm(struct mm_struct *mm);
>  void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> @@ -54,28 +49,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>                                unsigned long uaddr);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> -
> -#else /* CONFIG_SMP && CONFIG_MMU */
> -
> -#define flush_tlb_all() local_flush_tlb_all()
> -#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
> -
> -static inline void flush_tlb_range(struct vm_area_struct *vma,
> -               unsigned long start, unsigned long end)
> -{
> -       local_flush_tlb_all();
> -}
> -
> -/* Flush a range of kernel pages */
> -static inline void flush_tlb_kernel_range(unsigned long start,
> -       unsigned long end)
> -{
> -       local_flush_tlb_all();
> -}
> -
> -#define flush_tlb_mm(mm) flush_tlb_all()
> -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> -#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
> -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> +#else /* CONFIG_MMU */
> +#define local_flush_tlb_all()                  do { } while (0)
> +#endif /* CONFIG_MMU */
>
>  #endif /* _ASM_RISCV_TLBFLUSH_H */
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index 2c869f8026a8..cbe4d775ef56 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -13,14 +13,11 @@ endif
>  KCOV_INSTRUMENT_init.o := n
>
>  obj-y += init.o
> -obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o
> +obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o tlbflush.o
>  obj-y += cacheflush.o
>  obj-y += context.o
>  obj-y += pmem.o
>
> -ifeq ($(CONFIG_MMU),y)
> -obj-$(CONFIG_SMP) += tlbflush.o
> -endif
>  obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
>  obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
>  obj-$(CONFIG_KASAN)   += kasan_init.o
> --
> 2.43.1
>

This is cleaner than my previous modification to support UP __flush_tlb_range.
So:
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>

Thanks,
Yunhui

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 06/13] riscv: mm: Combine the SMP and UP TLB flush code
@ 2024-03-27  6:23     ` yunhui cui
  0 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:23 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:51 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> In SMP configurations, all TLB flushing narrower than flush_tlb_all()
> goes through __flush_tlb_range(). Do the same in UP configurations.
>
> This allows UP configurations to take advantage of recent improvements
> to the code in tlbflush.c, such as support for huge pages and flushing
> multiple-page ranges.
>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>  - Merge the two copies of __flush_tlb_range() and rely on the compiler
>    to optimize out the broadcast path (both clang and gcc do this)
>  - Merge the two copies of flush_tlb_all() and rely on constant folding
>
> Changes in v2:
>  - Move the SMP/UP merge earlier in the series to avoid build issues
>  - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
>  - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)
>
>  arch/riscv/Kconfig                |  2 +-
>  arch/riscv/include/asm/tlbflush.h | 31 +++----------------------------
>  arch/riscv/mm/Makefile            |  5 +----
>  3 files changed, 5 insertions(+), 33 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index be09c8836d56..442532819a44 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -60,7 +60,7 @@ config RISCV
>         select ARCH_USE_MEMTEST
>         select ARCH_USE_QUEUED_RWLOCKS
>         select ARCH_USES_CFI_TRAPS if CFI_CLANG
> -       select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP && MMU
> +       select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if MMU
>         select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>         select ARCH_WANT_FRAME_POINTERS
>         select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 4112cc8d1d69..4f86424b1ba5 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -27,12 +27,7 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  {
>         ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
>  }
> -#else /* CONFIG_MMU */
> -#define local_flush_tlb_all()                  do { } while (0)
> -#define local_flush_tlb_page(addr)             do { } while (0)
> -#endif /* CONFIG_MMU */
>
> -#if defined(CONFIG_SMP) && defined(CONFIG_MMU)
>  void flush_tlb_all(void);
>  void flush_tlb_mm(struct mm_struct *mm);
>  void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> @@ -54,28 +49,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>                                unsigned long uaddr);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> -
> -#else /* CONFIG_SMP && CONFIG_MMU */
> -
> -#define flush_tlb_all() local_flush_tlb_all()
> -#define flush_tlb_page(vma, addr) local_flush_tlb_page(addr)
> -
> -static inline void flush_tlb_range(struct vm_area_struct *vma,
> -               unsigned long start, unsigned long end)
> -{
> -       local_flush_tlb_all();
> -}
> -
> -/* Flush a range of kernel pages */
> -static inline void flush_tlb_kernel_range(unsigned long start,
> -       unsigned long end)
> -{
> -       local_flush_tlb_all();
> -}
> -
> -#define flush_tlb_mm(mm) flush_tlb_all()
> -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
> -#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
> -#endif /* !CONFIG_SMP || !CONFIG_MMU */
> +#else /* CONFIG_MMU */
> +#define local_flush_tlb_all()                  do { } while (0)
> +#endif /* CONFIG_MMU */
>
>  #endif /* _ASM_RISCV_TLBFLUSH_H */
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index 2c869f8026a8..cbe4d775ef56 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -13,14 +13,11 @@ endif
>  KCOV_INSTRUMENT_init.o := n
>
>  obj-y += init.o
> -obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o
> +obj-$(CONFIG_MMU) += extable.o fault.o pageattr.o pgtable.o tlbflush.o
>  obj-y += cacheflush.o
>  obj-y += context.o
>  obj-y += pmem.o
>
> -ifeq ($(CONFIG_MMU),y)
> -obj-$(CONFIG_SMP) += tlbflush.o
> -endif
>  obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
>  obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
>  obj-$(CONFIG_KASAN)   += kasan_init.o
> --
> 2.43.1
>

This is cleaner than my previous modification to support UP __flush_tlb_range.
So:
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>

Thanks,
Yunhui

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  2024-03-27  4:49   ` Samuel Holland
@ 2024-03-27  6:27     ` yunhui cui
  -1 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:27 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:51 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Implementations affected by SiFive errata CIP-1200 have a bug which
> forces the kernel to always use the global variant of the sfence.vma
> instruction. When affected by this errata, do not attempt to flush a
> range of addresses; each iteration of the loop would actually flush the
> whole TLB instead. Instead, minimize the overall number of sfence.vma
> instructions.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>  - Clarify the commit message for patch 8 based on ML discussion
>
> Changes in v4:
>  - Only set tlb_flush_all_threshold when CONFIG_MMU=y.
>
> Changes in v3:
>  - New patch for v3
>
>  arch/riscv/errata/sifive/errata.c | 5 +++++
>  arch/riscv/include/asm/tlbflush.h | 2 ++
>  arch/riscv/mm/tlbflush.c          | 2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
> index 3d9a32d791f7..716cfedad3a2 100644
> --- a/arch/riscv/errata/sifive/errata.c
> +++ b/arch/riscv/errata/sifive/errata.c
> @@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
>                 return false;
>         if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
>                 return false;
> +
> +#ifdef CONFIG_MMU
> +       tlb_flush_all_threshold = 0;
> +#endif
> +
>         return true;
>  }
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 463b615d7728..8e329721375b 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>                                unsigned long uaddr);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> +
> +extern unsigned long tlb_flush_all_threshold;
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()                  do { } while (0)
>  #endif /* CONFIG_MMU */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index ad7bdcfcc219..18af7b5053af 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -11,7 +11,7 @@
>   * Flush entire TLB if number of entries to be flushed is greater
>   * than the threshold below.
>   */
> -static unsigned long tlb_flush_all_threshold __read_mostly = 64;
> +unsigned long tlb_flush_all_threshold __read_mostly = 64;
>
>  static void local_flush_tlb_range_threshold_asid(unsigned long start,
>                                                  unsigned long size,
> --
> 2.43.1
>

Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>

Thanks,
Yunhui

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
@ 2024-03-27  6:27     ` yunhui cui
  0 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-27  6:27 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Wed, Mar 27, 2024 at 12:51 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Implementations affected by SiFive errata CIP-1200 have a bug which
> forces the kernel to always use the global variant of the sfence.vma
> instruction. When affected by this errata, do not attempt to flush a
> range of addresses; each iteration of the loop would actually flush the
> whole TLB instead. Instead, minimize the overall number of sfence.vma
> instructions.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>  - Clarify the commit message for patch 8 based on ML discussion
>
> Changes in v4:
>  - Only set tlb_flush_all_threshold when CONFIG_MMU=y.
>
> Changes in v3:
>  - New patch for v3
>
>  arch/riscv/errata/sifive/errata.c | 5 +++++
>  arch/riscv/include/asm/tlbflush.h | 2 ++
>  arch/riscv/mm/tlbflush.c          | 2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
> index 3d9a32d791f7..716cfedad3a2 100644
> --- a/arch/riscv/errata/sifive/errata.c
> +++ b/arch/riscv/errata/sifive/errata.c
> @@ -42,6 +42,11 @@ static bool errata_cip_1200_check_func(unsigned long  arch_id, unsigned long imp
>                 return false;
>         if ((impid & 0xffffff) > 0x200630 || impid == 0x1200626)
>                 return false;
> +
> +#ifdef CONFIG_MMU
> +       tlb_flush_all_threshold = 0;
> +#endif
> +
>         return true;
>  }
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 463b615d7728..8e329721375b 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -66,6 +66,8 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
>                                unsigned long uaddr);
>  void arch_flush_tlb_batched_pending(struct mm_struct *mm);
>  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> +
> +extern unsigned long tlb_flush_all_threshold;
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()                  do { } while (0)
>  #endif /* CONFIG_MMU */
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index ad7bdcfcc219..18af7b5053af 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -11,7 +11,7 @@
>   * Flush entire TLB if number of entries to be flushed is greater
>   * than the threshold below.
>   */
> -static unsigned long tlb_flush_all_threshold __read_mostly = 64;
> +unsigned long tlb_flush_all_threshold __read_mostly = 64;
>
>  static void local_flush_tlb_range_threshold_asid(unsigned long start,
>                                                  unsigned long size,
> --
> 2.43.1
>

Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>

Thanks,
Yunhui

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
  2024-03-27  6:16     ` yunhui cui
@ 2024-03-27 20:14       ` Samuel Holland
  -1 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27 20:14 UTC (permalink / raw)
  To: yunhui cui
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Yunhui,

On 2024-03-27 1:16 AM, yunhui cui wrote:
> On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
>>
>> If no other CPU is online, a local cache or TLB flush is sufficient.
>> These checks can be constant-folded when SMP is disabled.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>> (no changes since v4)
>>
>> Changes in v4:
>>  - New patch for v4
>>
>>  arch/riscv/mm/cacheflush.c | 4 +++-
>>  arch/riscv/mm/tlbflush.c   | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index d76fc73e594b..f5be1fec8191 100644
>> --- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>>  {
>>         local_flush_icache_all();
>>
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               return;
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_fence_i(NULL);
>>         else
>>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
>> index da821315d43e..0901aa47b58f 100644
>> --- a/arch/riscv/mm/tlbflush.c
>> +++ b/arch/riscv/mm/tlbflush.c
>> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>>
>>  void flush_tlb_all(void)
>>  {
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               local_flush_tlb_all();
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>>         else
>>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>> --
>> 2.43.1
>>
> 
> From a perceptual point of view, the modification here is not
> necessary. There is such logic in on_each_cpu(). Can you share your
> test data?

The logic in on_each_cpu() doesn't apply when riscv_use_sbi_for_rfence() is
true, so we would make unnecessary SBI calls, and cannot be oppimized out when
CONFIG_SMP=n. The cover letter includes benchmarks for a representative
single-core system (D1). There was no measurable performance impact from this
portion of the series on multi-core systems. If there are specific benchmarks
you think I should run, please let me know.

Regards,
Samuel


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
@ 2024-03-27 20:14       ` Samuel Holland
  0 siblings, 0 replies; 48+ messages in thread
From: Samuel Holland @ 2024-03-27 20:14 UTC (permalink / raw)
  To: yunhui cui
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Yunhui,

On 2024-03-27 1:16 AM, yunhui cui wrote:
> On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
>>
>> If no other CPU is online, a local cache or TLB flush is sufficient.
>> These checks can be constant-folded when SMP is disabled.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>> (no changes since v4)
>>
>> Changes in v4:
>>  - New patch for v4
>>
>>  arch/riscv/mm/cacheflush.c | 4 +++-
>>  arch/riscv/mm/tlbflush.c   | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index d76fc73e594b..f5be1fec8191 100644
>> --- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>>  {
>>         local_flush_icache_all();
>>
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               return;
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_fence_i(NULL);
>>         else
>>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
>> index da821315d43e..0901aa47b58f 100644
>> --- a/arch/riscv/mm/tlbflush.c
>> +++ b/arch/riscv/mm/tlbflush.c
>> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>>
>>  void flush_tlb_all(void)
>>  {
>> -       if (riscv_use_sbi_for_rfence())
>> +       if (num_online_cpus() < 2)
>> +               local_flush_tlb_all();
>> +       else if (riscv_use_sbi_for_rfence())
>>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>>         else
>>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>> --
>> 2.43.1
>>
> 
> From a perceptual point of view, the modification here is not
> necessary. There is such logic in on_each_cpu(). Can you share your
> test data?

The logic in on_each_cpu() doesn't apply when riscv_use_sbi_for_rfence() is
true, so we would make unnecessary SBI calls, and cannot be oppimized out when
CONFIG_SMP=n. The cover letter includes benchmarks for a representative
single-core system (D1). There was no measurable performance impact from this
portion of the series on multi-core systems. If there are specific benchmarks
you think I should run, please let me know.

Regards,
Samuel


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
  2024-03-27 20:14       ` Samuel Holland
@ 2024-03-28  2:21         ` yunhui cui
  -1 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-28  2:21 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Thu, Mar 28, 2024 at 4:14 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Yunhui,
>
> On 2024-03-27 1:16 AM, yunhui cui wrote:
> > On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
> > <samuel.holland@sifive.com> wrote:
> >>
> >> If no other CPU is online, a local cache or TLB flush is sufficient.
> >> These checks can be constant-folded when SMP is disabled.
> >>
> >> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >> ---
> >>
> >> (no changes since v4)
> >>
> >> Changes in v4:
> >>  - New patch for v4
> >>
> >>  arch/riscv/mm/cacheflush.c | 4 +++-
> >>  arch/riscv/mm/tlbflush.c   | 4 +++-
> >>  2 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> >> index d76fc73e594b..f5be1fec8191 100644
> >> --- a/arch/riscv/mm/cacheflush.c
> >> +++ b/arch/riscv/mm/cacheflush.c
> >> @@ -21,7 +21,9 @@ void flush_icache_all(void)
> >>  {
> >>         local_flush_icache_all();
> >>
> >> -       if (riscv_use_sbi_for_rfence())
> >> +       if (num_online_cpus() < 2)
> >> +               return;
> >> +       else if (riscv_use_sbi_for_rfence())
> >>                 sbi_remote_fence_i(NULL);
> >>         else
> >>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
> >> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> >> index da821315d43e..0901aa47b58f 100644
> >> --- a/arch/riscv/mm/tlbflush.c
> >> +++ b/arch/riscv/mm/tlbflush.c
> >> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
> >>
> >>  void flush_tlb_all(void)
> >>  {
> >> -       if (riscv_use_sbi_for_rfence())
> >> +       if (num_online_cpus() < 2)
> >> +               local_flush_tlb_all();
> >> +       else if (riscv_use_sbi_for_rfence())
> >>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
> >>         else
> >>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> >> --
> >> 2.43.1
> >>
> >
> > From a perceptual point of view, the modification here is not
> > necessary. There is such logic in on_each_cpu(). Can you share your
> > test data?
>
> The logic in on_each_cpu() doesn't apply when riscv_use_sbi_for_rfence() is
> true, so we would make unnecessary SBI calls, and cannot be oppimized out when
> CONFIG_SMP=n.

Is it possible to do this:
"sbi_remote_sfence_vma_asid(cpu_online_mask,...); " instead of adding:
"if (num_online_cpus() < 2)" ?

Thanks,
Yunhui

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [External] [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
@ 2024-03-28  2:21         ` yunhui cui
  0 siblings, 0 replies; 48+ messages in thread
From: yunhui cui @ 2024-03-28  2:21 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, linux-kernel, linux-mm,
	Alexandre Ghiti, Jisheng Zhang

Hi Samuel,

On Thu, Mar 28, 2024 at 4:14 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Yunhui,
>
> On 2024-03-27 1:16 AM, yunhui cui wrote:
> > On Wed, Mar 27, 2024 at 12:50 PM Samuel Holland
> > <samuel.holland@sifive.com> wrote:
> >>
> >> If no other CPU is online, a local cache or TLB flush is sufficient.
> >> These checks can be constant-folded when SMP is disabled.
> >>
> >> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >> ---
> >>
> >> (no changes since v4)
> >>
> >> Changes in v4:
> >>  - New patch for v4
> >>
> >>  arch/riscv/mm/cacheflush.c | 4 +++-
> >>  arch/riscv/mm/tlbflush.c   | 4 +++-
> >>  2 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> >> index d76fc73e594b..f5be1fec8191 100644
> >> --- a/arch/riscv/mm/cacheflush.c
> >> +++ b/arch/riscv/mm/cacheflush.c
> >> @@ -21,7 +21,9 @@ void flush_icache_all(void)
> >>  {
> >>         local_flush_icache_all();
> >>
> >> -       if (riscv_use_sbi_for_rfence())
> >> +       if (num_online_cpus() < 2)
> >> +               return;
> >> +       else if (riscv_use_sbi_for_rfence())
> >>                 sbi_remote_fence_i(NULL);
> >>         else
> >>                 on_each_cpu(ipi_remote_fence_i, NULL, 1);
> >> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> >> index da821315d43e..0901aa47b58f 100644
> >> --- a/arch/riscv/mm/tlbflush.c
> >> +++ b/arch/riscv/mm/tlbflush.c
> >> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
> >>
> >>  void flush_tlb_all(void)
> >>  {
> >> -       if (riscv_use_sbi_for_rfence())
> >> +       if (num_online_cpus() < 2)
> >> +               local_flush_tlb_all();
> >> +       else if (riscv_use_sbi_for_rfence())
> >>                 sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
> >>         else
> >>                 on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> >> --
> >> 2.43.1
> >>
> >
> > From a perceptual point of view, the modification here is not
> > necessary. There is such logic in on_each_cpu(). Can you share your
> > test data?
>
> The logic in on_each_cpu() doesn't apply when riscv_use_sbi_for_rfence() is
> true, so we would make unnecessary SBI calls, and cannot be oppimized out when
> CONFIG_SMP=n.

Is it possible to do this:
"sbi_remote_sfence_vma_asid(cpu_online_mask,...); " instead of adding:
"if (num_online_cpus() < 2)" ?

Thanks,
Yunhui

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 02/13] riscv: Factor out page table TLB synchronization
  2024-03-27  4:49   ` Samuel Holland
@ 2024-04-04  7:48     ` Alexandre Ghiti
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  7:48 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui


On 27/03/2024 05:49, Samuel Holland wrote:
> The logic is the same for all page table levels. See commit 69be3fb111e7
> ("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU").
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>   - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build
>
> Changes in v5:
>   - New patch for v5
>
>   arch/riscv/include/asm/pgalloc.h | 31 +++++++++++++------------------
>   1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index deaf971253a2..b34587da8882 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -15,6 +15,14 @@
>   #define __HAVE_ARCH_PUD_FREE
>   #include <asm-generic/pgalloc.h>
>   
> +static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> +	if (riscv_use_ipi_for_rfence())
> +		tlb_remove_page_ptdesc(tlb, pt);
> +	else
> +		tlb_remove_ptdesc(tlb, pt);
> +}
> +
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
>   	pmd_t *pmd, pte_t *pte)
>   {
> @@ -102,10 +110,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>   		struct ptdesc *ptdesc = virt_to_ptdesc(pud);
>   
>   		pagetable_pud_dtor(ptdesc);
> -		if (riscv_use_ipi_for_rfence())
> -			tlb_remove_page_ptdesc(tlb, ptdesc);
> -		else
> -			tlb_remove_ptdesc(tlb, ptdesc);
> +		riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   	}
>   }
>   
> @@ -139,12 +144,8 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>   static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
>   				  unsigned long addr)
>   {
> -	if (pgtable_l5_enabled) {
> -		if (riscv_use_ipi_for_rfence())
> -			tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> -		else
> -			tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
> -	}
> +	if (pgtable_l5_enabled)
> +		riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
>   }
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
> @@ -176,10 +177,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
>   	struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   
>   	pagetable_pmd_dtor(ptdesc);
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, ptdesc);
> -	else
> -		tlb_remove_ptdesc(tlb, ptdesc);
> +	riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   }
>   
>   #endif /* __PAGETABLE_PMD_FOLDED */
> @@ -190,10 +188,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
>   	struct ptdesc *ptdesc = page_ptdesc(pte);
>   
>   	pagetable_pte_dtor(ptdesc);
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, ptdesc);
> -	else
> -		tlb_remove_ptdesc(tlb, ptdesc);
> +	riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   }
>   #endif /* CONFIG_MMU */
>   


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 02/13] riscv: Factor out page table TLB synchronization
@ 2024-04-04  7:48     ` Alexandre Ghiti
  0 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  7:48 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui


On 27/03/2024 05:49, Samuel Holland wrote:
> The logic is the same for all page table levels. See commit 69be3fb111e7
> ("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU").
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>   - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build
>
> Changes in v5:
>   - New patch for v5
>
>   arch/riscv/include/asm/pgalloc.h | 31 +++++++++++++------------------
>   1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index deaf971253a2..b34587da8882 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -15,6 +15,14 @@
>   #define __HAVE_ARCH_PUD_FREE
>   #include <asm-generic/pgalloc.h>
>   
> +static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> +	if (riscv_use_ipi_for_rfence())
> +		tlb_remove_page_ptdesc(tlb, pt);
> +	else
> +		tlb_remove_ptdesc(tlb, pt);
> +}
> +
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
>   	pmd_t *pmd, pte_t *pte)
>   {
> @@ -102,10 +110,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>   		struct ptdesc *ptdesc = virt_to_ptdesc(pud);
>   
>   		pagetable_pud_dtor(ptdesc);
> -		if (riscv_use_ipi_for_rfence())
> -			tlb_remove_page_ptdesc(tlb, ptdesc);
> -		else
> -			tlb_remove_ptdesc(tlb, ptdesc);
> +		riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   	}
>   }
>   
> @@ -139,12 +144,8 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>   static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
>   				  unsigned long addr)
>   {
> -	if (pgtable_l5_enabled) {
> -		if (riscv_use_ipi_for_rfence())
> -			tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> -		else
> -			tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
> -	}
> +	if (pgtable_l5_enabled)
> +		riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
>   }
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
> @@ -176,10 +177,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
>   	struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   
>   	pagetable_pmd_dtor(ptdesc);
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, ptdesc);
> -	else
> -		tlb_remove_ptdesc(tlb, ptdesc);
> +	riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   }
>   
>   #endif /* __PAGETABLE_PMD_FOLDED */
> @@ -190,10 +188,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
>   	struct ptdesc *ptdesc = page_ptdesc(pte);
>   
>   	pagetable_pte_dtor(ptdesc);
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, ptdesc);
> -	else
> -		tlb_remove_ptdesc(tlb, ptdesc);
> +	riscv_tlb_remove_ptdesc(tlb, ptdesc);
>   }
>   #endif /* CONFIG_MMU */
>   


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 03/13] riscv: Use IPIs for remote cache/TLB flushes by default
  2024-03-27  4:49   ` Samuel Holland
@ 2024-04-04  7:56     ` Alexandre Ghiti
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  7:56 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Anup Patel


On 27/03/2024 05:49, Samuel Holland wrote:
> An IPI backend is always required in an SMP configuration, but an SBI
> implementation is not. For example, SBI will be unavailable when the
> kernel runs in M mode. For this reason, consider IPI delivery of cache
> and TLB flushes to be the base case, and any other implementation (such
> as the SBI remote fence extension) to be an optimization.
>
> Generally, if IPIs can be delivered without firmware assistance, they
> are assumed to be faster than SBI calls due to the SBI context switch
> overhead. However, when SBI is used as the IPI backend, then the context
> switch cost must be paid anyway, and performing the cache/TLB flush
> directly in the SBI implementation is more efficient than injecting an
> interrupt to S-mode. This is the only existing scenario where
> riscv_ipi_set_virq_range() is called with use_for_rfence set to false.
>
> sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only
> calls riscv_ipi_set_virq_range() when no other IPI device is available.
> This allows moving the static key and dropping the use_for_rfence
> parameter. This decouples the static key from the irqchip driver probe
> order.
>
> Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is
> enabled. Optherwise, IPIs must be used. Add a fallback definition of
> riscv_use_sbi_for_rfence() which handles this case and removes the need
> to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c.
>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>   - Clarify the commit message for patch 3 based on ML discussion
>
> Changes in v5:
>   - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/include/asm/pgalloc.h  |  7 ++++---
>   arch/riscv/include/asm/sbi.h      |  4 ++++
>   arch/riscv/include/asm/smp.h      | 15 ++-------------
>   arch/riscv/kernel/sbi-ipi.c       | 11 ++++++++++-
>   arch/riscv/kernel/smp.c           | 11 +----------
>   arch/riscv/mm/cacheflush.c        |  5 ++---
>   arch/riscv/mm/tlbflush.c          | 31 ++++++++++++++-----------------
>   drivers/clocksource/timer-clint.c |  2 +-
>   8 files changed, 38 insertions(+), 48 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index b34587da8882..f52264304f77 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -8,6 +8,7 @@
>   #define _ASM_RISCV_PGALLOC_H
>   
>   #include <linux/mm.h>
> +#include <asm/sbi.h>
>   #include <asm/tlb.h>
>   
>   #ifdef CONFIG_MMU
> @@ -17,10 +18,10 @@
>   
>   static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
>   {
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, pt);
> -	else
> +	if (riscv_use_sbi_for_rfence())
>   		tlb_remove_ptdesc(tlb, pt);
> +	else
> +		tlb_remove_page_ptdesc(tlb, pt);
>   }
>   
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 6e68f8dff76b..ea84392ca9d7 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -375,8 +375,12 @@ unsigned long riscv_cached_marchid(unsigned int cpu_id);
>   unsigned long riscv_cached_mimpid(unsigned int cpu_id);
>   
>   #if IS_ENABLED(CONFIG_SMP) && IS_ENABLED(CONFIG_RISCV_SBI)
> +DECLARE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
> +#define riscv_use_sbi_for_rfence() \
> +	static_branch_unlikely(&riscv_sbi_for_rfence)
>   void sbi_ipi_init(void);
>   #else
> +static inline bool riscv_use_sbi_for_rfence(void) { return false; }
>   static inline void sbi_ipi_init(void) { }
>   #endif
>   
> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
> index 0d555847cde6..7ac80e9f2288 100644
> --- a/arch/riscv/include/asm/smp.h
> +++ b/arch/riscv/include/asm/smp.h
> @@ -49,12 +49,7 @@ void riscv_ipi_disable(void);
>   bool riscv_ipi_have_virq_range(void);
>   
>   /* Set the IPI interrupt numbers for arch (called by irqchip drivers) */
> -void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence);
> -
> -/* Check if we can use IPIs for remote FENCEs */
> -DECLARE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
> -#define riscv_use_ipi_for_rfence() \
> -	static_branch_unlikely(&riscv_ipi_for_rfence)
> +void riscv_ipi_set_virq_range(int virq, int nr);
>   
>   /* Check other CPUs stop or not */
>   bool smp_crash_stop_failed(void);
> @@ -104,16 +99,10 @@ static inline bool riscv_ipi_have_virq_range(void)
>   	return false;
>   }
>   
> -static inline void riscv_ipi_set_virq_range(int virq, int nr,
> -					    bool use_for_rfence)
> +static inline void riscv_ipi_set_virq_range(int virq, int nr)
>   {
>   }
>   
> -static inline bool riscv_use_ipi_for_rfence(void)
> -{
> -	return false;
> -}
> -
>   #endif /* CONFIG_SMP */
>   
>   #if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
> diff --git a/arch/riscv/kernel/sbi-ipi.c b/arch/riscv/kernel/sbi-ipi.c
> index a4559695ce62..1026e22955cc 100644
> --- a/arch/riscv/kernel/sbi-ipi.c
> +++ b/arch/riscv/kernel/sbi-ipi.c
> @@ -13,6 +13,9 @@
>   #include <linux/irqdomain.h>
>   #include <asm/sbi.h>
>   
> +DEFINE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
> +EXPORT_SYMBOL_GPL(riscv_sbi_for_rfence);
> +
>   static int sbi_ipi_virq;
>   
>   static void sbi_ipi_handle(struct irq_desc *desc)
> @@ -72,6 +75,12 @@ void __init sbi_ipi_init(void)
>   			  "irqchip/sbi-ipi:starting",
>   			  sbi_ipi_starting_cpu, NULL);
>   
> -	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, false);
> +	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
>   	pr_info("providing IPIs using SBI IPI extension\n");
> +
> +	/*
> +	 * Use the SBI remote fence extension to avoid
> +	 * the extra context switch needed to handle IPIs.
> +	 */
> +	static_branch_enable(&riscv_sbi_for_rfence);
>   }
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index 45dd4035416e..8e6eb64459af 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -171,10 +171,7 @@ bool riscv_ipi_have_virq_range(void)
>   	return (ipi_virq_base) ? true : false;
>   }
>   
> -DEFINE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
> -EXPORT_SYMBOL_GPL(riscv_ipi_for_rfence);
> -
> -void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
> +void riscv_ipi_set_virq_range(int virq, int nr)
>   {
>   	int i, err;
>   
> @@ -197,12 +194,6 @@ void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
>   
>   	/* Enabled IPIs for boot CPU immediately */
>   	riscv_ipi_enable();
> -
> -	/* Update RFENCE static key */
> -	if (use_for_rfence)
> -		static_branch_enable(&riscv_ipi_for_rfence);
> -	else
> -		static_branch_disable(&riscv_ipi_for_rfence);
>   }
>   
>   static const char * const ipi_names[] = {
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index bc61ee5975e4..d76fc73e594b 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,7 @@ void flush_icache_all(void)
>   {
>   	local_flush_icache_all();
>   
> -	if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> +	if (riscv_use_sbi_for_rfence())
>   		sbi_remote_fence_i(NULL);
>   	else
>   		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> @@ -69,8 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
>   		 * with flush_icache_deferred().
>   		 */
>   		smp_mb();
> -	} else if (IS_ENABLED(CONFIG_RISCV_SBI) &&
> -		   !riscv_use_ipi_for_rfence()) {
> +	} else if (riscv_use_sbi_for_rfence()) {
>   		sbi_remote_fence_i(&others);
>   	} else {
>   		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 893566e004b7..0435605b07d0 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,10 +79,10 @@ static void __ipi_flush_tlb_all(void *info)
>   
>   void flush_tlb_all(void)
>   {
> -	if (riscv_use_ipi_for_rfence())
> -		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> -	else
> +	if (riscv_use_sbi_for_rfence())
>   		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
> +	else
> +		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>   }
>   
>   struct flush_tlb_range_data {
> @@ -103,7 +103,6 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
>   			      unsigned long start, unsigned long size,
>   			      unsigned long stride)
>   {
> -	struct flush_tlb_range_data ftd;
>   	bool broadcast;
>   
>   	if (cpumask_empty(cmask))
> @@ -119,20 +118,18 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
>   		broadcast = true;
>   	}
>   
> -	if (broadcast) {
> -		if (riscv_use_ipi_for_rfence()) {
> -			ftd.asid = asid;
> -			ftd.start = start;
> -			ftd.size = size;
> -			ftd.stride = stride;
> -			on_each_cpu_mask(cmask,
> -					 __ipi_flush_tlb_range_asid,
> -					 &ftd, 1);
> -		} else
> -			sbi_remote_sfence_vma_asid(cmask,
> -						   start, size, asid);
> -	} else {
> +	if (!broadcast) {
>   		local_flush_tlb_range_asid(start, size, stride, asid);
> +	} else if (riscv_use_sbi_for_rfence()) {
> +		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
> +	} else {
> +		struct flush_tlb_range_data ftd;
> +
> +		ftd.asid = asid;
> +		ftd.start = start;
> +		ftd.size = size;
> +		ftd.stride = stride;
> +		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
>   	}
>   
>   	if (cmask != cpu_online_mask)
> diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
> index 09fd292eb83d..0bdd9d7ec545 100644
> --- a/drivers/clocksource/timer-clint.c
> +++ b/drivers/clocksource/timer-clint.c
> @@ -251,7 +251,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
>   	}
>   
>   	irq_set_chained_handler(clint_ipi_irq, clint_ipi_interrupt);
> -	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE, true);
> +	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE);
>   	clint_clear_ipi();
>   #endif
>   


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 03/13] riscv: Use IPIs for remote cache/TLB flushes by default
@ 2024-04-04  7:56     ` Alexandre Ghiti
  0 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  7:56 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang,
	Yunhui Cui, Anup Patel


On 27/03/2024 05:49, Samuel Holland wrote:
> An IPI backend is always required in an SMP configuration, but an SBI
> implementation is not. For example, SBI will be unavailable when the
> kernel runs in M mode. For this reason, consider IPI delivery of cache
> and TLB flushes to be the base case, and any other implementation (such
> as the SBI remote fence extension) to be an optimization.
>
> Generally, if IPIs can be delivered without firmware assistance, they
> are assumed to be faster than SBI calls due to the SBI context switch
> overhead. However, when SBI is used as the IPI backend, then the context
> switch cost must be paid anyway, and performing the cache/TLB flush
> directly in the SBI implementation is more efficient than injecting an
> interrupt to S-mode. This is the only existing scenario where
> riscv_ipi_set_virq_range() is called with use_for_rfence set to false.
>
> sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only
> calls riscv_ipi_set_virq_range() when no other IPI device is available.
> This allows moving the static key and dropping the use_for_rfence
> parameter. This decouples the static key from the irqchip driver probe
> order.
>
> Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is
> enabled. Optherwise, IPIs must be used. Add a fallback definition of
> riscv_use_sbi_for_rfence() which handles this case and removes the need
> to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c.
>
> Reviewed-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v6:
>   - Clarify the commit message for patch 3 based on ML discussion
>
> Changes in v5:
>   - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/include/asm/pgalloc.h  |  7 ++++---
>   arch/riscv/include/asm/sbi.h      |  4 ++++
>   arch/riscv/include/asm/smp.h      | 15 ++-------------
>   arch/riscv/kernel/sbi-ipi.c       | 11 ++++++++++-
>   arch/riscv/kernel/smp.c           | 11 +----------
>   arch/riscv/mm/cacheflush.c        |  5 ++---
>   arch/riscv/mm/tlbflush.c          | 31 ++++++++++++++-----------------
>   drivers/clocksource/timer-clint.c |  2 +-
>   8 files changed, 38 insertions(+), 48 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index b34587da8882..f52264304f77 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -8,6 +8,7 @@
>   #define _ASM_RISCV_PGALLOC_H
>   
>   #include <linux/mm.h>
> +#include <asm/sbi.h>
>   #include <asm/tlb.h>
>   
>   #ifdef CONFIG_MMU
> @@ -17,10 +18,10 @@
>   
>   static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
>   {
> -	if (riscv_use_ipi_for_rfence())
> -		tlb_remove_page_ptdesc(tlb, pt);
> -	else
> +	if (riscv_use_sbi_for_rfence())
>   		tlb_remove_ptdesc(tlb, pt);
> +	else
> +		tlb_remove_page_ptdesc(tlb, pt);
>   }
>   
>   static inline void pmd_populate_kernel(struct mm_struct *mm,
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 6e68f8dff76b..ea84392ca9d7 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -375,8 +375,12 @@ unsigned long riscv_cached_marchid(unsigned int cpu_id);
>   unsigned long riscv_cached_mimpid(unsigned int cpu_id);
>   
>   #if IS_ENABLED(CONFIG_SMP) && IS_ENABLED(CONFIG_RISCV_SBI)
> +DECLARE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
> +#define riscv_use_sbi_for_rfence() \
> +	static_branch_unlikely(&riscv_sbi_for_rfence)
>   void sbi_ipi_init(void);
>   #else
> +static inline bool riscv_use_sbi_for_rfence(void) { return false; }
>   static inline void sbi_ipi_init(void) { }
>   #endif
>   
> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
> index 0d555847cde6..7ac80e9f2288 100644
> --- a/arch/riscv/include/asm/smp.h
> +++ b/arch/riscv/include/asm/smp.h
> @@ -49,12 +49,7 @@ void riscv_ipi_disable(void);
>   bool riscv_ipi_have_virq_range(void);
>   
>   /* Set the IPI interrupt numbers for arch (called by irqchip drivers) */
> -void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence);
> -
> -/* Check if we can use IPIs for remote FENCEs */
> -DECLARE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
> -#define riscv_use_ipi_for_rfence() \
> -	static_branch_unlikely(&riscv_ipi_for_rfence)
> +void riscv_ipi_set_virq_range(int virq, int nr);
>   
>   /* Check other CPUs stop or not */
>   bool smp_crash_stop_failed(void);
> @@ -104,16 +99,10 @@ static inline bool riscv_ipi_have_virq_range(void)
>   	return false;
>   }
>   
> -static inline void riscv_ipi_set_virq_range(int virq, int nr,
> -					    bool use_for_rfence)
> +static inline void riscv_ipi_set_virq_range(int virq, int nr)
>   {
>   }
>   
> -static inline bool riscv_use_ipi_for_rfence(void)
> -{
> -	return false;
> -}
> -
>   #endif /* CONFIG_SMP */
>   
>   #if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
> diff --git a/arch/riscv/kernel/sbi-ipi.c b/arch/riscv/kernel/sbi-ipi.c
> index a4559695ce62..1026e22955cc 100644
> --- a/arch/riscv/kernel/sbi-ipi.c
> +++ b/arch/riscv/kernel/sbi-ipi.c
> @@ -13,6 +13,9 @@
>   #include <linux/irqdomain.h>
>   #include <asm/sbi.h>
>   
> +DEFINE_STATIC_KEY_FALSE(riscv_sbi_for_rfence);
> +EXPORT_SYMBOL_GPL(riscv_sbi_for_rfence);
> +
>   static int sbi_ipi_virq;
>   
>   static void sbi_ipi_handle(struct irq_desc *desc)
> @@ -72,6 +75,12 @@ void __init sbi_ipi_init(void)
>   			  "irqchip/sbi-ipi:starting",
>   			  sbi_ipi_starting_cpu, NULL);
>   
> -	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE, false);
> +	riscv_ipi_set_virq_range(virq, BITS_PER_BYTE);
>   	pr_info("providing IPIs using SBI IPI extension\n");
> +
> +	/*
> +	 * Use the SBI remote fence extension to avoid
> +	 * the extra context switch needed to handle IPIs.
> +	 */
> +	static_branch_enable(&riscv_sbi_for_rfence);
>   }
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index 45dd4035416e..8e6eb64459af 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -171,10 +171,7 @@ bool riscv_ipi_have_virq_range(void)
>   	return (ipi_virq_base) ? true : false;
>   }
>   
> -DEFINE_STATIC_KEY_FALSE(riscv_ipi_for_rfence);
> -EXPORT_SYMBOL_GPL(riscv_ipi_for_rfence);
> -
> -void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
> +void riscv_ipi_set_virq_range(int virq, int nr)
>   {
>   	int i, err;
>   
> @@ -197,12 +194,6 @@ void riscv_ipi_set_virq_range(int virq, int nr, bool use_for_rfence)
>   
>   	/* Enabled IPIs for boot CPU immediately */
>   	riscv_ipi_enable();
> -
> -	/* Update RFENCE static key */
> -	if (use_for_rfence)
> -		static_branch_enable(&riscv_ipi_for_rfence);
> -	else
> -		static_branch_disable(&riscv_ipi_for_rfence);
>   }
>   
>   static const char * const ipi_names[] = {
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index bc61ee5975e4..d76fc73e594b 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,7 @@ void flush_icache_all(void)
>   {
>   	local_flush_icache_all();
>   
> -	if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> +	if (riscv_use_sbi_for_rfence())
>   		sbi_remote_fence_i(NULL);
>   	else
>   		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> @@ -69,8 +69,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
>   		 * with flush_icache_deferred().
>   		 */
>   		smp_mb();
> -	} else if (IS_ENABLED(CONFIG_RISCV_SBI) &&
> -		   !riscv_use_ipi_for_rfence()) {
> +	} else if (riscv_use_sbi_for_rfence()) {
>   		sbi_remote_fence_i(&others);
>   	} else {
>   		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 893566e004b7..0435605b07d0 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,10 +79,10 @@ static void __ipi_flush_tlb_all(void *info)
>   
>   void flush_tlb_all(void)
>   {
> -	if (riscv_use_ipi_for_rfence())
> -		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
> -	else
> +	if (riscv_use_sbi_for_rfence())
>   		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
> +	else
> +		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);
>   }
>   
>   struct flush_tlb_range_data {
> @@ -103,7 +103,6 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
>   			      unsigned long start, unsigned long size,
>   			      unsigned long stride)
>   {
> -	struct flush_tlb_range_data ftd;
>   	bool broadcast;
>   
>   	if (cpumask_empty(cmask))
> @@ -119,20 +118,18 @@ static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid,
>   		broadcast = true;
>   	}
>   
> -	if (broadcast) {
> -		if (riscv_use_ipi_for_rfence()) {
> -			ftd.asid = asid;
> -			ftd.start = start;
> -			ftd.size = size;
> -			ftd.stride = stride;
> -			on_each_cpu_mask(cmask,
> -					 __ipi_flush_tlb_range_asid,
> -					 &ftd, 1);
> -		} else
> -			sbi_remote_sfence_vma_asid(cmask,
> -						   start, size, asid);
> -	} else {
> +	if (!broadcast) {
>   		local_flush_tlb_range_asid(start, size, stride, asid);
> +	} else if (riscv_use_sbi_for_rfence()) {
> +		sbi_remote_sfence_vma_asid(cmask, start, size, asid);
> +	} else {
> +		struct flush_tlb_range_data ftd;
> +
> +		ftd.asid = asid;
> +		ftd.start = start;
> +		ftd.size = size;
> +		ftd.stride = stride;
> +		on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
>   	}
>   
>   	if (cmask != cpu_online_mask)
> diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
> index 09fd292eb83d..0bdd9d7ec545 100644
> --- a/drivers/clocksource/timer-clint.c
> +++ b/drivers/clocksource/timer-clint.c
> @@ -251,7 +251,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
>   	}
>   
>   	irq_set_chained_handler(clint_ipi_irq, clint_ipi_interrupt);
> -	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE, true);
> +	riscv_ipi_set_virq_range(rc, BITS_PER_BYTE);
>   	clint_clear_ipi();
>   #endif
>   


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
  2024-03-27  4:49   ` Samuel Holland
@ 2024-04-04  8:04     ` Alexandre Ghiti
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  8:04 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui


On 27/03/2024 05:49, Samuel Holland wrote:
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/mm/cacheflush.c | 4 +++-
>   arch/riscv/mm/tlbflush.c   | 4 +++-
>   2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index d76fc73e594b..f5be1fec8191 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>   {
>   	local_flush_icache_all();
>   
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)
> +		return;
> +	else if (riscv_use_sbi_for_rfence())
>   		sbi_remote_fence_i(NULL);
>   	else
>   		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index da821315d43e..0901aa47b58f 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>   
>   void flush_tlb_all(void)
>   {
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)
> +		local_flush_tlb_all();
> +	else if (riscv_use_sbi_for_rfence())
>   		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>   	else
>   		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);


Could this be done directly in __sbi_rfence() instead?

Otherwise:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online
@ 2024-04-04  8:04     ` Alexandre Ghiti
  0 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-04  8:04 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui


On 27/03/2024 05:49, Samuel Holland wrote:
> If no other CPU is online, a local cache or TLB flush is sufficient.
> These checks can be constant-folded when SMP is disabled.
>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/mm/cacheflush.c | 4 +++-
>   arch/riscv/mm/tlbflush.c   | 4 +++-
>   2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index d76fc73e594b..f5be1fec8191 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -21,7 +21,9 @@ void flush_icache_all(void)
>   {
>   	local_flush_icache_all();
>   
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)
> +		return;
> +	else if (riscv_use_sbi_for_rfence())
>   		sbi_remote_fence_i(NULL);
>   	else
>   		on_each_cpu(ipi_remote_fence_i, NULL, 1);
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index da821315d43e..0901aa47b58f 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -79,7 +79,9 @@ static void __ipi_flush_tlb_all(void *info)
>   
>   void flush_tlb_all(void)
>   {
> -	if (riscv_use_sbi_for_rfence())
> +	if (num_online_cpus() < 2)
> +		local_flush_tlb_all();
> +	else if (riscv_use_sbi_for_rfence())
>   		sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID);
>   	else
>   		on_each_cpu(__ipi_flush_tlb_all, NULL, 1);


Could this be done directly in __sbi_rfence() instead?

Otherwise:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 01/13] riscv: Flush the instruction cache during SMP bringup
  2024-03-27  4:49   ` Samuel Holland
@ 2024-04-24 20:50     ` Alexandre Ghiti
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-24 20:50 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui

Hi Samuel,

On 27/03/2024 05:49, Samuel Holland wrote:
> Instruction cache flush IPIs are sent only to CPUs in cpu_online_mask,
> so they will not target a CPU until it calls set_cpu_online() earlier in
> smp_callin(). As a result, if instruction memory is modified between the
> CPU coming out of reset and that point, then its instruction cache may
> contain stale data. Therefore, the instruction cache must be flushed
> after the set_cpu_online() synchronization point.
>
> Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/kernel/smpboot.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index d41090fc3203..4b3c50da48ba 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -26,7 +26,7 @@
>   #include <linux/sched/task_stack.h>
>   #include <linux/sched/mm.h>
>   
> -#include <asm/cpufeature.h>
> +#include <asm/cacheflush.h>
>   #include <asm/cpu_ops.h>
>   #include <asm/irq.h>
>   #include <asm/mmu_context.h>
> @@ -234,9 +234,10 @@ asmlinkage __visible void smp_callin(void)
>   	riscv_user_isa_enable();
>   
>   	/*
> -	 * Remote TLB flushes are ignored while the CPU is offline, so emit
> -	 * a local TLB flush right now just in case.
> +	 * Remote cache and TLB flushes are ignored while the CPU is offline,
> +	 * so flush them both right now just in case.
>   	 */
> +	local_flush_icache_all();
>   	local_flush_tlb_all();
>   	complete(&cpu_running);
>   	/*


This should go into -fixes, would you mind sending this patch on its 
own? I think it is easier for Palmer.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 01/13] riscv: Flush the instruction cache during SMP bringup
@ 2024-04-24 20:50     ` Alexandre Ghiti
  0 siblings, 0 replies; 48+ messages in thread
From: Alexandre Ghiti @ 2024-04-24 20:50 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv
  Cc: linux-kernel, linux-mm, Alexandre Ghiti, Jisheng Zhang, Yunhui Cui

Hi Samuel,

On 27/03/2024 05:49, Samuel Holland wrote:
> Instruction cache flush IPIs are sent only to CPUs in cpu_online_mask,
> so they will not target a CPU until it calls set_cpu_online() earlier in
> smp_callin(). As a result, if instruction memory is modified between the
> CPU coming out of reset and that point, then its instruction cache may
> contain stale data. Therefore, the instruction cache must be flushed
> after the set_cpu_online() synchronization point.
>
> Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> (no changes since v4)
>
> Changes in v4:
>   - New patch for v4
>
>   arch/riscv/kernel/smpboot.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index d41090fc3203..4b3c50da48ba 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -26,7 +26,7 @@
>   #include <linux/sched/task_stack.h>
>   #include <linux/sched/mm.h>
>   
> -#include <asm/cpufeature.h>
> +#include <asm/cacheflush.h>
>   #include <asm/cpu_ops.h>
>   #include <asm/irq.h>
>   #include <asm/mmu_context.h>
> @@ -234,9 +234,10 @@ asmlinkage __visible void smp_callin(void)
>   	riscv_user_isa_enable();
>   
>   	/*
> -	 * Remote TLB flushes are ignored while the CPU is offline, so emit
> -	 * a local TLB flush right now just in case.
> +	 * Remote cache and TLB flushes are ignored while the CPU is offline,
> +	 * so flush them both right now just in case.
>   	 */
> +	local_flush_icache_all();
>   	local_flush_tlb_all();
>   	complete(&cpu_running);
>   	/*


This should go into -fixes, would you mind sending this patch on its 
own? I think it is easier for Palmer.

Thanks,

Alex


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements
  2024-03-27  4:49 ` Samuel Holland
@ 2024-05-14 14:00   ` patchwork-bot+linux-riscv
  -1 siblings, 0 replies; 48+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-05-14 14:00 UTC (permalink / raw)
  To: Samuel Holland
  Cc: linux-riscv, palmer, linux-kernel, linux-mm, alexghiti, jszhang,
	cuiyunhui

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Tue, 26 Mar 2024 21:49:41 -0700 you wrote:
> This series converts uniprocessor kernel builds to use the same TLB
> flushing code as SMP builds, to take advantage of batching and existing
> range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
> SBI calls based on the online CPU count, which also covers the scenario
> where SMP was enabled at build time but only one CPU is present/online.
> A final optimization is to use single-ASID flushes wherever possible, to
> avoid unnecessary TLB misses for kernel mappings.
> 
> [...]

Here is the summary with links:
  - [v6,01/13] riscv: Flush the instruction cache during SMP bringup
    https://git.kernel.org/riscv/c/58661a30f1bc
  - [v6,02/13] riscv: Factor out page table TLB synchronization
    https://git.kernel.org/riscv/c/aaa56c8f378d
  - [v6,03/13] riscv: Use IPIs for remote cache/TLB flushes by default
    https://git.kernel.org/riscv/c/dc892fb44322
  - [v6,04/13] riscv: mm: Broadcast kernel TLB flushes only when needed
    https://git.kernel.org/riscv/c/038ac18aae93
  - [v6,05/13] riscv: Only send remote fences when some other CPU is online
    https://git.kernel.org/riscv/c/9546f00410ed
  - [v6,06/13] riscv: mm: Combine the SMP and UP TLB flush code
    https://git.kernel.org/riscv/c/c6026d35b6ab
  - [v6,07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
    https://git.kernel.org/riscv/c/20e03d702e00
  - [v6,08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
    https://git.kernel.org/riscv/c/d6dcdabafcd7
  - [v6,09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros
    https://git.kernel.org/riscv/c/74cd17792d28
  - [v6,10/13] riscv: mm: Use a fixed layout for the MM context ID
    https://git.kernel.org/riscv/c/f58e5dc45fa9
  - [v6,11/13] riscv: mm: Make asid_bits a local variable
    https://git.kernel.org/riscv/c/8d3e7613f97e
  - [v6,12/13] riscv: mm: Preserve global TLB entries when switching contexts
    https://git.kernel.org/riscv/c/8fc21cc672e8
  - [v6,13/13] riscv: mm: Always use an ASID to flush mm contexts
    https://git.kernel.org/riscv/c/daef19263fc1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements
@ 2024-05-14 14:00   ` patchwork-bot+linux-riscv
  0 siblings, 0 replies; 48+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-05-14 14:00 UTC (permalink / raw)
  To: Samuel Holland
  Cc: linux-riscv, palmer, linux-kernel, linux-mm, alexghiti, jszhang,
	cuiyunhui

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Tue, 26 Mar 2024 21:49:41 -0700 you wrote:
> This series converts uniprocessor kernel builds to use the same TLB
> flushing code as SMP builds, to take advantage of batching and existing
> range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
> SBI calls based on the online CPU count, which also covers the scenario
> where SMP was enabled at build time but only one CPU is present/online.
> A final optimization is to use single-ASID flushes wherever possible, to
> avoid unnecessary TLB misses for kernel mappings.
> 
> [...]

Here is the summary with links:
  - [v6,01/13] riscv: Flush the instruction cache during SMP bringup
    https://git.kernel.org/riscv/c/58661a30f1bc
  - [v6,02/13] riscv: Factor out page table TLB synchronization
    https://git.kernel.org/riscv/c/aaa56c8f378d
  - [v6,03/13] riscv: Use IPIs for remote cache/TLB flushes by default
    https://git.kernel.org/riscv/c/dc892fb44322
  - [v6,04/13] riscv: mm: Broadcast kernel TLB flushes only when needed
    https://git.kernel.org/riscv/c/038ac18aae93
  - [v6,05/13] riscv: Only send remote fences when some other CPU is online
    https://git.kernel.org/riscv/c/9546f00410ed
  - [v6,06/13] riscv: mm: Combine the SMP and UP TLB flush code
    https://git.kernel.org/riscv/c/c6026d35b6ab
  - [v6,07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
    https://git.kernel.org/riscv/c/20e03d702e00
  - [v6,08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
    https://git.kernel.org/riscv/c/d6dcdabafcd7
  - [v6,09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros
    https://git.kernel.org/riscv/c/74cd17792d28
  - [v6,10/13] riscv: mm: Use a fixed layout for the MM context ID
    https://git.kernel.org/riscv/c/f58e5dc45fa9
  - [v6,11/13] riscv: mm: Make asid_bits a local variable
    https://git.kernel.org/riscv/c/8d3e7613f97e
  - [v6,12/13] riscv: mm: Preserve global TLB entries when switching contexts
    https://git.kernel.org/riscv/c/8fc21cc672e8
  - [v6,13/13] riscv: mm: Always use an ASID to flush mm contexts
    https://git.kernel.org/riscv/c/daef19263fc1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-05-14 14:01 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27  4:49 [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements Samuel Holland
2024-03-27  4:49 ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 01/13] riscv: Flush the instruction cache during SMP bringup Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-04-24 20:50   ` Alexandre Ghiti
2024-04-24 20:50     ` Alexandre Ghiti
2024-03-27  4:49 ` [PATCH v6 02/13] riscv: Factor out page table TLB synchronization Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-04-04  7:48   ` Alexandre Ghiti
2024-04-04  7:48     ` Alexandre Ghiti
2024-03-27  4:49 ` [PATCH v6 03/13] riscv: Use IPIs for remote cache/TLB flushes by default Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-04-04  7:56   ` Alexandre Ghiti
2024-04-04  7:56     ` Alexandre Ghiti
2024-03-27  4:49 ` [PATCH v6 04/13] riscv: mm: Broadcast kernel TLB flushes only when needed Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 05/13] riscv: Only send remote fences when some other CPU is online Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  6:16   ` [External] " yunhui cui
2024-03-27  6:16     ` yunhui cui
2024-03-27 20:14     ` Samuel Holland
2024-03-27 20:14       ` Samuel Holland
2024-03-28  2:21       ` yunhui cui
2024-03-28  2:21         ` yunhui cui
2024-04-04  8:04   ` Alexandre Ghiti
2024-04-04  8:04     ` Alexandre Ghiti
2024-03-27  4:49 ` [PATCH v6 06/13] riscv: mm: Combine the SMP and UP TLB flush code Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  6:23   ` [External] " yunhui cui
2024-03-27  6:23     ` yunhui cui
2024-03-27  4:49 ` [PATCH v6 07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  6:27   ` [External] " yunhui cui
2024-03-27  6:27     ` yunhui cui
2024-03-27  4:49 ` [PATCH v6 09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 10/13] riscv: mm: Use a fixed layout for the MM context ID Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 11/13] riscv: mm: Make asid_bits a local variable Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 12/13] riscv: mm: Preserve global TLB entries when switching contexts Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-03-27  4:49 ` [PATCH v6 13/13] riscv: mm: Always use an ASID to flush mm contexts Samuel Holland
2024-03-27  4:49   ` Samuel Holland
2024-05-14 14:00 ` [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements patchwork-bot+linux-riscv
2024-05-14 14:00   ` patchwork-bot+linux-riscv

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.