* [RFC PATCH v2 0/2] Risc-V Svinval support
@ 2022-04-24 9:02 Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 1/2] riscv: enum for svinval extension Mayuresh Chitale
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Mayuresh Chitale @ 2022-04-24 9:02 UTC (permalink / raw)
To: Palmer Dabbelt, Paul Walmsley, Albert Ou
Cc: Mayuresh Chitale, Atish Patra, Anup Patel, linux-riscv
This patch adds support for the Svinval extension as defined in the
Risc V Privileged specification. This patchset depends on:
https://patchwork.kernel.org/project/linux-riscv/list/?series=632984
The feature was tested with qemu from latest master branch with
following additional patch:
https://lists.nongnu.org/archive/html/qemu-riscv/2022-03/msg00142.html
Changes in V2:
- Rebased on 5.18-rc3
- update riscv_fill_hwcap to probe Svinval extension
Mayuresh Chitale (2):
riscv: enum for svinval extension
riscv: mm: use svinval instructions instead of sfence.vma
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/tlbflush.h | 12 ++++
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/setup.c | 1 +
arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
6 files changed, 126 insertions(+), 6 deletions(-)
--
2.17.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH v2 1/2] riscv: enum for svinval extension
2022-04-24 9:02 [RFC PATCH v2 0/2] Risc-V Svinval support Mayuresh Chitale
@ 2022-04-24 9:02 ` Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma Mayuresh Chitale
2022-06-02 3:52 ` [RFC PATCH v2 0/2] Risc-V Svinval support Palmer Dabbelt
2 siblings, 0 replies; 8+ messages in thread
From: Mayuresh Chitale @ 2022-04-24 9:02 UTC (permalink / raw)
To: Palmer Dabbelt, Paul Walmsley, Albert Ou
Cc: Mayuresh Chitale, Atish Patra, Anup Patel, linux-riscv
Similar to the other ISA extensions, this patch enables
callers to check for the presence for the svinval extension.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
---
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 0734e42f74f2..c58c7ddd8a2a 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
*/
enum riscv_isa_ext_id {
RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
+ RISCV_ISA_EXT_SVINVAL,
RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
};
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index ccb617791e56..62db19f88704 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -88,6 +88,7 @@ int riscv_of_parent_hartid(struct device_node *node)
*/
static struct riscv_isa_ext_data isa_ext_arr[] = {
__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
+ __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
};
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 1b2d42d7f589..febe744c7f29 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -192,6 +192,7 @@ void __init riscv_fill_hwcap(void)
set_bit(*ext - 'a', this_isa);
} else {
SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
+ SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
}
#undef SET_ISA_EXT_MAP
}
--
2.17.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma
2022-04-24 9:02 [RFC PATCH v2 0/2] Risc-V Svinval support Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 1/2] riscv: enum for svinval extension Mayuresh Chitale
@ 2022-04-24 9:02 ` Mayuresh Chitale
2022-12-08 19:03 ` Palmer Dabbelt
2022-06-02 3:52 ` [RFC PATCH v2 0/2] Risc-V Svinval support Palmer Dabbelt
2 siblings, 1 reply; 8+ messages in thread
From: Mayuresh Chitale @ 2022-04-24 9:02 UTC (permalink / raw)
To: Palmer Dabbelt, Paul Walmsley, Albert Ou
Cc: Mayuresh Chitale, Atish Patra, Anup Patel, linux-riscv
When svinval is supported the local_flush_tlb_page*
functions would prefer to use the following sequence
to optimize the tlb flushes instead of a simple sfence.vma:
sfence.w.inval
svinval.vma
.
.
svinval.vma
sfence.inval.ir
The maximum number of consecutive svinval.vma instructions
that can be executed in local_flush_tlb_page* functions is
limited to PTRS_PER_PTE. This is required to avoid soft
lockups and the approach is similar to that used in arm64.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
---
arch/riscv/include/asm/tlbflush.h | 12 ++++
arch/riscv/kernel/setup.c | 1 +
arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
3 files changed, 123 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 801019381dea..b535467c99f0 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -22,6 +22,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
{
ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
}
+
+void riscv_tlbflush_init(void);
+void __riscv_sfence_w_inval(void);
+void __riscv_sfence_inval_ir(void);
+void __riscv_sinval_vma(unsigned long addr);
+void __riscv_sinval_vma_asid(unsigned long addr, unsigned long asid);
+
+/* Check if we can use sinval for tlb flush */
+DECLARE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
+#define riscv_use_flush_tlb_svinval() \
+ static_branch_unlikely(&riscv_flush_tlb_svinval)
+
#else /* CONFIG_MMU */
#define local_flush_tlb_all() do { } while (0)
#define local_flush_tlb_page(addr) do { } while (0)
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 834eb652a7b9..13de04259de9 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -295,6 +295,7 @@ void __init setup_arch(char **cmdline_p)
#endif
riscv_fill_hwcap();
+ riscv_tlbflush_init();
}
static int __init topology_init(void)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 27a7db8eb2c4..800953f9121e 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -1,11 +1,14 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) "riscv: " fmt
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/sched.h>
#include <asm/sbi.h>
#include <asm/mmu_context.h>
+static unsigned long tlb_flush_all_threshold __read_mostly = PTRS_PER_PTE;
+
static inline void local_flush_tlb_all_asid(unsigned long asid)
{
__asm__ __volatile__ ("sfence.vma x0, %0"
@@ -23,22 +26,110 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
: "memory");
}
+static inline void riscv_sfence_inval_ir(void)
+{
+ /*
+ * SFENCE.INVAL.IR
+ * 0001100 00001 00000 000 00000 1110011
+ */
+ asm volatile (".word 0x18100073" ::: "memory");
+}
+
+static inline void riscv_sfence_w_inval(void)
+{
+ /*
+ * SFENCE.W.INVAL
+ * 0001100 00000 00000 000 00000 1110011
+ */
+ asm volatile (".word 0x18000073" ::: "memory");
+}
+
+static inline void riscv_sinval_vma_asid(unsigned long vma, unsigned long asid)
+{
+ /*
+ * rs1 = a0 (VMA)
+ * rs2 = a1 (asid)
+ * SINVAL.VMA a0, a1
+ * 0001011 01011 01010 000 00000 1110011
+ */
+ asm volatile ("srli a0, %0, 2\n"
+ "add a1, %1, zero\n"
+ ".word 0x16B50073\n"
+ :: "r" (vma), "r" (asid)
+ : "a0", "a1", "memory");
+}
+
+static inline void riscv_sinval_vma(unsigned long vma)
+{
+ /*
+ * rs1 = a0 (VMA)
+ * rs2 = 0
+ * SINVAL.VMA a0
+ * 0001011 00000 01010 000 00000 1110011
+ */
+ asm volatile ("srli a0, %0, 2\n"
+ ".word 0x16050073\n"
+ :: "r" (vma) : "a0", "memory");
+}
+
static inline void local_flush_tlb_range(unsigned long start,
unsigned long size, unsigned long stride)
{
- if (size <= stride)
- local_flush_tlb_page(start);
- else
+ if ((size / stride) <= tlb_flush_all_threshold) {
+ if (riscv_use_flush_tlb_svinval()) {
+ riscv_sfence_w_inval();
+ while (size) {
+ riscv_sinval_vma(start);
+ start += stride;
+ if (size > stride)
+ size -= stride;
+ else
+ size = 0;
+ }
+ riscv_sfence_inval_ir();
+ } else {
+ while (size) {
+ local_flush_tlb_page(start);
+ start += stride;
+ if (size > stride)
+ size -= stride;
+ else
+ size = 0;
+ }
+ }
+ } else {
local_flush_tlb_all();
+ }
}
static inline void local_flush_tlb_range_asid(unsigned long start,
unsigned long size, unsigned long stride, unsigned long asid)
{
- if (size <= stride)
- local_flush_tlb_page_asid(start, asid);
- else
+ if ((size / stride) <= tlb_flush_all_threshold) {
+ if (riscv_use_flush_tlb_svinval()) {
+ riscv_sfence_w_inval();
+ while (size) {
+ riscv_sinval_vma_asid(start, asid);
+ start += stride;
+ if (size > stride)
+ size -= stride;
+ else
+ size = 0;
+ }
+ riscv_sfence_inval_ir();
+ } else {
+ while (size) {
+ local_flush_tlb_page_asid(start, asid);
+ start += stride;
+ if (size > stride)
+ size -= stride;
+ else
+ size = 0;
+ }
+ }
+ } else {
local_flush_tlb_all_asid(asid);
+ }
}
static void __ipi_flush_tlb_all(void *info)
@@ -149,3 +240,16 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
__flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
}
#endif
+
+DEFINE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
+EXPORT_SYMBOL_GPL(riscv_flush_tlb_svinval);
+
+void riscv_tlbflush_init(void)
+{
+ if (riscv_isa_extension_available(NULL, SVINVAL)) {
+ pr_info("Svinval extension supported\n");
+ static_branch_enable(&riscv_flush_tlb_svinval);
+ } else {
+ static_branch_disable(&riscv_flush_tlb_svinval);
+ }
+}
--
2.17.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v2 0/2] Risc-V Svinval support
2022-04-24 9:02 [RFC PATCH v2 0/2] Risc-V Svinval support Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 1/2] riscv: enum for svinval extension Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma Mayuresh Chitale
@ 2022-06-02 3:52 ` Palmer Dabbelt
2 siblings, 0 replies; 8+ messages in thread
From: Palmer Dabbelt @ 2022-06-02 3:52 UTC (permalink / raw)
To: mchitale; +Cc: Paul Walmsley, aou, mchitale, atishp, anup, linux-riscv
On Sun, 24 Apr 2022 02:02:14 PDT (-0700), mchitale@ventanamicro.com wrote:
>
> This patch adds support for the Svinval extension as defined in the
> Risc V Privileged specification. This patchset depends on:
>
> https://patchwork.kernel.org/project/linux-riscv/list/?series=632984
IIRC Marc still has some unresolved feedback on that one, for the IRQ
bits?
>
> The feature was tested with qemu from latest master branch with
> following additional patch:
> https://lists.nongnu.org/archive/html/qemu-riscv/2022-03/msg00142.html
>
> Changes in V2:
>
> - Rebased on 5.18-rc3
> - update riscv_fill_hwcap to probe Svinval extension
>
> Mayuresh Chitale (2):
> riscv: enum for svinval extension
> riscv: mm: use svinval instructions instead of sfence.vma
>
> arch/riscv/include/asm/hwcap.h | 1 +
> arch/riscv/include/asm/tlbflush.h | 12 ++++
> arch/riscv/kernel/cpu.c | 1 +
> arch/riscv/kernel/cpufeature.c | 1 +
> arch/riscv/kernel/setup.c | 1 +
> arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
> 6 files changed, 126 insertions(+), 6 deletions(-)
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma
2022-04-24 9:02 ` [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma Mayuresh Chitale
@ 2022-12-08 19:03 ` Palmer Dabbelt
2023-06-13 4:02 ` Mayuresh Chitale
0 siblings, 1 reply; 8+ messages in thread
From: Palmer Dabbelt @ 2022-12-08 19:03 UTC (permalink / raw)
To: mchitale; +Cc: Paul Walmsley, aou, mchitale, atishp, anup, linux-riscv
On Sun, 24 Apr 2022 02:02:16 PDT (-0700), mchitale@ventanamicro.com wrote:
> When svinval is supported the local_flush_tlb_page*
> functions would prefer to use the following sequence
> to optimize the tlb flushes instead of a simple sfence.vma:
>
> sfence.w.inval
> svinval.vma
> .
> .
> svinval.vma
> sfence.inval.ir
>
> The maximum number of consecutive svinval.vma instructions
> that can be executed in local_flush_tlb_page* functions is
> limited to PTRS_PER_PTE. This is required to avoid soft
> lockups and the approach is similar to that used in arm64.
>
> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> ---
> arch/riscv/include/asm/tlbflush.h | 12 ++++
> arch/riscv/kernel/setup.c | 1 +
> arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
> 3 files changed, 123 insertions(+), 6 deletions(-)
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 801019381dea..b535467c99f0 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -22,6 +22,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
> {
> ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
> }
> +
> +void riscv_tlbflush_init(void);
> +void __riscv_sfence_w_inval(void);
> +void __riscv_sfence_inval_ir(void);
> +void __riscv_sinval_vma(unsigned long addr);
> +void __riscv_sinval_vma_asid(unsigned long addr, unsigned long asid);
> +
> +/* Check if we can use sinval for tlb flush */
> +DECLARE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> +#define riscv_use_flush_tlb_svinval() \
> + static_branch_unlikely(&riscv_flush_tlb_svinval)
> +
> #else /* CONFIG_MMU */
> #define local_flush_tlb_all() do { } while (0)
> #define local_flush_tlb_page(addr) do { } while (0)
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 834eb652a7b9..13de04259de9 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -295,6 +295,7 @@ void __init setup_arch(char **cmdline_p)
> #endif
>
> riscv_fill_hwcap();
> + riscv_tlbflush_init();
> }
>
> static int __init topology_init(void)
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 27a7db8eb2c4..800953f9121e 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -1,11 +1,14 @@
> // SPDX-License-Identifier: GPL-2.0
>
> +#define pr_fmt(fmt) "riscv: " fmt
> #include <linux/mm.h>
> #include <linux/smp.h>
> #include <linux/sched.h>
> #include <asm/sbi.h>
> #include <asm/mmu_context.h>
>
> +static unsigned long tlb_flush_all_threshold __read_mostly = PTRS_PER_PTE;
> +
> static inline void local_flush_tlb_all_asid(unsigned long asid)
> {
> __asm__ __volatile__ ("sfence.vma x0, %0"
> @@ -23,22 +26,110 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
> : "memory");
> }
>
> +static inline void riscv_sfence_inval_ir(void)
> +{
> + /*
> + * SFENCE.INVAL.IR
> + * 0001100 00001 00000 000 00000 1110011
> + */
> + asm volatile (".word 0x18100073" ::: "memory");
> +}
> +
> +static inline void riscv_sfence_w_inval(void)
> +{
> + /*
> + * SFENCE.W.INVAL
> + * 0001100 00000 00000 000 00000 1110011
> + */
> + asm volatile (".word 0x18000073" ::: "memory");
> +}
> +
> +static inline void riscv_sinval_vma_asid(unsigned long vma, unsigned long asid)
> +{
> + /*
> + * rs1 = a0 (VMA)
> + * rs2 = a1 (asid)
> + * SINVAL.VMA a0, a1
> + * 0001011 01011 01010 000 00000 1110011
> + */
> + asm volatile ("srli a0, %0, 2\n"
> + "add a1, %1, zero\n"
> + ".word 0x16B50073\n"
> + :: "r" (vma), "r" (asid)
> + : "a0", "a1", "memory");
> +}
> +
> +static inline void riscv_sinval_vma(unsigned long vma)
> +{
> + /*
> + * rs1 = a0 (VMA)
> + * rs2 = 0
> + * SINVAL.VMA a0
> + * 0001011 00000 01010 000 00000 1110011
> + */
> + asm volatile ("srli a0, %0, 2\n"
> + ".word 0x16050073\n"
> + :: "r" (vma) : "a0", "memory");
> +}
> +
> static inline void local_flush_tlb_range(unsigned long start,
> unsigned long size, unsigned long stride)
> {
> - if (size <= stride)
> - local_flush_tlb_page(start);
> - else
> + if ((size / stride) <= tlb_flush_all_threshold) {
> + if (riscv_use_flush_tlb_svinval()) {
> + riscv_sfence_w_inval();
> + while (size) {
> + riscv_sinval_vma(start);
> + start += stride;
> + if (size > stride)
> + size -= stride;
> + else
> + size = 0;
> + }
> + riscv_sfence_inval_ir();
> + } else {
> + while (size) {
> + local_flush_tlb_page(start);
> + start += stride;
> + if (size > stride)
> + size -= stride;
> + else
> + size = 0;
> + }
> + }
> + } else {
> local_flush_tlb_all();
> + }
> }
>
> static inline void local_flush_tlb_range_asid(unsigned long start,
> unsigned long size, unsigned long stride, unsigned long asid)
> {
> - if (size <= stride)
> - local_flush_tlb_page_asid(start, asid);
> - else
> + if ((size / stride) <= tlb_flush_all_threshold) {
> + if (riscv_use_flush_tlb_svinval()) {
> + riscv_sfence_w_inval();
> + while (size) {
> + riscv_sinval_vma_asid(start, asid);
> + start += stride;
> + if (size > stride)
> + size -= stride;
> + else
> + size = 0;
> + }
> + riscv_sfence_inval_ir();
> + } else {
> + while (size) {
> + local_flush_tlb_page_asid(start, asid);
> + start += stride;
> + if (size > stride)
> + size -= stride;
> + else
> + size = 0;
> + }
> + }
> + } else {
> local_flush_tlb_all_asid(asid);
> + }
Sorry to dig up an old post, but svinval came up and I started looking
at this again.
There's some other issues with the code, but at a higher level it's not
really clear this is a useful thing to do: splitting up sfence.vma makes
sense, but if there's no operations between the sfence.w.inval and
sinval.vma (or sinval.vma and sfence.inval.ir) then we're not giving the
HW anything else to do concurrently with the flushes and thus I don't
see how this would improve performance -- if anything we're just
emitting more instructions to do the samething, which would be bad for
performance.
So I think if we're going to use these we really need to also split up
the TLB flush operations, so we can give something for the HW to do
currently with the flushing. That's be a pretty big change to our TLB
model, though: essentially we're going from two phases (write the page
tables, then flush) to at least three phases (write the page tables,
start the flush, finish the flush) or even four (if we want to insert
work between both sfence.w.inval->sinval.vma and
sinval.vma->sfence.inval.ir).
Not sure if I'm just misunderstanding the performance characteristics,
though. Maybe I'm over-thinking things and sfence.vma is heavy-weight
enough that implementations are flushing the pipeline for even the
page-based flavors and thus sfence.vma-based loops are very slow?
Do you have any benchmarks for the Svinval bits here (which are mixed in
with some non-global flushing, that's a different optimization that
needs to be split out)?
> }
>
> static void __ipi_flush_tlb_all(void *info)
> @@ -149,3 +240,16 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
> }
> #endif
> +
> +DEFINE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> +EXPORT_SYMBOL_GPL(riscv_flush_tlb_svinval);
> +
> +void riscv_tlbflush_init(void)
> +{
> + if (riscv_isa_extension_available(NULL, SVINVAL)) {
> + pr_info("Svinval extension supported\n");
> + static_branch_enable(&riscv_flush_tlb_svinval);
> + } else {
> + static_branch_disable(&riscv_flush_tlb_svinval);
> + }
> +}
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma
2022-12-08 19:03 ` Palmer Dabbelt
@ 2023-06-13 4:02 ` Mayuresh Chitale
2023-06-13 13:43 ` Alexandre Ghiti
0 siblings, 1 reply; 8+ messages in thread
From: Mayuresh Chitale @ 2023-06-13 4:02 UTC (permalink / raw)
To: Palmer Dabbelt; +Cc: Paul Walmsley, aou, atishp, anup, linux-riscv
On Fri, Dec 9, 2022 at 12:33 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Sun, 24 Apr 2022 02:02:16 PDT (-0700), mchitale@ventanamicro.com wrote:
> > When svinval is supported the local_flush_tlb_page*
> > functions would prefer to use the following sequence
> > to optimize the tlb flushes instead of a simple sfence.vma:
> >
> > sfence.w.inval
> > svinval.vma
> > .
> > .
> > svinval.vma
> > sfence.inval.ir
> >
> > The maximum number of consecutive svinval.vma instructions
> > that can be executed in local_flush_tlb_page* functions is
> > limited to PTRS_PER_PTE. This is required to avoid soft
> > lockups and the approach is similar to that used in arm64.
> >
> > Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> > ---
> > arch/riscv/include/asm/tlbflush.h | 12 ++++
> > arch/riscv/kernel/setup.c | 1 +
> > arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
> > 3 files changed, 123 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> > index 801019381dea..b535467c99f0 100644
> > --- a/arch/riscv/include/asm/tlbflush.h
> > +++ b/arch/riscv/include/asm/tlbflush.h
> > @@ -22,6 +22,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
> > {
> > ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
> > }
> > +
> > +void riscv_tlbflush_init(void);
> > +void __riscv_sfence_w_inval(void);
> > +void __riscv_sfence_inval_ir(void);
> > +void __riscv_sinval_vma(unsigned long addr);
> > +void __riscv_sinval_vma_asid(unsigned long addr, unsigned long asid);
> > +
> > +/* Check if we can use sinval for tlb flush */
> > +DECLARE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> > +#define riscv_use_flush_tlb_svinval() \
> > + static_branch_unlikely(&riscv_flush_tlb_svinval)
> > +
> > #else /* CONFIG_MMU */
> > #define local_flush_tlb_all() do { } while (0)
> > #define local_flush_tlb_page(addr) do { } while (0)
> > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> > index 834eb652a7b9..13de04259de9 100644
> > --- a/arch/riscv/kernel/setup.c
> > +++ b/arch/riscv/kernel/setup.c
> > @@ -295,6 +295,7 @@ void __init setup_arch(char **cmdline_p)
> > #endif
> >
> > riscv_fill_hwcap();
> > + riscv_tlbflush_init();
> > }
> >
> > static int __init topology_init(void)
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 27a7db8eb2c4..800953f9121e 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -1,11 +1,14 @@
> > // SPDX-License-Identifier: GPL-2.0
> >
> > +#define pr_fmt(fmt) "riscv: " fmt
> > #include <linux/mm.h>
> > #include <linux/smp.h>
> > #include <linux/sched.h>
> > #include <asm/sbi.h>
> > #include <asm/mmu_context.h>
> >
> > +static unsigned long tlb_flush_all_threshold __read_mostly = PTRS_PER_PTE;
> > +
> > static inline void local_flush_tlb_all_asid(unsigned long asid)
> > {
> > __asm__ __volatile__ ("sfence.vma x0, %0"
> > @@ -23,22 +26,110 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
> > : "memory");
> > }
> >
> > +static inline void riscv_sfence_inval_ir(void)
> > +{
> > + /*
> > + * SFENCE.INVAL.IR
> > + * 0001100 00001 00000 000 00000 1110011
> > + */
> > + asm volatile (".word 0x18100073" ::: "memory");
> > +}
> > +
> > +static inline void riscv_sfence_w_inval(void)
> > +{
> > + /*
> > + * SFENCE.W.INVAL
> > + * 0001100 00000 00000 000 00000 1110011
> > + */
> > + asm volatile (".word 0x18000073" ::: "memory");
> > +}
> > +
> > +static inline void riscv_sinval_vma_asid(unsigned long vma, unsigned long asid)
> > +{
> > + /*
> > + * rs1 = a0 (VMA)
> > + * rs2 = a1 (asid)
> > + * SINVAL.VMA a0, a1
> > + * 0001011 01011 01010 000 00000 1110011
> > + */
> > + asm volatile ("srli a0, %0, 2\n"
> > + "add a1, %1, zero\n"
> > + ".word 0x16B50073\n"
> > + :: "r" (vma), "r" (asid)
> > + : "a0", "a1", "memory");
> > +}
> > +
> > +static inline void riscv_sinval_vma(unsigned long vma)
> > +{
> > + /*
> > + * rs1 = a0 (VMA)
> > + * rs2 = 0
> > + * SINVAL.VMA a0
> > + * 0001011 00000 01010 000 00000 1110011
> > + */
> > + asm volatile ("srli a0, %0, 2\n"
> > + ".word 0x16050073\n"
> > + :: "r" (vma) : "a0", "memory");
> > +}
> > +
> > static inline void local_flush_tlb_range(unsigned long start,
> > unsigned long size, unsigned long stride)
> > {
> > - if (size <= stride)
> > - local_flush_tlb_page(start);
> > - else
> > + if ((size / stride) <= tlb_flush_all_threshold) {
> > + if (riscv_use_flush_tlb_svinval()) {
> > + riscv_sfence_w_inval();
> > + while (size) {
> > + riscv_sinval_vma(start);
> > + start += stride;
> > + if (size > stride)
> > + size -= stride;
> > + else
> > + size = 0;
> > + }
> > + riscv_sfence_inval_ir();
> > + } else {
> > + while (size) {
> > + local_flush_tlb_page(start);
> > + start += stride;
> > + if (size > stride)
> > + size -= stride;
> > + else
> > + size = 0;
> > + }
> > + }
> > + } else {
> > local_flush_tlb_all();
> > + }
> > }
> >
> > static inline void local_flush_tlb_range_asid(unsigned long start,
> > unsigned long size, unsigned long stride, unsigned long asid)
> > {
> > - if (size <= stride)
> > - local_flush_tlb_page_asid(start, asid);
> > - else
> > + if ((size / stride) <= tlb_flush_all_threshold) {
> > + if (riscv_use_flush_tlb_svinval()) {
> > + riscv_sfence_w_inval();
> > + while (size) {
> > + riscv_sinval_vma_asid(start, asid);
> > + start += stride;
> > + if (size > stride)
> > + size -= stride;
> > + else
> > + size = 0;
> > + }
> > + riscv_sfence_inval_ir();
> > + } else {
> > + while (size) {
> > + local_flush_tlb_page_asid(start, asid);
> > + start += stride;
> > + if (size > stride)
> > + size -= stride;
> > + else
> > + size = 0;
> > + }
> > + }
> > + } else {
> > local_flush_tlb_all_asid(asid);
> > + }
>
> Sorry to dig up an old post, but svinval came up and I started looking
> at this again.
>
> There's some other issues with the code, but at a higher level it's not
> really clear this is a useful thing to do: splitting up sfence.vma makes
> sense, but if there's no operations between the sfence.w.inval and
> sinval.vma (or sinval.vma and sfence.inval.ir) then we're not giving the
> HW anything else to do concurrently with the flushes and thus I don't
> see how this would improve performance -- if anything we're just
> emitting more instructions to do the samething, which would be bad for
> performance.
Actually this approach is similar to ARM with addition of only two
instructions: sfence.w.inval and sfence.inval.ir and it is also
described in the non normative text in the priv spec ch.7.
>
> So I think if we're going to use these we really need to also split up
> the TLB flush operations, so we can give something for the HW to do
> currently with the flushing. That's be a pretty big change to our TLB
> model, though: essentially we're going from two phases (write the page
> tables, then flush) to at least three phases (write the page tables,
> start the flush, finish the flush) or even four (if we want to insert
> work between both sfence.w.inval->sinval.vma and
> sinval.vma->sfence.inval.ir).
I think the extension is just to allow for a more efficient TLB
flushing operation in the hardware and no change in the software TLB
model is expected.
Besides, the extension would be disabled by default and only get
enabled on platforms that enable it in DT.
>
> Not sure if I'm just misunderstanding the performance characteristics,
> though. Maybe I'm over-thinking things and sfence.vma is heavy-weight
> enough that implementations are flushing the pipeline for even the
> page-based flavors and thus sfence.vma-based loops are very slow?
>
> Do you have any benchmarks for the Svinval bits here (which are mixed in
> with some non-global flushing, that's a different optimization that
> needs to be split out)?
>
>
> > }
> >
> > static void __ipi_flush_tlb_all(void *info)
> > @@ -149,3 +240,16 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> > __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
> > }
> > #endif
> > +
> > +DEFINE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> > +EXPORT_SYMBOL_GPL(riscv_flush_tlb_svinval);
> > +
> > +void riscv_tlbflush_init(void)
> > +{
> > + if (riscv_isa_extension_available(NULL, SVINVAL)) {
> > + pr_info("Svinval extension supported\n");
> > + static_branch_enable(&riscv_flush_tlb_svinval);
> > + } else {
> > + static_branch_disable(&riscv_flush_tlb_svinval);
> > + }
> > +}
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma
2023-06-13 4:02 ` Mayuresh Chitale
@ 2023-06-13 13:43 ` Alexandre Ghiti
2023-06-15 5:51 ` Mayuresh Chitale
0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Ghiti @ 2023-06-13 13:43 UTC (permalink / raw)
To: Mayuresh Chitale, Palmer Dabbelt
Cc: Paul Walmsley, aou, atishp, anup, linux-riscv
On 13/06/2023 06:02, Mayuresh Chitale wrote:
> On Fri, Dec 9, 2022 at 12:33 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> On Sun, 24 Apr 2022 02:02:16 PDT (-0700), mchitale@ventanamicro.com wrote:
>>> When svinval is supported the local_flush_tlb_page*
>>> functions would prefer to use the following sequence
>>> to optimize the tlb flushes instead of a simple sfence.vma:
>>>
>>> sfence.w.inval
>>> svinval.vma
>>> .
>>> .
>>> svinval.vma
>>> sfence.inval.ir
>>>
>>> The maximum number of consecutive svinval.vma instructions
>>> that can be executed in local_flush_tlb_page* functions is
>>> limited to PTRS_PER_PTE. This is required to avoid soft
>>> lockups and the approach is similar to that used in arm64.
>>>
>>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
>>> ---
>>> arch/riscv/include/asm/tlbflush.h | 12 ++++
>>> arch/riscv/kernel/setup.c | 1 +
>>> arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
>>> 3 files changed, 123 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
>>> index 801019381dea..b535467c99f0 100644
>>> --- a/arch/riscv/include/asm/tlbflush.h
>>> +++ b/arch/riscv/include/asm/tlbflush.h
>>> @@ -22,6 +22,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
>>> {
>>> ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
>>> }
>>> +
>>> +void riscv_tlbflush_init(void);
>>> +void __riscv_sfence_w_inval(void);
>>> +void __riscv_sfence_inval_ir(void);
>>> +void __riscv_sinval_vma(unsigned long addr);
>>> +void __riscv_sinval_vma_asid(unsigned long addr, unsigned long asid);
>>> +
>>> +/* Check if we can use sinval for tlb flush */
>>> +DECLARE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
>>> +#define riscv_use_flush_tlb_svinval() \
>>> + static_branch_unlikely(&riscv_flush_tlb_svinval)
>>> +
>>> #else /* CONFIG_MMU */
>>> #define local_flush_tlb_all() do { } while (0)
>>> #define local_flush_tlb_page(addr) do { } while (0)
>>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
>>> index 834eb652a7b9..13de04259de9 100644
>>> --- a/arch/riscv/kernel/setup.c
>>> +++ b/arch/riscv/kernel/setup.c
>>> @@ -295,6 +295,7 @@ void __init setup_arch(char **cmdline_p)
>>> #endif
>>>
>>> riscv_fill_hwcap();
>>> + riscv_tlbflush_init();
>>> }
>>>
>>> static int __init topology_init(void)
>>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
>>> index 27a7db8eb2c4..800953f9121e 100644
>>> --- a/arch/riscv/mm/tlbflush.c
>>> +++ b/arch/riscv/mm/tlbflush.c
>>> @@ -1,11 +1,14 @@
>>> // SPDX-License-Identifier: GPL-2.0
>>>
>>> +#define pr_fmt(fmt) "riscv: " fmt
>>> #include <linux/mm.h>
>>> #include <linux/smp.h>
>>> #include <linux/sched.h>
>>> #include <asm/sbi.h>
>>> #include <asm/mmu_context.h>
>>>
>>> +static unsigned long tlb_flush_all_threshold __read_mostly = PTRS_PER_PTE;
>>> +
>>> static inline void local_flush_tlb_all_asid(unsigned long asid)
>>> {
>>> __asm__ __volatile__ ("sfence.vma x0, %0"
>>> @@ -23,22 +26,110 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
>>> : "memory");
>>> }
>>>
>>> +static inline void riscv_sfence_inval_ir(void)
>>> +{
>>> + /*
>>> + * SFENCE.INVAL.IR
>>> + * 0001100 00001 00000 000 00000 1110011
>>> + */
>>> + asm volatile (".word 0x18100073" ::: "memory");
>>> +}
>>> +
>>> +static inline void riscv_sfence_w_inval(void)
>>> +{
>>> + /*
>>> + * SFENCE.W.INVAL
>>> + * 0001100 00000 00000 000 00000 1110011
>>> + */
>>> + asm volatile (".word 0x18000073" ::: "memory");
>>> +}
>>> +
>>> +static inline void riscv_sinval_vma_asid(unsigned long vma, unsigned long asid)
>>> +{
>>> + /*
>>> + * rs1 = a0 (VMA)
>>> + * rs2 = a1 (asid)
>>> + * SINVAL.VMA a0, a1
>>> + * 0001011 01011 01010 000 00000 1110011
>>> + */
>>> + asm volatile ("srli a0, %0, 2\n"
>>> + "add a1, %1, zero\n"
>>> + ".word 0x16B50073\n"
>>> + :: "r" (vma), "r" (asid)
>>> + : "a0", "a1", "memory");
>>> +}
>>> +
>>> +static inline void riscv_sinval_vma(unsigned long vma)
>>> +{
>>> + /*
>>> + * rs1 = a0 (VMA)
>>> + * rs2 = 0
>>> + * SINVAL.VMA a0
>>> + * 0001011 00000 01010 000 00000 1110011
>>> + */
>>> + asm volatile ("srli a0, %0, 2\n"
>>> + ".word 0x16050073\n"
>>> + :: "r" (vma) : "a0", "memory");
>>> +}
>>> +
>>> static inline void local_flush_tlb_range(unsigned long start,
>>> unsigned long size, unsigned long stride)
>>> {
>>> - if (size <= stride)
>>> - local_flush_tlb_page(start);
>>> - else
>>> + if ((size / stride) <= tlb_flush_all_threshold) {
>>> + if (riscv_use_flush_tlb_svinval()) {
>>> + riscv_sfence_w_inval();
>>> + while (size) {
>>> + riscv_sinval_vma(start);
>>> + start += stride;
>>> + if (size > stride)
>>> + size -= stride;
>>> + else
>>> + size = 0;
>>> + }
>>> + riscv_sfence_inval_ir();
>>> + } else {
>>> + while (size) {
>>> + local_flush_tlb_page(start);
>>> + start += stride;
>>> + if (size > stride)
>>> + size -= stride;
>>> + else
>>> + size = 0;
>>> + }
>>> + }
>>> + } else {
>>> local_flush_tlb_all();
>>> + }
>>> }
>>>
>>> static inline void local_flush_tlb_range_asid(unsigned long start,
>>> unsigned long size, unsigned long stride, unsigned long asid)
>>> {
>>> - if (size <= stride)
>>> - local_flush_tlb_page_asid(start, asid);
>>> - else
>>> + if ((size / stride) <= tlb_flush_all_threshold) {
>>> + if (riscv_use_flush_tlb_svinval()) {
>>> + riscv_sfence_w_inval();
>>> + while (size) {
>>> + riscv_sinval_vma_asid(start, asid);
>>> + start += stride;
>>> + if (size > stride)
>>> + size -= stride;
>>> + else
>>> + size = 0;
>>> + }
>>> + riscv_sfence_inval_ir();
>>> + } else {
>>> + while (size) {
>>> + local_flush_tlb_page_asid(start, asid);
>>> + start += stride;
>>> + if (size > stride)
>>> + size -= stride;
>>> + else
>>> + size = 0;
>>> + }
>>> + }
>>> + } else {
>>> local_flush_tlb_all_asid(asid);
>>> + }
>> Sorry to dig up an old post, but svinval came up and I started looking
>> at this again.
>>
>> There's some other issues with the code, but at a higher level it's not
>> really clear this is a useful thing to do: splitting up sfence.vma makes
>> sense, but if there's no operations between the sfence.w.inval and
>> sinval.vma (or sinval.vma and sfence.inval.ir) then we're not giving the
>> HW anything else to do concurrently with the flushes and thus I don't
>> see how this would improve performance -- if anything we're just
>> emitting more instructions to do the samething, which would be bad for
>> performance.
> Actually this approach is similar to ARM with addition of only two
> instructions: sfence.w.inval and sfence.inval.ir and it is also
> described in the non normative text in the priv spec ch.7.
>
>> So I think if we're going to use these we really need to also split up
>> the TLB flush operations, so we can give something for the HW to do
>> currently with the flushing. That's be a pretty big change to our TLB
>> model, though: essentially we're going from two phases (write the page
>> tables, then flush) to at least three phases (write the page tables,
>> start the flush, finish the flush) or even four (if we want to insert
>> work between both sfence.w.inval->sinval.vma and
>> sinval.vma->sfence.inval.ir).
> I think the extension is just to allow for a more efficient TLB
> flushing operation in the hardware and no change in the software TLB
> model is expected.
> Besides, the extension would be disabled by default and only get
> enabled on platforms that enable it in DT.
>
>> Not sure if I'm just misunderstanding the performance characteristics,
>> though. Maybe I'm over-thinking things and sfence.vma is heavy-weight
>> enough that implementations are flushing the pipeline for even the
>> page-based flavors and thus sfence.vma-based loops are very slow?
>
>> Do you have any benchmarks for the Svinval bits here (which are mixed in
>> with some non-global flushing, that's a different optimization that
>> needs to be split out)?
Thanks for digging that up!
I have a patchset to optimize the TLB flushes (which also uses this
concept of threshold, also present in x86) and I don't want to propose
it upstream since I can't measure any performance benefits (which is
very weird actually): did you actually benchmark this? If yes, I'd like
to know how you did it :)
Thanks,
Alex
>>
>>> }
>>>
>>> static void __ipi_flush_tlb_all(void *info)
>>> @@ -149,3 +240,16 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
>>> __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
>>> }
>>> #endif
>>> +
>>> +DEFINE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
>>> +EXPORT_SYMBOL_GPL(riscv_flush_tlb_svinval);
>>> +
>>> +void riscv_tlbflush_init(void)
>>> +{
>>> + if (riscv_isa_extension_available(NULL, SVINVAL)) {
>>> + pr_info("Svinval extension supported\n");
>>> + static_branch_enable(&riscv_flush_tlb_svinval);
>>> + } else {
>>> + static_branch_disable(&riscv_flush_tlb_svinval);
>>> + }
>>> +}
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma
2023-06-13 13:43 ` Alexandre Ghiti
@ 2023-06-15 5:51 ` Mayuresh Chitale
0 siblings, 0 replies; 8+ messages in thread
From: Mayuresh Chitale @ 2023-06-15 5:51 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Palmer Dabbelt, Paul Walmsley, aou, atishp, anup, linux-riscv
On Tue, Jun 13, 2023 at 7:13 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
>
> On 13/06/2023 06:02, Mayuresh Chitale wrote:
> > On Fri, Dec 9, 2022 at 12:33 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >> On Sun, 24 Apr 2022 02:02:16 PDT (-0700), mchitale@ventanamicro.com wrote:
> >>> When svinval is supported the local_flush_tlb_page*
> >>> functions would prefer to use the following sequence
> >>> to optimize the tlb flushes instead of a simple sfence.vma:
> >>>
> >>> sfence.w.inval
> >>> svinval.vma
> >>> .
> >>> .
> >>> svinval.vma
> >>> sfence.inval.ir
> >>>
> >>> The maximum number of consecutive svinval.vma instructions
> >>> that can be executed in local_flush_tlb_page* functions is
> >>> limited to PTRS_PER_PTE. This is required to avoid soft
> >>> lockups and the approach is similar to that used in arm64.
> >>>
> >>> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
> >>> ---
> >>> arch/riscv/include/asm/tlbflush.h | 12 ++++
> >>> arch/riscv/kernel/setup.c | 1 +
> >>> arch/riscv/mm/tlbflush.c | 116 ++++++++++++++++++++++++++++--
> >>> 3 files changed, 123 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> >>> index 801019381dea..b535467c99f0 100644
> >>> --- a/arch/riscv/include/asm/tlbflush.h
> >>> +++ b/arch/riscv/include/asm/tlbflush.h
> >>> @@ -22,6 +22,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
> >>> {
> >>> ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
> >>> }
> >>> +
> >>> +void riscv_tlbflush_init(void);
> >>> +void __riscv_sfence_w_inval(void);
> >>> +void __riscv_sfence_inval_ir(void);
> >>> +void __riscv_sinval_vma(unsigned long addr);
> >>> +void __riscv_sinval_vma_asid(unsigned long addr, unsigned long asid);
> >>> +
> >>> +/* Check if we can use sinval for tlb flush */
> >>> +DECLARE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> >>> +#define riscv_use_flush_tlb_svinval() \
> >>> + static_branch_unlikely(&riscv_flush_tlb_svinval)
> >>> +
> >>> #else /* CONFIG_MMU */
> >>> #define local_flush_tlb_all() do { } while (0)
> >>> #define local_flush_tlb_page(addr) do { } while (0)
> >>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> >>> index 834eb652a7b9..13de04259de9 100644
> >>> --- a/arch/riscv/kernel/setup.c
> >>> +++ b/arch/riscv/kernel/setup.c
> >>> @@ -295,6 +295,7 @@ void __init setup_arch(char **cmdline_p)
> >>> #endif
> >>>
> >>> riscv_fill_hwcap();
> >>> + riscv_tlbflush_init();
> >>> }
> >>>
> >>> static int __init topology_init(void)
> >>> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> >>> index 27a7db8eb2c4..800953f9121e 100644
> >>> --- a/arch/riscv/mm/tlbflush.c
> >>> +++ b/arch/riscv/mm/tlbflush.c
> >>> @@ -1,11 +1,14 @@
> >>> // SPDX-License-Identifier: GPL-2.0
> >>>
> >>> +#define pr_fmt(fmt) "riscv: " fmt
> >>> #include <linux/mm.h>
> >>> #include <linux/smp.h>
> >>> #include <linux/sched.h>
> >>> #include <asm/sbi.h>
> >>> #include <asm/mmu_context.h>
> >>>
> >>> +static unsigned long tlb_flush_all_threshold __read_mostly = PTRS_PER_PTE;
> >>> +
> >>> static inline void local_flush_tlb_all_asid(unsigned long asid)
> >>> {
> >>> __asm__ __volatile__ ("sfence.vma x0, %0"
> >>> @@ -23,22 +26,110 @@ static inline void local_flush_tlb_page_asid(unsigned long addr,
> >>> : "memory");
> >>> }
> >>>
> >>> +static inline void riscv_sfence_inval_ir(void)
> >>> +{
> >>> + /*
> >>> + * SFENCE.INVAL.IR
> >>> + * 0001100 00001 00000 000 00000 1110011
> >>> + */
> >>> + asm volatile (".word 0x18100073" ::: "memory");
> >>> +}
> >>> +
> >>> +static inline void riscv_sfence_w_inval(void)
> >>> +{
> >>> + /*
> >>> + * SFENCE.W.INVAL
> >>> + * 0001100 00000 00000 000 00000 1110011
> >>> + */
> >>> + asm volatile (".word 0x18000073" ::: "memory");
> >>> +}
> >>> +
> >>> +static inline void riscv_sinval_vma_asid(unsigned long vma, unsigned long asid)
> >>> +{
> >>> + /*
> >>> + * rs1 = a0 (VMA)
> >>> + * rs2 = a1 (asid)
> >>> + * SINVAL.VMA a0, a1
> >>> + * 0001011 01011 01010 000 00000 1110011
> >>> + */
> >>> + asm volatile ("srli a0, %0, 2\n"
> >>> + "add a1, %1, zero\n"
> >>> + ".word 0x16B50073\n"
> >>> + :: "r" (vma), "r" (asid)
> >>> + : "a0", "a1", "memory");
> >>> +}
> >>> +
> >>> +static inline void riscv_sinval_vma(unsigned long vma)
> >>> +{
> >>> + /*
> >>> + * rs1 = a0 (VMA)
> >>> + * rs2 = 0
> >>> + * SINVAL.VMA a0
> >>> + * 0001011 00000 01010 000 00000 1110011
> >>> + */
> >>> + asm volatile ("srli a0, %0, 2\n"
> >>> + ".word 0x16050073\n"
> >>> + :: "r" (vma) : "a0", "memory");
> >>> +}
> >>> +
> >>> static inline void local_flush_tlb_range(unsigned long start,
> >>> unsigned long size, unsigned long stride)
> >>> {
> >>> - if (size <= stride)
> >>> - local_flush_tlb_page(start);
> >>> - else
> >>> + if ((size / stride) <= tlb_flush_all_threshold) {
> >>> + if (riscv_use_flush_tlb_svinval()) {
> >>> + riscv_sfence_w_inval();
> >>> + while (size) {
> >>> + riscv_sinval_vma(start);
> >>> + start += stride;
> >>> + if (size > stride)
> >>> + size -= stride;
> >>> + else
> >>> + size = 0;
> >>> + }
> >>> + riscv_sfence_inval_ir();
> >>> + } else {
> >>> + while (size) {
> >>> + local_flush_tlb_page(start);
> >>> + start += stride;
> >>> + if (size > stride)
> >>> + size -= stride;
> >>> + else
> >>> + size = 0;
> >>> + }
> >>> + }
> >>> + } else {
> >>> local_flush_tlb_all();
> >>> + }
> >>> }
> >>>
> >>> static inline void local_flush_tlb_range_asid(unsigned long start,
> >>> unsigned long size, unsigned long stride, unsigned long asid)
> >>> {
> >>> - if (size <= stride)
> >>> - local_flush_tlb_page_asid(start, asid);
> >>> - else
> >>> + if ((size / stride) <= tlb_flush_all_threshold) {
> >>> + if (riscv_use_flush_tlb_svinval()) {
> >>> + riscv_sfence_w_inval();
> >>> + while (size) {
> >>> + riscv_sinval_vma_asid(start, asid);
> >>> + start += stride;
> >>> + if (size > stride)
> >>> + size -= stride;
> >>> + else
> >>> + size = 0;
> >>> + }
> >>> + riscv_sfence_inval_ir();
> >>> + } else {
> >>> + while (size) {
> >>> + local_flush_tlb_page_asid(start, asid);
> >>> + start += stride;
> >>> + if (size > stride)
> >>> + size -= stride;
> >>> + else
> >>> + size = 0;
> >>> + }
> >>> + }
> >>> + } else {
> >>> local_flush_tlb_all_asid(asid);
> >>> + }
> >> Sorry to dig up an old post, but svinval came up and I started looking
> >> at this again.
> >>
> >> There's some other issues with the code, but at a higher level it's not
> >> really clear this is a useful thing to do: splitting up sfence.vma makes
> >> sense, but if there's no operations between the sfence.w.inval and
> >> sinval.vma (or sinval.vma and sfence.inval.ir) then we're not giving the
> >> HW anything else to do concurrently with the flushes and thus I don't
> >> see how this would improve performance -- if anything we're just
> >> emitting more instructions to do the samething, which would be bad for
> >> performance.
> > Actually this approach is similar to ARM with addition of only two
> > instructions: sfence.w.inval and sfence.inval.ir and it is also
> > described in the non normative text in the priv spec ch.7.
> >
> >> So I think if we're going to use these we really need to also split up
> >> the TLB flush operations, so we can give something for the HW to do
> >> currently with the flushing. That's be a pretty big change to our TLB
> >> model, though: essentially we're going from two phases (write the page
> >> tables, then flush) to at least three phases (write the page tables,
> >> start the flush, finish the flush) or even four (if we want to insert
> >> work between both sfence.w.inval->sinval.vma and
> >> sinval.vma->sfence.inval.ir).
> > I think the extension is just to allow for a more efficient TLB
> > flushing operation in the hardware and no change in the software TLB
> > model is expected.
> > Besides, the extension would be disabled by default and only get
> > enabled on platforms that enable it in DT.
> >
> >> Not sure if I'm just misunderstanding the performance characteristics,
> >> though. Maybe I'm over-thinking things and sfence.vma is heavy-weight
> >> enough that implementations are flushing the pipeline for even the
> >> page-based flavors and thus sfence.vma-based loops are very slow?
> >
> >> Do you have any benchmarks for the Svinval bits here (which are mixed in
> >> with some non-global flushing, that's a different optimization that
> >> needs to be split out)?
>
>
> Thanks for digging that up!
>
> I have a patchset to optimize the TLB flushes (which also uses this
> concept of threshold, also present in x86) and I don't want to propose
> it upstream since I can't measure any performance benefits (which is
> very weird actually): did you actually benchmark this? If yes, I'd like
> to know how you did it :)
I couldn't run any benchmark as such. I just used a few tests from
stress-ng and the kernel selftests.
>
> Thanks,
>
> Alex
>
>
> >>
> >>> }
> >>>
> >>> static void __ipi_flush_tlb_all(void *info)
> >>> @@ -149,3 +240,16 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> >>> __flush_tlb_range(vma->vm_mm, start, end - start, PMD_SIZE);
> >>> }
> >>> #endif
> >>> +
> >>> +DEFINE_STATIC_KEY_FALSE(riscv_flush_tlb_svinval);
> >>> +EXPORT_SYMBOL_GPL(riscv_flush_tlb_svinval);
> >>> +
> >>> +void riscv_tlbflush_init(void)
> >>> +{
> >>> + if (riscv_isa_extension_available(NULL, SVINVAL)) {
> >>> + pr_info("Svinval extension supported\n");
> >>> + static_branch_enable(&riscv_flush_tlb_svinval);
> >>> + } else {
> >>> + static_branch_disable(&riscv_flush_tlb_svinval);
> >>> + }
> >>> +}
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-06-15 5:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-24 9:02 [RFC PATCH v2 0/2] Risc-V Svinval support Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 1/2] riscv: enum for svinval extension Mayuresh Chitale
2022-04-24 9:02 ` [RFC PATCH v2 2/2] riscv: mm: use svinval instructions instead of sfence.vma Mayuresh Chitale
2022-12-08 19:03 ` Palmer Dabbelt
2023-06-13 4:02 ` Mayuresh Chitale
2023-06-13 13:43 ` Alexandre Ghiti
2023-06-15 5:51 ` Mayuresh Chitale
2022-06-02 3:52 ` [RFC PATCH v2 0/2] Risc-V Svinval support Palmer Dabbelt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.