All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page
@ 2023-02-09 15:26 ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch4's commit message for more details).

For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).

Based on riscv-linux/for-next plus the dependencies listed below.

Dependencies:
https://lore.kernel.org/all/20230108163356.3063839-1-conor@kernel.org/
https://lore.kernel.org/all/20230105192610.1940841-1-heiko@sntech.de/

The patches are also available here
https://github.com/jones-drew/linux/commits/riscv/zicboz-v4

To test over QEMU this branch may be used to enable Zicboz
https://gitlab.com/jones-drew/qemu/-/commits/riscv/zicboz

To test running a KVM guest with Zicboz this kvmtool branch may be used
https://github.com/jones-drew/kvmtool/commits/riscv/zicboz

Thanks,
drew

[1] I ported the functions under test to userspace and linked them with
    a test program. Then, I ran them under gdb with a script[3] which
    counted instructions by single stepping.
[2] https://lore.kernel.org/all/20221027130247.31634-1-ajones@ventanamicro.com/
[3] https://gist.github.com/jones-drew/487791c956ceca8c18adc2847eec9c60

v4:
  - Rebased on latest for-next which allowed dropping one dependency
  - Added "RISC-V: alternatives: Support patching multiple insns in assembly"
    since I needed to use more than one instruction in an ALTERNATIVE call
    from assembly. I can post this patch separately as a fix if desired.
  - Improved the dt-binding patch commit message [Conor]
  - Picked up some tags from Conor and Rob (I kept Conor's a-b on the
    clear_page patch, even though there are several changes to it, because
    I interpreted the a-b as "OK by me to implement a Zicboz clear_page")
  - Added "RISC-V: cpufeature: Put vendor_id to work", which was necessary
    for how I wanted to use ALTERNATIVE in clear_page.
  - Fallback to memset when the Zicboz block size is too big for the
    Zicboz implementation of clear_page [Palmer]
  - Use lw without la in clear_page impl [Palmer]
  - Implement a Duff device for clear_page using the new 'vendor_id'
    alternatives support [drew]

v3:
  - CC'ed DT list
  - Improved commit message of DT bindings patch to point out relationship
    with cbom-block-size
  - Picked up an a-b from Conor

v2:
  - s/blksz/block_size/, improved commit message for "RISC-V: Add Zicboz
    detection and block size parsing", isa ext sorting [Conor]
  - Added dt binding patch [Heiko]
  - Picked up r-b's from Conor, Heiko, and Anup
  - Moved config symbol and CBO_zero() introduction to "RISC-V: Use Zicboz
    in clear_page when available" and improved its commit message and
    implementation (unrolled four times) [drew]
  - Dropped memset() patches [drew]
  - Rebased on ae4d39f75308 ("Merge patch "RISC-V: fix incorrect type of
    ARCH_CANAAN_K210_DTB_SOURCE"") plus the dependencies

Andrew Jones (8):
  RISC-V: alternatives: Support patching multiple insns in assembly
  RISC-V: Factor out body of riscv_init_cbom_blocksize loop
  dt-bindings: riscv: Document cboz-block-size
  RISC-V: Add Zicboz detection and block size parsing
  RISC-V: cpufeature: Put vendor_id to work
  RISC-V: Use Zicboz in clear_page when available
  RISC-V: KVM: Provide UAPI for Zicboz block size
  RISC-V: KVM: Expose Zicboz to the guest

 .../devicetree/bindings/riscv/cpus.yaml       |  5 ++
 arch/riscv/Kconfig                            | 13 ++++
 arch/riscv/include/asm/alternative-macros.h   |  6 +-
 arch/riscv/include/asm/cacheflush.h           |  3 +-
 arch/riscv/include/asm/hwcap.h                |  1 +
 arch/riscv/include/asm/insn-def.h             |  4 ++
 arch/riscv/include/asm/page.h                 |  6 +-
 arch/riscv/include/uapi/asm/kvm.h             |  2 +
 arch/riscv/kernel/cpu.c                       |  1 +
 arch/riscv/kernel/cpufeature.c                | 31 +++++++-
 arch/riscv/kernel/setup.c                     |  2 +-
 arch/riscv/kvm/vcpu.c                         | 11 +++
 arch/riscv/lib/Makefile                       |  1 +
 arch/riscv/lib/clear_page.S                   | 71 +++++++++++++++++++
 arch/riscv/mm/cacheflush.c                    | 64 ++++++++++-------
 15 files changed, 187 insertions(+), 34 deletions(-)
 create mode 100644 arch/riscv/lib/clear_page.S

-- 
2.39.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page
@ 2023-02-09 15:26 ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch4's commit message for more details).

For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).

Based on riscv-linux/for-next plus the dependencies listed below.

Dependencies:
https://lore.kernel.org/all/20230108163356.3063839-1-conor@kernel.org/
https://lore.kernel.org/all/20230105192610.1940841-1-heiko@sntech.de/

The patches are also available here
https://github.com/jones-drew/linux/commits/riscv/zicboz-v4

To test over QEMU this branch may be used to enable Zicboz
https://gitlab.com/jones-drew/qemu/-/commits/riscv/zicboz

To test running a KVM guest with Zicboz this kvmtool branch may be used
https://github.com/jones-drew/kvmtool/commits/riscv/zicboz

Thanks,
drew

[1] I ported the functions under test to userspace and linked them with
    a test program. Then, I ran them under gdb with a script[3] which
    counted instructions by single stepping.
[2] https://lore.kernel.org/all/20221027130247.31634-1-ajones@ventanamicro.com/
[3] https://gist.github.com/jones-drew/487791c956ceca8c18adc2847eec9c60

v4:
  - Rebased on latest for-next which allowed dropping one dependency
  - Added "RISC-V: alternatives: Support patching multiple insns in assembly"
    since I needed to use more than one instruction in an ALTERNATIVE call
    from assembly. I can post this patch separately as a fix if desired.
  - Improved the dt-binding patch commit message [Conor]
  - Picked up some tags from Conor and Rob (I kept Conor's a-b on the
    clear_page patch, even though there are several changes to it, because
    I interpreted the a-b as "OK by me to implement a Zicboz clear_page")
  - Added "RISC-V: cpufeature: Put vendor_id to work", which was necessary
    for how I wanted to use ALTERNATIVE in clear_page.
  - Fallback to memset when the Zicboz block size is too big for the
    Zicboz implementation of clear_page [Palmer]
  - Use lw without la in clear_page impl [Palmer]
  - Implement a Duff device for clear_page using the new 'vendor_id'
    alternatives support [drew]

v3:
  - CC'ed DT list
  - Improved commit message of DT bindings patch to point out relationship
    with cbom-block-size
  - Picked up an a-b from Conor

v2:
  - s/blksz/block_size/, improved commit message for "RISC-V: Add Zicboz
    detection and block size parsing", isa ext sorting [Conor]
  - Added dt binding patch [Heiko]
  - Picked up r-b's from Conor, Heiko, and Anup
  - Moved config symbol and CBO_zero() introduction to "RISC-V: Use Zicboz
    in clear_page when available" and improved its commit message and
    implementation (unrolled four times) [drew]
  - Dropped memset() patches [drew]
  - Rebased on ae4d39f75308 ("Merge patch "RISC-V: fix incorrect type of
    ARCH_CANAAN_K210_DTB_SOURCE"") plus the dependencies

Andrew Jones (8):
  RISC-V: alternatives: Support patching multiple insns in assembly
  RISC-V: Factor out body of riscv_init_cbom_blocksize loop
  dt-bindings: riscv: Document cboz-block-size
  RISC-V: Add Zicboz detection and block size parsing
  RISC-V: cpufeature: Put vendor_id to work
  RISC-V: Use Zicboz in clear_page when available
  RISC-V: KVM: Provide UAPI for Zicboz block size
  RISC-V: KVM: Expose Zicboz to the guest

 .../devicetree/bindings/riscv/cpus.yaml       |  5 ++
 arch/riscv/Kconfig                            | 13 ++++
 arch/riscv/include/asm/alternative-macros.h   |  6 +-
 arch/riscv/include/asm/cacheflush.h           |  3 +-
 arch/riscv/include/asm/hwcap.h                |  1 +
 arch/riscv/include/asm/insn-def.h             |  4 ++
 arch/riscv/include/asm/page.h                 |  6 +-
 arch/riscv/include/uapi/asm/kvm.h             |  2 +
 arch/riscv/kernel/cpu.c                       |  1 +
 arch/riscv/kernel/cpufeature.c                | 31 +++++++-
 arch/riscv/kernel/setup.c                     |  2 +-
 arch/riscv/kvm/vcpu.c                         | 11 +++
 arch/riscv/lib/Makefile                       |  1 +
 arch/riscv/lib/clear_page.S                   | 71 +++++++++++++++++++
 arch/riscv/mm/cacheflush.c                    | 64 ++++++++++-------
 15 files changed, 187 insertions(+), 34 deletions(-)
 create mode 100644 arch/riscv/lib/clear_page.S

-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 1/8] RISC-V: alternatives: Support patching multiple insns in assembly
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

As pointed out in commit d374a16539b1 ("RISC-V: fix compile error
from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
parameters passed to macros within macros to avoid spaces being
interpreted as separators. ALT_NEW_CONTENT was trying to handle
this by defining new_c has a vararg, but this isn't sufficient
for calling ALTERNATIVE() from assembly with multiple instructions
in the new/old sequences. Remove the vararg "hack" and use quotes.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/include/asm/alternative-macros.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
index 51c6867e02f3..7bc52f33f95f 100644
--- a/arch/riscv/include/asm/alternative-macros.h
+++ b/arch/riscv/include/asm/alternative-macros.h
@@ -14,7 +14,7 @@
 	.4byte \errata_id
 .endm
 
-.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
+.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c
 	.if \enable
 	.pushsection .alternative, "a"
 	ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
@@ -41,13 +41,13 @@
 	\old_c
 	.option pop
 887 :
-	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, \new_c
+	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, "\new_c"
 .endm
 
 .macro ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1,	\
 				new_c_2, vendor_id_2, errata_id_2, enable_2
 	ALTERNATIVE_CFG "\old_c", "\new_c_1", \vendor_id_1, \errata_id_1, \enable_1
-	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, \new_c_2
+	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, "\new_c_2"
 .endm
 
 #define __ALTERNATIVE_CFG(...)		ALTERNATIVE_CFG __VA_ARGS__
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 1/8] RISC-V: alternatives: Support patching multiple insns in assembly
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

As pointed out in commit d374a16539b1 ("RISC-V: fix compile error
from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
parameters passed to macros within macros to avoid spaces being
interpreted as separators. ALT_NEW_CONTENT was trying to handle
this by defining new_c has a vararg, but this isn't sufficient
for calling ALTERNATIVE() from assembly with multiple instructions
in the new/old sequences. Remove the vararg "hack" and use quotes.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/include/asm/alternative-macros.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
index 51c6867e02f3..7bc52f33f95f 100644
--- a/arch/riscv/include/asm/alternative-macros.h
+++ b/arch/riscv/include/asm/alternative-macros.h
@@ -14,7 +14,7 @@
 	.4byte \errata_id
 .endm
 
-.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
+.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c
 	.if \enable
 	.pushsection .alternative, "a"
 	ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
@@ -41,13 +41,13 @@
 	\old_c
 	.option pop
 887 :
-	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, \new_c
+	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, "\new_c"
 .endm
 
 .macro ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1,	\
 				new_c_2, vendor_id_2, errata_id_2, enable_2
 	ALTERNATIVE_CFG "\old_c", "\new_c_1", \vendor_id_1, \errata_id_1, \enable_1
-	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, \new_c_2
+	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, "\new_c_2"
 .endm
 
 #define __ALTERNATIVE_CFG(...)		ALTERNATIVE_CFG __VA_ARGS__
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 2/8] RISC-V: Factor out body of riscv_init_cbom_blocksize loop
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Refactor riscv_init_cbom_blocksize() to prepare for it to be used
for both cbom block size and cboz block size.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/mm/cacheflush.c | 45 +++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 3cc07ed45aeb..eaf23fc14966 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -98,34 +98,39 @@ void flush_icache_pte(pte_t pte)
 unsigned int riscv_cbom_block_size;
 EXPORT_SYMBOL_GPL(riscv_cbom_block_size);
 
+static void cbo_get_block_size(struct device_node *node,
+			       const char *name, u32 *block_size,
+			       unsigned long *first_hartid)
+{
+	unsigned long hartid;
+	u32 val;
+
+	if (riscv_of_processor_hartid(node, &hartid))
+		return;
+
+	if (of_property_read_u32(node, name, &val))
+		return;
+
+	if (!*block_size) {
+		*block_size = val;
+		*first_hartid = hartid;
+	} else if (*block_size != val) {
+		pr_warn("%s mismatched between harts %lu and %lu\n",
+			name, *first_hartid, hartid);
+	}
+}
+
 void riscv_init_cbom_blocksize(void)
 {
 	struct device_node *node;
 	unsigned long cbom_hartid;
-	u32 val, probed_block_size;
-	int ret;
+	u32 probed_block_size;
 
 	probed_block_size = 0;
 	for_each_of_cpu_node(node) {
-		unsigned long hartid;
-
-		ret = riscv_of_processor_hartid(node, &hartid);
-		if (ret)
-			continue;
-
 		/* set block-size for cbom extension if available */
-		ret = of_property_read_u32(node, "riscv,cbom-block-size", &val);
-		if (ret)
-			continue;
-
-		if (!probed_block_size) {
-			probed_block_size = val;
-			cbom_hartid = hartid;
-		} else {
-			if (probed_block_size != val)
-				pr_warn("cbom-block-size mismatched between harts %lu and %lu\n",
-					cbom_hartid, hartid);
-		}
+		cbo_get_block_size(node, "riscv,cbom-block-size",
+				   &probed_block_size, &cbom_hartid);
 	}
 
 	if (probed_block_size)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 2/8] RISC-V: Factor out body of riscv_init_cbom_blocksize loop
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Refactor riscv_init_cbom_blocksize() to prepare for it to be used
for both cbom block size and cboz block size.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/mm/cacheflush.c | 45 +++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 3cc07ed45aeb..eaf23fc14966 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -98,34 +98,39 @@ void flush_icache_pte(pte_t pte)
 unsigned int riscv_cbom_block_size;
 EXPORT_SYMBOL_GPL(riscv_cbom_block_size);
 
+static void cbo_get_block_size(struct device_node *node,
+			       const char *name, u32 *block_size,
+			       unsigned long *first_hartid)
+{
+	unsigned long hartid;
+	u32 val;
+
+	if (riscv_of_processor_hartid(node, &hartid))
+		return;
+
+	if (of_property_read_u32(node, name, &val))
+		return;
+
+	if (!*block_size) {
+		*block_size = val;
+		*first_hartid = hartid;
+	} else if (*block_size != val) {
+		pr_warn("%s mismatched between harts %lu and %lu\n",
+			name, *first_hartid, hartid);
+	}
+}
+
 void riscv_init_cbom_blocksize(void)
 {
 	struct device_node *node;
 	unsigned long cbom_hartid;
-	u32 val, probed_block_size;
-	int ret;
+	u32 probed_block_size;
 
 	probed_block_size = 0;
 	for_each_of_cpu_node(node) {
-		unsigned long hartid;
-
-		ret = riscv_of_processor_hartid(node, &hartid);
-		if (ret)
-			continue;
-
 		/* set block-size for cbom extension if available */
-		ret = of_property_read_u32(node, "riscv,cbom-block-size", &val);
-		if (ret)
-			continue;
-
-		if (!probed_block_size) {
-			probed_block_size = val;
-			cbom_hartid = hartid;
-		} else {
-			if (probed_block_size != val)
-				pr_warn("cbom-block-size mismatched between harts %lu and %lu\n",
-					cbom_hartid, hartid);
-		}
+		cbo_get_block_size(node, "riscv,cbom-block-size",
+				   &probed_block_size, &cbom_hartid);
 	}
 
 	if (probed_block_size)
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 3/8] dt-bindings: riscv: Document cboz-block-size
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

The Zicboz operation (cbo.zero) operates on a block-size defined
for the cpu-core. While we already have the riscv,cbom-block-size
property, it only provides the block size for Zicbom operations.
Even though it's likely Zicboz and Zicbom will use the same size,
that's not required by the specification. Create another property
specifically for Zicboz.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/riscv/cpus.yaml | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
index c6720764e765..f4ee70f8e1cf 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -72,6 +72,11 @@ properties:
     description:
       The blocksize in bytes for the Zicbom cache operations.
 
+  riscv,cboz-block-size:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description:
+      The blocksize in bytes for the Zicboz cache operations.
+
   riscv,isa:
     description:
       Identifies the specific RISC-V instruction set architecture
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 3/8] dt-bindings: riscv: Document cboz-block-size
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

The Zicboz operation (cbo.zero) operates on a block-size defined
for the cpu-core. While we already have the riscv,cbom-block-size
property, it only provides the block size for Zicbom operations.
Even though it's likely Zicboz and Zicbom will use the same size,
that's not required by the specification. Create another property
specifically for Zicboz.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/riscv/cpus.yaml | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
index c6720764e765..f4ee70f8e1cf 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -72,6 +72,11 @@ properties:
     description:
       The blocksize in bytes for the Zicbom cache operations.
 
+  riscv,cboz-block-size:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description:
+      The blocksize in bytes for the Zicboz cache operations.
+
   riscv,isa:
     description:
       Identifies the specific RISC-V instruction set architecture
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 4/8] RISC-V: Add Zicboz detection and block size parsing
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/include/asm/cacheflush.h |  3 ++-
 arch/riscv/include/asm/hwcap.h      |  1 +
 arch/riscv/kernel/cpu.c             |  1 +
 arch/riscv/kernel/cpufeature.c      | 10 ++++++++++
 arch/riscv/kernel/setup.c           |  2 +-
 arch/riscv/mm/cacheflush.c          | 23 +++++++++++++++--------
 6 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..8091b8bf4883 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -50,7 +50,8 @@ void flush_icache_mm(struct mm_struct *mm, bool local);
 #endif /* CONFIG_SMP */
 
 extern unsigned int riscv_cbom_block_size;
-void riscv_init_cbom_blocksize(void);
+extern unsigned int riscv_cboz_block_size;
+void riscv_init_cbo_blocksizes(void);
 
 #ifdef CONFIG_RISCV_DMA_NONCOHERENT
 void riscv_noncoherent_supported(void);
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index ee9c80fe0062..bd025e918a08 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -47,6 +47,7 @@
 #define RISCV_ISA_EXT_ZBB              30
 #define RISCV_ISA_EXT_ZICBOM           31
 #define RISCV_ISA_EXT_ZIHINTPAUSE      32
+#define RISCV_ISA_EXT_ZICBOZ           33
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 420228e219f7..7a3065544dcb 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -187,6 +187,7 @@ arch_initcall(riscv_cpuinfo_init);
 static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
+	__RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 21fb567e1b22..0d2db03cf167 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -78,6 +78,15 @@ static bool riscv_isa_extension_check(int id)
 			return false;
 		}
 		return true;
+	case RISCV_ISA_EXT_ZICBOZ:
+		if (!riscv_cboz_block_size) {
+			pr_err("Zicboz detected in ISA string, but no cboz-block-size found\n");
+			return false;
+		} else if (!is_power_of_2(riscv_cboz_block_size)) {
+			pr_err("cboz-block-size present, but is not a power-of-2\n");
+			return false;
+		}
+		return true;
 	}
 
 	return true;
@@ -226,6 +235,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
 				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
+				SET_ISA_EXT_MAP("zicboz", RISCV_ISA_EXT_ZICBOZ);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 			}
 #undef SET_ISA_EXT_MAP
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 376d2827e736..5d3184cbf518 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -297,7 +297,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_smp();
 #endif
 
-	riscv_init_cbom_blocksize();
+	riscv_init_cbo_blocksizes();
 	riscv_fill_hwcap();
 	apply_boot_alternatives();
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index eaf23fc14966..ba4832bb949b 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -98,6 +98,9 @@ void flush_icache_pte(pte_t pte)
 unsigned int riscv_cbom_block_size;
 EXPORT_SYMBOL_GPL(riscv_cbom_block_size);
 
+unsigned int riscv_cboz_block_size;
+EXPORT_SYMBOL_GPL(riscv_cboz_block_size);
+
 static void cbo_get_block_size(struct device_node *node,
 			       const char *name, u32 *block_size,
 			       unsigned long *first_hartid)
@@ -120,19 +123,23 @@ static void cbo_get_block_size(struct device_node *node,
 	}
 }
 
-void riscv_init_cbom_blocksize(void)
+void riscv_init_cbo_blocksizes(void)
 {
+	unsigned long cbom_hartid, cboz_hartid;
+	u32 cbom_block_size = 0, cboz_block_size = 0;
 	struct device_node *node;
-	unsigned long cbom_hartid;
-	u32 probed_block_size;
 
-	probed_block_size = 0;
 	for_each_of_cpu_node(node) {
-		/* set block-size for cbom extension if available */
+		/* set block-size for cbom and/or cboz extension if available */
 		cbo_get_block_size(node, "riscv,cbom-block-size",
-				   &probed_block_size, &cbom_hartid);
+				   &cbom_block_size, &cbom_hartid);
+		cbo_get_block_size(node, "riscv,cboz-block-size",
+				   &cboz_block_size, &cboz_hartid);
 	}
 
-	if (probed_block_size)
-		riscv_cbom_block_size = probed_block_size;
+	if (cbom_block_size)
+		riscv_cbom_block_size = cbom_block_size;
+
+	if (cboz_block_size)
+		riscv_cboz_block_size = cboz_block_size;
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 4/8] RISC-V: Add Zicboz detection and block size parsing
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/include/asm/cacheflush.h |  3 ++-
 arch/riscv/include/asm/hwcap.h      |  1 +
 arch/riscv/kernel/cpu.c             |  1 +
 arch/riscv/kernel/cpufeature.c      | 10 ++++++++++
 arch/riscv/kernel/setup.c           |  2 +-
 arch/riscv/mm/cacheflush.c          | 23 +++++++++++++++--------
 6 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..8091b8bf4883 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -50,7 +50,8 @@ void flush_icache_mm(struct mm_struct *mm, bool local);
 #endif /* CONFIG_SMP */
 
 extern unsigned int riscv_cbom_block_size;
-void riscv_init_cbom_blocksize(void);
+extern unsigned int riscv_cboz_block_size;
+void riscv_init_cbo_blocksizes(void);
 
 #ifdef CONFIG_RISCV_DMA_NONCOHERENT
 void riscv_noncoherent_supported(void);
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index ee9c80fe0062..bd025e918a08 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -47,6 +47,7 @@
 #define RISCV_ISA_EXT_ZBB              30
 #define RISCV_ISA_EXT_ZICBOM           31
 #define RISCV_ISA_EXT_ZIHINTPAUSE      32
+#define RISCV_ISA_EXT_ZICBOZ           33
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 420228e219f7..7a3065544dcb 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -187,6 +187,7 @@ arch_initcall(riscv_cpuinfo_init);
 static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
+	__RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 21fb567e1b22..0d2db03cf167 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -78,6 +78,15 @@ static bool riscv_isa_extension_check(int id)
 			return false;
 		}
 		return true;
+	case RISCV_ISA_EXT_ZICBOZ:
+		if (!riscv_cboz_block_size) {
+			pr_err("Zicboz detected in ISA string, but no cboz-block-size found\n");
+			return false;
+		} else if (!is_power_of_2(riscv_cboz_block_size)) {
+			pr_err("cboz-block-size present, but is not a power-of-2\n");
+			return false;
+		}
+		return true;
 	}
 
 	return true;
@@ -226,6 +235,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
 				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
+				SET_ISA_EXT_MAP("zicboz", RISCV_ISA_EXT_ZICBOZ);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 			}
 #undef SET_ISA_EXT_MAP
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 376d2827e736..5d3184cbf518 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -297,7 +297,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_smp();
 #endif
 
-	riscv_init_cbom_blocksize();
+	riscv_init_cbo_blocksizes();
 	riscv_fill_hwcap();
 	apply_boot_alternatives();
 	if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index eaf23fc14966..ba4832bb949b 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -98,6 +98,9 @@ void flush_icache_pte(pte_t pte)
 unsigned int riscv_cbom_block_size;
 EXPORT_SYMBOL_GPL(riscv_cbom_block_size);
 
+unsigned int riscv_cboz_block_size;
+EXPORT_SYMBOL_GPL(riscv_cboz_block_size);
+
 static void cbo_get_block_size(struct device_node *node,
 			       const char *name, u32 *block_size,
 			       unsigned long *first_hartid)
@@ -120,19 +123,23 @@ static void cbo_get_block_size(struct device_node *node,
 	}
 }
 
-void riscv_init_cbom_blocksize(void)
+void riscv_init_cbo_blocksizes(void)
 {
+	unsigned long cbom_hartid, cboz_hartid;
+	u32 cbom_block_size = 0, cboz_block_size = 0;
 	struct device_node *node;
-	unsigned long cbom_hartid;
-	u32 probed_block_size;
 
-	probed_block_size = 0;
 	for_each_of_cpu_node(node) {
-		/* set block-size for cbom extension if available */
+		/* set block-size for cbom and/or cboz extension if available */
 		cbo_get_block_size(node, "riscv,cbom-block-size",
-				   &probed_block_size, &cbom_hartid);
+				   &cbom_block_size, &cbom_hartid);
+		cbo_get_block_size(node, "riscv,cboz-block-size",
+				   &cboz_block_size, &cboz_hartid);
 	}
 
-	if (probed_block_size)
-		riscv_cbom_block_size = probed_block_size;
+	if (cbom_block_size)
+		riscv_cbom_block_size = cbom_block_size;
+
+	if (cboz_block_size)
+		riscv_cboz_block_size = cboz_block_size;
 }
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

When [ab]using alternatives as cpufeature "static keys", which can
be used in assembly, also put vendor_id to work as application-
specific data. This will be initially used in Zicboz's application to
clear_page(), as Zicboz's block size must also be considered. In that
case, vendor_id's role will be to convey the maximum block size which
the Zicboz clear_page() implementation supports.

cpufeature alternative applications which need to check for the
existence or absence of other cpufeatures may also be able to make
use of this.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/kernel/cpufeature.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 0d2db03cf167..74736b4f0624 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
+static bool riscv_cpufeature_application_check(u32 feature, u16 data)
+{
+	return data == 0;
+}
+
 void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 						  struct alt_entry *end,
 						  unsigned int stage)
@@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 		return;
 
 	for (alt = begin; alt < end; alt++) {
-		if (alt->vendor_id != 0)
-			continue;
 		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
 			WARN(1, "This extension id:%d is not in ISA extension list",
 				alt->errata_id);
@@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
 			continue;
 
+		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
+			continue;
+
 		oldptr = ALT_OLD_PTR(alt);
 		altptr = ALT_ALT_PTR(alt);
 		patch_text_nosync(oldptr, altptr, alt->alt_len);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

When [ab]using alternatives as cpufeature "static keys", which can
be used in assembly, also put vendor_id to work as application-
specific data. This will be initially used in Zicboz's application to
clear_page(), as Zicboz's block size must also be considered. In that
case, vendor_id's role will be to convey the maximum block size which
the Zicboz clear_page() implementation supports.

cpufeature alternative applications which need to check for the
existence or absence of other cpufeatures may also be able to make
use of this.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/kernel/cpufeature.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 0d2db03cf167..74736b4f0624 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
+static bool riscv_cpufeature_application_check(u32 feature, u16 data)
+{
+	return data == 0;
+}
+
 void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 						  struct alt_entry *end,
 						  unsigned int stage)
@@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 		return;
 
 	for (alt = begin; alt < end; alt++) {
-		if (alt->vendor_id != 0)
-			continue;
 		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
 			WARN(1, "This extension id:%d is not in ISA extension list",
 				alt->errata_id);
@@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
 		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
 			continue;
 
+		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
+			continue;
+
 		oldptr = ALT_OLD_PTR(alt);
 		altptr = ALT_ALT_PTR(alt);
 		patch_text_nosync(oldptr, altptr, alt->alt_len);
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by taking advantage of 'vendor_id'
being used as application-specific data for alternatives, enabling us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)

For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/Kconfig                | 13 ++++++
 arch/riscv/include/asm/insn-def.h |  4 ++
 arch/riscv/include/asm/page.h     |  6 ++-
 arch/riscv/kernel/cpufeature.c    | 11 +++++
 arch/riscv/lib/Makefile           |  1 +
 arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
 6 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/clear_page.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 029d1d3b40bd..9590a1661caf 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
 
 	   If you don't know what to do here, say Y.
 
+config RISCV_ISA_ZICBOZ
+	bool "Zicboz extension support for faster zeroing of memory"
+	depends on !XIP_KERNEL && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
+	   when available.
+
+	   The Zicboz extension is used for faster zeroing of memory.
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZIHINTPAUSE
 	bool
 	default y
diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
index e01ab51f50d2..6960beb75f32 100644
--- a/arch/riscv/include/asm/insn-def.h
+++ b/arch/riscv/include/asm/insn-def.h
@@ -192,4 +192,8 @@
 	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
 	       RS1(base), SIMM12(2))
 
+#define CBO_zero(base)						\
+	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
+	       RS1(base), SIMM12(4))
+
 #endif /* __ASM_INSN_DEF_H */
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 9f432c1b5289..ccd168fe29d2 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -49,10 +49,14 @@
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_RISCV_ISA_ZICBOZ
+void clear_page(void *page);
+#else
 #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
+#endif
 #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
 
-#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
+#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
 #define copy_user_page(vto, vfrom, vaddr, topg) \
 			memcpy((vto), (vfrom), PAGE_SIZE)
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 74736b4f0624..42246bbfa532 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
 #ifdef CONFIG_RISCV_ALTERNATIVE
 static bool riscv_cpufeature_application_check(u32 feature, u16 data)
 {
+	switch (feature) {
+	case RISCV_ISA_EXT_ZICBOZ:
+		/*
+		 * Zicboz alternative applications provide the maximum
+		 * supported block size order, or zero when it doesn't
+		 * matter. If the current block size exceeds the maximum,
+		 * then the alternative cannot be applied.
+		 */
+		return data == 0 || riscv_cboz_block_size <= (1U << data);
+	}
+
 	return data == 0;
 }
 
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 6c74b0bedd60..26cb2502ecf8 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -8,5 +8,6 @@ lib-y			+= strlen.o
 lib-y			+= strncmp.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
+lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
new file mode 100644
index 000000000000..5b851e727f7c
--- /dev/null
+++ b/arch/riscv/lib/clear_page.S
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Ventana Micro Systems Inc.
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/alternative-macros.h>
+#include <asm/hwcap.h>
+#include <asm/insn-def.h>
+#include <asm/page.h>
+
+#define CBOZ_ALT(order, old, new)	\
+	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)
+
+/* void clear_page(void *page) */
+ENTRY(__clear_page)
+WEAK(clear_page)
+	li	a2, PAGE_SIZE
+
+	/*
+	 * If Zicboz isn't present, or somehow has a block
+	 * size larger than 4K, then fallback to memset.
+	 */
+	CBOZ_ALT(12, "j .Lno_zicboz", "nop")
+
+	lw	a1, riscv_cboz_block_size
+	add	a2, a0, a2
+.Lzero_loop:
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	bltu	a0, a2, .Lzero_loop
+	ret
+.Lno_zicboz:
+	li	a1, 0
+	tail	__memset
+END(__clear_page)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by taking advantage of 'vendor_id'
being used as application-specific data for alternatives, enabling us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)

For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/Kconfig                | 13 ++++++
 arch/riscv/include/asm/insn-def.h |  4 ++
 arch/riscv/include/asm/page.h     |  6 ++-
 arch/riscv/kernel/cpufeature.c    | 11 +++++
 arch/riscv/lib/Makefile           |  1 +
 arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
 6 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/clear_page.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 029d1d3b40bd..9590a1661caf 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
 
 	   If you don't know what to do here, say Y.
 
+config RISCV_ISA_ZICBOZ
+	bool "Zicboz extension support for faster zeroing of memory"
+	depends on !XIP_KERNEL && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
+	   when available.
+
+	   The Zicboz extension is used for faster zeroing of memory.
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZIHINTPAUSE
 	bool
 	default y
diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
index e01ab51f50d2..6960beb75f32 100644
--- a/arch/riscv/include/asm/insn-def.h
+++ b/arch/riscv/include/asm/insn-def.h
@@ -192,4 +192,8 @@
 	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
 	       RS1(base), SIMM12(2))
 
+#define CBO_zero(base)						\
+	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
+	       RS1(base), SIMM12(4))
+
 #endif /* __ASM_INSN_DEF_H */
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 9f432c1b5289..ccd168fe29d2 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -49,10 +49,14 @@
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_RISCV_ISA_ZICBOZ
+void clear_page(void *page);
+#else
 #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
+#endif
 #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
 
-#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
+#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
 #define copy_user_page(vto, vfrom, vaddr, topg) \
 			memcpy((vto), (vfrom), PAGE_SIZE)
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 74736b4f0624..42246bbfa532 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
 #ifdef CONFIG_RISCV_ALTERNATIVE
 static bool riscv_cpufeature_application_check(u32 feature, u16 data)
 {
+	switch (feature) {
+	case RISCV_ISA_EXT_ZICBOZ:
+		/*
+		 * Zicboz alternative applications provide the maximum
+		 * supported block size order, or zero when it doesn't
+		 * matter. If the current block size exceeds the maximum,
+		 * then the alternative cannot be applied.
+		 */
+		return data == 0 || riscv_cboz_block_size <= (1U << data);
+	}
+
 	return data == 0;
 }
 
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 6c74b0bedd60..26cb2502ecf8 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -8,5 +8,6 @@ lib-y			+= strlen.o
 lib-y			+= strncmp.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
+lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
new file mode 100644
index 000000000000..5b851e727f7c
--- /dev/null
+++ b/arch/riscv/lib/clear_page.S
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Ventana Micro Systems Inc.
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/alternative-macros.h>
+#include <asm/hwcap.h>
+#include <asm/insn-def.h>
+#include <asm/page.h>
+
+#define CBOZ_ALT(order, old, new)	\
+	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)
+
+/* void clear_page(void *page) */
+ENTRY(__clear_page)
+WEAK(clear_page)
+	li	a2, PAGE_SIZE
+
+	/*
+	 * If Zicboz isn't present, or somehow has a block
+	 * size larger than 4K, then fallback to memset.
+	 */
+	CBOZ_ALT(12, "j .Lno_zicboz", "nop")
+
+	lw	a1, riscv_cboz_block_size
+	add	a2, a0, a2
+.Lzero_loop:
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	CBO_zero(a0)
+	add	a0, a0, a1
+	bltu	a0, a2, .Lzero_loop
+	ret
+.Lno_zicboz:
+	li	a1, 0
+	tail	__memset
+END(__clear_page)
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 7/8] RISC-V: KVM: Provide UAPI for Zicboz block size
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley ',
	Anup Patel

We're about to allow guests to use the Zicboz extension. KVM
userspace needs to know the cache block size in order to
properly advertise it to the guest. Provide a virtual config
register for userspace to get it with the GET_ONE_REG API, but
setting it cannot be supported, so disallow SET_ONE_REG.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/uapi/asm/kvm.h | 1 +
 arch/riscv/kvm/vcpu.c             | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 92af6f3f057c..c1a1bb0fa91c 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -52,6 +52,7 @@ struct kvm_riscv_config {
 	unsigned long mvendorid;
 	unsigned long marchid;
 	unsigned long mimpid;
+	unsigned long zicboz_block_size;
 };
 
 /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7c08567097f0..e5126cefbc87 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -276,6 +276,11 @@ static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 		reg_val = riscv_cbom_block_size;
 		break;
+	case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
+		if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOZ))
+			return -EINVAL;
+		reg_val = riscv_cboz_block_size;
+		break;
 	case KVM_REG_RISCV_CONFIG_REG(mvendorid):
 		reg_val = vcpu->arch.mvendorid;
 		break;
@@ -347,6 +352,8 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size):
 		return -EOPNOTSUPP;
+	case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
+		return -EOPNOTSUPP;
 	case KVM_REG_RISCV_CONFIG_REG(mvendorid):
 		if (!vcpu->arch.ran_atleast_once)
 			vcpu->arch.mvendorid = reg_val;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 7/8] RISC-V: KVM: Provide UAPI for Zicboz block size
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley ',
	Anup Patel

We're about to allow guests to use the Zicboz extension. KVM
userspace needs to know the cache block size in order to
properly advertise it to the guest. Provide a virtual config
register for userspace to get it with the GET_ONE_REG API, but
setting it cannot be supported, so disallow SET_ONE_REG.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/uapi/asm/kvm.h | 1 +
 arch/riscv/kvm/vcpu.c             | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 92af6f3f057c..c1a1bb0fa91c 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -52,6 +52,7 @@ struct kvm_riscv_config {
 	unsigned long mvendorid;
 	unsigned long marchid;
 	unsigned long mimpid;
+	unsigned long zicboz_block_size;
 };
 
 /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7c08567097f0..e5126cefbc87 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -276,6 +276,11 @@ static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 		reg_val = riscv_cbom_block_size;
 		break;
+	case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
+		if (!riscv_isa_extension_available(vcpu->arch.isa, ZICBOZ))
+			return -EINVAL;
+		reg_val = riscv_cboz_block_size;
+		break;
 	case KVM_REG_RISCV_CONFIG_REG(mvendorid):
 		reg_val = vcpu->arch.mvendorid;
 		break;
@@ -347,6 +352,8 @@ static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_REG_RISCV_CONFIG_REG(zicbom_block_size):
 		return -EOPNOTSUPP;
+	case KVM_REG_RISCV_CONFIG_REG(zicboz_block_size):
+		return -EOPNOTSUPP;
 	case KVM_REG_RISCV_CONFIG_REG(mvendorid):
 		if (!vcpu->arch.ran_atleast_once)
 			vcpu->arch.mvendorid = reg_val;
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 8/8] RISC-V: KVM: Expose Zicboz to the guest
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 15:26   ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley ',
	Anup Patel

Guests may use the cbo.zero instruction when the CPU has the Zicboz
extension and the hypervisor sets henvcfg.CBZE.

Add Zicboz support for KVM guests which may be enabled and
disabled from KVM userspace using the ISA extension ONE_REG API.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/uapi/asm/kvm.h | 1 +
 arch/riscv/kvm/vcpu.c             | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index c1a1bb0fa91c..e44c1e90eaa7 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_SVINVAL,
 	KVM_RISCV_ISA_EXT_ZIHINTPAUSE,
 	KVM_RISCV_ISA_EXT_ZICBOM,
+	KVM_RISCV_ISA_EXT_ZICBOZ,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index e5126cefbc87..198ee86cad38 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -63,6 +63,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
 	KVM_ISA_EXT_ARR(SVPBMT),
 	KVM_ISA_EXT_ARR(ZIHINTPAUSE),
 	KVM_ISA_EXT_ARR(ZICBOM),
+	KVM_ISA_EXT_ARR(ZICBOZ),
 };
 
 static unsigned long kvm_riscv_vcpu_base2isa_ext(unsigned long base_ext)
@@ -865,6 +866,9 @@ static void kvm_riscv_vcpu_update_config(const unsigned long *isa)
 	if (riscv_isa_extension_available(isa, ZICBOM))
 		henvcfg |= (ENVCFG_CBIE | ENVCFG_CBCFE);
 
+	if (riscv_isa_extension_available(isa, ZICBOZ))
+		henvcfg |= ENVCFG_CBZE;
+
 	csr_write(CSR_HENVCFG, henvcfg);
 #ifdef CONFIG_32BIT
 	csr_write(CSR_HENVCFGH, henvcfg >> 32);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 8/8] RISC-V: KVM: Expose Zicboz to the guest
@ 2023-02-09 15:26   ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-09 15:26 UTC (permalink / raw)
  To: linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley ',
	Anup Patel

Guests may use the cbo.zero instruction when the CPU has the Zicboz
extension and the hypervisor sets henvcfg.CBZE.

Add Zicboz support for KVM guests which may be enabled and
disabled from KVM userspace using the ISA extension ONE_REG API.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/include/uapi/asm/kvm.h | 1 +
 arch/riscv/kvm/vcpu.c             | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index c1a1bb0fa91c..e44c1e90eaa7 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_SVINVAL,
 	KVM_RISCV_ISA_EXT_ZIHINTPAUSE,
 	KVM_RISCV_ISA_EXT_ZICBOM,
+	KVM_RISCV_ISA_EXT_ZICBOZ,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index e5126cefbc87..198ee86cad38 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -63,6 +63,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
 	KVM_ISA_EXT_ARR(SVPBMT),
 	KVM_ISA_EXT_ARR(ZIHINTPAUSE),
 	KVM_ISA_EXT_ARR(ZICBOM),
+	KVM_ISA_EXT_ARR(ZICBOZ),
 };
 
 static unsigned long kvm_riscv_vcpu_base2isa_ext(unsigned long base_ext)
@@ -865,6 +866,9 @@ static void kvm_riscv_vcpu_update_config(const unsigned long *isa)
 	if (riscv_isa_extension_available(isa, ZICBOM))
 		henvcfg |= (ENVCFG_CBIE | ENVCFG_CBCFE);
 
+	if (riscv_isa_extension_available(isa, ZICBOZ))
+		henvcfg |= ENVCFG_CBZE;
+
 	csr_write(CSR_HENVCFG, henvcfg);
 #ifdef CONFIG_32BIT
 	csr_write(CSR_HENVCFGH, henvcfg >> 32);
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page
  2023-02-09 15:26 ` Andrew Jones
@ 2023-02-09 17:45   ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 17:45 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

[-- Attachment #1: Type: text/plain, Size: 3031 bytes --]

Hey Drew,

On Thu, Feb 09, 2023 at 04:26:20PM +0100, Andrew Jones wrote:
> When the Zicboz extension is available we can more rapidly zero naturally
> aligned Zicboz block sized chunks of memory. As pages are always page
> aligned and are larger than any Zicboz block size will be, then
> clear_page() appears to be a good candidate for the extension. While cycle
> count and energy consumption should also be considered, we can be pretty
> certain that implementing clear_page() with the Zicboz extension is a win
> by comparing the new dynamic instruction count with its current count[1].
> Doing so we see that the new count is just over a quarter of the old count
> (see patch4's commit message for more details).
> 
> For those of you who reviewed v1[2], you may be looking for the memset()
> patches. As pointed out in v1, and a couple follow-up emails, it's not
> clear that patching memset() is a win yet. When I get a chance to test
> on real hardware with a comprehensive benchmark collection then I can
> post the memset() patches separately (assuming the benchmarks show it's
> worthwhile).
> 
> Based on riscv-linux/for-next plus the dependencies listed below.
> 
> Dependencies:
> https://lore.kernel.org/all/20230108163356.3063839-1-conor@kernel.org/
> https://lore.kernel.org/all/20230105192610.1940841-1-heiko@sntech.de/

I've had a short (due to FOSDEM) & busy week since we discussed the
automagic dependency collection. I'll try to get to it in the next few
days.

> The patches are also available here
> https://github.com/jones-drew/linux/commits/riscv/zicboz-v4
> 
> To test over QEMU this branch may be used to enable Zicboz
> https://gitlab.com/jones-drew/qemu/-/commits/riscv/zicboz
> 
> To test running a KVM guest with Zicboz this kvmtool branch may be used
> https://github.com/jones-drew/kvmtool/commits/riscv/zicboz
> 
> Thanks,
> drew
> 
> [1] I ported the functions under test to userspace and linked them with
>     a test program. Then, I ran them under gdb with a script[3] which
>     counted instructions by single stepping.
> [2] https://lore.kernel.org/all/20221027130247.31634-1-ajones@ventanamicro.com/
> [3] https://gist.github.com/jones-drew/487791c956ceca8c18adc2847eec9c60
> 
> v4:
>   - Rebased on latest for-next which allowed dropping one dependency
>   - Added "RISC-V: alternatives: Support patching multiple insns in assembly"
>     since I needed to use more than one instruction in an ALTERNATIVE call
>     from assembly. I can post this patch separately as a fix if desired.
>   - Improved the dt-binding patch commit message [Conor]
>   - Picked up some tags from Conor and Rob (I kept Conor's a-b on the
>     clear_page patch, even though there are several changes to it, because
>     I interpreted the a-b as "OK by me to implement a Zicboz clear_page")

Yea, it was a "I am far from qualified to review your implementation,
but I am okay with it existing and the remainder of the patch".

Cheers,
Conor.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page
@ 2023-02-09 17:45   ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 17:45 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '


[-- Attachment #1.1: Type: text/plain, Size: 3031 bytes --]

Hey Drew,

On Thu, Feb 09, 2023 at 04:26:20PM +0100, Andrew Jones wrote:
> When the Zicboz extension is available we can more rapidly zero naturally
> aligned Zicboz block sized chunks of memory. As pages are always page
> aligned and are larger than any Zicboz block size will be, then
> clear_page() appears to be a good candidate for the extension. While cycle
> count and energy consumption should also be considered, we can be pretty
> certain that implementing clear_page() with the Zicboz extension is a win
> by comparing the new dynamic instruction count with its current count[1].
> Doing so we see that the new count is just over a quarter of the old count
> (see patch4's commit message for more details).
> 
> For those of you who reviewed v1[2], you may be looking for the memset()
> patches. As pointed out in v1, and a couple follow-up emails, it's not
> clear that patching memset() is a win yet. When I get a chance to test
> on real hardware with a comprehensive benchmark collection then I can
> post the memset() patches separately (assuming the benchmarks show it's
> worthwhile).
> 
> Based on riscv-linux/for-next plus the dependencies listed below.
> 
> Dependencies:
> https://lore.kernel.org/all/20230108163356.3063839-1-conor@kernel.org/
> https://lore.kernel.org/all/20230105192610.1940841-1-heiko@sntech.de/

I've had a short (due to FOSDEM) & busy week since we discussed the
automagic dependency collection. I'll try to get to it in the next few
days.

> The patches are also available here
> https://github.com/jones-drew/linux/commits/riscv/zicboz-v4
> 
> To test over QEMU this branch may be used to enable Zicboz
> https://gitlab.com/jones-drew/qemu/-/commits/riscv/zicboz
> 
> To test running a KVM guest with Zicboz this kvmtool branch may be used
> https://github.com/jones-drew/kvmtool/commits/riscv/zicboz
> 
> Thanks,
> drew
> 
> [1] I ported the functions under test to userspace and linked them with
>     a test program. Then, I ran them under gdb with a script[3] which
>     counted instructions by single stepping.
> [2] https://lore.kernel.org/all/20221027130247.31634-1-ajones@ventanamicro.com/
> [3] https://gist.github.com/jones-drew/487791c956ceca8c18adc2847eec9c60
> 
> v4:
>   - Rebased on latest for-next which allowed dropping one dependency
>   - Added "RISC-V: alternatives: Support patching multiple insns in assembly"
>     since I needed to use more than one instruction in an ALTERNATIVE call
>     from assembly. I can post this patch separately as a fix if desired.
>   - Improved the dt-binding patch commit message [Conor]
>   - Picked up some tags from Conor and Rob (I kept Conor's a-b on the
>     clear_page patch, even though there are several changes to it, because
>     I interpreted the a-b as "OK by me to implement a Zicboz clear_page")

Yea, it was a "I am far from qualified to review your implementation,
but I am okay with it existing and the remainder of the patch".

Cheers,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/8] RISC-V: alternatives: Support patching multiple insns in assembly
  2023-02-09 15:26   ` Andrew Jones
@ 2023-02-09 18:02     ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 18:02 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

On Thu, Feb 09, 2023 at 04:26:21PM +0100, Andrew Jones wrote:
> As pointed out in commit d374a16539b1 ("RISC-V: fix compile error
> from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
> parameters passed to macros within macros to avoid spaces being
> interpreted as separators. ALT_NEW_CONTENT was trying to handle
> this by defining new_c has a vararg, but this isn't sufficient
> for calling ALTERNATIVE() from assembly with multiple instructions
> in the new/old sequences. Remove the vararg "hack" and use quotes.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/alternative-macros.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
> index 51c6867e02f3..7bc52f33f95f 100644
> --- a/arch/riscv/include/asm/alternative-macros.h
> +++ b/arch/riscv/include/asm/alternative-macros.h
> @@ -14,7 +14,7 @@
>  	.4byte \errata_id
>  .endm
>  
> -.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
> +.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c
>  	.if \enable
>  	.pushsection .alternative, "a"
>  	ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
> @@ -41,13 +41,13 @@
>  	\old_c
>  	.option pop
>  887 :
> -	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, \new_c
> +	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, "\new_c"

The rationale above seems pretty reasonable to me.
My main thought is that vararg seems intentional, while the "s don't
really?
Given how much churn there is here at the moment, I think it's fairly
likely that the immediate blame will be lost quickly too.
Usually I'd err on the side of "try to explain the black magic parts of
the cosebase to mere mortals" (like me!), but this is going in with a
user & things will quickly blow up if it gets accidentally removed by
someone who doesn't see their value.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Cheers,
Conor.

>  .endm
>  
>  .macro ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1,	\
>  				new_c_2, vendor_id_2, errata_id_2, enable_2
>  	ALTERNATIVE_CFG "\old_c", "\new_c_1", \vendor_id_1, \errata_id_1, \enable_1
> -	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, \new_c_2
> +	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, "\new_c_2"
>  .endm
>  
>  #define __ALTERNATIVE_CFG(...)		ALTERNATIVE_CFG __VA_ARGS__
> -- 
> 2.39.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/8] RISC-V: alternatives: Support patching multiple insns in assembly
@ 2023-02-09 18:02     ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 18:02 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '


[-- Attachment #1.1: Type: text/plain, Size: 2546 bytes --]

On Thu, Feb 09, 2023 at 04:26:21PM +0100, Andrew Jones wrote:
> As pointed out in commit d374a16539b1 ("RISC-V: fix compile error
> from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
> parameters passed to macros within macros to avoid spaces being
> interpreted as separators. ALT_NEW_CONTENT was trying to handle
> this by defining new_c has a vararg, but this isn't sufficient
> for calling ALTERNATIVE() from assembly with multiple instructions
> in the new/old sequences. Remove the vararg "hack" and use quotes.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/include/asm/alternative-macros.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
> index 51c6867e02f3..7bc52f33f95f 100644
> --- a/arch/riscv/include/asm/alternative-macros.h
> +++ b/arch/riscv/include/asm/alternative-macros.h
> @@ -14,7 +14,7 @@
>  	.4byte \errata_id
>  .endm
>  
> -.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
> +.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c
>  	.if \enable
>  	.pushsection .alternative, "a"
>  	ALT_ENTRY 886b, 888f, \vendor_id, \errata_id, 889f - 888f
> @@ -41,13 +41,13 @@
>  	\old_c
>  	.option pop
>  887 :
> -	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, \new_c
> +	ALT_NEW_CONTENT \vendor_id, \errata_id, \enable, "\new_c"

The rationale above seems pretty reasonable to me.
My main thought is that vararg seems intentional, while the "s don't
really?
Given how much churn there is here at the moment, I think it's fairly
likely that the immediate blame will be lost quickly too.
Usually I'd err on the side of "try to explain the black magic parts of
the cosebase to mere mortals" (like me!), but this is going in with a
user & things will quickly blow up if it gets accidentally removed by
someone who doesn't see their value.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Cheers,
Conor.

>  .endm
>  
>  .macro ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1,	\
>  				new_c_2, vendor_id_2, errata_id_2, enable_2
>  	ALTERNATIVE_CFG "\old_c", "\new_c_1", \vendor_id_1, \errata_id_1, \enable_1
> -	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, \new_c_2
> +	ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, "\new_c_2"
>  .endm
>  
>  #define __ALTERNATIVE_CFG(...)		ALTERNATIVE_CFG __VA_ARGS__
> -- 
> 2.39.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
  2023-02-09 15:26   ` Andrew Jones
@ 2023-02-09 19:04     ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 19:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

[-- Attachment #1: Type: text/plain, Size: 3587 bytes --]

Hey Drew,

On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> When [ab]using alternatives as cpufeature "static keys", which can
> be used in assembly, also put vendor_id to work as application-
> specific data. This will be initially used in Zicboz's application to
> clear_page(), as Zicboz's block size must also be considered. In that
> case, vendor_id's role will be to convey the maximum block size which
> the Zicboz clear_page() implementation supports.
> 
> cpufeature alternative applications which need to check for the
> existence or absence of other cpufeatures may also be able to make
> use of this.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/kernel/cpufeature.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 0d2db03cf167..74736b4f0624 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
>  }
>  
>  #ifdef CONFIG_RISCV_ALTERNATIVE

I think a comment here about what "application check" means would be
nice.
That wording just feels clunky & the meaning is not immediately
graspable?

/*
 * riscv_cpufeature_application_check() - Check if a cpufeature applies.
 * The presence of a cpufeature does not mean it is necessarily
 * useable. This function is used to apply the alternative on a
 * case-by-case basis.
 */

Dunno, does something like that convey the intent?

> +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> +{
> +	return data == 0;
> +}
> +
>  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  						  struct alt_entry *end,
>  						  unsigned int stage)
> @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  		return;
>  
>  	for (alt = begin; alt < end; alt++) {
> -		if (alt->vendor_id != 0)
> -			continue;

Can you remind me what makes this "safe"?
My understanding was that a vendor_id of zero was safe, as zero is
reserved in JEDEC.
What is stopping someone stuffing this with a given value and
colliding with a real vendor's errata?

	for (alt = begin; alt < end; alt++) {
		if (alt->vendor_id != A_VENDOR_ID)
			continue;
		if (alt->errata_id >= ERRATA_A_NUMBER)
			continue;

		tmp = (1U << alt->errata_id);
		if (cpu_req_errata & tmp) {
			oldptr = ALT_OLD_PTR(alt);
			altptr = ALT_ALT_PTR(alt);

			/* On vm-alternatives, the mmu isn't running yet */
			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
				memcpy((void *)__pa_symbol(oldptr),
				       (void *)__pa_symbol(altptr),
				       alt->alt_len);
			else
				patch_text_nosync(oldptr, altptr, alt->alt_len);
		}
	}

I've probably just missing something, my brain swapped out alternatives
the other week. Hopefully whatever I missed isn't embarrassingly obvious!

Cheers,
Conor.

>  		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
>  			WARN(1, "This extension id:%d is not in ISA extension list",
>  				alt->errata_id);
> @@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
>  			continue;
>  
> +		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
> +			continue;
> +
>  		oldptr = ALT_OLD_PTR(alt);
>  		altptr = ALT_ALT_PTR(alt);
>  		patch_text_nosync(oldptr, altptr, alt->alt_len);
> -- 
> 2.39.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
@ 2023-02-09 19:04     ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 19:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '


[-- Attachment #1.1: Type: text/plain, Size: 3587 bytes --]

Hey Drew,

On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> When [ab]using alternatives as cpufeature "static keys", which can
> be used in assembly, also put vendor_id to work as application-
> specific data. This will be initially used in Zicboz's application to
> clear_page(), as Zicboz's block size must also be considered. In that
> case, vendor_id's role will be to convey the maximum block size which
> the Zicboz clear_page() implementation supports.
> 
> cpufeature alternative applications which need to check for the
> existence or absence of other cpufeatures may also be able to make
> use of this.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/kernel/cpufeature.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 0d2db03cf167..74736b4f0624 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
>  }
>  
>  #ifdef CONFIG_RISCV_ALTERNATIVE

I think a comment here about what "application check" means would be
nice.
That wording just feels clunky & the meaning is not immediately
graspable?

/*
 * riscv_cpufeature_application_check() - Check if a cpufeature applies.
 * The presence of a cpufeature does not mean it is necessarily
 * useable. This function is used to apply the alternative on a
 * case-by-case basis.
 */

Dunno, does something like that convey the intent?

> +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> +{
> +	return data == 0;
> +}
> +
>  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  						  struct alt_entry *end,
>  						  unsigned int stage)
> @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  		return;
>  
>  	for (alt = begin; alt < end; alt++) {
> -		if (alt->vendor_id != 0)
> -			continue;

Can you remind me what makes this "safe"?
My understanding was that a vendor_id of zero was safe, as zero is
reserved in JEDEC.
What is stopping someone stuffing this with a given value and
colliding with a real vendor's errata?

	for (alt = begin; alt < end; alt++) {
		if (alt->vendor_id != A_VENDOR_ID)
			continue;
		if (alt->errata_id >= ERRATA_A_NUMBER)
			continue;

		tmp = (1U << alt->errata_id);
		if (cpu_req_errata & tmp) {
			oldptr = ALT_OLD_PTR(alt);
			altptr = ALT_ALT_PTR(alt);

			/* On vm-alternatives, the mmu isn't running yet */
			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
				memcpy((void *)__pa_symbol(oldptr),
				       (void *)__pa_symbol(altptr),
				       alt->alt_len);
			else
				patch_text_nosync(oldptr, altptr, alt->alt_len);
		}
	}

I've probably just missing something, my brain swapped out alternatives
the other week. Hopefully whatever I missed isn't embarrassingly obvious!

Cheers,
Conor.

>  		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
>  			WARN(1, "This extension id:%d is not in ISA extension list",
>  				alt->errata_id);
> @@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
>  		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
>  			continue;
>  
> +		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
> +			continue;
> +
>  		oldptr = ALT_OLD_PTR(alt);
>  		altptr = ALT_ALT_PTR(alt);
>  		patch_text_nosync(oldptr, altptr, alt->alt_len);
> -- 
> 2.39.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-09 15:26   ` Andrew Jones
@ 2023-02-09 19:09     ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 19:09 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> Using memset() to zero a 4K page takes 563 total instructions, where
> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> takes 169 total instructions, where 4 are branches and 33 are nops.
> Even though the block size is a variable, thanks to alternatives, we
> can still implement a Duff device without having to do any preliminary
> calculations. This is achieved by taking advantage of 'vendor_id'
> being used as application-specific data for alternatives, enabling us
> to stop patching / unrolling when 4K bytes have been zeroed (we would
> loop and continue after 4K if the page size would be larger)
> 
> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> only loop a few times and larger block sizes to not loop at all. Since
> cbo.zero doesn't take an offset, we also need an 'add' after each
> instruction, making the loop body 112 to 160 bytes. Hopefully this
> is small enough to not cause icache misses.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>

> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 74736b4f0624..42246bbfa532 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>  #ifdef CONFIG_RISCV_ALTERNATIVE
>  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>  {
> +	switch (feature) {
> +	case RISCV_ISA_EXT_ZICBOZ:
> +		/*
> +		 * Zicboz alternative applications provide the maximum

I like the comment, rather than this being some wizardry.
I find the word "applications" to be a little unclear, perhaps, iff this
series needs a respin, this would work better as "Users of the Zicboz
alternative provide..." (or s/Users/Callers)?

> +		 * supported block size order, or zero when it doesn't
> +		 * matter. If the current block size exceeds the maximum,
> +		 * then the alternative cannot be applied.
> +		 */
> +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> +	}
> +
>  	return data == 0;
>  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-09 19:09     ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-09 19:09 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '


[-- Attachment #1.1: Type: text/plain, Size: 2206 bytes --]

On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> Using memset() to zero a 4K page takes 563 total instructions, where
> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> takes 169 total instructions, where 4 are branches and 33 are nops.
> Even though the block size is a variable, thanks to alternatives, we
> can still implement a Duff device without having to do any preliminary
> calculations. This is achieved by taking advantage of 'vendor_id'
> being used as application-specific data for alternatives, enabling us
> to stop patching / unrolling when 4K bytes have been zeroed (we would
> loop and continue after 4K if the page size would be larger)
> 
> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> only loop a few times and larger block sizes to not loop at all. Since
> cbo.zero doesn't take an offset, we also need an 'add' after each
> instruction, making the loop body 112 to 160 bytes. Hopefully this
> is small enough to not cause icache misses.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>

> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 74736b4f0624..42246bbfa532 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>  #ifdef CONFIG_RISCV_ALTERNATIVE
>  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>  {
> +	switch (feature) {
> +	case RISCV_ISA_EXT_ZICBOZ:
> +		/*
> +		 * Zicboz alternative applications provide the maximum

I like the comment, rather than this being some wizardry.
I find the word "applications" to be a little unclear, perhaps, iff this
series needs a respin, this would work better as "Users of the Zicboz
alternative provide..." (or s/Users/Callers)?

> +		 * supported block size order, or zero when it doesn't
> +		 * matter. If the current block size exceeds the maximum,
> +		 * then the alternative cannot be applied.
> +		 */
> +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> +	}
> +
>  	return data == 0;
>  }

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
  2023-02-09 19:04     ` Conor Dooley
@ 2023-02-10  7:58       ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-10  7:58 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> Hey Drew,
> 
> On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> > When [ab]using alternatives as cpufeature "static keys", which can
> > be used in assembly, also put vendor_id to work as application-
> > specific data. This will be initially used in Zicboz's application to
> > clear_page(), as Zicboz's block size must also be considered. In that
> > case, vendor_id's role will be to convey the maximum block size which
> > the Zicboz clear_page() implementation supports.
> > 
> > cpufeature alternative applications which need to check for the
> > existence or absence of other cpufeatures may also be able to make
> > use of this.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/kernel/cpufeature.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 0d2db03cf167..74736b4f0624 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
> >  }
> >  
> >  #ifdef CONFIG_RISCV_ALTERNATIVE
> 
> I think a comment here about what "application check" means would be
> nice.
> That wording just feels clunky & the meaning is not immediately
> graspable?
> 
> /*
>  * riscv_cpufeature_application_check() - Check if a cpufeature applies.
>  * The presence of a cpufeature does not mean it is necessarily
>  * useable. This function is used to apply the alternative on a
>  * case-by-case basis.
>  */
> 
> Dunno, does something like that convey the intent?

Indeed, a comment would be helpful. I'll put something similar to what you
propose in the next version.

> 
> > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > +{
> > +	return data == 0;
> > +}
> > +
> >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  						  struct alt_entry *end,
> >  						  unsigned int stage)
> > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  		return;
> >  
> >  	for (alt = begin; alt < end; alt++) {
> > -		if (alt->vendor_id != 0)
> > -			continue;
> 
> Can you remind me what makes this "safe"?
> My understanding was that a vendor_id of zero was safe, as zero is
> reserved in JEDEC.
> What is stopping someone stuffing this with a given value and
> colliding with a real vendor's errata?
> 
> 	for (alt = begin; alt < end; alt++) {
> 		if (alt->vendor_id != A_VENDOR_ID)
> 			continue;
> 		if (alt->errata_id >= ERRATA_A_NUMBER)
> 			continue;
> 
> 		tmp = (1U << alt->errata_id);
> 		if (cpu_req_errata & tmp) {
> 			oldptr = ALT_OLD_PTR(alt);
> 			altptr = ALT_ALT_PTR(alt);
> 
> 			/* On vm-alternatives, the mmu isn't running yet */
> 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> 				memcpy((void *)__pa_symbol(oldptr),
> 				       (void *)__pa_symbol(altptr),
> 				       alt->alt_len);
> 			else
> 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> 		}
> 	}
> 
> I've probably just missing something, my brain swapped out alternatives
> the other week. Hopefully whatever I missed isn't embarrassingly obvious!

You're right. I was assuming the errata_id space for errata didn't overlap
with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
not design. I could try to ensure that somehow, but probably the better
approach would be to use the upper bits of errata_id for the application
data and to leave vendor_id alone. Thanks for catching my oversight!

Thanks,
drew


> 
> Cheers,
> Conor.
> 
> >  		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
> >  			WARN(1, "This extension id:%d is not in ISA extension list",
> >  				alt->errata_id);
> > @@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
> >  			continue;
> >  
> > +		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
> > +			continue;
> > +
> >  		oldptr = ALT_OLD_PTR(alt);
> >  		altptr = ALT_ALT_PTR(alt);
> >  		patch_text_nosync(oldptr, altptr, alt->alt_len);
> > -- 
> > 2.39.1
> > 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
@ 2023-02-10  7:58       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-10  7:58 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> Hey Drew,
> 
> On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> > When [ab]using alternatives as cpufeature "static keys", which can
> > be used in assembly, also put vendor_id to work as application-
> > specific data. This will be initially used in Zicboz's application to
> > clear_page(), as Zicboz's block size must also be considered. In that
> > case, vendor_id's role will be to convey the maximum block size which
> > the Zicboz clear_page() implementation supports.
> > 
> > cpufeature alternative applications which need to check for the
> > existence or absence of other cpufeatures may also be able to make
> > use of this.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/kernel/cpufeature.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 0d2db03cf167..74736b4f0624 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -278,6 +278,11 @@ void __init riscv_fill_hwcap(void)
> >  }
> >  
> >  #ifdef CONFIG_RISCV_ALTERNATIVE
> 
> I think a comment here about what "application check" means would be
> nice.
> That wording just feels clunky & the meaning is not immediately
> graspable?
> 
> /*
>  * riscv_cpufeature_application_check() - Check if a cpufeature applies.
>  * The presence of a cpufeature does not mean it is necessarily
>  * useable. This function is used to apply the alternative on a
>  * case-by-case basis.
>  */
> 
> Dunno, does something like that convey the intent?

Indeed, a comment would be helpful. I'll put something similar to what you
propose in the next version.

> 
> > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > +{
> > +	return data == 0;
> > +}
> > +
> >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  						  struct alt_entry *end,
> >  						  unsigned int stage)
> > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  		return;
> >  
> >  	for (alt = begin; alt < end; alt++) {
> > -		if (alt->vendor_id != 0)
> > -			continue;
> 
> Can you remind me what makes this "safe"?
> My understanding was that a vendor_id of zero was safe, as zero is
> reserved in JEDEC.
> What is stopping someone stuffing this with a given value and
> colliding with a real vendor's errata?
> 
> 	for (alt = begin; alt < end; alt++) {
> 		if (alt->vendor_id != A_VENDOR_ID)
> 			continue;
> 		if (alt->errata_id >= ERRATA_A_NUMBER)
> 			continue;
> 
> 		tmp = (1U << alt->errata_id);
> 		if (cpu_req_errata & tmp) {
> 			oldptr = ALT_OLD_PTR(alt);
> 			altptr = ALT_ALT_PTR(alt);
> 
> 			/* On vm-alternatives, the mmu isn't running yet */
> 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> 				memcpy((void *)__pa_symbol(oldptr),
> 				       (void *)__pa_symbol(altptr),
> 				       alt->alt_len);
> 			else
> 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> 		}
> 	}
> 
> I've probably just missing something, my brain swapped out alternatives
> the other week. Hopefully whatever I missed isn't embarrassingly obvious!

You're right. I was assuming the errata_id space for errata didn't overlap
with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
not design. I could try to ensure that somehow, but probably the better
approach would be to use the upper bits of errata_id for the application
data and to leave vendor_id alone. Thanks for catching my oversight!

Thanks,
drew


> 
> Cheers,
> Conor.
> 
> >  		if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
> >  			WARN(1, "This extension id:%d is not in ISA extension list",
> >  				alt->errata_id);
> > @@ -300,6 +303,9 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> >  		if (!__riscv_isa_extension_available(NULL, alt->errata_id))
> >  			continue;
> >  
> > +		if (!riscv_cpufeature_application_check(alt->errata_id, alt->vendor_id))
> > +			continue;
> > +
> >  		oldptr = ALT_OLD_PTR(alt);
> >  		altptr = ALT_ALT_PTR(alt);
> >  		patch_text_nosync(oldptr, altptr, alt->alt_len);
> > -- 
> > 2.39.1
> > 



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-09 19:09     ` Conor Dooley
@ 2023-02-10  8:05       ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-10  8:05 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Thu, Feb 09, 2023 at 07:09:53PM +0000, Conor Dooley wrote:
> On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> > Using memset() to zero a 4K page takes 563 total instructions, where
> > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > takes 169 total instructions, where 4 are branches and 33 are nops.
> > Even though the block size is a variable, thanks to alternatives, we
> > can still implement a Duff device without having to do any preliminary
> > calculations. This is achieved by taking advantage of 'vendor_id'
> > being used as application-specific data for alternatives, enabling us
> > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > loop and continue after 4K if the page size would be larger)
> > 
> > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > only loop a few times and larger block sizes to not loop at all. Since
> > cbo.zero doesn't take an offset, we also need an 'add' after each
> > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > is small enough to not cause icache misses.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 74736b4f0624..42246bbfa532 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> >  #ifdef CONFIG_RISCV_ALTERNATIVE
> >  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> >  {
> > +	switch (feature) {
> > +	case RISCV_ISA_EXT_ZICBOZ:
> > +		/*
> > +		 * Zicboz alternative applications provide the maximum
> 
> I like the comment, rather than this being some wizardry.
> I find the word "applications" to be a little unclear, perhaps, iff this
> series needs a respin, this would work better as "Users of the Zicboz
> alternative provide..." (or s/Users/Callers)?

Right, "applications" is an overloaded word. "users" is probably a better
choice. "callers" isn't quite right, to me, since it's a code patching
"application" / "use". Do you think the function name should change as
well?

Thanks,
drew

> 
> > +		 * supported block size order, or zero when it doesn't
> > +		 * matter. If the current block size exceeds the maximum,
> > +		 * then the alternative cannot be applied.
> > +		 */
> > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > +	}
> > +
> >  	return data == 0;
> >  }



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-10  8:05       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-10  8:05 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Thu, Feb 09, 2023 at 07:09:53PM +0000, Conor Dooley wrote:
> On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> > Using memset() to zero a 4K page takes 563 total instructions, where
> > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > takes 169 total instructions, where 4 are branches and 33 are nops.
> > Even though the block size is a variable, thanks to alternatives, we
> > can still implement a Duff device without having to do any preliminary
> > calculations. This is achieved by taking advantage of 'vendor_id'
> > being used as application-specific data for alternatives, enabling us
> > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > loop and continue after 4K if the page size would be larger)
> > 
> > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > only loop a few times and larger block sizes to not loop at all. Since
> > cbo.zero doesn't take an offset, we also need an 'add' after each
> > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > is small enough to not cause icache misses.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> 
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 74736b4f0624..42246bbfa532 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> >  #ifdef CONFIG_RISCV_ALTERNATIVE
> >  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> >  {
> > +	switch (feature) {
> > +	case RISCV_ISA_EXT_ZICBOZ:
> > +		/*
> > +		 * Zicboz alternative applications provide the maximum
> 
> I like the comment, rather than this being some wizardry.
> I find the word "applications" to be a little unclear, perhaps, iff this
> series needs a respin, this would work better as "Users of the Zicboz
> alternative provide..." (or s/Users/Callers)?

Right, "applications" is an overloaded word. "users" is probably a better
choice. "callers" isn't quite right, to me, since it's a code patching
"application" / "use". Do you think the function name should change as
well?

Thanks,
drew

> 
> > +		 * supported block size order, or zero when it doesn't
> > +		 * matter. If the current block size exceeds the maximum,
> > +		 * then the alternative cannot be applied.
> > +		 */
> > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > +	}
> > +
> >  	return data == 0;
> >  }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-10  8:05       ` Andrew Jones
@ 2023-02-10  9:04         ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-10  9:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Conor Dooley, linux-riscv, kvm-riscv, devicetree,
	'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou '


[-- Attachment #1.1: Type: text/plain, Size: 2897 bytes --]

On Fri, Feb 10, 2023 at 09:05:15AM +0100, Andrew Jones wrote:
> On Thu, Feb 09, 2023 at 07:09:53PM +0000, Conor Dooley wrote:
> > On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> > > Using memset() to zero a 4K page takes 563 total instructions, where
> > > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > > takes 169 total instructions, where 4 are branches and 33 are nops.
> > > Even though the block size is a variable, thanks to alternatives, we
> > > can still implement a Duff device without having to do any preliminary
> > > calculations. This is achieved by taking advantage of 'vendor_id'
> > > being used as application-specific data for alternatives, enabling us
> > > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > > loop and continue after 4K if the page size would be larger)
> > > 
> > > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > > only loop a few times and larger block sizes to not loop at all. Since
> > > cbo.zero doesn't take an offset, we also need an 'add' after each
> > > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > > is small enough to not cause icache misses.
> > > 
> > > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > 
> > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > index 74736b4f0624..42246bbfa532 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> > >  #ifdef CONFIG_RISCV_ALTERNATIVE
> > >  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > >  {
> > > +	switch (feature) {
> > > +	case RISCV_ISA_EXT_ZICBOZ:
> > > +		/*
> > > +		 * Zicboz alternative applications provide the maximum
> > 
> > I like the comment, rather than this being some wizardry.
> > I find the word "applications" to be a little unclear, perhaps, iff this
> > series needs a respin, this would work better as "Users of the Zicboz
> > alternative provide..." (or s/Users/Callers)?
> 
> Right, "applications" is an overloaded word. "users" is probably a better
> choice. "callers" isn't quite right, to me, since it's a code patching
> "application" / "use". Do you think the function name should change as
> well?

I was initially going to suggest that too, but then couldn't really
think of something better. s/application_check/check_applies/ maybe?

> > > +		 * supported block size order, or zero when it doesn't
> > > +		 * matter. If the current block size exceeds the maximum,
> > > +		 * then the alternative cannot be applied.
> > > +		 */
> > > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > > +	}
> > > +
> > >  	return data == 0;
> > >  }

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-10  9:04         ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-10  9:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Conor Dooley, linux-riscv, kvm-riscv, devicetree,
	'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou '

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

On Fri, Feb 10, 2023 at 09:05:15AM +0100, Andrew Jones wrote:
> On Thu, Feb 09, 2023 at 07:09:53PM +0000, Conor Dooley wrote:
> > On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote:
> > > Using memset() to zero a 4K page takes 563 total instructions, where
> > > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > > takes 169 total instructions, where 4 are branches and 33 are nops.
> > > Even though the block size is a variable, thanks to alternatives, we
> > > can still implement a Duff device without having to do any preliminary
> > > calculations. This is achieved by taking advantage of 'vendor_id'
> > > being used as application-specific data for alternatives, enabling us
> > > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > > loop and continue after 4K if the page size would be larger)
> > > 
> > > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > > only loop a few times and larger block sizes to not loop at all. Since
> > > cbo.zero doesn't take an offset, we also need an 'add' after each
> > > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > > is small enough to not cause icache misses.
> > > 
> > > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > 
> > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > index 74736b4f0624..42246bbfa532 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> > >  #ifdef CONFIG_RISCV_ALTERNATIVE
> > >  static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > >  {
> > > +	switch (feature) {
> > > +	case RISCV_ISA_EXT_ZICBOZ:
> > > +		/*
> > > +		 * Zicboz alternative applications provide the maximum
> > 
> > I like the comment, rather than this being some wizardry.
> > I find the word "applications" to be a little unclear, perhaps, iff this
> > series needs a respin, this would work better as "Users of the Zicboz
> > alternative provide..." (or s/Users/Callers)?
> 
> Right, "applications" is an overloaded word. "users" is probably a better
> choice. "callers" isn't quite right, to me, since it's a code patching
> "application" / "use". Do you think the function name should change as
> well?

I was initially going to suggest that too, but then couldn't really
think of something better. s/application_check/check_applies/ maybe?

> > > +		 * supported block size order, or zero when it doesn't
> > > +		 * matter. If the current block size exceeds the maximum,
> > > +		 * then the alternative cannot be applied.
> > > +		 */
> > > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > > +	}
> > > +
> > >  	return data == 0;
> > >  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
  2023-02-10  7:58       ` Andrew Jones
@ 2023-02-10 20:29         ` Conor Dooley
  -1 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-10 20:29 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

[-- Attachment #1: Type: text/plain, Size: 2466 bytes --]

On Fri, Feb 10, 2023 at 08:58:18AM +0100, Andrew Jones wrote:
> On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> > On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:

> > > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > > +{
> > > +	return data == 0;
> > > +}
> > > +
> > >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >  						  struct alt_entry *end,
> > >  						  unsigned int stage)
> > > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >  		return;
> > >  
> > >  	for (alt = begin; alt < end; alt++) {
> > > -		if (alt->vendor_id != 0)
> > > -			continue;
> > 
> > Can you remind me what makes this "safe"?
> > My understanding was that a vendor_id of zero was safe, as zero is
> > reserved in JEDEC.
> > What is stopping someone stuffing this with a given value and
> > colliding with a real vendor's errata?
> > 
> > 	for (alt = begin; alt < end; alt++) {
> > 		if (alt->vendor_id != A_VENDOR_ID)
> > 			continue;
> > 		if (alt->errata_id >= ERRATA_A_NUMBER)
> > 			continue;
> > 
> > 		tmp = (1U << alt->errata_id);
> > 		if (cpu_req_errata & tmp) {
> > 			oldptr = ALT_OLD_PTR(alt);
> > 			altptr = ALT_ALT_PTR(alt);
> > 
> > 			/* On vm-alternatives, the mmu isn't running yet */
> > 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > 				memcpy((void *)__pa_symbol(oldptr),
> > 				       (void *)__pa_symbol(altptr),
> > 				       alt->alt_len);
> > 			else
> > 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> > 		}
> > 	}
> > 
> > I've probably just missing something, my brain swapped out alternatives
> > the other week. Hopefully whatever I missed isn't embarrassingly obvious!
> 
> You're right. I was assuming the errata_id space for errata didn't overlap
> with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
> not design. I could try to ensure that somehow, but probably the better
> approach would be to use the upper bits of errata_id for the application
> data and to leave vendor_id alone. Thanks for catching my oversight!

Sounds like a better idea at least.
I suppose the ideal would be to add another arg and not abuse things,
but maybe that's one for the future, idk.
Is this somewhere that an assertion should be used to make sure that
we don't grow the list of extensions such that the regions collide?


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
@ 2023-02-10 20:29         ` Conor Dooley
  0 siblings, 0 replies; 48+ messages in thread
From: Conor Dooley @ 2023-02-10 20:29 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '


[-- Attachment #1.1: Type: text/plain, Size: 2466 bytes --]

On Fri, Feb 10, 2023 at 08:58:18AM +0100, Andrew Jones wrote:
> On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> > On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:

> > > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > > +{
> > > +	return data == 0;
> > > +}
> > > +
> > >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >  						  struct alt_entry *end,
> > >  						  unsigned int stage)
> > > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > >  		return;
> > >  
> > >  	for (alt = begin; alt < end; alt++) {
> > > -		if (alt->vendor_id != 0)
> > > -			continue;
> > 
> > Can you remind me what makes this "safe"?
> > My understanding was that a vendor_id of zero was safe, as zero is
> > reserved in JEDEC.
> > What is stopping someone stuffing this with a given value and
> > colliding with a real vendor's errata?
> > 
> > 	for (alt = begin; alt < end; alt++) {
> > 		if (alt->vendor_id != A_VENDOR_ID)
> > 			continue;
> > 		if (alt->errata_id >= ERRATA_A_NUMBER)
> > 			continue;
> > 
> > 		tmp = (1U << alt->errata_id);
> > 		if (cpu_req_errata & tmp) {
> > 			oldptr = ALT_OLD_PTR(alt);
> > 			altptr = ALT_ALT_PTR(alt);
> > 
> > 			/* On vm-alternatives, the mmu isn't running yet */
> > 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > 				memcpy((void *)__pa_symbol(oldptr),
> > 				       (void *)__pa_symbol(altptr),
> > 				       alt->alt_len);
> > 			else
> > 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> > 		}
> > 	}
> > 
> > I've probably just missing something, my brain swapped out alternatives
> > the other week. Hopefully whatever I missed isn't embarrassingly obvious!
> 
> You're right. I was assuming the errata_id space for errata didn't overlap
> with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
> not design. I could try to ensure that somehow, but probably the better
> approach would be to use the upper bits of errata_id for the application
> data and to leave vendor_id alone. Thanks for catching my oversight!

Sounds like a better idea at least.
I suppose the ideal would be to add another arg and not abuse things,
but maybe that's one for the future, idk.
Is this somewhere that an assertion should be used to make sure that
we don't grow the list of extensions such that the regions collide?


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
  2023-02-10 20:29         ` Conor Dooley
@ 2023-02-12 16:26           ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-12 16:26 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 10, 2023 at 08:29:15PM +0000, Conor Dooley wrote:
> On Fri, Feb 10, 2023 at 08:58:18AM +0100, Andrew Jones wrote:
> > On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> > > On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> 
> > > > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > > > +{
> > > > +	return data == 0;
> > > > +}
> > > > +
> > > >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >  						  struct alt_entry *end,
> > > >  						  unsigned int stage)
> > > > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >  		return;
> > > >  
> > > >  	for (alt = begin; alt < end; alt++) {
> > > > -		if (alt->vendor_id != 0)
> > > > -			continue;
> > > 
> > > Can you remind me what makes this "safe"?
> > > My understanding was that a vendor_id of zero was safe, as zero is
> > > reserved in JEDEC.
> > > What is stopping someone stuffing this with a given value and
> > > colliding with a real vendor's errata?
> > > 
> > > 	for (alt = begin; alt < end; alt++) {
> > > 		if (alt->vendor_id != A_VENDOR_ID)
> > > 			continue;
> > > 		if (alt->errata_id >= ERRATA_A_NUMBER)
> > > 			continue;
> > > 
> > > 		tmp = (1U << alt->errata_id);
> > > 		if (cpu_req_errata & tmp) {
> > > 			oldptr = ALT_OLD_PTR(alt);
> > > 			altptr = ALT_ALT_PTR(alt);
> > > 
> > > 			/* On vm-alternatives, the mmu isn't running yet */
> > > 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > > 				memcpy((void *)__pa_symbol(oldptr),
> > > 				       (void *)__pa_symbol(altptr),
> > > 				       alt->alt_len);
> > > 			else
> > > 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> > > 		}
> > > 	}
> > > 
> > > I've probably just missing something, my brain swapped out alternatives
> > > the other week. Hopefully whatever I missed isn't embarrassingly obvious!
> > 
> > You're right. I was assuming the errata_id space for errata didn't overlap
> > with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
> > not design. I could try to ensure that somehow, but probably the better
> > approach would be to use the upper bits of errata_id for the application
> > data and to leave vendor_id alone. Thanks for catching my oversight!
> 
> Sounds like a better idea at least.
> I suppose the ideal would be to add another arg and not abuse things,
> but maybe that's one for the future, idk.

We could add another arg without too much trouble and even use some macro
trickier to only pass it when it's needed. OTOH, the errata_id is 32-bits,
which is overkill, since errata are assigned increasing integers from
zero, and I don't want to over-bloat this structure. Maybe I can split
errata_id into two 16-bit parameters to avoid completely adding yet
another.

> Is this somewhere that an assertion should be used to make sure that
> we don't grow the list of extensions such that the regions collide?

It's probably best to split errata_id for the application data and not
worry about collisions.

Thanks,
drew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work
@ 2023-02-12 16:26           ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-12 16:26 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 10, 2023 at 08:29:15PM +0000, Conor Dooley wrote:
> On Fri, Feb 10, 2023 at 08:58:18AM +0100, Andrew Jones wrote:
> > On Thu, Feb 09, 2023 at 07:04:59PM +0000, Conor Dooley wrote:
> > > On Thu, Feb 09, 2023 at 04:26:25PM +0100, Andrew Jones wrote:
> 
> > > > +static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > > > +{
> > > > +	return data == 0;
> > > > +}
> > > > +
> > > >  void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >  						  struct alt_entry *end,
> > > >  						  unsigned int stage)
> > > > @@ -289,8 +294,6 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> > > >  		return;
> > > >  
> > > >  	for (alt = begin; alt < end; alt++) {
> > > > -		if (alt->vendor_id != 0)
> > > > -			continue;
> > > 
> > > Can you remind me what makes this "safe"?
> > > My understanding was that a vendor_id of zero was safe, as zero is
> > > reserved in JEDEC.
> > > What is stopping someone stuffing this with a given value and
> > > colliding with a real vendor's errata?
> > > 
> > > 	for (alt = begin; alt < end; alt++) {
> > > 		if (alt->vendor_id != A_VENDOR_ID)
> > > 			continue;
> > > 		if (alt->errata_id >= ERRATA_A_NUMBER)
> > > 			continue;
> > > 
> > > 		tmp = (1U << alt->errata_id);
> > > 		if (cpu_req_errata & tmp) {
> > > 			oldptr = ALT_OLD_PTR(alt);
> > > 			altptr = ALT_ALT_PTR(alt);
> > > 
> > > 			/* On vm-alternatives, the mmu isn't running yet */
> > > 			if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > > 				memcpy((void *)__pa_symbol(oldptr),
> > > 				       (void *)__pa_symbol(altptr),
> > > 				       alt->alt_len);
> > > 			else
> > > 				patch_text_nosync(oldptr, altptr, alt->alt_len);
> > > 		}
> > > 	}
> > > 
> > > I've probably just missing something, my brain swapped out alternatives
> > > the other week. Hopefully whatever I missed isn't embarrassingly obvious!
> > 
> > You're right. I was assuming the errata_id space for errata didn't overlap
> > with the errata_id space for cpufeatures. It doesn't, atm, but by luck,
> > not design. I could try to ensure that somehow, but probably the better
> > approach would be to use the upper bits of errata_id for the application
> > data and to leave vendor_id alone. Thanks for catching my oversight!
> 
> Sounds like a better idea at least.
> I suppose the ideal would be to add another arg and not abuse things,
> but maybe that's one for the future, idk.

We could add another arg without too much trouble and even use some macro
trickier to only pass it when it's needed. OTOH, the errata_id is 32-bits,
which is overkill, since errata are assigned increasing integers from
zero, and I don't want to over-bloat this structure. Maybe I can split
errata_id into two 16-bit parameters to avoid completely adding yet
another.

> Is this somewhere that an assertion should be used to make sure that
> we don't grow the list of extensions such that the regions collide?

It's probably best to split errata_id for the application data and not
worry about collisions.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-09 15:26   ` Andrew Jones
@ 2023-02-17 10:18     ` Ben Dooks
  -1 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-17 10:18 UTC (permalink / raw)
  To: Andrew Jones, linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

On 09/02/2023 15:26, Andrew Jones wrote:
> Using memset() to zero a 4K page takes 563 total instructions, where
> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> takes 169 total instructions, where 4 are branches and 33 are nops.
> Even though the block size is a variable, thanks to alternatives, we
> can still implement a Duff device without having to do any preliminary
> calculations. This is achieved by taking advantage of 'vendor_id'
> being used as application-specific data for alternatives, enabling us
> to stop patching / unrolling when 4K bytes have been zeroed (we would
> loop and continue after 4K if the page size would be larger)
> 
> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> only loop a few times and larger block sizes to not loop at all. Since
> cbo.zero doesn't take an offset, we also need an 'add' after each
> instruction, making the loop body 112 to 160 bytes. Hopefully this
> is small enough to not cause icache misses.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> ---
>   arch/riscv/Kconfig                | 13 ++++++
>   arch/riscv/include/asm/insn-def.h |  4 ++
>   arch/riscv/include/asm/page.h     |  6 ++-
>   arch/riscv/kernel/cpufeature.c    | 11 +++++
>   arch/riscv/lib/Makefile           |  1 +
>   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
>   6 files changed, 105 insertions(+), 1 deletion(-)
>   create mode 100644 arch/riscv/lib/clear_page.S
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 029d1d3b40bd..9590a1661caf 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
>   
>   	   If you don't know what to do here, say Y.
>   
> +config RISCV_ISA_ZICBOZ
> +	bool "Zicboz extension support for faster zeroing of memory"
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
> +	   when available.
> +
> +	   The Zicboz extension is used for faster zeroing of memory.
> +
> +	   If you don't know what to do here, say Y.
> +
>   config TOOLCHAIN_HAS_ZIHINTPAUSE
>   	bool
>   	default y
> diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
> index e01ab51f50d2..6960beb75f32 100644
> --- a/arch/riscv/include/asm/insn-def.h
> +++ b/arch/riscv/include/asm/insn-def.h
> @@ -192,4 +192,8 @@
>   	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
>   	       RS1(base), SIMM12(2))
>   
> +#define CBO_zero(base)						\
> +	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> +	       RS1(base), SIMM12(4))
> +
>   #endif /* __ASM_INSN_DEF_H */
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 9f432c1b5289..ccd168fe29d2 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -49,10 +49,14 @@
>   
>   #ifndef __ASSEMBLY__
>   
> +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> +void clear_page(void *page);
> +#else
>   #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
> +#endif
>   #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
>   
> -#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
> +#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
>   #define copy_user_page(vto, vfrom, vaddr, topg) \
>   			memcpy((vto), (vfrom), PAGE_SIZE)
>   
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 74736b4f0624..42246bbfa532 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>   #ifdef CONFIG_RISCV_ALTERNATIVE
>   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>   {
> +	switch (feature) {
> +	case RISCV_ISA_EXT_ZICBOZ:
> +		/*
> +		 * Zicboz alternative applications provide the maximum
> +		 * supported block size order, or zero when it doesn't
> +		 * matter. If the current block size exceeds the maximum,
> +		 * then the alternative cannot be applied.
> +		 */
> +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> +	}
> +
>   	return data == 0;
>   }
>   
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 6c74b0bedd60..26cb2502ecf8 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -8,5 +8,6 @@ lib-y			+= strlen.o
>   lib-y			+= strncmp.o
>   lib-$(CONFIG_MMU)	+= uaccess.o
>   lib-$(CONFIG_64BIT)	+= tishift.o
> +lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
>   
>   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> new file mode 100644
> index 000000000000..5b851e727f7c
> --- /dev/null
> +++ b/arch/riscv/lib/clear_page.S
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023 Ventana Micro Systems Inc.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/hwcap.h>
> +#include <asm/insn-def.h>
> +#include <asm/page.h>
> +
> +#define CBOZ_ALT(order, old, new)	\
> +	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)

I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
alternatives?

I may also be missing something, but when backporting this to 5.19
to test it on our system the "order" argument is in fact the vendor-id
so this doesn't work as the alternatives don't get patched in at-all.

> +
> +/* void clear_page(void *page) */
> +ENTRY(__clear_page)
> +WEAK(clear_page)
> +	li	a2, PAGE_SIZE
> +
> +	/*
> +	 * If Zicboz isn't present, or somehow has a block
> +	 * size larger than 4K, then fallback to memset.
> +	 */
> +	CBOZ_ALT(12, "j .Lno_zicboz", "nop")

I can't see how the CBOZ_ALT is checking for the CBOZ block
size being bigger than 4k... I guess we should have also
tested the block size is an exact multiple of page size?

It would also be better if we just didn't enable it and printed
an warn when we initialise and then never advertise the feature
in the first place?

> +
> +	lw	a1, riscv_cboz_block_size
> +	add	a2, a0, a2
> +.Lzero_loop:
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")

I'm also wondering why we are using CBOZ_ALT past the first
check. I don't see why it shouldn't just be a loop with a siz
check like:

Lzero_loop:
	CBO_zero(a0)
	add	a0, a0, a1
	blge	a0, a2, .Lzero_done
	....


If you wanted to do multiple CBO_zero(a0) then maybe testing
and branching would be easier to allow a certain amount of
loop unrolling.


> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	bltu	a0, a2, .Lzero_loop
> +	ret
> +.Lno_zicboz:
> +	li	a1, 0
> +	tail	__memset
> +END(__clear_page)

Whilst this seems to work, I am not sure why and it probably wants
more testing.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-17 10:18     ` Ben Dooks
  0 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-17 10:18 UTC (permalink / raw)
  To: Andrew Jones, linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

On 09/02/2023 15:26, Andrew Jones wrote:
> Using memset() to zero a 4K page takes 563 total instructions, where
> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> takes 169 total instructions, where 4 are branches and 33 are nops.
> Even though the block size is a variable, thanks to alternatives, we
> can still implement a Duff device without having to do any preliminary
> calculations. This is achieved by taking advantage of 'vendor_id'
> being used as application-specific data for alternatives, enabling us
> to stop patching / unrolling when 4K bytes have been zeroed (we would
> loop and continue after 4K if the page size would be larger)
> 
> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> only loop a few times and larger block sizes to not loop at all. Since
> cbo.zero doesn't take an offset, we also need an 'add' after each
> instruction, making the loop body 112 to 160 bytes. Hopefully this
> is small enough to not cause icache misses.
> 
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> ---
>   arch/riscv/Kconfig                | 13 ++++++
>   arch/riscv/include/asm/insn-def.h |  4 ++
>   arch/riscv/include/asm/page.h     |  6 ++-
>   arch/riscv/kernel/cpufeature.c    | 11 +++++
>   arch/riscv/lib/Makefile           |  1 +
>   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
>   6 files changed, 105 insertions(+), 1 deletion(-)
>   create mode 100644 arch/riscv/lib/clear_page.S
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 029d1d3b40bd..9590a1661caf 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
>   
>   	   If you don't know what to do here, say Y.
>   
> +config RISCV_ISA_ZICBOZ
> +	bool "Zicboz extension support for faster zeroing of memory"
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
> +	   when available.
> +
> +	   The Zicboz extension is used for faster zeroing of memory.
> +
> +	   If you don't know what to do here, say Y.
> +
>   config TOOLCHAIN_HAS_ZIHINTPAUSE
>   	bool
>   	default y
> diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
> index e01ab51f50d2..6960beb75f32 100644
> --- a/arch/riscv/include/asm/insn-def.h
> +++ b/arch/riscv/include/asm/insn-def.h
> @@ -192,4 +192,8 @@
>   	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
>   	       RS1(base), SIMM12(2))
>   
> +#define CBO_zero(base)						\
> +	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> +	       RS1(base), SIMM12(4))
> +
>   #endif /* __ASM_INSN_DEF_H */
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 9f432c1b5289..ccd168fe29d2 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -49,10 +49,14 @@
>   
>   #ifndef __ASSEMBLY__
>   
> +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> +void clear_page(void *page);
> +#else
>   #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
> +#endif
>   #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
>   
> -#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
> +#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
>   #define copy_user_page(vto, vfrom, vaddr, topg) \
>   			memcpy((vto), (vfrom), PAGE_SIZE)
>   
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 74736b4f0624..42246bbfa532 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>   #ifdef CONFIG_RISCV_ALTERNATIVE
>   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>   {
> +	switch (feature) {
> +	case RISCV_ISA_EXT_ZICBOZ:
> +		/*
> +		 * Zicboz alternative applications provide the maximum
> +		 * supported block size order, or zero when it doesn't
> +		 * matter. If the current block size exceeds the maximum,
> +		 * then the alternative cannot be applied.
> +		 */
> +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> +	}
> +
>   	return data == 0;
>   }
>   
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 6c74b0bedd60..26cb2502ecf8 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -8,5 +8,6 @@ lib-y			+= strlen.o
>   lib-y			+= strncmp.o
>   lib-$(CONFIG_MMU)	+= uaccess.o
>   lib-$(CONFIG_64BIT)	+= tishift.o
> +lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
>   
>   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> new file mode 100644
> index 000000000000..5b851e727f7c
> --- /dev/null
> +++ b/arch/riscv/lib/clear_page.S
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023 Ventana Micro Systems Inc.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/hwcap.h>
> +#include <asm/insn-def.h>
> +#include <asm/page.h>
> +
> +#define CBOZ_ALT(order, old, new)	\
> +	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)

I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
alternatives?

I may also be missing something, but when backporting this to 5.19
to test it on our system the "order" argument is in fact the vendor-id
so this doesn't work as the alternatives don't get patched in at-all.

> +
> +/* void clear_page(void *page) */
> +ENTRY(__clear_page)
> +WEAK(clear_page)
> +	li	a2, PAGE_SIZE
> +
> +	/*
> +	 * If Zicboz isn't present, or somehow has a block
> +	 * size larger than 4K, then fallback to memset.
> +	 */
> +	CBOZ_ALT(12, "j .Lno_zicboz", "nop")

I can't see how the CBOZ_ALT is checking for the CBOZ block
size being bigger than 4k... I guess we should have also
tested the block size is an exact multiple of page size?

It would also be better if we just didn't enable it and printed
an warn when we initialise and then never advertise the feature
in the first place?

> +
> +	lw	a1, riscv_cboz_block_size
> +	add	a2, a0, a2
> +.Lzero_loop:
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")

I'm also wondering why we are using CBOZ_ALT past the first
check. I don't see why it shouldn't just be a loop with a siz
check like:

Lzero_loop:
	CBO_zero(a0)
	add	a0, a0, a1
	blge	a0, a2, .Lzero_done
	....


If you wanted to do multiple CBO_zero(a0) then maybe testing
and branching would be easier to allow a certain amount of
loop unrolling.


> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	CBO_zero(a0)
> +	add	a0, a0, a1
> +	bltu	a0, a2, .Lzero_loop
> +	ret
> +.Lno_zicboz:
> +	li	a1, 0
> +	tail	__memset
> +END(__clear_page)

Whilst this seems to work, I am not sure why and it probably wants
more testing.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-17 10:18     ` Ben Dooks
@ 2023-02-17 10:50       ` Ben Dooks
  -1 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-17 10:50 UTC (permalink / raw)
  To: Andrew Jones, linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

On 17/02/2023 10:18, Ben Dooks wrote:
> On 09/02/2023 15:26, Andrew Jones wrote:
>> Using memset() to zero a 4K page takes 563 total instructions, where
>> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
>> takes 169 total instructions, where 4 are branches and 33 are nops.
>> Even though the block size is a variable, thanks to alternatives, we
>> can still implement a Duff device without having to do any preliminary
>> calculations. This is achieved by taking advantage of 'vendor_id'
>> being used as application-specific data for alternatives, enabling us
>> to stop patching / unrolling when 4K bytes have been zeroed (we would
>> loop and continue after 4K if the page size would be larger)
>>
>> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
>> only loop a few times and larger block sizes to not loop at all. Since
>> cbo.zero doesn't take an offset, we also need an 'add' after each
>> instruction, making the loop body 112 to 160 bytes. Hopefully this
>> is small enough to not cause icache misses.
>>
>> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
>> Acked-by: Conor Dooley <conor.dooley@microchip.com>
>> ---
>>   arch/riscv/Kconfig                | 13 ++++++
>>   arch/riscv/include/asm/insn-def.h |  4 ++
>>   arch/riscv/include/asm/page.h     |  6 ++-
>>   arch/riscv/kernel/cpufeature.c    | 11 +++++
>>   arch/riscv/lib/Makefile           |  1 +
>>   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
>>   6 files changed, 105 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/riscv/lib/clear_page.S
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 029d1d3b40bd..9590a1661caf 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
>>          If you don't know what to do here, say Y.
>> +config RISCV_ISA_ZICBOZ
>> +    bool "Zicboz extension support for faster zeroing of memory"
>> +    depends on !XIP_KERNEL && MMU
>> +    select RISCV_ALTERNATIVE
>> +    default y
>> +    help
>> +       Enable the use of the ZICBOZ extension (cbo.zero instruction)
>> +       when available.
>> +
>> +       The Zicboz extension is used for faster zeroing of memory.
>> +
>> +       If you don't know what to do here, say Y.
>> +
>>   config TOOLCHAIN_HAS_ZIHINTPAUSE
>>       bool
>>       default y
>> diff --git a/arch/riscv/include/asm/insn-def.h 
>> b/arch/riscv/include/asm/insn-def.h
>> index e01ab51f50d2..6960beb75f32 100644
>> --- a/arch/riscv/include/asm/insn-def.h
>> +++ b/arch/riscv/include/asm/insn-def.h
>> @@ -192,4 +192,8 @@
>>       INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
>>              RS1(base), SIMM12(2))
>> +#define CBO_zero(base)                        \
>> +    INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
>> +           RS1(base), SIMM12(4))
>> +
>>   #endif /* __ASM_INSN_DEF_H */
>> diff --git a/arch/riscv/include/asm/page.h 
>> b/arch/riscv/include/asm/page.h
>> index 9f432c1b5289..ccd168fe29d2 100644
>> --- a/arch/riscv/include/asm/page.h
>> +++ b/arch/riscv/include/asm/page.h
>> @@ -49,10 +49,14 @@
>>   #ifndef __ASSEMBLY__
>> +#ifdef CONFIG_RISCV_ISA_ZICBOZ
>> +void clear_page(void *page);
>> +#else
>>   #define clear_page(pgaddr)            memset((pgaddr), 0, PAGE_SIZE)
>> +#endif
>>   #define copy_page(to, from)            memcpy((to), (from), PAGE_SIZE)
>> -#define clear_user_page(pgaddr, vaddr, page)    memset((pgaddr), 0, 
>> PAGE_SIZE)
>> +#define clear_user_page(pgaddr, vaddr, page)    clear_page(pgaddr)
>>   #define copy_user_page(vto, vfrom, vaddr, topg) \
>>               memcpy((vto), (vfrom), PAGE_SIZE)
>> diff --git a/arch/riscv/kernel/cpufeature.c 
>> b/arch/riscv/kernel/cpufeature.c
>> index 74736b4f0624..42246bbfa532 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>>   #ifdef CONFIG_RISCV_ALTERNATIVE
>>   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>>   {
>> +    switch (feature) {
>> +    case RISCV_ISA_EXT_ZICBOZ:
>> +        /*
>> +         * Zicboz alternative applications provide the maximum
>> +         * supported block size order, or zero when it doesn't
>> +         * matter. If the current block size exceeds the maximum,
>> +         * then the alternative cannot be applied.
>> +         */
>> +        return data == 0 || riscv_cboz_block_size <= (1U << data);
>> +    }



>> +
>>       return data == 0;
>>   }
>> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
>> index 6c74b0bedd60..26cb2502ecf8 100644
>> --- a/arch/riscv/lib/Makefile
>> +++ b/arch/riscv/lib/Makefile
>> @@ -8,5 +8,6 @@ lib-y            += strlen.o
>>   lib-y            += strncmp.o
>>   lib-$(CONFIG_MMU)    += uaccess.o
>>   lib-$(CONFIG_64BIT)    += tishift.o
>> +lib-$(CONFIG_RISCV_ISA_ZICBOZ)    += clear_page.o
>>   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>> diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
>> new file mode 100644
>> index 000000000000..5b851e727f7c
>> --- /dev/null
>> +++ b/arch/riscv/lib/clear_page.S
>> @@ -0,0 +1,71 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (c) 2023 Ventana Micro Systems Inc.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/asm.h>
>> +#include <asm/alternative-macros.h>
>> +#include <asm/hwcap.h>
>> +#include <asm/insn-def.h>
>> +#include <asm/page.h>
>> +
>> +#define CBOZ_ALT(order, old, new)    \
>> +    ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, 
>> CONFIG_RISCV_ISA_ZICBOZ)
> 
> I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> alternatives?
> 
> I may also be missing something, but when backporting this to 5.19
> to test it on our system the "order" argument is in fact the vendor-id
> so this doesn't work as the alternatives don't get patched in at-all.

So it looks like when backporting I missed the updated in
arch/riscv/kernel/cpufeature.c for testing the block size
which explains the issues I was seeing with the assembly
code.

I'm not sure why it wouldn't assemble for me with the
RISCV_ISA_EXT_ZICBOZ being undefined.

With using vendor-id with the RISCV_ISA_EXT_ZICBOZ as the
errata-id, would the RISCV_ISA_EXT space need to be reserved
for any vendor specific IDs?


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-17 10:50       ` Ben Dooks
  0 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-17 10:50 UTC (permalink / raw)
  To: Andrew Jones, linux-riscv, kvm-riscv, devicetree
  Cc: 'Anup Patel ', 'Palmer Dabbelt ',
	'Paul Walmsley ', 'Krzysztof Kozlowski ',
	'Atish Patra ', 'Heiko Stuebner ',
	'Jisheng Zhang ', 'Rob Herring ',
	'Albert Ou ', 'Conor Dooley '

On 17/02/2023 10:18, Ben Dooks wrote:
> On 09/02/2023 15:26, Andrew Jones wrote:
>> Using memset() to zero a 4K page takes 563 total instructions, where
>> 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
>> takes 169 total instructions, where 4 are branches and 33 are nops.
>> Even though the block size is a variable, thanks to alternatives, we
>> can still implement a Duff device without having to do any preliminary
>> calculations. This is achieved by taking advantage of 'vendor_id'
>> being used as application-specific data for alternatives, enabling us
>> to stop patching / unrolling when 4K bytes have been zeroed (we would
>> loop and continue after 4K if the page size would be larger)
>>
>> For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
>> only loop a few times and larger block sizes to not loop at all. Since
>> cbo.zero doesn't take an offset, we also need an 'add' after each
>> instruction, making the loop body 112 to 160 bytes. Hopefully this
>> is small enough to not cause icache misses.
>>
>> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
>> Acked-by: Conor Dooley <conor.dooley@microchip.com>
>> ---
>>   arch/riscv/Kconfig                | 13 ++++++
>>   arch/riscv/include/asm/insn-def.h |  4 ++
>>   arch/riscv/include/asm/page.h     |  6 ++-
>>   arch/riscv/kernel/cpufeature.c    | 11 +++++
>>   arch/riscv/lib/Makefile           |  1 +
>>   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
>>   6 files changed, 105 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/riscv/lib/clear_page.S
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 029d1d3b40bd..9590a1661caf 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
>>          If you don't know what to do here, say Y.
>> +config RISCV_ISA_ZICBOZ
>> +    bool "Zicboz extension support for faster zeroing of memory"
>> +    depends on !XIP_KERNEL && MMU
>> +    select RISCV_ALTERNATIVE
>> +    default y
>> +    help
>> +       Enable the use of the ZICBOZ extension (cbo.zero instruction)
>> +       when available.
>> +
>> +       The Zicboz extension is used for faster zeroing of memory.
>> +
>> +       If you don't know what to do here, say Y.
>> +
>>   config TOOLCHAIN_HAS_ZIHINTPAUSE
>>       bool
>>       default y
>> diff --git a/arch/riscv/include/asm/insn-def.h 
>> b/arch/riscv/include/asm/insn-def.h
>> index e01ab51f50d2..6960beb75f32 100644
>> --- a/arch/riscv/include/asm/insn-def.h
>> +++ b/arch/riscv/include/asm/insn-def.h
>> @@ -192,4 +192,8 @@
>>       INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
>>              RS1(base), SIMM12(2))
>> +#define CBO_zero(base)                        \
>> +    INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
>> +           RS1(base), SIMM12(4))
>> +
>>   #endif /* __ASM_INSN_DEF_H */
>> diff --git a/arch/riscv/include/asm/page.h 
>> b/arch/riscv/include/asm/page.h
>> index 9f432c1b5289..ccd168fe29d2 100644
>> --- a/arch/riscv/include/asm/page.h
>> +++ b/arch/riscv/include/asm/page.h
>> @@ -49,10 +49,14 @@
>>   #ifndef __ASSEMBLY__
>> +#ifdef CONFIG_RISCV_ISA_ZICBOZ
>> +void clear_page(void *page);
>> +#else
>>   #define clear_page(pgaddr)            memset((pgaddr), 0, PAGE_SIZE)
>> +#endif
>>   #define copy_page(to, from)            memcpy((to), (from), PAGE_SIZE)
>> -#define clear_user_page(pgaddr, vaddr, page)    memset((pgaddr), 0, 
>> PAGE_SIZE)
>> +#define clear_user_page(pgaddr, vaddr, page)    clear_page(pgaddr)
>>   #define copy_user_page(vto, vfrom, vaddr, topg) \
>>               memcpy((vto), (vfrom), PAGE_SIZE)
>> diff --git a/arch/riscv/kernel/cpufeature.c 
>> b/arch/riscv/kernel/cpufeature.c
>> index 74736b4f0624..42246bbfa532 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
>>   #ifdef CONFIG_RISCV_ALTERNATIVE
>>   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
>>   {
>> +    switch (feature) {
>> +    case RISCV_ISA_EXT_ZICBOZ:
>> +        /*
>> +         * Zicboz alternative applications provide the maximum
>> +         * supported block size order, or zero when it doesn't
>> +         * matter. If the current block size exceeds the maximum,
>> +         * then the alternative cannot be applied.
>> +         */
>> +        return data == 0 || riscv_cboz_block_size <= (1U << data);
>> +    }



>> +
>>       return data == 0;
>>   }
>> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
>> index 6c74b0bedd60..26cb2502ecf8 100644
>> --- a/arch/riscv/lib/Makefile
>> +++ b/arch/riscv/lib/Makefile
>> @@ -8,5 +8,6 @@ lib-y            += strlen.o
>>   lib-y            += strncmp.o
>>   lib-$(CONFIG_MMU)    += uaccess.o
>>   lib-$(CONFIG_64BIT)    += tishift.o
>> +lib-$(CONFIG_RISCV_ISA_ZICBOZ)    += clear_page.o
>>   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
>> diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
>> new file mode 100644
>> index 000000000000..5b851e727f7c
>> --- /dev/null
>> +++ b/arch/riscv/lib/clear_page.S
>> @@ -0,0 +1,71 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (c) 2023 Ventana Micro Systems Inc.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/asm.h>
>> +#include <asm/alternative-macros.h>
>> +#include <asm/hwcap.h>
>> +#include <asm/insn-def.h>
>> +#include <asm/page.h>
>> +
>> +#define CBOZ_ALT(order, old, new)    \
>> +    ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, 
>> CONFIG_RISCV_ISA_ZICBOZ)
> 
> I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> alternatives?
> 
> I may also be missing something, but when backporting this to 5.19
> to test it on our system the "order" argument is in fact the vendor-id
> so this doesn't work as the alternatives don't get patched in at-all.

So it looks like when backporting I missed the updated in
arch/riscv/kernel/cpufeature.c for testing the block size
which explains the issues I was seeing with the assembly
code.

I'm not sure why it wouldn't assemble for me with the
RISCV_ISA_EXT_ZICBOZ being undefined.

With using vendor-id with the RISCV_ISA_EXT_ZICBOZ as the
errata-id, would the RISCV_ISA_EXT space need to be reserved
for any vendor specific IDs?


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-17 10:50       ` Ben Dooks
@ 2023-02-17 12:29         ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-17 12:29 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
> On 17/02/2023 10:18, Ben Dooks wrote:
> > On 09/02/2023 15:26, Andrew Jones wrote:
> > > Using memset() to zero a 4K page takes 563 total instructions, where
> > > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > > takes 169 total instructions, where 4 are branches and 33 are nops.
> > > Even though the block size is a variable, thanks to alternatives, we
> > > can still implement a Duff device without having to do any preliminary
> > > calculations. This is achieved by taking advantage of 'vendor_id'
> > > being used as application-specific data for alternatives, enabling us
> > > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > > loop and continue after 4K if the page size would be larger)
> > > 
> > > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > > only loop a few times and larger block sizes to not loop at all. Since
> > > cbo.zero doesn't take an offset, we also need an 'add' after each
> > > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > > is small enough to not cause icache misses.
> > > 
> > > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > > ---
> > >   arch/riscv/Kconfig                | 13 ++++++
> > >   arch/riscv/include/asm/insn-def.h |  4 ++
> > >   arch/riscv/include/asm/page.h     |  6 ++-
> > >   arch/riscv/kernel/cpufeature.c    | 11 +++++
> > >   arch/riscv/lib/Makefile           |  1 +
> > >   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
> > >   6 files changed, 105 insertions(+), 1 deletion(-)
> > >   create mode 100644 arch/riscv/lib/clear_page.S
> > > 
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 029d1d3b40bd..9590a1661caf 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
> > >          If you don't know what to do here, say Y.
> > > +config RISCV_ISA_ZICBOZ
> > > +    bool "Zicboz extension support for faster zeroing of memory"
> > > +    depends on !XIP_KERNEL && MMU
> > > +    select RISCV_ALTERNATIVE
> > > +    default y
> > > +    help
> > > +       Enable the use of the ZICBOZ extension (cbo.zero instruction)
> > > +       when available.
> > > +
> > > +       The Zicboz extension is used for faster zeroing of memory.
> > > +
> > > +       If you don't know what to do here, say Y.
> > > +
> > >   config TOOLCHAIN_HAS_ZIHINTPAUSE
> > >       bool
> > >       default y
> > > diff --git a/arch/riscv/include/asm/insn-def.h
> > > b/arch/riscv/include/asm/insn-def.h
> > > index e01ab51f50d2..6960beb75f32 100644
> > > --- a/arch/riscv/include/asm/insn-def.h
> > > +++ b/arch/riscv/include/asm/insn-def.h
> > > @@ -192,4 +192,8 @@
> > >       INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
> > >              RS1(base), SIMM12(2))
> > > +#define CBO_zero(base)                        \
> > > +    INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
> > > +           RS1(base), SIMM12(4))
> > > +
> > >   #endif /* __ASM_INSN_DEF_H */
> > > diff --git a/arch/riscv/include/asm/page.h
> > > b/arch/riscv/include/asm/page.h
> > > index 9f432c1b5289..ccd168fe29d2 100644
> > > --- a/arch/riscv/include/asm/page.h
> > > +++ b/arch/riscv/include/asm/page.h
> > > @@ -49,10 +49,14 @@
> > >   #ifndef __ASSEMBLY__
> > > +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> > > +void clear_page(void *page);
> > > +#else
> > >   #define clear_page(pgaddr)            memset((pgaddr), 0, PAGE_SIZE)
> > > +#endif
> > >   #define copy_page(to, from)            memcpy((to), (from), PAGE_SIZE)
> > > -#define clear_user_page(pgaddr, vaddr, page)    memset((pgaddr), 0,
> > > PAGE_SIZE)
> > > +#define clear_user_page(pgaddr, vaddr, page)    clear_page(pgaddr)
> > >   #define copy_user_page(vto, vfrom, vaddr, topg) \
> > >               memcpy((vto), (vfrom), PAGE_SIZE)
> > > diff --git a/arch/riscv/kernel/cpufeature.c
> > > b/arch/riscv/kernel/cpufeature.c
> > > index 74736b4f0624..42246bbfa532 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> > >   #ifdef CONFIG_RISCV_ALTERNATIVE
> > >   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > >   {
> > > +    switch (feature) {
> > > +    case RISCV_ISA_EXT_ZICBOZ:
> > > +        /*
> > > +         * Zicboz alternative applications provide the maximum
> > > +         * supported block size order, or zero when it doesn't
> > > +         * matter. If the current block size exceeds the maximum,
> > > +         * then the alternative cannot be applied.
> > > +         */
> > > +        return data == 0 || riscv_cboz_block_size <= (1U << data);
> > > +    }
> 
> 
> 
> > > +
> > >       return data == 0;
> > >   }
> > > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > > index 6c74b0bedd60..26cb2502ecf8 100644
> > > --- a/arch/riscv/lib/Makefile
> > > +++ b/arch/riscv/lib/Makefile
> > > @@ -8,5 +8,6 @@ lib-y            += strlen.o
> > >   lib-y            += strncmp.o
> > >   lib-$(CONFIG_MMU)    += uaccess.o
> > >   lib-$(CONFIG_64BIT)    += tishift.o
> > > +lib-$(CONFIG_RISCV_ISA_ZICBOZ)    += clear_page.o
> > >   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > > diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> > > new file mode 100644
> > > index 000000000000..5b851e727f7c
> > > --- /dev/null
> > > +++ b/arch/riscv/lib/clear_page.S
> > > @@ -0,0 +1,71 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2023 Ventana Micro Systems Inc.
> > > + */
> > > +
> > > +#include <linux/linkage.h>
> > > +#include <asm/asm.h>
> > > +#include <asm/alternative-macros.h>
> > > +#include <asm/hwcap.h>
> > > +#include <asm/insn-def.h>
> > > +#include <asm/page.h>
> > > +
> > > +#define CBOZ_ALT(order, old, new)    \
> > > +    ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ,
> > > CONFIG_RISCV_ISA_ZICBOZ)
> > 
> > I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> > alternatives?
> > 
> > I may also be missing something, but when backporting this to 5.19
> > to test it on our system the "order" argument is in fact the vendor-id
> > so this doesn't work as the alternatives don't get patched in at-all.
> 
> So it looks like when backporting I missed the updated in
> arch/riscv/kernel/cpufeature.c for testing the block size
> which explains the issues I was seeing with the assembly
> code.
> 
> I'm not sure why it wouldn't assemble for me with the
> RISCV_ISA_EXT_ZICBOZ being undefined.
> 
> With using vendor-id with the RISCV_ISA_EXT_ZICBOZ as the
> errata-id, would the RISCV_ISA_EXT space need to be reserved
> for any vendor specific IDs?

Hi Ben,

I'll be sending a new version where I don't touch vendor-id anymore, but
rather break errata_id into two parts: id and application-data.

Thanks,
drew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-17 12:29         ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-17 12:29 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
> On 17/02/2023 10:18, Ben Dooks wrote:
> > On 09/02/2023 15:26, Andrew Jones wrote:
> > > Using memset() to zero a 4K page takes 563 total instructions, where
> > > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > > takes 169 total instructions, where 4 are branches and 33 are nops.
> > > Even though the block size is a variable, thanks to alternatives, we
> > > can still implement a Duff device without having to do any preliminary
> > > calculations. This is achieved by taking advantage of 'vendor_id'
> > > being used as application-specific data for alternatives, enabling us
> > > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > > loop and continue after 4K if the page size would be larger)
> > > 
> > > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > > only loop a few times and larger block sizes to not loop at all. Since
> > > cbo.zero doesn't take an offset, we also need an 'add' after each
> > > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > > is small enough to not cause icache misses.
> > > 
> > > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > > ---
> > >   arch/riscv/Kconfig                | 13 ++++++
> > >   arch/riscv/include/asm/insn-def.h |  4 ++
> > >   arch/riscv/include/asm/page.h     |  6 ++-
> > >   arch/riscv/kernel/cpufeature.c    | 11 +++++
> > >   arch/riscv/lib/Makefile           |  1 +
> > >   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
> > >   6 files changed, 105 insertions(+), 1 deletion(-)
> > >   create mode 100644 arch/riscv/lib/clear_page.S
> > > 
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 029d1d3b40bd..9590a1661caf 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
> > >          If you don't know what to do here, say Y.
> > > +config RISCV_ISA_ZICBOZ
> > > +    bool "Zicboz extension support for faster zeroing of memory"
> > > +    depends on !XIP_KERNEL && MMU
> > > +    select RISCV_ALTERNATIVE
> > > +    default y
> > > +    help
> > > +       Enable the use of the ZICBOZ extension (cbo.zero instruction)
> > > +       when available.
> > > +
> > > +       The Zicboz extension is used for faster zeroing of memory.
> > > +
> > > +       If you don't know what to do here, say Y.
> > > +
> > >   config TOOLCHAIN_HAS_ZIHINTPAUSE
> > >       bool
> > >       default y
> > > diff --git a/arch/riscv/include/asm/insn-def.h
> > > b/arch/riscv/include/asm/insn-def.h
> > > index e01ab51f50d2..6960beb75f32 100644
> > > --- a/arch/riscv/include/asm/insn-def.h
> > > +++ b/arch/riscv/include/asm/insn-def.h
> > > @@ -192,4 +192,8 @@
> > >       INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
> > >              RS1(base), SIMM12(2))
> > > +#define CBO_zero(base)                        \
> > > +    INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),        \
> > > +           RS1(base), SIMM12(4))
> > > +
> > >   #endif /* __ASM_INSN_DEF_H */
> > > diff --git a/arch/riscv/include/asm/page.h
> > > b/arch/riscv/include/asm/page.h
> > > index 9f432c1b5289..ccd168fe29d2 100644
> > > --- a/arch/riscv/include/asm/page.h
> > > +++ b/arch/riscv/include/asm/page.h
> > > @@ -49,10 +49,14 @@
> > >   #ifndef __ASSEMBLY__
> > > +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> > > +void clear_page(void *page);
> > > +#else
> > >   #define clear_page(pgaddr)            memset((pgaddr), 0, PAGE_SIZE)
> > > +#endif
> > >   #define copy_page(to, from)            memcpy((to), (from), PAGE_SIZE)
> > > -#define clear_user_page(pgaddr, vaddr, page)    memset((pgaddr), 0,
> > > PAGE_SIZE)
> > > +#define clear_user_page(pgaddr, vaddr, page)    clear_page(pgaddr)
> > >   #define copy_user_page(vto, vfrom, vaddr, topg) \
> > >               memcpy((vto), (vfrom), PAGE_SIZE)
> > > diff --git a/arch/riscv/kernel/cpufeature.c
> > > b/arch/riscv/kernel/cpufeature.c
> > > index 74736b4f0624..42246bbfa532 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> > >   #ifdef CONFIG_RISCV_ALTERNATIVE
> > >   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> > >   {
> > > +    switch (feature) {
> > > +    case RISCV_ISA_EXT_ZICBOZ:
> > > +        /*
> > > +         * Zicboz alternative applications provide the maximum
> > > +         * supported block size order, or zero when it doesn't
> > > +         * matter. If the current block size exceeds the maximum,
> > > +         * then the alternative cannot be applied.
> > > +         */
> > > +        return data == 0 || riscv_cboz_block_size <= (1U << data);
> > > +    }
> 
> 
> 
> > > +
> > >       return data == 0;
> > >   }
> > > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > > index 6c74b0bedd60..26cb2502ecf8 100644
> > > --- a/arch/riscv/lib/Makefile
> > > +++ b/arch/riscv/lib/Makefile
> > > @@ -8,5 +8,6 @@ lib-y            += strlen.o
> > >   lib-y            += strncmp.o
> > >   lib-$(CONFIG_MMU)    += uaccess.o
> > >   lib-$(CONFIG_64BIT)    += tishift.o
> > > +lib-$(CONFIG_RISCV_ISA_ZICBOZ)    += clear_page.o
> > >   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > > diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> > > new file mode 100644
> > > index 000000000000..5b851e727f7c
> > > --- /dev/null
> > > +++ b/arch/riscv/lib/clear_page.S
> > > @@ -0,0 +1,71 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2023 Ventana Micro Systems Inc.
> > > + */
> > > +
> > > +#include <linux/linkage.h>
> > > +#include <asm/asm.h>
> > > +#include <asm/alternative-macros.h>
> > > +#include <asm/hwcap.h>
> > > +#include <asm/insn-def.h>
> > > +#include <asm/page.h>
> > > +
> > > +#define CBOZ_ALT(order, old, new)    \
> > > +    ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ,
> > > CONFIG_RISCV_ISA_ZICBOZ)
> > 
> > I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> > alternatives?
> > 
> > I may also be missing something, but when backporting this to 5.19
> > to test it on our system the "order" argument is in fact the vendor-id
> > so this doesn't work as the alternatives don't get patched in at-all.
> 
> So it looks like when backporting I missed the updated in
> arch/riscv/kernel/cpufeature.c for testing the block size
> which explains the issues I was seeing with the assembly
> code.
> 
> I'm not sure why it wouldn't assemble for me with the
> RISCV_ISA_EXT_ZICBOZ being undefined.
> 
> With using vendor-id with the RISCV_ISA_EXT_ZICBOZ as the
> errata-id, would the RISCV_ISA_EXT space need to be reserved
> for any vendor specific IDs?

Hi Ben,

I'll be sending a new version where I don't touch vendor-id anymore, but
rather break errata_id into two parts: id and application-data.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-17 10:18     ` Ben Dooks
@ 2023-02-17 12:44       ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-17 12:44 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 17, 2023 at 10:18:26AM +0000, Ben Dooks wrote:
> On 09/02/2023 15:26, Andrew Jones wrote:
> > Using memset() to zero a 4K page takes 563 total instructions, where
> > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > takes 169 total instructions, where 4 are branches and 33 are nops.
> > Even though the block size is a variable, thanks to alternatives, we
> > can still implement a Duff device without having to do any preliminary
> > calculations. This is achieved by taking advantage of 'vendor_id'
> > being used as application-specific data for alternatives, enabling us
> > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > loop and continue after 4K if the page size would be larger)
> > 
> > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > only loop a few times and larger block sizes to not loop at all. Since
> > cbo.zero doesn't take an offset, we also need an 'add' after each
> > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > is small enough to not cause icache misses.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > ---
> >   arch/riscv/Kconfig                | 13 ++++++
> >   arch/riscv/include/asm/insn-def.h |  4 ++
> >   arch/riscv/include/asm/page.h     |  6 ++-
> >   arch/riscv/kernel/cpufeature.c    | 11 +++++
> >   arch/riscv/lib/Makefile           |  1 +
> >   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
> >   6 files changed, 105 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/riscv/lib/clear_page.S
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 029d1d3b40bd..9590a1661caf 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
> >   	   If you don't know what to do here, say Y.
> > +config RISCV_ISA_ZICBOZ
> > +	bool "Zicboz extension support for faster zeroing of memory"
> > +	depends on !XIP_KERNEL && MMU
> > +	select RISCV_ALTERNATIVE
> > +	default y
> > +	help
> > +	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
> > +	   when available.
> > +
> > +	   The Zicboz extension is used for faster zeroing of memory.
> > +
> > +	   If you don't know what to do here, say Y.
> > +
> >   config TOOLCHAIN_HAS_ZIHINTPAUSE
> >   	bool
> >   	default y
> > diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
> > index e01ab51f50d2..6960beb75f32 100644
> > --- a/arch/riscv/include/asm/insn-def.h
> > +++ b/arch/riscv/include/asm/insn-def.h
> > @@ -192,4 +192,8 @@
> >   	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> >   	       RS1(base), SIMM12(2))
> > +#define CBO_zero(base)						\
> > +	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> > +	       RS1(base), SIMM12(4))
> > +
> >   #endif /* __ASM_INSN_DEF_H */
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 9f432c1b5289..ccd168fe29d2 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -49,10 +49,14 @@
> >   #ifndef __ASSEMBLY__
> > +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> > +void clear_page(void *page);
> > +#else
> >   #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
> > +#endif
> >   #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
> > -#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
> > +#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
> >   #define copy_user_page(vto, vfrom, vaddr, topg) \
> >   			memcpy((vto), (vfrom), PAGE_SIZE)
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 74736b4f0624..42246bbfa532 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> >   #ifdef CONFIG_RISCV_ALTERNATIVE
> >   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> >   {
> > +	switch (feature) {
> > +	case RISCV_ISA_EXT_ZICBOZ:
> > +		/*
> > +		 * Zicboz alternative applications provide the maximum
> > +		 * supported block size order, or zero when it doesn't
> > +		 * matter. If the current block size exceeds the maximum,
> > +		 * then the alternative cannot be applied.
> > +		 */
> > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > +	}
> > +
> >   	return data == 0;
> >   }
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 6c74b0bedd60..26cb2502ecf8 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -8,5 +8,6 @@ lib-y			+= strlen.o
> >   lib-y			+= strncmp.o
> >   lib-$(CONFIG_MMU)	+= uaccess.o
> >   lib-$(CONFIG_64BIT)	+= tishift.o
> > +lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
> >   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> > new file mode 100644
> > index 000000000000..5b851e727f7c
> > --- /dev/null
> > +++ b/arch/riscv/lib/clear_page.S
> > @@ -0,0 +1,71 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2023 Ventana Micro Systems Inc.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/hwcap.h>
> > +#include <asm/insn-def.h>
> > +#include <asm/page.h>
> > +
> > +#define CBOZ_ALT(order, old, new)	\
> > +	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)
> 
> I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> alternatives?
> 
> I may also be missing something, but when backporting this to 5.19
> to test it on our system the "order" argument is in fact the vendor-id
> so this doesn't work as the alternatives don't get patched in at-all.

If I understood your follow-up message correctly, then you've already
determined that the 5.19 base needs more patches for this to be applied,
and it's now resolved.

> 
> > +
> > +/* void clear_page(void *page) */
> > +ENTRY(__clear_page)
> > +WEAK(clear_page)
> > +	li	a2, PAGE_SIZE
> > +
> > +	/*
> > +	 * If Zicboz isn't present, or somehow has a block
> > +	 * size larger than 4K, then fallback to memset.
> > +	 */
> > +	CBOZ_ALT(12, "j .Lno_zicboz", "nop")
> 
> I can't see how the CBOZ_ALT is checking for the CBOZ block
> size being bigger than 4k... I guess we should have also
> tested the block size is an exact multiple of page size?

I think you've resolved this as well with more patches in your backport.
But just to follow-up here, both checks, >= size and power-of-two, are
done elsewhere by other patches.

> 
> It would also be better if we just didn't enable it and printed
> an warn when we initialise and then never advertise the feature
> in the first place?

I'm not sure I understand this suggestion. We won't advertise the
feature when it isn't present, but if we compile a kernel that supports
it, then we need to have a fallback for when it isn't present. That's
what this does. It shouldn't warn, as it's not an error to boot a
zicboz capable kernel on a platform that doesn't have zicboz.

> 
> > +
> > +	lw	a1, riscv_cboz_block_size
> > +	add	a2, a0, a2
> > +.Lzero_loop:
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> 
> I'm also wondering why we are using CBOZ_ALT past the first
> check. I don't see why it shouldn't just be a loop with a siz
> check like:
> 
> Lzero_loop:
> 	CBO_zero(a0)
> 	add	a0, a0, a1
> 	blge	a0, a2, .Lzero_done
> 	....
> 

Because then we wouldn't be implementing an unrolled loop :-)

> 
> If you wanted to do multiple CBO_zero(a0) then maybe testing
> and branching would be easier to allow a certain amount of
> loop unrolling.

I want to eliminate as many branches as possible. Alternatives
give me the ability to do that.

> 
> 
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	bltu	a0, a2, .Lzero_loop
> > +	ret
> > +.Lno_zicboz:
> > +	li	a1, 0
> > +	tail	__memset
> > +END(__clear_page)
> 
> Whilst this seems to work, I am not sure why and it probably wants
> more testing.

More testing is always a good idea, but I do know how it supposed to
work :-) Maybe I should add some more comments to make it more clear?

Thanks,
drew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-17 12:44       ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-17 12:44 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Fri, Feb 17, 2023 at 10:18:26AM +0000, Ben Dooks wrote:
> On 09/02/2023 15:26, Andrew Jones wrote:
> > Using memset() to zero a 4K page takes 563 total instructions, where
> > 20 are branches. clear_page(), with Zicboz and a 64 byte block size,
> > takes 169 total instructions, where 4 are branches and 33 are nops.
> > Even though the block size is a variable, thanks to alternatives, we
> > can still implement a Duff device without having to do any preliminary
> > calculations. This is achieved by taking advantage of 'vendor_id'
> > being used as application-specific data for alternatives, enabling us
> > to stop patching / unrolling when 4K bytes have been zeroed (we would
> > loop and continue after 4K if the page size would be larger)
> > 
> > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
> > only loop a few times and larger block sizes to not loop at all. Since
> > cbo.zero doesn't take an offset, we also need an 'add' after each
> > instruction, making the loop body 112 to 160 bytes. Hopefully this
> > is small enough to not cause icache misses.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > Acked-by: Conor Dooley <conor.dooley@microchip.com>
> > ---
> >   arch/riscv/Kconfig                | 13 ++++++
> >   arch/riscv/include/asm/insn-def.h |  4 ++
> >   arch/riscv/include/asm/page.h     |  6 ++-
> >   arch/riscv/kernel/cpufeature.c    | 11 +++++
> >   arch/riscv/lib/Makefile           |  1 +
> >   arch/riscv/lib/clear_page.S       | 71 +++++++++++++++++++++++++++++++
> >   6 files changed, 105 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/riscv/lib/clear_page.S
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 029d1d3b40bd..9590a1661caf 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -456,6 +456,19 @@ config RISCV_ISA_ZICBOM
> >   	   If you don't know what to do here, say Y.
> > +config RISCV_ISA_ZICBOZ
> > +	bool "Zicboz extension support for faster zeroing of memory"
> > +	depends on !XIP_KERNEL && MMU
> > +	select RISCV_ALTERNATIVE
> > +	default y
> > +	help
> > +	   Enable the use of the ZICBOZ extension (cbo.zero instruction)
> > +	   when available.
> > +
> > +	   The Zicboz extension is used for faster zeroing of memory.
> > +
> > +	   If you don't know what to do here, say Y.
> > +
> >   config TOOLCHAIN_HAS_ZIHINTPAUSE
> >   	bool
> >   	default y
> > diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
> > index e01ab51f50d2..6960beb75f32 100644
> > --- a/arch/riscv/include/asm/insn-def.h
> > +++ b/arch/riscv/include/asm/insn-def.h
> > @@ -192,4 +192,8 @@
> >   	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> >   	       RS1(base), SIMM12(2))
> > +#define CBO_zero(base)						\
> > +	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
> > +	       RS1(base), SIMM12(4))
> > +
> >   #endif /* __ASM_INSN_DEF_H */
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 9f432c1b5289..ccd168fe29d2 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -49,10 +49,14 @@
> >   #ifndef __ASSEMBLY__
> > +#ifdef CONFIG_RISCV_ISA_ZICBOZ
> > +void clear_page(void *page);
> > +#else
> >   #define clear_page(pgaddr)			memset((pgaddr), 0, PAGE_SIZE)
> > +#endif
> >   #define copy_page(to, from)			memcpy((to), (from), PAGE_SIZE)
> > -#define clear_user_page(pgaddr, vaddr, page)	memset((pgaddr), 0, PAGE_SIZE)
> > +#define clear_user_page(pgaddr, vaddr, page)	clear_page(pgaddr)
> >   #define copy_user_page(vto, vfrom, vaddr, topg) \
> >   			memcpy((vto), (vfrom), PAGE_SIZE)
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 74736b4f0624..42246bbfa532 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void)
> >   #ifdef CONFIG_RISCV_ALTERNATIVE
> >   static bool riscv_cpufeature_application_check(u32 feature, u16 data)
> >   {
> > +	switch (feature) {
> > +	case RISCV_ISA_EXT_ZICBOZ:
> > +		/*
> > +		 * Zicboz alternative applications provide the maximum
> > +		 * supported block size order, or zero when it doesn't
> > +		 * matter. If the current block size exceeds the maximum,
> > +		 * then the alternative cannot be applied.
> > +		 */
> > +		return data == 0 || riscv_cboz_block_size <= (1U << data);
> > +	}
> > +
> >   	return data == 0;
> >   }
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 6c74b0bedd60..26cb2502ecf8 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -8,5 +8,6 @@ lib-y			+= strlen.o
> >   lib-y			+= strncmp.o
> >   lib-$(CONFIG_MMU)	+= uaccess.o
> >   lib-$(CONFIG_64BIT)	+= tishift.o
> > +lib-$(CONFIG_RISCV_ISA_ZICBOZ)	+= clear_page.o
> >   obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > diff --git a/arch/riscv/lib/clear_page.S b/arch/riscv/lib/clear_page.S
> > new file mode 100644
> > index 000000000000..5b851e727f7c
> > --- /dev/null
> > +++ b/arch/riscv/lib/clear_page.S
> > @@ -0,0 +1,71 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2023 Ventana Micro Systems Inc.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/hwcap.h>
> > +#include <asm/insn-def.h>
> > +#include <asm/page.h>
> > +
> > +#define CBOZ_ALT(order, old, new)	\
> > +	ALTERNATIVE(old, new, order, RISCV_ISA_EXT_ZICBOZ, CONFIG_RISCV_ISA_ZICBOZ)
> 
> I thought this was meant to be a CPUFEATURE_ZICBOZ thing for the
> alternatives?
> 
> I may also be missing something, but when backporting this to 5.19
> to test it on our system the "order" argument is in fact the vendor-id
> so this doesn't work as the alternatives don't get patched in at-all.

If I understood your follow-up message correctly, then you've already
determined that the 5.19 base needs more patches for this to be applied,
and it's now resolved.

> 
> > +
> > +/* void clear_page(void *page) */
> > +ENTRY(__clear_page)
> > +WEAK(clear_page)
> > +	li	a2, PAGE_SIZE
> > +
> > +	/*
> > +	 * If Zicboz isn't present, or somehow has a block
> > +	 * size larger than 4K, then fallback to memset.
> > +	 */
> > +	CBOZ_ALT(12, "j .Lno_zicboz", "nop")
> 
> I can't see how the CBOZ_ALT is checking for the CBOZ block
> size being bigger than 4k... I guess we should have also
> tested the block size is an exact multiple of page size?

I think you've resolved this as well with more patches in your backport.
But just to follow-up here, both checks, >= size and power-of-two, are
done elsewhere by other patches.

> 
> It would also be better if we just didn't enable it and printed
> an warn when we initialise and then never advertise the feature
> in the first place?

I'm not sure I understand this suggestion. We won't advertise the
feature when it isn't present, but if we compile a kernel that supports
it, then we need to have a fallback for when it isn't present. That's
what this does. It shouldn't warn, as it's not an error to boot a
zicboz capable kernel on a platform that doesn't have zicboz.

> 
> > +
> > +	lw	a1, riscv_cboz_block_size
> > +	add	a2, a0, a2
> > +.Lzero_loop:
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(11, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(10, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(9, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> 
> I'm also wondering why we are using CBOZ_ALT past the first
> check. I don't see why it shouldn't just be a loop with a siz
> check like:
> 
> Lzero_loop:
> 	CBO_zero(a0)
> 	add	a0, a0, a1
> 	blge	a0, a2, .Lzero_done
> 	....
> 

Because then we wouldn't be implementing an unrolled loop :-)

> 
> If you wanted to do multiple CBO_zero(a0) then maybe testing
> and branching would be easier to allow a certain amount of
> loop unrolling.

I want to eliminate as many branches as possible. Alternatives
give me the ability to do that.

> 
> 
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBOZ_ALT(8, "bltu a0, a2, .Lzero_loop; ret", "nop; nop")
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	CBO_zero(a0)
> > +	add	a0, a0, a1
> > +	bltu	a0, a2, .Lzero_loop
> > +	ret
> > +.Lno_zicboz:
> > +	li	a1, 0
> > +	tail	__memset
> > +END(__clear_page)
> 
> Whilst this seems to work, I am not sure why and it probably wants
> more testing.

More testing is always a good idea, but I do know how it supposed to
work :-) Maybe I should add some more comments to make it more clear?

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-17 12:29         ` Andrew Jones
@ 2023-02-20 18:43           ` Ben Dooks
  -1 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-20 18:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On 17/02/2023 12:29, Andrew Jones wrote:
> On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
>> On 17/02/2023 10:18, Ben Dooks wrote:
>>> On 09/02/2023 15:26, Andrew Jones wrote:
>

[snip]

> Hi Ben,
> 
> I'll be sending a new version where I don't touch vendor-id anymore, but
> rather break errata_id into two parts: id and application-data.

Do you have an idea when v5 will be out, if it is this week I will
hold-off our internal tree rebase to change to v5.

Thanks!

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-20 18:43           ` Ben Dooks
  0 siblings, 0 replies; 48+ messages in thread
From: Ben Dooks @ 2023-02-20 18:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On 17/02/2023 12:29, Andrew Jones wrote:
> On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
>> On 17/02/2023 10:18, Ben Dooks wrote:
>>> On 09/02/2023 15:26, Andrew Jones wrote:
>

[snip]

> Hi Ben,
> 
> I'll be sending a new version where I don't touch vendor-id anymore, but
> rather break errata_id into two parts: id and application-data.

Do you have an idea when v5 will be out, if it is this week I will
hold-off our internal tree rebase to change to v5.

Thanks!

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
  2023-02-20 18:43           ` Ben Dooks
@ 2023-02-20 19:24             ` Andrew Jones
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-20 19:24 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Mon, Feb 20, 2023 at 06:43:40PM +0000, Ben Dooks wrote:
> On 17/02/2023 12:29, Andrew Jones wrote:
> > On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
> > > On 17/02/2023 10:18, Ben Dooks wrote:
> > > > On 09/02/2023 15:26, Andrew Jones wrote:
> > 
> 
> [snip]
> 
> > Hi Ben,
> > 
> > I'll be sending a new version where I don't touch vendor-id anymore, but
> > rather break errata_id into two parts: id and application-data.
> 
> Do you have an idea when v5 will be out, if it is this week I will
> hold-off our internal tree rebase to change to v5.

I'll bump it up in the priority queue and try to send something tomorrow
or Wednesday.

Thanks,
drew

> 
> Thanks!
> 
> -- 
> Ben Dooks				http://www.codethink.co.uk/
> Senior Engineer				Codethink - Providing Genius
> 
> https://www.codethink.co.uk/privacy.html
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available
@ 2023-02-20 19:24             ` Andrew Jones
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Jones @ 2023-02-20 19:24 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-riscv, kvm-riscv, devicetree, 'Anup Patel ',
	'Palmer Dabbelt ', 'Paul Walmsley ',
	'Krzysztof Kozlowski ', 'Atish Patra ',
	'Heiko Stuebner ', 'Jisheng Zhang ',
	'Rob Herring ', 'Albert Ou ',
	'Conor Dooley '

On Mon, Feb 20, 2023 at 06:43:40PM +0000, Ben Dooks wrote:
> On 17/02/2023 12:29, Andrew Jones wrote:
> > On Fri, Feb 17, 2023 at 10:50:07AM +0000, Ben Dooks wrote:
> > > On 17/02/2023 10:18, Ben Dooks wrote:
> > > > On 09/02/2023 15:26, Andrew Jones wrote:
> > 
> 
> [snip]
> 
> > Hi Ben,
> > 
> > I'll be sending a new version where I don't touch vendor-id anymore, but
> > rather break errata_id into two parts: id and application-data.
> 
> Do you have an idea when v5 will be out, if it is this week I will
> hold-off our internal tree rebase to change to v5.

I'll bump it up in the priority queue and try to send something tomorrow
or Wednesday.

Thanks,
drew

> 
> Thanks!
> 
> -- 
> Ben Dooks				http://www.codethink.co.uk/
> Senior Engineer				Codethink - Providing Genius
> 
> https://www.codethink.co.uk/privacy.html
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-02-20 19:24 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 15:26 [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page Andrew Jones
2023-02-09 15:26 ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 1/8] RISC-V: alternatives: Support patching multiple insns in assembly Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 18:02   ` Conor Dooley
2023-02-09 18:02     ` Conor Dooley
2023-02-09 15:26 ` [PATCH v4 2/8] RISC-V: Factor out body of riscv_init_cbom_blocksize loop Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 3/8] dt-bindings: riscv: Document cboz-block-size Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 4/8] RISC-V: Add Zicboz detection and block size parsing Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 5/8] RISC-V: cpufeature: Put vendor_id to work Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 19:04   ` Conor Dooley
2023-02-09 19:04     ` Conor Dooley
2023-02-10  7:58     ` Andrew Jones
2023-02-10  7:58       ` Andrew Jones
2023-02-10 20:29       ` Conor Dooley
2023-02-10 20:29         ` Conor Dooley
2023-02-12 16:26         ` Andrew Jones
2023-02-12 16:26           ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 19:09   ` Conor Dooley
2023-02-09 19:09     ` Conor Dooley
2023-02-10  8:05     ` Andrew Jones
2023-02-10  8:05       ` Andrew Jones
2023-02-10  9:04       ` Conor Dooley
2023-02-10  9:04         ` Conor Dooley
2023-02-17 10:18   ` Ben Dooks
2023-02-17 10:18     ` Ben Dooks
2023-02-17 10:50     ` Ben Dooks
2023-02-17 10:50       ` Ben Dooks
2023-02-17 12:29       ` Andrew Jones
2023-02-17 12:29         ` Andrew Jones
2023-02-20 18:43         ` Ben Dooks
2023-02-20 18:43           ` Ben Dooks
2023-02-20 19:24           ` Andrew Jones
2023-02-20 19:24             ` Andrew Jones
2023-02-17 12:44     ` Andrew Jones
2023-02-17 12:44       ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 7/8] RISC-V: KVM: Provide UAPI for Zicboz block size Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 15:26 ` [PATCH v4 8/8] RISC-V: KVM: Expose Zicboz to the guest Andrew Jones
2023-02-09 15:26   ` Andrew Jones
2023-02-09 17:45 ` [PATCH v4 0/8] RISC-V: Apply Zicboz to clear_page Conor Dooley
2023-02-09 17:45   ` Conor Dooley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.