All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Zbb string optimizations
@ 2023-01-13 21:22 Heiko Stuebner
  2023-01-13 21:23 ` [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Heiko Stuebner @ 2023-01-13 21:22 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This series still tries to allow optimized string functions for specific
extensions. The last approach of using an inline base function to hold
the alternative calls did cause some issues in a number of places

So instead of that we're now just using an alternative j at the beginning
of the generic function to jump to a separate place inside the function
itself.

This of course needs a fixup for "j" instructions in alternative blocks,
so that is provided here as well.

Technically patch4 got a review from Andrew, but that was still with
the inline approach, so I didn't bring it over to v4.

Dependencies:
- 6.2-rc1
- the series adding the call address fixes to alternatives (in Palmer's next)
  https://lore.kernel.org/r/20221223221332.4127602-1-heiko@sntech.de
- the patch adding the j address fix to alternatives
  https://lore.kernel.org/r/20230113212205.3534622-1-heiko@sntech.de
- Conor's isa extension reordering series
  https://lore.kernel.org/r/20221205144525.2148448-1-conor.dooley@microchip.com


changes since v4:
- split out the now shared alternatives j-address-fixup
- split out the somewhat unrelated isn-func move patch
- follow Andrew's great suggestions on making the assembly nicer
- put the Zbb extension into the correct place after Conor's reordering

changes since v3:
- rebase on top of 6.2-rc1 + the applied alternative-call series
- add alternative fixup for jal instructions
- drop the inline functions and instead just jump

changes since v2:
- add patch fixing the c.jalr funct4 value
- reword some commit messages
- fix position of auipc addition patch (earlier)
- fix compile errors from patch-reordering gone wrong
  (worked at the end of v2, but compiling individual patches
   caused issues) - patches are now tested individually
- limit Zbb variants for GNU as for now
  (LLVM support for .option arch is still under review)
- prevent str-functions from getting optimized to builtin-variants

changes since v1:
- a number of generalizations/cleanups for instruction parsing
- use accessor function to access instructions (Emil)
- actually patch the correct location when having more than one
  instruction in an alternative block
- string function cleanups (comments etc) (Conor)
- move zbb extension above s* extensions in cpu.c lists

changes since rfc:
- make Zbb code actually work
- drop some unneeded patches
- a lot of cleanups

Heiko Stuebner (2):
  RISC-V: add infrastructure to allow different str* implementations
  RISC-V: add zbb support to string functions

 arch/riscv/Kconfig                   |  24 +++++
 arch/riscv/include/asm/errata_list.h |   3 +-
 arch/riscv/include/asm/hwcap.h       |   1 +
 arch/riscv/include/asm/string.h      |  10 ++
 arch/riscv/kernel/cpu.c              |   1 +
 arch/riscv/kernel/cpufeature.c       |  18 ++++
 arch/riscv/kernel/riscv_ksyms.c      |   3 +
 arch/riscv/lib/Makefile              |   3 +
 arch/riscv/lib/strcmp.S              | 121 +++++++++++++++++++++++
 arch/riscv/lib/strlen.S              | 133 +++++++++++++++++++++++++
 arch/riscv/lib/strncmp.S             | 139 +++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile        |  13 +++
 12 files changed, 468 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strncmp.S

-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations
  2023-01-13 21:22 [PATCH v5 0/2] Zbb string optimizations Heiko Stuebner
@ 2023-01-13 21:23 ` Heiko Stuebner
  2023-01-13 21:59   ` Conor Dooley
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Heiko Stuebner @ 2023-01-13 21:23 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.

This adds basic infrastructure to allow patching the function calls
via alternatives later on.

The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.

The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.

This doesn't override the compiler's use of builtin replacements. So still
first of all the compiler will select if a builtin will be better suitable
i.e. for known strings. For all regular cases we will want to later
select possible optimized variants and in the worst case fall back to the
generic implemention added with this change.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/string.h | 10 ++++++++
 arch/riscv/kernel/riscv_ksyms.c |  3 +++
 arch/riscv/lib/Makefile         |  3 +++
 arch/riscv/lib/strcmp.S         | 36 +++++++++++++++++++++++++++++
 arch/riscv/lib/strlen.S         | 28 ++++++++++++++++++++++
 arch/riscv/lib/strncmp.S        | 41 +++++++++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile   | 13 +++++++++++
 7 files changed, 134 insertions(+)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strncmp.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 909049366555..a96b1fea24fe 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
 #define __HAVE_ARCH_MEMMOVE
 extern asmlinkage void *memmove(void *, const void *, size_t);
 extern asmlinkage void *__memmove(void *, const void *, size_t);
+
+#define __HAVE_ARCH_STRCMP
+extern asmlinkage int strcmp(const char *cs, const char *ct);
+
+#define __HAVE_ARCH_STRLEN
+extern asmlinkage __kernel_size_t strlen(const char *);
+
+#define __HAVE_ARCH_STRNCMP
+extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
+
 /* For those files which don't want to check by kasan. */
 #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
index 5ab1c7e1a6ed..a72879b4249a 100644
--- a/arch/riscv/kernel/riscv_ksyms.c
+++ b/arch/riscv/kernel/riscv_ksyms.c
@@ -12,6 +12,9 @@
 EXPORT_SYMBOL(memset);
 EXPORT_SYMBOL(memcpy);
 EXPORT_SYMBOL(memmove);
+EXPORT_SYMBOL(strcmp);
+EXPORT_SYMBOL(strlen);
+EXPORT_SYMBOL(strncmp);
 EXPORT_SYMBOL(__memset);
 EXPORT_SYMBOL(__memcpy);
 EXPORT_SYMBOL(__memmove);
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 25d5c9664e57..6c74b0bedd60 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -3,6 +3,9 @@ lib-y			+= delay.o
 lib-y			+= memcpy.o
 lib-y			+= memset.o
 lib-y			+= memmove.o
+lib-y			+= strcmp.o
+lib-y			+= strlen.o
+lib-y			+= strncmp.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
new file mode 100644
index 000000000000..8babd712b958
--- /dev/null
+++ b/arch/riscv/lib/strcmp.S
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strcmp(const char *cs, const char *ct) */
+SYM_FUNC_START(strcmp)
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strcmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *
+	 * Clobbers
+	 *   t0, t1
+	 */
+1:
+	lbu	t0, 0(a0)
+	lbu	t1, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	bne	t0, t1, 2f
+	bnez	t0, 1b
+	li	a0, 0
+	ret
+2:
+	/*
+	 * strcmp only needs to return (< 0, 0, > 0) values
+	 * not necessarily -1, 0, +1
+	 */
+	sub	a0, t0, t1
+	ret
+SYM_FUNC_END(strcmp)
diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
new file mode 100644
index 000000000000..0a3b11853efd
--- /dev/null
+++ b/arch/riscv/lib/strlen.S
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strlen(const char *s) */
+SYM_FUNC_START(strlen)
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers:
+	 *   t0, t1
+	 */
+	mv	t1, a0
+1:
+	lbu	t0, 0(t1)
+	beqz	t0, 2f
+	addi	t1, t1, 1
+	j	1b
+2:
+	sub	a0, t1, a0
+	ret
+SYM_FUNC_END(strlen)
diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
new file mode 100644
index 000000000000..1f644d0a93f6
--- /dev/null
+++ b/arch/riscv/lib/strncmp.S
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strncmp(const char *cs, const char *ct, size_t count) */
+SYM_FUNC_START(strncmp)
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2
+	 */
+	li	t2, 0
+1:
+	beq	a2, t2, 2f
+	lbu	t0, 0(a0)
+	lbu	t1, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	bne	t0, t1, 3f
+	addi	t2, t2, 1
+	bnez	t0, 1b
+2:
+	li	a0, 0
+	ret
+3:
+	/*
+	 * strncmp only needs to return (< 0, 0, > 0) values
+	 * not necessarily -1, 0, +1
+	 */
+	sub	a0, t0, t1
+	ret
+SYM_FUNC_END(strncmp)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index dd58e1d99397..d16bf715a586 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD := y
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
+purgatory-y += strcmp.o strlen.o strncmp.o
 
 targets += $(purgatory-y)
 PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
@@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
 $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
 	$(call if_changed_rule,as_o_S)
 
+$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
@@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
 AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
 
 $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
 		$(call if_changed,ld)
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/2] RISC-V: add zbb support to string functions
  2023-01-13 21:22 [PATCH v5 0/2] Zbb string optimizations Heiko Stuebner
  2023-01-13 21:23 ` [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2023-01-13 21:23 ` Heiko Stuebner
  2023-01-13 22:33   ` Conor Dooley
                     ` (3 more replies)
  2023-02-02 23:39 ` [PATCH v5 0/2] Zbb string optimizations Palmer Dabbelt
  2023-02-02 23:40 ` patchwork-bot+linux-riscv
  3 siblings, 4 replies; 11+ messages in thread
From: Heiko Stuebner @ 2023-01-13 21:23 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.

Support for the Zbb-str-variants is limited to the GNU-assembler
for now, as LLVM has not yet acquired the functionality to
selectively change the arch option in assembler code.
This is still under review at
    https://reviews.llvm.org/D123515

Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/Kconfig                   |  24 ++++++
 arch/riscv/include/asm/errata_list.h |   3 +-
 arch/riscv/include/asm/hwcap.h       |   1 +
 arch/riscv/kernel/cpu.c              |   1 +
 arch/riscv/kernel/cpufeature.c       |  18 +++++
 arch/riscv/lib/strcmp.S              |  85 ++++++++++++++++++++++
 arch/riscv/lib/strlen.S              | 105 +++++++++++++++++++++++++++
 arch/riscv/lib/strncmp.S             |  98 +++++++++++++++++++++++++
 8 files changed, 334 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..7c814fbf9527 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
 
 	   If you don't know what to do here, say Y.
 
+config TOOLCHAIN_HAS_ZBB
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
+	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
+	depends on AS_IS_GNU
+
+config RISCV_ISA_ZBB
+	bool "Zbb extension support for bit manipulation instructions"
+	depends on TOOLCHAIN_HAS_ZBB
+	depends on !XIP_KERNEL && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZBB
+	   extension (basic bit manipulation) and enable its usage.
+
+	   The Zbb extension provides instructions to accelerate a number
+	   of bit-specific operations (count bit population, sign extending,
+	   bitrotation, etc).
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZICBOM
 	bool
 	default y
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4180312d2a70..95e626b7281e 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -24,7 +24,8 @@
 
 #define	CPUFEATURE_SVPBMT 0
 #define	CPUFEATURE_ZICBOM 1
-#define	CPUFEATURE_NUMBER 2
+#define	CPUFEATURE_ZBB 2
+#define	CPUFEATURE_NUMBER 3
 
 #ifdef __ASSEMBLY__
 
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 57439da71c77..462d6cde9bac 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -58,6 +58,7 @@ enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_SSTC,
 	RISCV_ISA_EXT_SVINVAL,
 	RISCV_ISA_EXT_SVPBMT,
+	RISCV_ISA_EXT_ZBB,
 	RISCV_ISA_EXT_ZICBOM,
 	RISCV_ISA_EXT_ZIHINTPAUSE,
 	RISCV_ISA_EXT_ID_MAX
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 0bf1c7f663fc..420228e219f7 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -185,6 +185,7 @@ arch_initcall(riscv_cpuinfo_init);
  * New entries to this struct should follow the ordering rules described above.
  */
 static struct riscv_isa_ext_data isa_ext_arr[] = {
+	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index dde0e91d7668..9899806cef29 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -227,6 +227,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
 				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
+				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 			}
@@ -302,6 +303,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
 	return true;
 }
 
+static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
+{
+	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
+		return false;
+
+	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
+		return false;
+
+	if (!riscv_isa_extension_available(NULL, ZBB))
+		return false;
+
+	return true;
+}
+
 /*
  * Probe presence of individual extensions.
  *
@@ -320,6 +335,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
 	if (cpufeature_probe_zicbom(stage))
 		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
 
+	if (cpufeature_probe_zbb(stage))
+		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
+
 	return cpu_req_feature;
 }
 
diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
index 8babd712b958..8148b6418f61 100644
--- a/arch/riscv/lib/strcmp.S
+++ b/arch/riscv/lib/strcmp.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strcmp(const char *cs, const char *ct) */
 SYM_FUNC_START(strcmp)
+
+	ALTERNATIVE("nop", "j strcmp_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - comparison result, value like strcmp
@@ -33,4 +38,84 @@ SYM_FUNC_START(strcmp)
 	 */
 	sub	a0, t0, t1
 	ret
+
+/*
+ * Variant of strcmp using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+strcmp_zbb:
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strcmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5
+	 */
+
+	or	t2, a0, a1
+	li	t4, -1
+	and	t2, t2, SZREG-1
+	bnez	t2, 3f
+
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	REG_L	t0, 0(a0)
+	REG_L	t1, 0(a1)
+	orc.b	t3, t0
+	bne	t3, t4, 2f
+	addi	a0, a0, SZREG
+	addi	a1, a1, SZREG
+	beq	t0, t1, 1b
+
+	/*
+	 * Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.
+	 */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	t0, t0
+	rev8	t1, t1
+#endif
+
+	/* Synthesize (t0 >= t1) ? 1 : -1 in a branchless sequence. */
+	sltu	a0, t0, t1
+	neg	a0, a0
+	ori	a0, a0, 1
+	ret
+
+2:
+	/*
+	 * Found a null byte.
+	 * If words don't match, fall back to simple loop.
+	 */
+	bne	t0, t1, 3f
+
+	/* Otherwise, strings are equal. */
+	li	a0, 0
+	ret
+
+	/* Simple loop for misaligned strings. */
+	.p2align 3
+3:
+	lbu	t0, 0(a0)
+	lbu	t1, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	bne	t0, t1, 4f
+	bnez	t0, 3b
+
+4:
+	sub	a0, t0, t1
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strcmp)
diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
index 0a3b11853efd..0f9dbf93301a 100644
--- a/arch/riscv/lib/strlen.S
+++ b/arch/riscv/lib/strlen.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strlen(const char *s) */
 SYM_FUNC_START(strlen)
+
+	ALTERNATIVE("nop", "j strlen_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - string length
@@ -25,4 +30,104 @@ SYM_FUNC_START(strlen)
 2:
 	sub	a0, t1, a0
 	ret
+
+/*
+ * Variant of strlen using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+strlen_zbb:
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+# define CZ	clz
+# define SHIFT	sll
+#else
+# define CZ	ctz
+# define SHIFT	srl
+#endif
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3
+	 */
+
+	/* Number of irrelevant bytes in the first word. */
+	andi	t2, a0, SZREG-1
+
+	/* Align pointer. */
+	andi	t0, a0, -SZREG
+
+	li	t3, SZREG
+	sub	t3, t3, t2
+	slli	t2, t2, 3
+
+	/* Get the first word.  */
+	REG_L	t1, 0(t0)
+
+	/*
+	 * Shift away the partial data we loaded to remove the irrelevant bytes
+	 * preceding the string with the effect of adding NUL bytes at the
+	 * end of the string's first word.
+	 */
+	SHIFT	t1, t1, t2
+
+	/* Convert non-NUL into 0xff and NUL into 0x00. */
+	orc.b	t1, t1
+
+	/* Convert non-NUL into 0x00 and NUL into 0xff. */
+	not	t1, t1
+
+	/*
+	 * Search for the first set bit (corresponding to a NUL byte in the
+	 * original chunk).
+	 */
+	CZ	t1, t1
+
+	/*
+	 * The first chunk is special: compare against the number
+	 * of valid bytes in this chunk.
+	 */
+	srli	a0, t1, 3
+	bgtu	t3, a0, 3f
+
+	/* Prepare for the word comparison loop. */
+	addi	t2, t0, SZREG
+	li	t3, -1
+
+	/*
+	 * Our critical loop is 4 instructions and processes data in
+	 * 4 byte or 8 byte chunks.
+	 */
+	.p2align 3
+1:
+	REG_L	t1, SZREG(t0)
+	addi	t0, t0, SZREG
+	orc.b	t1, t1
+	beq	t1, t3, 1b
+2:
+	not	t1, t1
+	CZ	t1, t1
+
+	/* Get number of processed words.  */
+	sub	t2, t0, t2
+
+	/* Add number of characters in the first word.  */
+	add	a0, a0, t2
+	srli	t1, t1, 3
+
+	/* Add number of characters in the last word.  */
+	add	a0, a0, t1
+3:
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strlen)
diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
index 1f644d0a93f6..7940ddab2d48 100644
--- a/arch/riscv/lib/strncmp.S
+++ b/arch/riscv/lib/strncmp.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strncmp(const char *cs, const char *ct, size_t count) */
 SYM_FUNC_START(strncmp)
+
+	ALTERNATIVE("nop", "j strncmp_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - comparison result, value like strncmp
@@ -38,4 +43,97 @@ SYM_FUNC_START(strncmp)
 	 */
 	sub	a0, t0, t1
 	ret
+
+/*
+ * Variant of strncmp using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+strncmp_zbb:
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5, t6
+	 */
+
+	or	t2, a0, a1
+	li	t5, -1
+	and	t2, t2, SZREG-1
+	add	t4, a0, a2
+	bnez	t2, 4f
+
+	/* Adjust limit for fast-path.  */
+	andi	t6, t4, -SZREG
+
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	bgt	a0, t6, 3f
+	REG_L	t0, 0(a0)
+	REG_L	t1, 0(a1)
+	orc.b	t3, t0
+	bne	t3, t5, 2f
+	addi	a0, a0, SZREG
+	addi	a1, a1, SZREG
+	beq	t0, t1, 1b
+
+	/*
+	 * Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.
+	 */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	t0, t0
+	rev8	t1, t1
+#endif
+
+	/* Synthesize (t0 >= t1) ? 1 : -1 in a branchless sequence.  */
+	sltu	a0, t0, t1
+	neg	a0, a0
+	ori	a0, a0, 1
+	ret
+
+2:
+	/*
+	 * Found a null byte.
+	 * If words don't match, fall back to simple loop.
+	 */
+	bne	t0, t1, 3f
+
+	/* Otherwise, strings are equal.  */
+	li	a0, 0
+	ret
+
+	/* Simple loop for misaligned strings.  */
+3:
+	/* Restore limit for slow-path.  */
+	.p2align 3
+4:
+	bge	a0, t4, 6f
+	lbu	t0, 0(a0)
+	lbu	t1, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	bne	t0, t1, 5f
+	bnez	t0, 4b
+
+5:
+	sub	a0, t0, t1
+	ret
+
+6:
+	li	a0, 0
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strncmp)
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations
  2023-01-13 21:23 ` [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2023-01-13 21:59   ` Conor Dooley
  0 siblings, 0 replies; 11+ messages in thread
From: Conor Dooley @ 2023-01-13 21:59 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 1961 bytes --]

Hey Heiko,

On Fri, Jan 13, 2023 at 10:23:00PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Depending on supported extensions on specific RISC-V cores,
> optimized str* functions might make sense.
> 
> This adds basic infrastructure to allow patching the function calls
> via alternatives later on.
> 
> The Linux kernel provides standard implementations for string functions
> but when architectures want to extend them, they need to provide their
> own.
> 
> The added generic string functions are done in assembler (taken from
> disassembling the main-kernel functions for now) to allow us to control
> the used registers and extend them with optimized variants.
> 


> This doesn't override the compiler's use of builtin replacements. So still
> first of all the compiler will select if a builtin will be better suitable
> i.e. for known strings. For all regular cases we will want to later
  ^
i.e. suggests that this will only happen for known strings.

> select possible optimized variants and in the worst case fall back to the
> generic implemention added with this change.
          ^ typo btw

I'm not expecting a resend for this, that'd be stupid, but does the
following say the same thing? (What you wrote was a bit -ENOPARSE)

"This doesn't override the compiler's use of builtin replacements. The
compiler will still select a builtin if it will be better suited, e.g.
for known strings. For all regular cases, we may want to select possibly
optimized variants, and in the worst case fall back to the generic
implementation added with this change."

I'm a philistine who was okay with previous version of the asm, so you
could've kept the:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
I had a look and it seemed to match what Drew suggested.
Looking at the b4 diff output and seeing both side by side, the new one
does look nicer..

Thanks,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] RISC-V: add zbb support to string functions
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
@ 2023-01-13 22:33   ` Conor Dooley
  2023-01-17 12:24   ` Andrew Jones
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Conor Dooley @ 2023-01-13 22:33 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 946 bytes --]

On Fri, Jan 13, 2023 at 10:23:01PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515
> 
> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>

Had a look through, again mostly examining b4 diff...
The changes look to match the suggestions on v4, but I was happy with
the un-optimised version so I have no taste.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Thanks,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] RISC-V: add zbb support to string functions
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
  2023-01-13 22:33   ` Conor Dooley
@ 2023-01-17 12:24   ` Andrew Jones
  2023-01-20 15:02   ` Andrew Jones
  2023-02-12 15:47   ` Guenter Roeck
  3 siblings, 0 replies; 11+ messages in thread
From: Andrew Jones @ 2023-01-17 12:24 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Fri, Jan 13, 2023 at 10:23:01PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515
> 
> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/Kconfig                   |  24 ++++++
>  arch/riscv/include/asm/errata_list.h |   3 +-
>  arch/riscv/include/asm/hwcap.h       |   1 +
>  arch/riscv/kernel/cpu.c              |   1 +
>  arch/riscv/kernel/cpufeature.c       |  18 +++++
>  arch/riscv/lib/strcmp.S              |  85 ++++++++++++++++++++++
>  arch/riscv/lib/strlen.S              | 105 +++++++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S             |  98 +++++++++++++++++++++++++
>  8 files changed, 334 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..7c814fbf9527 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
>  
>  	   If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBB
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> +	depends on AS_IS_GNU
> +
> +config RISCV_ISA_ZBB
> +	bool "Zbb extension support for bit manipulation instructions"
> +	depends on TOOLCHAIN_HAS_ZBB
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBB
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zbb extension provides instructions to accelerate a number
> +	   of bit-specific operations (count bit population, sign extending,
> +	   bitrotation, etc).
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZICBOM
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..95e626b7281e 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -24,7 +24,8 @@
>  
>  #define	CPUFEATURE_SVPBMT 0
>  #define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_ZBB 2
> +#define	CPUFEATURE_NUMBER 3
>  
>  #ifdef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 57439da71c77..462d6cde9bac 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -58,6 +58,7 @@ enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_SSTC,
>  	RISCV_ISA_EXT_SVINVAL,
>  	RISCV_ISA_EXT_SVPBMT,
> +	RISCV_ISA_EXT_ZBB,
>  	RISCV_ISA_EXT_ZICBOM,
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
>  	RISCV_ISA_EXT_ID_MAX
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 0bf1c7f663fc..420228e219f7 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -185,6 +185,7 @@ arch_initcall(riscv_cpuinfo_init);
>   * New entries to this struct should follow the ordering rules described above.
>   */
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
> +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),

Zb* comes after Zi* according 27.11 "Subset Naming Convention". I can hear
you groaning!

The other lists are OK, because we decided to just keep those in
alphabetical order.

>  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index dde0e91d7668..9899806cef29 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -227,6 +227,7 @@ void __init riscv_fill_hwcap(void)
>  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>  				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
>  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
>  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>  			}
> @@ -302,6 +303,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
>  	return true;
>  }
>  
> +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	if (!riscv_isa_extension_available(NULL, ZBB))
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * Probe presence of individual extensions.
>   *
> @@ -320,6 +335,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  	if (cpufeature_probe_zicbom(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
>  
> +	if (cpufeature_probe_zbb(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> +
>  	return cpu_req_feature;
>  }
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> index 8babd712b958..8148b6418f61 100644
> --- a/arch/riscv/lib/strcmp.S
> +++ b/arch/riscv/lib/strcmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strcmp(const char *cs, const char *ct) */
>  SYM_FUNC_START(strcmp)
> +
> +	ALTERNATIVE("nop", "j strcmp_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strcmp
> @@ -33,4 +38,84 @@ SYM_FUNC_START(strcmp)
>  	 */
>  	sub	a0, t0, t1
>  	ret
> +
> +/*
> + * Variant of strcmp using the ZBB extension if available

Could call out the source of this being Appendix A of the bitman manual.
Same suggestion for the other variant headers comments as well.

> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +strcmp_zbb:
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strcmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5

t5 isn't used

> +	 */
> +
> +	or	t2, a0, a1
> +	li	t4, -1
> +	and	t2, t2, SZREG-1
> +	bnez	t2, 3f
> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	REG_L	t0, 0(a0)
> +	REG_L	t1, 0(a1)
> +	orc.b	t3, t0
> +	bne	t3, t4, 2f
> +	addi	a0, a0, SZREG
> +	addi	a1, a1, SZREG
> +	beq	t0, t1, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	t0, t0
> +	rev8	t1, t1
> +#endif
> +
> +	/* Synthesize (t0 >= t1) ? 1 : -1 in a branchless sequence. */
> +	sltu	a0, t0, t1
> +	neg	a0, a0
> +	ori	a0, a0, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	t0, t1, 3f
> +
> +	/* Otherwise, strings are equal. */
> +	li	a0, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings. */
> +	.p2align 3
> +3:
> +	lbu	t0, 0(a0)
> +	lbu	t1, 0(a1)
> +	addi	a0, a0, 1
> +	addi	a1, a1, 1
> +	bne	t0, t1, 4f
> +	bnez	t0, 3b
> +
> +4:
> +	sub	a0, t0, t1
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strcmp)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> index 0a3b11853efd..0f9dbf93301a 100644
> --- a/arch/riscv/lib/strlen.S
> +++ b/arch/riscv/lib/strlen.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strlen(const char *s) */
>  SYM_FUNC_START(strlen)
> +
> +	ALTERNATIVE("nop", "j strlen_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - string length
> @@ -25,4 +30,104 @@ SYM_FUNC_START(strlen)
>  2:
>  	sub	a0, t1, a0
>  	ret
> +
> +/*
> + * Variant of strlen using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +strlen_zbb:
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +# define CZ	clz
> +# define SHIFT	sll
> +#else
> +# define CZ	ctz
> +# define SHIFT	srl
> +#endif
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3
> +	 */
> +
> +	/* Number of irrelevant bytes in the first word. */
> +	andi	t2, a0, SZREG-1
> +
> +	/* Align pointer. */
> +	andi	t0, a0, -SZREG
> +
> +	li	t3, SZREG
> +	sub	t3, t3, t2
> +	slli	t2, t2, 3
> +
> +	/* Get the first word.  */
> +	REG_L	t1, 0(t0)
> +
> +	/*
> +	 * Shift away the partial data we loaded to remove the irrelevant bytes
> +	 * preceding the string with the effect of adding NUL bytes at the
> +	 * end of the string's first word.
> +	 */
> +	SHIFT	t1, t1, t2
> +
> +	/* Convert non-NUL into 0xff and NUL into 0x00. */
> +	orc.b	t1, t1
> +
> +	/* Convert non-NUL into 0x00 and NUL into 0xff. */
> +	not	t1, t1
> +
> +	/*
> +	 * Search for the first set bit (corresponding to a NUL byte in the
> +	 * original chunk).
> +	 */
> +	CZ	t1, t1
> +
> +	/*
> +	 * The first chunk is special: compare against the number
> +	 * of valid bytes in this chunk.
> +	 */
> +	srli	a0, t1, 3
> +	bgtu	t3, a0, 3f
> +
> +	/* Prepare for the word comparison loop. */
> +	addi	t2, t0, SZREG
> +	li	t3, -1
> +
> +	/*
> +	 * Our critical loop is 4 instructions and processes data in
> +	 * 4 byte or 8 byte chunks.
> +	 */
> +	.p2align 3
> +1:
> +	REG_L	t1, SZREG(t0)
> +	addi	t0, t0, SZREG
> +	orc.b	t1, t1
> +	beq	t1, t3, 1b
> +2:

This 2 label is never used so it can be removed. I see the appendix also
has an unused label here, .Lepilogue, which could also be removed.

> +	not	t1, t1
> +	CZ	t1, t1
> +
> +	/* Get number of processed words.  */
                                   ^ bytes

> +	sub	t2, t0, t2
> +
> +	/* Add number of characters in the first word.  */
> +	add	a0, a0, t2
> +	srli	t1, t1, 3

I'd move this shift up under the CZ a few lines above or down under the
next comment since it's not related to what the comment above it says.

> +
> +	/* Add number of characters in the last word.  */
> +	add	a0, a0, t1
> +3:
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strlen)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> index 1f644d0a93f6..7940ddab2d48 100644
> --- a/arch/riscv/lib/strncmp.S
> +++ b/arch/riscv/lib/strncmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strncmp(const char *cs, const char *ct, size_t count) */
>  SYM_FUNC_START(strncmp)
> +
> +	ALTERNATIVE("nop", "j strncmp_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strncmp
> @@ -38,4 +43,97 @@ SYM_FUNC_START(strncmp)
>  	 */
>  	sub	a0, t0, t1
>  	ret
> +
> +/*
> + * Variant of strncmp using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +strncmp_zbb:
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5, t6
> +	 */
> +
> +	or	t2, a0, a1
> +	li	t5, -1
> +	and	t2, t2, SZREG-1
> +	add	t4, a0, a2
> +	bnez	t2, 4f
> +
> +	/* Adjust limit for fast-path.  */
> +	andi	t6, t4, -SZREG
> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	bgt	a0, t6, 3f
> +	REG_L	t0, 0(a0)
> +	REG_L	t1, 0(a1)
> +	orc.b	t3, t0
> +	bne	t3, t5, 2f
> +	addi	a0, a0, SZREG
> +	addi	a1, a1, SZREG
> +	beq	t0, t1, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	t0, t0
> +	rev8	t1, t1
> +#endif
> +
> +	/* Synthesize (t0 >= t1) ? 1 : -1 in a branchless sequence.  */
> +	sltu	a0, t0, t1
> +	neg	a0, a0
> +	ori	a0, a0, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	t0, t1, 3f
> +
> +	/* Otherwise, strings are equal.  */
> +	li	a0, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings.  */
> +3:

This label isn't any different than the next, label 4, so it should be
removed.


> +	/* Restore limit for slow-path.  */

nit: We're not really "restoring" anything, but rather using limit which
is saved in a different register.

> +	.p2align 3
> +4:
> +	bge	a0, t4, 6f
> +	lbu	t0, 0(a0)
> +	lbu	t1, 0(a1)
> +	addi	a0, a0, 1
> +	addi	a1, a1, 1
> +	bne	t0, t1, 5f
> +	bnez	t0, 4b
> +
> +5:
> +	sub	a0, t0, t1
> +	ret
> +
> +6:
> +	li	a0, 0
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strncmp)
> -- 
> 2.35.1
>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] RISC-V: add zbb support to string functions
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
  2023-01-13 22:33   ` Conor Dooley
  2023-01-17 12:24   ` Andrew Jones
@ 2023-01-20 15:02   ` Andrew Jones
  2023-02-12 15:47   ` Guenter Roeck
  3 siblings, 0 replies; 11+ messages in thread
From: Andrew Jones @ 2023-01-20 15:02 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Fri, Jan 13, 2023 at 10:23:01PM +0100, Heiko Stuebner wrote:
...
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> index 8babd712b958..8148b6418f61 100644
> --- a/arch/riscv/lib/strcmp.S
> +++ b/arch/riscv/lib/strcmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strcmp(const char *cs, const char *ct) */
>  SYM_FUNC_START(strcmp)
> +
> +	ALTERNATIVE("nop", "j strcmp_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +

I have something similar for Zicboz (which I've just reread as I'm
preparing v2). The difference is that I opted to penalize the non-
optimized version with the unconditional jump and give the optimized
version the nop. To do that here, it'd just need the label changed to
strcmp_basic or whatever, push the "basic" code down into it the new
label, and then put the new zbb code here.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/2] Zbb string optimizations
  2023-01-13 21:22 [PATCH v5 0/2] Zbb string optimizations Heiko Stuebner
  2023-01-13 21:23 ` [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
@ 2023-02-02 23:39 ` Palmer Dabbelt
  2023-02-03  9:53   ` Andrew Jones
  2023-02-02 23:40 ` patchwork-bot+linux-riscv
  3 siblings, 1 reply; 11+ messages in thread
From: Palmer Dabbelt @ 2023-02-02 23:39 UTC (permalink / raw)
  To: Heiko Stuebner, linux-riscv, Palmer Dabbelt
  Cc: ajones, jszhang, Conor Dooley, Heiko Stuebner,
	christoph.muellner, philipp.tomsich

On Fri, 13 Jan 2023 22:22:59 +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series still tries to allow optimized string functions for specific
> extensions. The last approach of using an inline base function to hold
> the alternative calls did cause some issues in a number of places
> 
> So instead of that we're now just using an alternative j at the beginning
> of the generic function to jump to a separate place inside the function
> itself.
> 
> [...]

Applied, thanks!

[1/2] RISC-V: add infrastructure to allow different str* implementations
      https://git.kernel.org/palmer/c/56e0790c7f9e
[2/2] RISC-V: add zbb support to string functions
      https://git.kernel.org/palmer/c/b6fcdb191e36

Best regards,
-- 
Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/2] Zbb string optimizations
  2023-01-13 21:22 [PATCH v5 0/2] Zbb string optimizations Heiko Stuebner
                   ` (2 preceding siblings ...)
  2023-02-02 23:39 ` [PATCH v5 0/2] Zbb string optimizations Palmer Dabbelt
@ 2023-02-02 23:40 ` patchwork-bot+linux-riscv
  3 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+linux-riscv @ 2023-02-02 23:40 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, jszhang, heiko.stuebner

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Fri, 13 Jan 2023 22:22:59 +0100 you wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series still tries to allow optimized string functions for specific
> extensions. The last approach of using an inline base function to hold
> the alternative calls did cause some issues in a number of places
> 
> So instead of that we're now just using an alternative j at the beginning
> of the generic function to jump to a separate place inside the function
> itself.
> 
> [...]

Here is the summary with links:
  - [v5,1/2] RISC-V: add infrastructure to allow different str* implementations
    https://git.kernel.org/riscv/c/56e0790c7f9e
  - [v5,2/2] RISC-V: add zbb support to string functions
    https://git.kernel.org/riscv/c/b6fcdb191e36

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/2] Zbb string optimizations
  2023-02-02 23:39 ` [PATCH v5 0/2] Zbb string optimizations Palmer Dabbelt
@ 2023-02-03  9:53   ` Andrew Jones
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Jones @ 2023-02-03  9:53 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Heiko Stuebner, linux-riscv, Palmer Dabbelt, jszhang,
	Conor Dooley, Heiko Stuebner, christoph.muellner,
	philipp.tomsich

On Thu, Feb 02, 2023 at 03:39:09PM -0800, Palmer Dabbelt wrote:
> On Fri, 13 Jan 2023 22:22:59 +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > This series still tries to allow optimized string functions for specific
> > extensions. The last approach of using an inline base function to hold
> > the alternative calls did cause some issues in a number of places
> > 
> > So instead of that we're now just using an alternative j at the beginning
> > of the generic function to jump to a separate place inside the function
> > itself.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [1/2] RISC-V: add infrastructure to allow different str* implementations
>       https://git.kernel.org/palmer/c/56e0790c7f9e
> [2/2] RISC-V: add zbb support to string functions
>       https://git.kernel.org/palmer/c/b6fcdb191e36

I had a few more comments on this version which I was hoping Heiko would
respin for. Some where just nits, but the extra labels (which don't hurt,
but are confusing) and the s/words/bytes/ comment change, which also
doesn't affect functionality, but causes confusion, would be good pick
up. With those changes, I would have given it an r-b.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/2] RISC-V: add zbb support to string functions
  2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
                     ` (2 preceding siblings ...)
  2023-01-20 15:02   ` Andrew Jones
@ 2023-02-12 15:47   ` Guenter Roeck
  3 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2023-02-12 15:47 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, jszhang, Heiko Stuebner

Hi,

On Fri, Jan 13, 2023 at 10:23:01PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515
> 
> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>

With this patch in linux-next, the qemu "virt" emulation no longer finds
a root filesystem. I don't know how it is related, but the problem is
seen for all types of devices I tried to boot from. Booting from initrd
still works.

There are also repeated

WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6

in the log, but that appears to be unrelated.

Guenter

---
bisect log:

# bad: [6ba8a227fd19d19779005fb66ad7562608e1df83] Add linux-next specific files for 20230210
# good: [4ec5183ec48656cec489c49f989c508b68b518e3] Linux 6.2-rc7
git bisect start 'HEAD' 'v6.2-rc7'
# bad: [94613f0efc69ed41f9229ef5c294db3ec37145da] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad 94613f0efc69ed41f9229ef5c294db3ec37145da
# bad: [8928ece68de4371dc6e1d3336d3029c1e18ae3b4] Merge branch 'for_next' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
git bisect bad 8928ece68de4371dc6e1d3336d3029c1e18ae3b4
# good: [78a9f460e33d103256f3abb38f02f4759710c7dc] soc: document merges
git bisect good 78a9f460e33d103256f3abb38f02f4759710c7dc
# good: [b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git
git bisect good b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619
# bad: [57a87a64b520c37dd49f5fde84d09a4adb391180] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
git bisect bad 57a87a64b520c37dd49f5fde84d09a4adb391180
# good: [cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581] Merge branch 'clk-next' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
git bisect good cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581
# good: [6acecfa485d3de955c35a18730c106ddf1e7600e] powerpc/kcsan: Add KCSAN Support
git bisect good 6acecfa485d3de955c35a18730c106ddf1e7600e
# good: [8a16dea453dbc3e834b162640028e505882cd11e] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git bisect good 8a16dea453dbc3e834b162640028e505882cd11e
# good: [6be1ff430dab9fc047762b10b2c9669399ea1f37] riscv: pgtable: Fixup comment for KERN_VIRT_SIZE
git bisect good 6be1ff430dab9fc047762b10b2c9669399ea1f37
# good: [e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f] riscv: module: move find_section to module.h
git bisect good e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f
# good: [e8ad17d2b5f38e595d597a3e2419d6d7cc727b17] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
git bisect good e8ad17d2b5f38e595d597a3e2419d6d7cc727b17
# bad: [75ab93a244a516d1d3c03c4e27d5d0deff76ebfb] Merge patch series "Zbb string optimizations"
git bisect bad 75ab93a244a516d1d3c03c4e27d5d0deff76ebfb
# bad: [b6fcdb191e36f82336f9b5e126d51c02e7323480] RISC-V: add zbb support to string functions
git bisect bad b6fcdb191e36f82336f9b5e126d51c02e7323480
# good: [56e0790c7f9e59ba6a0f4b59981d1d6fbf43efb0] RISC-V: add infrastructure to allow different str* implementations
git bisect good 56e0790c7f9e59ba6a0f4b59981d1d6fbf43efb0
# first bad commit: [b6fcdb191e36f82336f9b5e126d51c02e7323480] RISC-V: add zbb support to string functions

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-12 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13 21:22 [PATCH v5 0/2] Zbb string optimizations Heiko Stuebner
2023-01-13 21:23 ` [PATCH v5 1/2] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
2023-01-13 21:59   ` Conor Dooley
2023-01-13 21:23 ` [PATCH v5 2/2] RISC-V: add zbb support to string functions Heiko Stuebner
2023-01-13 22:33   ` Conor Dooley
2023-01-17 12:24   ` Andrew Jones
2023-01-20 15:02   ` Andrew Jones
2023-02-12 15:47   ` Guenter Roeck
2023-02-02 23:39 ` [PATCH v5 0/2] Zbb string optimizations Palmer Dabbelt
2023-02-03  9:53   ` Andrew Jones
2023-02-02 23:40 ` patchwork-bot+linux-riscv

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.