All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Zbb string optimizations and call support in alternatives
@ 2023-01-09 18:17 Heiko Stuebner
  2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
                   ` (5 more replies)
  0 siblings, 6 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This series still tries to allow optimized string functions for specific
extensions. The last approach of using an inline base function to hold
the alternative calls did cause some issues in a number of places

So instead of that we're now just using an alternative j at the beginning
of the generic function to jump to a separate place inside the function
itself.

This of course needs a fixup for "j" instructions in alternative blocks,
so that is provided here as well.

Technically patch4 got a review from Andrew, but that was still with
the inline approach, so I didn't bring it over to v4.


changes since v3:
- rebase on top of 6.2-rc1 + the applied alternative-call series
- add alternative fixup for jal instructions
- drop the inline functions and instead just jump

changes since v2:
- add patch fixing the c.jalr funct4 value
- reword some commit messages
- fix position of auipc addition patch (earlier)
- fix compile errors from patch-reordering gone wrong
  (worked at the end of v2, but compiling individual patches
   caused issues) - patches are now tested individually
- limit Zbb variants for GNU as for now
  (LLVM support for .option arch is still under review)
- prevent str-functions from getting optimized to builtin-variants

changes since v1:
- a number of generalizations/cleanups for instruction parsing
- use accessor function to access instructions (Emil)
- actually patch the correct location when having more than one
  instruction in an alternative block
- string function cleanups (comments etc) (Conor)
- move zbb extension above s* extensions in cpu.c lists

changes since rfc:
- make Zbb code actually work
- drop some unneeded patches
- a lot of cleanups

Heiko Stuebner (5):
  RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  RISC-V: add helpers for J-type immediate handling
  RISC-V: fix jal addresses in patched alternatives
  RISC-V: add infrastructure to allow different str* implementations
  RISC-V: add zbb support to string functions

 arch/riscv/Kconfig                       |  24 ++++
 arch/riscv/include/asm/errata_list.h     |   3 +-
 arch/riscv/include/asm/hwcap.h           |   1 +
 arch/riscv/include/asm/insn.h            |  36 ++++++
 arch/riscv/include/asm/string.h          |  12 ++
 arch/riscv/kernel/alternative.c          |  27 ++++
 arch/riscv/kernel/cpu.c                  |   1 +
 arch/riscv/kernel/cpufeature.c           |  18 +++
 arch/riscv/kernel/probes/simulate-insn.h |   3 -
 arch/riscv/kernel/riscv_ksyms.c          |   3 +
 arch/riscv/lib/Makefile                  |   3 +
 arch/riscv/lib/strcmp.S                  | 131 ++++++++++++++++++++
 arch/riscv/lib/strlen.S                  | 142 +++++++++++++++++++++
 arch/riscv/lib/strncmp.S                 | 151 +++++++++++++++++++++++
 arch/riscv/purgatory/Makefile            |  13 ++
 15 files changed, 564 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strncmp.S

-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
@ 2023-01-09 18:17 ` Heiko Stuebner
  2023-01-09 20:53   ` Conor Dooley
  2023-01-10  8:32   ` Andrew Jones
  2023-01-09 18:17 ` [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling Heiko Stuebner
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
the kprobes implementation. This got moved into a central header in
commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header").

Though it looks like I overlooked two of them, so fix that. FENCE itself is
an instruction defined directly by its own opcode, while the created
riscv_isn_is_system function covers all instructions defined under the SYSTEM
opcode.

Fixes: ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header")
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/insn.h            | 10 ++++++++++
 arch/riscv/kernel/probes/simulate-insn.h |  3 ---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/insn.h b/arch/riscv/include/asm/insn.h
index 98453535324a..0455b4dcb0a7 100644
--- a/arch/riscv/include/asm/insn.h
+++ b/arch/riscv/include/asm/insn.h
@@ -128,6 +128,7 @@
 #define RVC_C2_RD_OPOFF		7
 
 /* parts of opcode for RVG*/
+#define RVG_OPCODE_FENCE	0x0f
 #define RVG_OPCODE_AUIPC	0x17
 #define RVG_OPCODE_BRANCH	0x63
 #define RVG_OPCODE_JALR		0x67
@@ -163,6 +164,7 @@
 #define RVG_MATCH_AUIPC		(RVG_OPCODE_AUIPC)
 #define RVG_MATCH_JALR		(RV_ENCODE_FUNCT3(JALR) | RVG_OPCODE_JALR)
 #define RVG_MATCH_JAL		(RVG_OPCODE_JAL)
+#define RVG_MATCH_FENCE		(RVG_OPCODE_FENCE)
 #define RVG_MATCH_BEQ		(RV_ENCODE_FUNCT3(BEQ) | RVG_OPCODE_BRANCH)
 #define RVG_MATCH_BNE		(RV_ENCODE_FUNCT3(BNE) | RVG_OPCODE_BRANCH)
 #define RVG_MATCH_BLT		(RV_ENCODE_FUNCT3(BLT) | RVG_OPCODE_BRANCH)
@@ -182,6 +184,7 @@
 #define RVG_MASK_AUIPC		(RV_INSN_OPCODE_MASK)
 #define RVG_MASK_JALR		(RV_INSN_FUNCT3_MASK | RV_INSN_OPCODE_MASK)
 #define RVG_MASK_JAL		(RV_INSN_OPCODE_MASK)
+#define RVG_MASK_FENCE		(RV_INSN_OPCODE_MASK)
 #define RVC_MASK_C_JALR		(RVC_INSN_FUNCT4_MASK | RVC_INSN_J_RS2_MASK | RVC_INSN_OPCODE_MASK)
 #define RVC_MASK_C_JR		(RVC_INSN_FUNCT4_MASK | RVC_INSN_J_RS2_MASK | RVC_INSN_OPCODE_MASK)
 #define RVC_MASK_C_JAL		(RVC_INSN_FUNCT3_MASK | RVC_INSN_OPCODE_MASK)
@@ -233,6 +236,13 @@ __RISCV_INSN_FUNCS(c_bnez, RVC_MASK_C_BNEZ, RVC_MATCH_C_BNEZ)
 __RISCV_INSN_FUNCS(c_ebreak, RVC_MASK_C_EBREAK, RVC_MATCH_C_EBREAK)
 __RISCV_INSN_FUNCS(ebreak, RVG_MASK_EBREAK, RVG_MATCH_EBREAK)
 __RISCV_INSN_FUNCS(sret, RVG_MASK_SRET, RVG_MATCH_SRET)
+__RISCV_INSN_FUNCS(fence, RVG_MASK_FENCE, RVG_MATCH_FENCE);
+
+/* special case to catch _any_ system instruction */
+static __always_inline bool riscv_insn_is_system(u32 code)
+{
+	return (code & RV_INSN_OPCODE_MASK) == RVG_OPCODE_SYSTEM;
+}
 
 /* special case to catch _any_ branch instruction */
 static __always_inline bool riscv_insn_is_branch(u32 code)
diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
index a19aaa0feb44..61e35db31001 100644
--- a/arch/riscv/kernel/probes/simulate-insn.h
+++ b/arch/riscv/kernel/probes/simulate-insn.h
@@ -12,9 +12,6 @@
 		}							\
 	} while (0)
 
-__RISCV_INSN_FUNCS(system,	0x7f, 0x73);
-__RISCV_INSN_FUNCS(fence,	0x7f, 0x0f);
-
 #define RISCV_INSN_SET_SIMULATE(name, code)				\
 	do {								\
 		if (riscv_insn_is_##name(code)) {			\
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
  2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
@ 2023-01-09 18:17 ` Heiko Stuebner
  2023-01-09 22:22   ` Conor Dooley
  2023-01-10  8:44   ` Andrew Jones
  2023-01-09 18:17 ` [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives Heiko Stuebner
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Similar to the helper for the u-type + i-type imm handling, add helper-
functions for j-type immediates.

While it also would be possible to open-code that bit of imm wiggling
using the macros already provided in insn.h, it's way better to have
consistency on how we handle this across all function encoding/decoding.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/insn.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/riscv/include/asm/insn.h b/arch/riscv/include/asm/insn.h
index 0455b4dcb0a7..9eea61a3028f 100644
--- a/arch/riscv/include/asm/insn.h
+++ b/arch/riscv/include/asm/insn.h
@@ -301,6 +301,32 @@ static __always_inline bool riscv_insn_is_branch(u32 code)
 	(RVC_X(x_, RVC_B_IMM_7_6_OPOFF, RVC_B_IMM_7_6_MASK) << RVC_B_IMM_7_6_OFF) | \
 	(RVC_IMM_SIGN(x_) << RVC_B_IMM_SIGN_OFF); })
 
+/*
+ * Get the immediate from a J-type instruction.
+ *
+ * @insn: instruction to process
+ * Return: immediate
+ */
+static inline s32 riscv_insn_extract_jtype_imm(u32 insn)
+{
+	return RV_EXTRACT_JTYPE_IMM(insn);
+}
+
+/*
+ * Update a J-type instruction with an immediate value.
+ *
+ * @insn: pointer to the jtype instruction
+ * @imm: the immediate to insert into the instruction
+ */
+static inline void riscv_insn_insert_jtype_imm(u32 *insn, s32 imm)
+{
+	*insn &= ~GENMASK(31, 12);
+	*insn |= (((imm & (RV_J_IMM_10_1_MASK << RV_J_IMM_10_1_OFF)) << RV_I_IMM_11_0_OPOFF) |
+		  ((imm & (RV_J_IMM_11_MASK << RV_J_IMM_11_OFF)) << RV_J_IMM_11_OPOFF) |
+		  ((imm & (RV_J_IMM_19_12_OPOFF << RV_J_IMM_19_12_OFF)) << RV_J_IMM_19_12_OPOFF) |
+		  ((imm & (1 << RV_J_IMM_SIGN_OFF)) << RV_J_IMM_SIGN_OPOFF));
+}
+
 /*
  * Put together one immediate from a U-type and I-type instruction pair.
  *
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
  2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
  2023-01-09 18:17 ` [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling Heiko Stuebner
@ 2023-01-09 18:17 ` Heiko Stuebner
  2023-01-10  9:28   ` Andrew Jones
  2023-01-11 13:18   ` Jisheng Zhang
  2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Alternatives live in a different section, so addresses used by jal
instructions will point to wrong locations after the patch got applied.

Similar to arm64, adjust the location to consider that offset.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/kernel/alternative.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
index 6212ea0eed72..985c284fe9f4 100644
--- a/arch/riscv/kernel/alternative.c
+++ b/arch/riscv/kernel/alternative.c
@@ -79,6 +79,21 @@ static void riscv_alternative_fix_auipc_jalr(void *ptr, u32 auipc_insn,
 	patch_text_nosync(ptr, call, sizeof(u32) * 2);
 }
 
+static void riscv_alternative_fix_jal(void *ptr, u32 jal_insn, int patch_offset)
+{
+	s32 imm;
+
+	/* get and adjust new target address */
+	imm = riscv_insn_extract_jtype_imm(jal_insn);
+	imm -= patch_offset;
+
+	/* update instructions */
+	riscv_insn_insert_jtype_imm(&jal_insn, imm);
+
+	/* patch the call place again */
+	patch_text_nosync(ptr, &jal_insn, sizeof(u32));
+}
+
 void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
 				      int patch_offset)
 {
@@ -106,6 +121,18 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
 			riscv_alternative_fix_auipc_jalr(alt_ptr + i * sizeof(u32),
 							 insn, insn2, patch_offset);
 		}
+
+		if (riscv_insn_is_jal(insn)) {
+			s32 imm = riscv_insn_extract_jtype_imm(insn);
+
+			/* don't modify jumps inside the alternative block */
+			if ((alt_ptr + i * sizeof(u32) + imm) >= alt_ptr &&
+			    (alt_ptr + i * sizeof(u32) + imm) < (alt_ptr + len))
+				continue;
+
+			riscv_alternative_fix_jal(alt_ptr + i * sizeof(u32),
+						  insn, patch_offset);
+		}
 	}
 }
 
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (2 preceding siblings ...)
  2023-01-09 18:17 ` [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives Heiko Stuebner
@ 2023-01-09 18:17 ` Heiko Stuebner
  2023-01-09 22:37   ` Conor Dooley
                     ` (2 more replies)
  2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
  2023-01-11 13:24 ` [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Jisheng Zhang
  5 siblings, 3 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.

This adds basic infrastructure to allow patching the function calls
via alternatives later on.

The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.

The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/string.h | 10 +++++++++
 arch/riscv/kernel/riscv_ksyms.c |  3 +++
 arch/riscv/lib/Makefile         |  3 +++
 arch/riscv/lib/strcmp.S         | 37 ++++++++++++++++++++++++++++++
 arch/riscv/lib/strlen.S         | 28 +++++++++++++++++++++++
 arch/riscv/lib/strncmp.S        | 40 +++++++++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile   | 13 +++++++++++
 7 files changed, 134 insertions(+)
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strncmp.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 909049366555..a96b1fea24fe 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
 #define __HAVE_ARCH_MEMMOVE
 extern asmlinkage void *memmove(void *, const void *, size_t);
 extern asmlinkage void *__memmove(void *, const void *, size_t);
+
+#define __HAVE_ARCH_STRCMP
+extern asmlinkage int strcmp(const char *cs, const char *ct);
+
+#define __HAVE_ARCH_STRLEN
+extern asmlinkage __kernel_size_t strlen(const char *);
+
+#define __HAVE_ARCH_STRNCMP
+extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
+
 /* For those files which don't want to check by kasan. */
 #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
index 5ab1c7e1a6ed..a72879b4249a 100644
--- a/arch/riscv/kernel/riscv_ksyms.c
+++ b/arch/riscv/kernel/riscv_ksyms.c
@@ -12,6 +12,9 @@
 EXPORT_SYMBOL(memset);
 EXPORT_SYMBOL(memcpy);
 EXPORT_SYMBOL(memmove);
+EXPORT_SYMBOL(strcmp);
+EXPORT_SYMBOL(strlen);
+EXPORT_SYMBOL(strncmp);
 EXPORT_SYMBOL(__memset);
 EXPORT_SYMBOL(__memcpy);
 EXPORT_SYMBOL(__memmove);
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 25d5c9664e57..6c74b0bedd60 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -3,6 +3,9 @@ lib-y			+= delay.o
 lib-y			+= memcpy.o
 lib-y			+= memset.o
 lib-y			+= memmove.o
+lib-y			+= strcmp.o
+lib-y			+= strlen.o
+lib-y			+= strncmp.o
 lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
new file mode 100644
index 000000000000..94440fb8390c
--- /dev/null
+++ b/arch/riscv/lib/strcmp.S
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strcmp(const char *cs, const char *ct) */
+SYM_FUNC_START(strcmp)
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strcmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *
+	 * Clobbers
+	 *   t0, t1, t2
+	 */
+	mv	t2, a1
+1:
+	lbu	t1, 0(a0)
+	lbu	t0, 0(a1)
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	beq	t1, t0, 3f
+	li	a0, 1
+	bgeu	t1, t0, 2f
+	li	a0, -1
+2:
+	mv	a1, t2
+	ret
+3:
+	bnez	t1, 1b
+	li	a0, 0
+	j	2b
+SYM_FUNC_END(strcmp)
diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
new file mode 100644
index 000000000000..09a7aaff26c8
--- /dev/null
+++ b/arch/riscv/lib/strlen.S
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strlen(const char *s) */
+SYM_FUNC_START(strlen)
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers:
+	 *   t0, t1
+	 */
+	mv	t1, a0
+1:
+	lbu	t0, 0(t1)
+	bnez	t0, 2f
+	sub	a0, t1, a0
+	ret
+2:
+	addi	t1, t1, 1
+	j	1b
+SYM_FUNC_END(strlen)
diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
new file mode 100644
index 000000000000..493ab6febcb2
--- /dev/null
+++ b/arch/riscv/lib/strncmp.S
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm-generic/export.h>
+
+/* int strncmp(const char *cs, const char *ct, size_t count) */
+SYM_FUNC_START(strncmp)
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2
+	 */
+	li	t0, 0
+1:
+	beq	a2, t0, 4f
+	add	t1, a0, t0
+	add	t2, a1, t0
+	lbu	t1, 0(t1)
+	lbu	t2, 0(t2)
+	beq	t1, t2, 3f
+	li	a0, 1
+	bgeu	t1, t2, 2f
+	li	a0, -1
+2:
+	ret
+3:
+	addi	t0, t0, 1
+	bnez	t1, 1b
+4:
+	li	a0, 0
+	j	2b
+SYM_FUNC_END(strncmp)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index dd58e1d99397..d16bf715a586 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD := y
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
+purgatory-y += strcmp.o strlen.o strncmp.o
 
 targets += $(purgatory-y)
 PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
@@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
 $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
 	$(call if_changed_rule,as_o_S)
 
+$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
@@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
 AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
 
 $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
 		$(call if_changed,ld)
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (3 preceding siblings ...)
  2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2023-01-09 18:17 ` Heiko Stuebner
  2023-01-09 20:39   ` Conor Dooley
                     ` (2 more replies)
  2023-01-11 13:24 ` [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Jisheng Zhang
  5 siblings, 3 replies; 42+ messages in thread
From: Heiko Stuebner @ 2023-01-09 18:17 UTC (permalink / raw)
  To: linux-riscv, palmer
  Cc: christoph.muellner, conor, philipp.tomsich, ajones, heiko,
	jszhang, Heiko Stuebner

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.

Support for the Zbb-str-variants is limited to the GNU-assembler
for now, as LLVM has not yet acquired the functionality to
selectively change the arch option in assembler code.
This is still under review at
    https://reviews.llvm.org/D123515

Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/Kconfig                   |  24 ++++++
 arch/riscv/include/asm/errata_list.h |   3 +-
 arch/riscv/include/asm/hwcap.h       |   1 +
 arch/riscv/include/asm/string.h      |   2 +
 arch/riscv/kernel/cpu.c              |   1 +
 arch/riscv/kernel/cpufeature.c       |  18 +++++
 arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
 arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
 arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
 9 files changed, 367 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..7c814fbf9527 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
 
 	   If you don't know what to do here, say Y.
 
+config TOOLCHAIN_HAS_ZBB
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
+	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
+	depends on AS_IS_GNU
+
+config RISCV_ISA_ZBB
+	bool "Zbb extension support for bit manipulation instructions"
+	depends on TOOLCHAIN_HAS_ZBB
+	depends on !XIP_KERNEL && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZBB
+	   extension (basic bit manipulation) and enable its usage.
+
+	   The Zbb extension provides instructions to accelerate a number
+	   of bit-specific operations (count bit population, sign extending,
+	   bitrotation, etc).
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZICBOM
 	bool
 	default y
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4180312d2a70..95e626b7281e 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -24,7 +24,8 @@
 
 #define	CPUFEATURE_SVPBMT 0
 #define	CPUFEATURE_ZICBOM 1
-#define	CPUFEATURE_NUMBER 2
+#define	CPUFEATURE_ZBB 2
+#define	CPUFEATURE_NUMBER 3
 
 #ifdef __ASSEMBLY__
 
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 86328e3acb02..b727491fb100 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_ZIHINTPAUSE,
 	RISCV_ISA_EXT_SSTC,
 	RISCV_ISA_EXT_SVINVAL,
+	RISCV_ISA_EXT_ZBB,
 	RISCV_ISA_EXT_ID_MAX
 };
 static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index a96b1fea24fe..17dfc4ab4c80 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -6,6 +6,8 @@
 #ifndef _ASM_RISCV_STRING_H
 #define _ASM_RISCV_STRING_H
 
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 #include <linux/types.h>
 #include <linux/linkage.h>
 
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 1b9a5a66e55a..c4d1aa166f8b 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
  *    extensions by an underscore.
  */
 static struct riscv_isa_ext_data isa_ext_arr[] = {
+	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 205bbd6b1fce..bf3a791d7110 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
 					set_bit(nr, this_isa);
 				}
 			} else {
+				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
@@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
 	return true;
 }
 
+static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
+{
+	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
+		return false;
+
+	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
+		return false;
+
+	if (!riscv_isa_extension_available(NULL, ZBB))
+		return false;
+
+	return true;
+}
+
 /*
  * Probe presence of individual extensions.
  *
@@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
 	if (cpufeature_probe_zicbom(stage))
 		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
 
+	if (cpufeature_probe_zbb(stage))
+		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
+
 	return cpu_req_feature;
 }
 
diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
index 94440fb8390c..5428a8f2eb84 100644
--- a/arch/riscv/lib/strcmp.S
+++ b/arch/riscv/lib/strcmp.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strcmp(const char *cs, const char *ct) */
 SYM_FUNC_START(strcmp)
+
+	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - comparison result, value like strcmp
@@ -34,4 +39,93 @@ SYM_FUNC_START(strcmp)
 	bnez	t1, 1b
 	li	a0, 0
 	j	2b
+
+/*
+ * Variant of strcmp using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+variant_zbb:
+#define src1		a0
+#define result		a0
+#define src2		t5
+#define data1		t0
+#define data2		t1
+#define align		t2
+#define data1_orcb	t3
+#define m1		t4
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - comparison result, value like strcmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5
+	 */
+	mv	src2, a1
+
+	or	align, src1, src2
+	li	m1, -1
+	and	align, align, SZREG-1
+	bnez	align, 3f
+
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	REG_L	data1, 0(src1)
+	REG_L	data2, 0(src2)
+	orc.b	data1_orcb, data1
+	bne	data1_orcb, m1, 2f
+	addi	src1, src1, SZREG
+	addi	src2, src2, SZREG
+	beq	data1, data2, 1b
+
+	/*
+	 * Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.
+	 */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	data1, data1
+	rev8	data2, data2
+#endif
+
+	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */
+	sltu	result, data1, data2
+	neg	result, result
+	ori	result, result, 1
+	ret
+
+2:
+	/*
+	 * Found a null byte.
+	 * If words don't match, fall back to simple loop.
+	 */
+	bne	data1, data2, 3f
+
+	/* Otherwise, strings are equal. */
+	li	result, 0
+	ret
+
+	/* Simple loop for misaligned strings. */
+	.p2align 3
+3:
+	lbu	data1, 0(src1)
+	lbu	data2, 0(src2)
+	addi	src1, src1, 1
+	addi	src2, src2, 1
+	bne	data1, data2, 4f
+	bnez	data1, 3b
+
+4:
+	sub	result, data1, data2
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strcmp)
diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
index 09a7aaff26c8..738efb04307d 100644
--- a/arch/riscv/lib/strlen.S
+++ b/arch/riscv/lib/strlen.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strlen(const char *s) */
 SYM_FUNC_START(strlen)
+
+	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - string length
@@ -25,4 +30,113 @@ SYM_FUNC_START(strlen)
 2:
 	addi	t1, t1, 1
 	j	1b
+
+/*
+ * Variant of strlen using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+variant_zbb:
+
+#define src		a0
+#define result		a0
+#define addr		t0
+#define data		t1
+#define offset		t2
+#define offset_bits	t2
+#define valid_bytes	t3
+#define m1		t3
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+# define CZ	clz
+# define SHIFT	sll
+#else
+# define CZ	ctz
+# define SHIFT	srl
+#endif
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - string length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3
+	 */
+
+	/* Number of irrelevant bytes in the first word. */
+	andi	offset, src, SZREG-1
+
+	/* Align pointer. */
+	andi	addr, src, -SZREG
+
+	li	valid_bytes, SZREG
+	sub	valid_bytes, valid_bytes, offset
+	slli	offset_bits, offset, RISCV_LGPTR
+
+	/* Get the first word.  */
+	REG_L	data, 0(addr)
+
+	/*
+	 * Shift away the partial data we loaded to remove the irrelevant bytes
+	 * preceding the string with the effect of adding NUL bytes at the
+	 * end of the string.
+	 */
+	SHIFT	data, data, offset_bits
+
+	/* Convert non-NUL into 0xff and NUL into 0x00. */
+	orc.b	data, data
+
+	/* Convert non-NUL into 0x00 and NUL into 0xff. */
+	not	data, data
+
+	/*
+	 * Search for the first set bit (corresponding to a NUL byte in the
+	 * original chunk).
+	 */
+	CZ	data, data
+
+	/*
+	 * The first chunk is special: commpare against the number
+	 * of valid bytes in this chunk.
+	 */
+	srli	result, data, 3
+	bgtu	valid_bytes, result, 3f
+
+	/* Prepare for the word comparison loop. */
+	addi	offset, addr, SZREG
+	li	m1, -1
+
+	/*
+	 * Our critical loop is 4 instructions and processes data in
+	 * 4 byte or 8 byte chunks.
+	 */
+	.p2align 3
+1:
+	REG_L	data, SZREG(addr)
+	addi	addr, addr, SZREG
+	orc.b	data, data
+	beq	data, m1, 1b
+2:
+	not	data, data
+	CZ	data, data
+
+	/* Get number of processed words.  */
+	sub	offset, addr, offset
+
+	/* Add number of characters in the first word.  */
+	add	result, result, offset
+	srli	data, data, 3
+
+	/* Add number of characters in the last word.  */
+	add	result, result, data
+3:
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strlen)
diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
index 493ab6febcb2..851428b439dc 100644
--- a/arch/riscv/lib/strncmp.S
+++ b/arch/riscv/lib/strncmp.S
@@ -3,9 +3,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm-generic/export.h>
+#include <asm/alternative-macros.h>
+#include <asm/errata_list.h>
 
 /* int strncmp(const char *cs, const char *ct, size_t count) */
 SYM_FUNC_START(strncmp)
+
+	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
+
 	/*
 	 * Returns
 	 *   a0 - comparison result, value like strncmp
@@ -37,4 +42,110 @@ SYM_FUNC_START(strncmp)
 4:
 	li	a0, 0
 	j	2b
+
+/*
+ * Variant of strncmp using the ZBB extension if available
+ */
+#ifdef CONFIG_RISCV_ISA_ZBB
+variant_zbb:
+
+#define src1		a0
+#define result		a0
+#define src2		t6
+#define len		a2
+#define data1		t0
+#define data2		t1
+#define align		t2
+#define data1_orcb	t3
+#define limit		t4
+#define m1		t5
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - comparison result, like strncmp
+	 *
+	 * Parameters
+	 *   a0 - string1
+	 *   a1 - string2
+	 *   a2 - number of characters to compare
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4, t5, t6
+	 */
+	mv	src2, a1
+
+	or	align, src1, src2
+	li	m1, -1
+	and	align, align, SZREG-1
+	add	limit, src1, len
+	bnez	align, 4f
+
+	/* Adjust limit for fast-path.  */
+	addi	limit, limit, -SZREG
+
+	/* Main loop for aligned string.  */
+	.p2align 3
+1:
+	bgt	src1, limit, 3f
+	REG_L	data1, 0(src1)
+	REG_L	data2, 0(src2)
+	orc.b	data1_orcb, data1
+	bne	data1_orcb, m1, 2f
+	addi	src1, src1, SZREG
+	addi	src2, src2, SZREG
+	beq	data1, data2, 1b
+
+	/*
+	 * Words don't match, and no null byte in the first
+	 * word. Get bytes in big-endian order and compare.
+	 */
+#ifndef CONFIG_CPU_BIG_ENDIAN
+	rev8	data1, data1
+	rev8	data2, data2
+#endif
+
+	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+	sltu	result, data1, data2
+	neg	result, result
+	ori	result, result, 1
+	ret
+
+2:
+	/*
+	 * Found a null byte.
+	 * If words don't match, fall back to simple loop.
+	 */
+	bne	data1, data2, 3f
+
+	/* Otherwise, strings are equal.  */
+	li	result, 0
+	ret
+
+	/* Simple loop for misaligned strings.  */
+3:
+	/* Restore limit for slow-path.  */
+	addi	limit, limit, SZREG
+	.p2align 3
+4:
+	bge	src1, limit, 6f
+	lbu	data1, 0(src1)
+	lbu	data2, 0(src2)
+	addi	src1, src1, 1
+	addi	src2, src2, 1
+	bne	data1, data2, 5f
+	bnez	data1, 4b
+
+5:
+	sub	result, data1, data2
+	ret
+
+6:
+	li	result, 0
+	ret
+
+.option pop
+#endif
 SYM_FUNC_END(strncmp)
-- 
2.35.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
@ 2023-01-09 20:39   ` Conor Dooley
  2023-01-10  9:57   ` Andrew Jones
  2023-01-11 12:24   ` Andrew Jones
  2 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-01-09 20:39 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 14569 bytes --]

On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515

Shame, soon hopefully!

> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>

If you've not changed the asm bits since vN-1, then:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> ---
>  arch/riscv/Kconfig                   |  24 ++++++
>  arch/riscv/include/asm/errata_list.h |   3 +-
>  arch/riscv/include/asm/hwcap.h       |   1 +
>  arch/riscv/include/asm/string.h      |   2 +
>  arch/riscv/kernel/cpu.c              |   1 +
>  arch/riscv/kernel/cpufeature.c       |  18 +++++
>  arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
>  arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
>  9 files changed, 367 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..7c814fbf9527 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
>  
>  	   If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBB
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> +	depends on AS_IS_GNU
> +
> +config RISCV_ISA_ZBB
> +	bool "Zbb extension support for bit manipulation instructions"
> +	depends on TOOLCHAIN_HAS_ZBB
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBB
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zbb extension provides instructions to accelerate a number
> +	   of bit-specific operations (count bit population, sign extending,
> +	   bitrotation, etc).
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZICBOM
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..95e626b7281e 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -24,7 +24,8 @@
>  
>  #define	CPUFEATURE_SVPBMT 0
>  #define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_ZBB 2
> +#define	CPUFEATURE_NUMBER 3
>  
>  #ifdef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 86328e3acb02..b727491fb100 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
>  	RISCV_ISA_EXT_SSTC,
>  	RISCV_ISA_EXT_SVINVAL,
> +	RISCV_ISA_EXT_ZBB,
>  	RISCV_ISA_EXT_ID_MAX
>  };
>  static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index a96b1fea24fe..17dfc4ab4c80 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -6,6 +6,8 @@
>  #ifndef _ASM_RISCV_STRING_H
>  #define _ASM_RISCV_STRING_H
>  
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  #include <linux/types.h>
>  #include <linux/linkage.h>
>  
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 1b9a5a66e55a..c4d1aa166f8b 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
>   *    extensions by an underscore.
>   */
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
> +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 205bbd6b1fce..bf3a791d7110 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
>  					set_bit(nr, this_isa);
>  				}
>  			} else {
> +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
>  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
>  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
>  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> @@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
>  	return true;
>  }
>  
> +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	if (!riscv_isa_extension_available(NULL, ZBB))
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * Probe presence of individual extensions.
>   *
> @@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  	if (cpufeature_probe_zicbom(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
>  
> +	if (cpufeature_probe_zbb(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> +
>  	return cpu_req_feature;
>  }
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> index 94440fb8390c..5428a8f2eb84 100644
> --- a/arch/riscv/lib/strcmp.S
> +++ b/arch/riscv/lib/strcmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strcmp(const char *cs, const char *ct) */
>  SYM_FUNC_START(strcmp)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strcmp
> @@ -34,4 +39,93 @@ SYM_FUNC_START(strcmp)
>  	bnez	t1, 1b
>  	li	a0, 0
>  	j	2b
> +
> +/*
> + * Variant of strcmp using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +#define src1		a0
> +#define result		a0
> +#define src2		t5
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define m1		t4
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strcmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	bnez	align, 3f
> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal. */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings. */
> +	.p2align 3
> +3:
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 4f
> +	bnez	data1, 3b
> +
> +4:
> +	sub	result, data1, data2
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strcmp)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> index 09a7aaff26c8..738efb04307d 100644
> --- a/arch/riscv/lib/strlen.S
> +++ b/arch/riscv/lib/strlen.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strlen(const char *s) */
>  SYM_FUNC_START(strlen)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - string length
> @@ -25,4 +30,113 @@ SYM_FUNC_START(strlen)
>  2:
>  	addi	t1, t1, 1
>  	j	1b
> +
> +/*
> + * Variant of strlen using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +
> +#define src		a0
> +#define result		a0
> +#define addr		t0
> +#define data		t1
> +#define offset		t2
> +#define offset_bits	t2
> +#define valid_bytes	t3
> +#define m1		t3
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +# define CZ	clz
> +# define SHIFT	sll
> +#else
> +# define CZ	ctz
> +# define SHIFT	srl
> +#endif
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3
> +	 */
> +
> +	/* Number of irrelevant bytes in the first word. */
> +	andi	offset, src, SZREG-1
> +
> +	/* Align pointer. */
> +	andi	addr, src, -SZREG
> +
> +	li	valid_bytes, SZREG
> +	sub	valid_bytes, valid_bytes, offset
> +	slli	offset_bits, offset, RISCV_LGPTR
> +
> +	/* Get the first word.  */
> +	REG_L	data, 0(addr)
> +
> +	/*
> +	 * Shift away the partial data we loaded to remove the irrelevant bytes
> +	 * preceding the string with the effect of adding NUL bytes at the
> +	 * end of the string.
> +	 */
> +	SHIFT	data, data, offset_bits
> +
> +	/* Convert non-NUL into 0xff and NUL into 0x00. */
> +	orc.b	data, data
> +
> +	/* Convert non-NUL into 0x00 and NUL into 0xff. */
> +	not	data, data
> +
> +	/*
> +	 * Search for the first set bit (corresponding to a NUL byte in the
> +	 * original chunk).
> +	 */
> +	CZ	data, data
> +
> +	/*
> +	 * The first chunk is special: commpare against the number
> +	 * of valid bytes in this chunk.
> +	 */
> +	srli	result, data, 3
> +	bgtu	valid_bytes, result, 3f
> +
> +	/* Prepare for the word comparison loop. */
> +	addi	offset, addr, SZREG
> +	li	m1, -1
> +
> +	/*
> +	 * Our critical loop is 4 instructions and processes data in
> +	 * 4 byte or 8 byte chunks.
> +	 */
> +	.p2align 3
> +1:
> +	REG_L	data, SZREG(addr)
> +	addi	addr, addr, SZREG
> +	orc.b	data, data
> +	beq	data, m1, 1b
> +2:
> +	not	data, data
> +	CZ	data, data
> +
> +	/* Get number of processed words.  */
> +	sub	offset, addr, offset
> +
> +	/* Add number of characters in the first word.  */
> +	add	result, result, offset
> +	srli	data, data, 3
> +
> +	/* Add number of characters in the last word.  */
> +	add	result, result, data
> +3:
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strlen)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> index 493ab6febcb2..851428b439dc 100644
> --- a/arch/riscv/lib/strncmp.S
> +++ b/arch/riscv/lib/strncmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strncmp(const char *cs, const char *ct, size_t count) */
>  SYM_FUNC_START(strncmp)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strncmp
> @@ -37,4 +42,110 @@ SYM_FUNC_START(strncmp)
>  4:
>  	li	a0, 0
>  	j	2b
> +
> +/*
> + * Variant of strncmp using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +
> +#define src1		a0
> +#define result		a0
> +#define src2		t6
> +#define len		a2
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define limit		t4
> +#define m1		t5
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5, t6
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	add	limit, src1, len
> +	bnez	align, 4f
> +
> +	/* Adjust limit for fast-path.  */
> +	addi	limit, limit, -SZREG
> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	bgt	src1, limit, 3f
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal.  */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings.  */
> +3:
> +	/* Restore limit for slow-path.  */
> +	addi	limit, limit, SZREG
> +	.p2align 3
> +4:
> +	bge	src1, limit, 6f
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 5f
> +	bnez	data1, 4b
> +
> +5:
> +	sub	result, data1, data2
> +	ret
> +
> +6:
> +	li	result, 0
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strncmp)
> -- 
> 2.35.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
@ 2023-01-09 20:53   ` Conor Dooley
  2023-01-11 15:14     ` Heiko Stübner
  2023-01-10  8:32   ` Andrew Jones
  1 sibling, 1 reply; 42+ messages in thread
From: Conor Dooley @ 2023-01-09 20:53 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 3787 bytes --]

On Mon, Jan 09, 2023 at 07:17:51PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
> the kprobes implementation. This got moved into a central header in
> commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header").
> 
> Though it looks like I overlooked two of them, so fix that. FENCE itself is
> an instruction defined directly by its own opcode, while the created
> riscv_isn_is_system function covers all instructions defined under the SYSTEM
> opcode.
> 
> Fixes: ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header")

Not quite sure why it needs a fixes tag, nothing was actually broken
previously, right?

Not that I really care, but Sasha is likely gonna pick it up for
backporting as a result.

Either way, looks grand.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/insn.h            | 10 ++++++++++
>  arch/riscv/kernel/probes/simulate-insn.h |  3 ---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/insn.h b/arch/riscv/include/asm/insn.h
> index 98453535324a..0455b4dcb0a7 100644
> --- a/arch/riscv/include/asm/insn.h
> +++ b/arch/riscv/include/asm/insn.h
> @@ -128,6 +128,7 @@
>  #define RVC_C2_RD_OPOFF		7
>  
>  /* parts of opcode for RVG*/
> +#define RVG_OPCODE_FENCE	0x0f
>  #define RVG_OPCODE_AUIPC	0x17
>  #define RVG_OPCODE_BRANCH	0x63
>  #define RVG_OPCODE_JALR		0x67
> @@ -163,6 +164,7 @@
>  #define RVG_MATCH_AUIPC		(RVG_OPCODE_AUIPC)
>  #define RVG_MATCH_JALR		(RV_ENCODE_FUNCT3(JALR) | RVG_OPCODE_JALR)
>  #define RVG_MATCH_JAL		(RVG_OPCODE_JAL)
> +#define RVG_MATCH_FENCE		(RVG_OPCODE_FENCE)
>  #define RVG_MATCH_BEQ		(RV_ENCODE_FUNCT3(BEQ) | RVG_OPCODE_BRANCH)
>  #define RVG_MATCH_BNE		(RV_ENCODE_FUNCT3(BNE) | RVG_OPCODE_BRANCH)
>  #define RVG_MATCH_BLT		(RV_ENCODE_FUNCT3(BLT) | RVG_OPCODE_BRANCH)
> @@ -182,6 +184,7 @@
>  #define RVG_MASK_AUIPC		(RV_INSN_OPCODE_MASK)
>  #define RVG_MASK_JALR		(RV_INSN_FUNCT3_MASK | RV_INSN_OPCODE_MASK)
>  #define RVG_MASK_JAL		(RV_INSN_OPCODE_MASK)
> +#define RVG_MASK_FENCE		(RV_INSN_OPCODE_MASK)
>  #define RVC_MASK_C_JALR		(RVC_INSN_FUNCT4_MASK | RVC_INSN_J_RS2_MASK | RVC_INSN_OPCODE_MASK)
>  #define RVC_MASK_C_JR		(RVC_INSN_FUNCT4_MASK | RVC_INSN_J_RS2_MASK | RVC_INSN_OPCODE_MASK)
>  #define RVC_MASK_C_JAL		(RVC_INSN_FUNCT3_MASK | RVC_INSN_OPCODE_MASK)
> @@ -233,6 +236,13 @@ __RISCV_INSN_FUNCS(c_bnez, RVC_MASK_C_BNEZ, RVC_MATCH_C_BNEZ)
>  __RISCV_INSN_FUNCS(c_ebreak, RVC_MASK_C_EBREAK, RVC_MATCH_C_EBREAK)
>  __RISCV_INSN_FUNCS(ebreak, RVG_MASK_EBREAK, RVG_MATCH_EBREAK)
>  __RISCV_INSN_FUNCS(sret, RVG_MASK_SRET, RVG_MATCH_SRET)
> +__RISCV_INSN_FUNCS(fence, RVG_MASK_FENCE, RVG_MATCH_FENCE);
> +
> +/* special case to catch _any_ system instruction */
> +static __always_inline bool riscv_insn_is_system(u32 code)
> +{
> +	return (code & RV_INSN_OPCODE_MASK) == RVG_OPCODE_SYSTEM;
> +}
>  
>  /* special case to catch _any_ branch instruction */
>  static __always_inline bool riscv_insn_is_branch(u32 code)
> diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
> index a19aaa0feb44..61e35db31001 100644
> --- a/arch/riscv/kernel/probes/simulate-insn.h
> +++ b/arch/riscv/kernel/probes/simulate-insn.h
> @@ -12,9 +12,6 @@
>  		}							\
>  	} while (0)
>  
> -__RISCV_INSN_FUNCS(system,	0x7f, 0x73);
> -__RISCV_INSN_FUNCS(fence,	0x7f, 0x0f);
> -
>  #define RISCV_INSN_SET_SIMULATE(name, code)				\
>  	do {								\
>  		if (riscv_insn_is_##name(code)) {			\
> -- 
> 2.35.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling
  2023-01-09 18:17 ` [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling Heiko Stuebner
@ 2023-01-09 22:22   ` Conor Dooley
  2023-01-10  8:44   ` Andrew Jones
  1 sibling, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-01-09 22:22 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 2208 bytes --]

On Mon, Jan 09, 2023 at 07:17:52PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Similar to the helper for the u-type + i-type imm handling, add helper-
> functions for j-type immediates.
> 
> While it also would be possible to open-code that bit of imm wiggling
> using the macros already provided in insn.h, it's way better to have
> consistency on how we handle this across all function encoding/decoding.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/insn.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/insn.h b/arch/riscv/include/asm/insn.h
> index 0455b4dcb0a7..9eea61a3028f 100644
> --- a/arch/riscv/include/asm/insn.h
> +++ b/arch/riscv/include/asm/insn.h
> @@ -301,6 +301,32 @@ static __always_inline bool riscv_insn_is_branch(u32 code)
>  	(RVC_X(x_, RVC_B_IMM_7_6_OPOFF, RVC_B_IMM_7_6_MASK) << RVC_B_IMM_7_6_OFF) | \
>  	(RVC_IMM_SIGN(x_) << RVC_B_IMM_SIGN_OFF); })
>  
> +/*
> + * Get the immediate from a J-type instruction.
> + *
> + * @insn: instruction to process
> + * Return: immediate
> + */
> +static inline s32 riscv_insn_extract_jtype_imm(u32 insn)
> +{
> +	return RV_EXTRACT_JTYPE_IMM(insn);
> +}
> +
> +/*
> + * Update a J-type instruction with an immediate value.
> + *
> + * @insn: pointer to the jtype instruction
> + * @imm: the immediate to insert into the instruction
> + */
> +static inline void riscv_insn_insert_jtype_imm(u32 *insn, s32 imm)
> +{
> +	*insn &= ~GENMASK(31, 12);
> +	*insn |= (((imm & (RV_J_IMM_10_1_MASK << RV_J_IMM_10_1_OFF)) << RV_I_IMM_11_0_OPOFF) |
                                                                           ^
RV_I? That should be RV_J_IMM_10_1_OPOFF, no?

> +		  ((imm & (RV_J_IMM_11_MASK << RV_J_IMM_11_OFF)) << RV_J_IMM_11_OPOFF) |
> +		  ((imm & (RV_J_IMM_19_12_OPOFF << RV_J_IMM_19_12_OFF)) << RV_J_IMM_19_12_OPOFF) |
> +		  ((imm & (1 << RV_J_IMM_SIGN_OFF)) << RV_J_IMM_SIGN_OPOFF));
> +}

Otherwise, seems fine?

> +
>  /*
>   * Put together one immediate from a U-type and I-type instruction pair.
>   *
> -- 
> 2.35.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
@ 2023-01-09 22:37   ` Conor Dooley
  2023-01-09 23:31     ` Heiko Stübner
  2023-01-10  9:39   ` Andrew Jones
  2023-01-10 12:13   ` Andrew Jones
  2 siblings, 1 reply; 42+ messages in thread
From: Conor Dooley @ 2023-01-09 22:37 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 7599 bytes --]

On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Depending on supported extensions on specific RISC-V cores,
> optimized str* functions might make sense.
> 
> This adds basic infrastructure to allow patching the function calls
> via alternatives later on.
> 
> The Linux kernel provides standard implementations for string functions
> but when architectures want to extend them, they need to provide their
> own.
> 
> The added generic string functions are done in assembler (taken from
> disassembling the main-kernel functions for now) to allow us to control
> the used registers and extend them with optimized variants.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>

asm looks the same as what I previously provided a review for, but I
assume you dropped cos of the EFI stub stuff? You can have it back I
suppose..
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

> ---
>  arch/riscv/include/asm/string.h | 10 +++++++++
>  arch/riscv/kernel/riscv_ksyms.c |  3 +++
>  arch/riscv/lib/Makefile         |  3 +++
>  arch/riscv/lib/strcmp.S         | 37 ++++++++++++++++++++++++++++++
>  arch/riscv/lib/strlen.S         | 28 +++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S        | 40 +++++++++++++++++++++++++++++++++
>  arch/riscv/purgatory/Makefile   | 13 +++++++++++
>  7 files changed, 134 insertions(+)
>  create mode 100644 arch/riscv/lib/strcmp.S
>  create mode 100644 arch/riscv/lib/strlen.S
>  create mode 100644 arch/riscv/lib/strncmp.S
> 
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index 909049366555..a96b1fea24fe 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
>  #define __HAVE_ARCH_MEMMOVE
>  extern asmlinkage void *memmove(void *, const void *, size_t);
>  extern asmlinkage void *__memmove(void *, const void *, size_t);
> +
> +#define __HAVE_ARCH_STRCMP
> +extern asmlinkage int strcmp(const char *cs, const char *ct);
> +
> +#define __HAVE_ARCH_STRLEN
> +extern asmlinkage __kernel_size_t strlen(const char *);
> +
> +#define __HAVE_ARCH_STRNCMP
> +extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
> +
>  /* For those files which don't want to check by kasan. */
>  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
> index 5ab1c7e1a6ed..a72879b4249a 100644
> --- a/arch/riscv/kernel/riscv_ksyms.c
> +++ b/arch/riscv/kernel/riscv_ksyms.c
> @@ -12,6 +12,9 @@
>  EXPORT_SYMBOL(memset);
>  EXPORT_SYMBOL(memcpy);
>  EXPORT_SYMBOL(memmove);
> +EXPORT_SYMBOL(strcmp);
> +EXPORT_SYMBOL(strlen);
> +EXPORT_SYMBOL(strncmp);
>  EXPORT_SYMBOL(__memset);
>  EXPORT_SYMBOL(__memcpy);
>  EXPORT_SYMBOL(__memmove);
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 25d5c9664e57..6c74b0bedd60 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -3,6 +3,9 @@ lib-y			+= delay.o
>  lib-y			+= memcpy.o
>  lib-y			+= memset.o
>  lib-y			+= memmove.o
> +lib-y			+= strcmp.o
> +lib-y			+= strlen.o
> +lib-y			+= strncmp.o
>  lib-$(CONFIG_MMU)	+= uaccess.o
>  lib-$(CONFIG_64BIT)	+= tishift.o
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> new file mode 100644
> index 000000000000..94440fb8390c
> --- /dev/null
> +++ b/arch/riscv/lib/strcmp.S
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strcmp(const char *cs, const char *ct) */
> +SYM_FUNC_START(strcmp)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strcmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2
> +	 */
> +	mv	t2, a1
> +1:
> +	lbu	t1, 0(a0)
> +	lbu	t0, 0(a1)
> +	addi	a0, a0, 1
> +	addi	a1, a1, 1
> +	beq	t1, t0, 3f
> +	li	a0, 1
> +	bgeu	t1, t0, 2f
> +	li	a0, -1
> +2:
> +	mv	a1, t2
> +	ret
> +3:
> +	bnez	t1, 1b
> +	li	a0, 0
> +	j	2b
> +SYM_FUNC_END(strcmp)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> new file mode 100644
> index 000000000000..09a7aaff26c8
> --- /dev/null
> +++ b/arch/riscv/lib/strlen.S
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strlen(const char *s) */
> +SYM_FUNC_START(strlen)
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers:
> +	 *   t0, t1
> +	 */
> +	mv	t1, a0
> +1:
> +	lbu	t0, 0(t1)
> +	bnez	t0, 2f
> +	sub	a0, t1, a0
> +	ret
> +2:
> +	addi	t1, t1, 1
> +	j	1b
> +SYM_FUNC_END(strlen)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> new file mode 100644
> index 000000000000..493ab6febcb2
> --- /dev/null
> +++ b/arch/riscv/lib/strncmp.S
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strncmp(const char *cs, const char *ct, size_t count) */
> +SYM_FUNC_START(strncmp)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2
> +	 */
> +	li	t0, 0
> +1:
> +	beq	a2, t0, 4f
> +	add	t1, a0, t0
> +	add	t2, a1, t0
> +	lbu	t1, 0(t1)
> +	lbu	t2, 0(t2)
> +	beq	t1, t2, 3f
> +	li	a0, 1
> +	bgeu	t1, t2, 2f
> +	li	a0, -1
> +2:
> +	ret
> +3:
> +	addi	t0, t0, 1
> +	bnez	t1, 1b
> +4:
> +	li	a0, 0
> +	j	2b
> +SYM_FUNC_END(strncmp)
> diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
> index dd58e1d99397..d16bf715a586 100644
> --- a/arch/riscv/purgatory/Makefile
> +++ b/arch/riscv/purgatory/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD := y
>  
>  purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
> +purgatory-y += strcmp.o strlen.o strncmp.o
>  
>  targets += $(purgatory-y)
>  PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> @@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
>  $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
>  	$(call if_changed_rule,as_o_S)
>  
> +$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
> +$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
> +$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
>  $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
>  	$(call if_changed_rule,cc_o_c)
>  
> @@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
>  AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
>  AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
>  AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
>  
>  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
>  		$(call if_changed,ld)
> -- 
> 2.35.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-09 22:37   ` Conor Dooley
@ 2023-01-09 23:31     ` Heiko Stübner
  0 siblings, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-09 23:31 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang

Am Montag, 9. Januar 2023, 23:37:45 CET schrieb Conor Dooley:
> On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Depending on supported extensions on specific RISC-V cores,
> > optimized str* functions might make sense.
> > 
> > This adds basic infrastructure to allow patching the function calls
> > via alternatives later on.
> > 
> > The Linux kernel provides standard implementations for string functions
> > but when architectures want to extend them, they need to provide their
> > own.
> > 
> > The added generic string functions are done in assembler (taken from
> > disassembling the main-kernel functions for now) to allow us to control
> > the used registers and extend them with optimized variants.
> > 
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> asm looks the same as what I previously provided a review for, but I
> assume you dropped cos of the EFI stub stuff? You can have it back I
> suppose..
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

That was the

| Technically patch4 got a review from Andrew, but that was still with
| the inline approach, so I didn't bring it over to v4.

part in the cover-letter, thanks for bringing the rb back :-)


Heiko

> > ---
> >  arch/riscv/include/asm/string.h | 10 +++++++++
> >  arch/riscv/kernel/riscv_ksyms.c |  3 +++
> >  arch/riscv/lib/Makefile         |  3 +++
> >  arch/riscv/lib/strcmp.S         | 37 ++++++++++++++++++++++++++++++
> >  arch/riscv/lib/strlen.S         | 28 +++++++++++++++++++++++
> >  arch/riscv/lib/strncmp.S        | 40 +++++++++++++++++++++++++++++++++
> >  arch/riscv/purgatory/Makefile   | 13 +++++++++++
> >  7 files changed, 134 insertions(+)
> >  create mode 100644 arch/riscv/lib/strcmp.S
> >  create mode 100644 arch/riscv/lib/strlen.S
> >  create mode 100644 arch/riscv/lib/strncmp.S
> > 
> > diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> > index 909049366555..a96b1fea24fe 100644
> > --- a/arch/riscv/include/asm/string.h
> > +++ b/arch/riscv/include/asm/string.h
> > @@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
> >  #define __HAVE_ARCH_MEMMOVE
> >  extern asmlinkage void *memmove(void *, const void *, size_t);
> >  extern asmlinkage void *__memmove(void *, const void *, size_t);
> > +
> > +#define __HAVE_ARCH_STRCMP
> > +extern asmlinkage int strcmp(const char *cs, const char *ct);
> > +
> > +#define __HAVE_ARCH_STRLEN
> > +extern asmlinkage __kernel_size_t strlen(const char *);
> > +
> > +#define __HAVE_ARCH_STRNCMP
> > +extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
> > +
> >  /* For those files which don't want to check by kasan. */
> >  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
> >  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> > diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
> > index 5ab1c7e1a6ed..a72879b4249a 100644
> > --- a/arch/riscv/kernel/riscv_ksyms.c
> > +++ b/arch/riscv/kernel/riscv_ksyms.c
> > @@ -12,6 +12,9 @@
> >  EXPORT_SYMBOL(memset);
> >  EXPORT_SYMBOL(memcpy);
> >  EXPORT_SYMBOL(memmove);
> > +EXPORT_SYMBOL(strcmp);
> > +EXPORT_SYMBOL(strlen);
> > +EXPORT_SYMBOL(strncmp);
> >  EXPORT_SYMBOL(__memset);
> >  EXPORT_SYMBOL(__memcpy);
> >  EXPORT_SYMBOL(__memmove);
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 25d5c9664e57..6c74b0bedd60 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -3,6 +3,9 @@ lib-y			+= delay.o
> >  lib-y			+= memcpy.o
> >  lib-y			+= memset.o
> >  lib-y			+= memmove.o
> > +lib-y			+= strcmp.o
> > +lib-y			+= strlen.o
> > +lib-y			+= strncmp.o
> >  lib-$(CONFIG_MMU)	+= uaccess.o
> >  lib-$(CONFIG_64BIT)	+= tishift.o
> >  
> > diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> > new file mode 100644
> > index 000000000000..94440fb8390c
> > --- /dev/null
> > +++ b/arch/riscv/lib/strcmp.S
> > @@ -0,0 +1,37 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strcmp(const char *cs, const char *ct) */
> > +SYM_FUNC_START(strcmp)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strcmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2
> > +	 */
> > +	mv	t2, a1
> > +1:
> > +	lbu	t1, 0(a0)
> > +	lbu	t0, 0(a1)
> > +	addi	a0, a0, 1
> > +	addi	a1, a1, 1
> > +	beq	t1, t0, 3f
> > +	li	a0, 1
> > +	bgeu	t1, t0, 2f
> > +	li	a0, -1
> > +2:
> > +	mv	a1, t2
> > +	ret
> > +3:
> > +	bnez	t1, 1b
> > +	li	a0, 0
> > +	j	2b
> > +SYM_FUNC_END(strcmp)
> > diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> > new file mode 100644
> > index 000000000000..09a7aaff26c8
> > --- /dev/null
> > +++ b/arch/riscv/lib/strlen.S
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strlen(const char *s) */
> > +SYM_FUNC_START(strlen)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - string length
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - String to measure
> > +	 *
> > +	 * Clobbers:
> > +	 *   t0, t1
> > +	 */
> > +	mv	t1, a0
> > +1:
> > +	lbu	t0, 0(t1)
> > +	bnez	t0, 2f
> > +	sub	a0, t1, a0
> > +	ret
> > +2:
> > +	addi	t1, t1, 1
> > +	j	1b
> > +SYM_FUNC_END(strlen)
> > diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> > new file mode 100644
> > index 000000000000..493ab6febcb2
> > --- /dev/null
> > +++ b/arch/riscv/lib/strncmp.S
> > @@ -0,0 +1,40 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strncmp(const char *cs, const char *ct, size_t count) */
> > +SYM_FUNC_START(strncmp)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strncmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *   a2 - number of characters to compare
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2
> > +	 */
> > +	li	t0, 0
> > +1:
> > +	beq	a2, t0, 4f
> > +	add	t1, a0, t0
> > +	add	t2, a1, t0
> > +	lbu	t1, 0(t1)
> > +	lbu	t2, 0(t2)
> > +	beq	t1, t2, 3f
> > +	li	a0, 1
> > +	bgeu	t1, t2, 2f
> > +	li	a0, -1
> > +2:
> > +	ret
> > +3:
> > +	addi	t0, t0, 1
> > +	bnez	t1, 1b
> > +4:
> > +	li	a0, 0
> > +	j	2b
> > +SYM_FUNC_END(strncmp)
> > diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
> > index dd58e1d99397..d16bf715a586 100644
> > --- a/arch/riscv/purgatory/Makefile
> > +++ b/arch/riscv/purgatory/Makefile
> > @@ -2,6 +2,7 @@
> >  OBJECT_FILES_NON_STANDARD := y
> >  
> >  purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
> > +purgatory-y += strcmp.o strlen.o strncmp.o
> >  
> >  targets += $(purgatory-y)
> >  PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> > @@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
> >  $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
> >  	$(call if_changed_rule,as_o_S)
> >  
> > +$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> > +$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> > +$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> >  $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
> >  	$(call if_changed_rule,cc_o_c)
> >  
> > @@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
> >  AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
> >  AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
> >  AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
> >  
> >  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> >  		$(call if_changed,ld)
> 





_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
  2023-01-09 20:53   ` Conor Dooley
@ 2023-01-10  8:32   ` Andrew Jones
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-10  8:32 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:51PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
> the kprobes implementation. This got moved into a central header in
> commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header").
> 
> Though it looks like I overlooked two of them, so fix that. FENCE itself is
> an instruction defined directly by its own opcode, while the created
> riscv_isn_is_system function covers all instructions defined under the SYSTEM
> opcode.
> 
> Fixes: ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header")
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/insn.h            | 10 ++++++++++
>  arch/riscv/kernel/probes/simulate-insn.h |  3 ---
>  2 files changed, 10 insertions(+), 3 deletions(-)
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling
  2023-01-09 18:17 ` [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling Heiko Stuebner
  2023-01-09 22:22   ` Conor Dooley
@ 2023-01-10  8:44   ` Andrew Jones
  2023-01-10  8:54     ` Conor Dooley
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-10  8:44 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:52PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Similar to the helper for the u-type + i-type imm handling, add helper-
> functions for j-type immediates.
> 
> While it also would be possible to open-code that bit of imm wiggling
> using the macros already provided in insn.h, it's way better to have
> consistency on how we handle this across all function encoding/decoding.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/insn.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/insn.h b/arch/riscv/include/asm/insn.h
> index 0455b4dcb0a7..9eea61a3028f 100644
> --- a/arch/riscv/include/asm/insn.h
> +++ b/arch/riscv/include/asm/insn.h
> @@ -301,6 +301,32 @@ static __always_inline bool riscv_insn_is_branch(u32 code)
>  	(RVC_X(x_, RVC_B_IMM_7_6_OPOFF, RVC_B_IMM_7_6_MASK) << RVC_B_IMM_7_6_OFF) | \
>  	(RVC_IMM_SIGN(x_) << RVC_B_IMM_SIGN_OFF); })
>  
> +/*
> + * Get the immediate from a J-type instruction.
> + *
> + * @insn: instruction to process
> + * Return: immediate
> + */
> +static inline s32 riscv_insn_extract_jtype_imm(u32 insn)
> +{
> +	return RV_EXTRACT_JTYPE_IMM(insn);
> +}
> +
> +/*
> + * Update a J-type instruction with an immediate value.
> + *
> + * @insn: pointer to the jtype instruction
> + * @imm: the immediate to insert into the instruction
> + */
> +static inline void riscv_insn_insert_jtype_imm(u32 *insn, s32 imm)
> +{
> +	*insn &= ~GENMASK(31, 12);
> +	*insn |= (((imm & (RV_J_IMM_10_1_MASK << RV_J_IMM_10_1_OFF)) << RV_I_IMM_11_0_OPOFF) |
                                                                        ^ RV_J_IMM_10_1_OPOFF
									(as pointed out by Conor)

> +		  ((imm & (RV_J_IMM_11_MASK << RV_J_IMM_11_OFF)) << RV_J_IMM_11_OPOFF) |
> +		  ((imm & (RV_J_IMM_19_12_OPOFF << RV_J_IMM_19_12_OFF)) << RV_J_IMM_19_12_OPOFF) |
                           ^ RV_J_IMM_19_12_MASK


> +		  ((imm & (1 << RV_J_IMM_SIGN_OFF)) << RV_J_IMM_SIGN_OPOFF));

nit: can drop the outermost ()

> +}
> +
>  /*
>   * Put together one immediate from a U-type and I-type instruction pair.
>   *
> -- 
> 2.35.1
> 

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling
  2023-01-10  8:44   ` Andrew Jones
@ 2023-01-10  8:54     ` Conor Dooley
  2023-01-11 14:43       ` Jisheng Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Conor Dooley @ 2023-01-10  8:54 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 600 bytes --]

On Tue, Jan 10, 2023 at 09:44:25AM +0100, Andrew Jones wrote:
> > +	*insn |= (((imm & (RV_J_IMM_10_1_MASK << RV_J_IMM_10_1_OFF)) << RV_I_IMM_11_0_OPOFF) |
>                                                                         ^ RV_J_IMM_10_1_OPOFF
> 									(as pointed out by Conor)

> > +		  ((imm & (RV_J_IMM_11_MASK << RV_J_IMM_11_OFF)) << RV_J_IMM_11_OPOFF) |
> > +		  ((imm & (RV_J_IMM_19_12_OPOFF << RV_J_IMM_19_12_OFF)) << RV_J_IMM_19_12_OPOFF) |
>                            ^ RV_J_IMM_19_12_MASK

You'd think that having seen one I'd have picked up on the other one
being wrong too...


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-09 18:17 ` [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives Heiko Stuebner
@ 2023-01-10  9:28   ` Andrew Jones
  2023-01-11 17:15     ` Jisheng Zhang
  2023-01-11 13:18   ` Jisheng Zhang
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-10  9:28 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Alternatives live in a different section, so addresses used by jal
> instructions will point to wrong locations after the patch got applied.
> 
> Similar to arm64, adjust the location to consider that offset.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/kernel/alternative.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
> index 6212ea0eed72..985c284fe9f4 100644
> --- a/arch/riscv/kernel/alternative.c
> +++ b/arch/riscv/kernel/alternative.c
> @@ -79,6 +79,21 @@ static void riscv_alternative_fix_auipc_jalr(void *ptr, u32 auipc_insn,
>  	patch_text_nosync(ptr, call, sizeof(u32) * 2);
>  }
>  
> +static void riscv_alternative_fix_jal(void *ptr, u32 jal_insn, int patch_offset)
> +{
> +	s32 imm;
> +
> +	/* get and adjust new target address */
> +	imm = riscv_insn_extract_jtype_imm(jal_insn);
> +	imm -= patch_offset;
> +
> +	/* update instructions */

instruction

> +	riscv_insn_insert_jtype_imm(&jal_insn, imm);
> +
> +	/* patch the call place again */
> +	patch_text_nosync(ptr, &jal_insn, sizeof(u32));
> +}
> +
>  void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
>  				      int patch_offset)
>  {
> @@ -106,6 +121,18 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
>  			riscv_alternative_fix_auipc_jalr(alt_ptr + i * sizeof(u32),
>  							 insn, insn2, patch_offset);

I think we should add an i++ here. There's not a problem now, since we're
only adding a fixup for jal, not jalr, but we should future-proof this and
there's no reason to revisit an already fixed-up instruction anyway.

>  		}
> +
> +		if (riscv_insn_is_jal(insn)) {
> +			s32 imm = riscv_insn_extract_jtype_imm(insn);
> +
> +			/* don't modify jumps inside the alternative block */

Don't

> +			if ((alt_ptr + i * sizeof(u32) + imm) >= alt_ptr &&
> +			    (alt_ptr + i * sizeof(u32) + imm) < (alt_ptr + len))
> +				continue;
> +
> +			riscv_alternative_fix_jal(alt_ptr + i * sizeof(u32),
> +						  insn, patch_offset);
> +		}
>  	}
>  }
>  
> -- 
> 2.35.1
>

Otherwise

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
  2023-01-09 22:37   ` Conor Dooley
@ 2023-01-10  9:39   ` Andrew Jones
  2023-01-10 10:46     ` Heiko Stübner
  2023-01-10 12:13   ` Andrew Jones
  2 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-10  9:39 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Depending on supported extensions on specific RISC-V cores,
> optimized str* functions might make sense.
> 
> This adds basic infrastructure to allow patching the function calls
> via alternatives later on.
> 
> The Linux kernel provides standard implementations for string functions
> but when architectures want to extend them, they need to provide their
> own.

And the compiler provides builtins. In the previous series it appeared
to be a bad idea to compile the kernel with the compiler's builtins
disabled. How will the optimized string functions which will be based
on this patch be selected?

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
  2023-01-09 20:39   ` Conor Dooley
@ 2023-01-10  9:57   ` Andrew Jones
  2023-01-10 10:14     ` Conor Dooley
  2023-01-11 12:24   ` Andrew Jones
  2 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-10  9:57 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515
> 
> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/Kconfig                   |  24 ++++++
>  arch/riscv/include/asm/errata_list.h |   3 +-
>  arch/riscv/include/asm/hwcap.h       |   1 +
>  arch/riscv/include/asm/string.h      |   2 +
>  arch/riscv/kernel/cpu.c              |   1 +
>  arch/riscv/kernel/cpufeature.c       |  18 +++++
>  arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
>  arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
>  9 files changed, 367 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..7c814fbf9527 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
>  
>  	   If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBB
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> +	depends on AS_IS_GNU
> +
> +config RISCV_ISA_ZBB
> +	bool "Zbb extension support for bit manipulation instructions"
> +	depends on TOOLCHAIN_HAS_ZBB
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBB
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zbb extension provides instructions to accelerate a number
> +	   of bit-specific operations (count bit population, sign extending,
> +	   bitrotation, etc).
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZICBOM
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..95e626b7281e 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -24,7 +24,8 @@
>  
>  #define	CPUFEATURE_SVPBMT 0
>  #define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_ZBB 2
> +#define	CPUFEATURE_NUMBER 3
>  
>  #ifdef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 86328e3acb02..b727491fb100 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
>  	RISCV_ISA_EXT_SSTC,
>  	RISCV_ISA_EXT_SVINVAL,
> +	RISCV_ISA_EXT_ZBB,
>  	RISCV_ISA_EXT_ID_MAX
>  };
>  static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index a96b1fea24fe..17dfc4ab4c80 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -6,6 +6,8 @@
>  #ifndef _ASM_RISCV_STRING_H
>  #define _ASM_RISCV_STRING_H
>  
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>

Why are these includes getting added? Shouldn't they be getting directly
added to whatever is including asm/string.h instead?

>  #include <linux/types.h>
>  #include <linux/linkage.h>
>  
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 1b9a5a66e55a..c4d1aa166f8b 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
>   *    extensions by an underscore.
>   */
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
> +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),

Huh, this array still doesn't appear to be in order... Zbb should
be getting inserted between the Zi* extensions (which should be first)
and the S* extensions and each of those three categories should be
in alphabetical order.

> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 205bbd6b1fce..bf3a791d7110 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
>  					set_bit(nr, this_isa);
>  				}
>  			} else {
> +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
>  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
>  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
>  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);

I think we wanted this one in alphabetical order...

> @@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
>  	return true;
>  }
>  
> +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	if (!riscv_isa_extension_available(NULL, ZBB))
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * Probe presence of individual extensions.
>   *
> @@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  	if (cpufeature_probe_zicbom(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
>  
> +	if (cpufeature_probe_zbb(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> +
>  	return cpu_req_feature;
>  }
>

I'm skipping the asm review for now since I'm still not clear on the plan
for non-optimized string functions vs. optimized vs. compiler builtins.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-10  9:57   ` Andrew Jones
@ 2023-01-10 10:14     ` Conor Dooley
  2023-01-12 11:21       ` Heiko Stübner
  0 siblings, 1 reply; 42+ messages in thread
From: Conor Dooley @ 2023-01-10 10:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, jszhang, Heiko Stuebner


[-- Attachment #1.1: Type: text/plain, Size: 2226 bytes --]

On Tue, Jan 10, 2023 at 10:57:20AM +0100, Andrew Jones wrote:
> On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:

> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index 1b9a5a66e55a..c4d1aa166f8b 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
> >   *    extensions by an underscore.
> >   */
> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> >  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> 
> Huh, this array still doesn't appear to be in order... Zbb should
> be getting inserted between the Zi* extensions (which should be first)
> and the S* extensions and each of those three categories should be
> in alphabetical order.

Correct. The new entry was at least added in the right place, reordering
existing entries aside.

> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 205bbd6b1fce..bf3a791d7110 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
> >  					set_bit(nr, this_isa);
> >  				}
> >  			} else {
> > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> 
> I think we wanted this one in alphabetical order...

Correct again.

I've been avoiding mentioning this stuff in reviews though as Palmer has
not yet picked up the patches [0] putting these arrays into those orders
in the first place.

I tried to harass him about them last night, but he didn't get around to
them. Perhaps worth mentioning tomorrow if we're gonna keep having to
discussing these lists in reviews?

Thanks,
Conor.

0 - https://lore.kernel.org/all/20221205144525.2148448-1-conor.dooley@microchip.com/
(I tried applying it yesterday, worked with `git am -3`)

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-10  9:39   ` Andrew Jones
@ 2023-01-10 10:46     ` Heiko Stübner
  2023-01-10 11:16       ` Andrew Jones
  0 siblings, 1 reply; 42+ messages in thread
From: Heiko Stübner @ 2023-01-10 10:46 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

Hi Andrew,

Am Dienstag, 10. Januar 2023, 10:39:36 CET schrieb Andrew Jones:
> On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Depending on supported extensions on specific RISC-V cores,
> > optimized str* functions might make sense.
> > 
> > This adds basic infrastructure to allow patching the function calls
> > via alternatives later on.
> > 
> > The Linux kernel provides standard implementations for string functions
> > but when architectures want to extend them, they need to provide their
> > own.
> 
> And the compiler provides builtins. In the previous series it appeared
> to be a bad idea to compile the kernel with the compiler's builtins
> disabled. How will the optimized string functions which will be based
> on this patch be selected?

yep, the consensus seemingly was that the compiler knows best when
to use builtins for some cases (which is probably correct), hence the move
away from the inline bases.

So I guess the first decision is the compiler's wether to use a builtin or
the kernel string function (same as for mem*) .

In my tests, I did see both getting used - so it's definitly not lost work :-) .

After that when landing in these here, we want to select the best variant
for the host-system the kernel runs on.

I.e. this one as baseline or for example using zbb as an "alternative".

As for the "more" variants, I currently have more patches on top, that
then use an ALTERNATIVE_2

ALTERNATIVE_2("nop",
      "j variant_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
      "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)

with the "errata_id" being used as a bitfield to use extension combinations.
And a "not"-field, so I can do a has-zbb + has-not-fast-unaligned


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-10 10:46     ` Heiko Stübner
@ 2023-01-10 11:16       ` Andrew Jones
  2023-01-11 12:34         ` Andrew Jones
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-10 11:16 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

On Tue, Jan 10, 2023 at 11:46:40AM +0100, Heiko Stübner wrote:
> Hi Andrew,
> 
> Am Dienstag, 10. Januar 2023, 10:39:36 CET schrieb Andrew Jones:
> > On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > 
> > > Depending on supported extensions on specific RISC-V cores,
> > > optimized str* functions might make sense.
> > > 
> > > This adds basic infrastructure to allow patching the function calls
> > > via alternatives later on.
> > > 
> > > The Linux kernel provides standard implementations for string functions
> > > but when architectures want to extend them, they need to provide their
> > > own.
> > 
> > And the compiler provides builtins. In the previous series it appeared
> > to be a bad idea to compile the kernel with the compiler's builtins
> > disabled. How will the optimized string functions which will be based
> > on this patch be selected?
> 
> yep, the consensus seemingly was that the compiler knows best when
> to use builtins for some cases (which is probably correct), hence the move
> away from the inline bases.
> 
> So I guess the first decision is the compiler's wether to use a builtin or
> the kernel string function (same as for mem*) .
> 
> In my tests, I did see both getting used - so it's definitly not lost work :-) .
> 
> After that when landing in these here, we want to select the best variant
> for the host-system the kernel runs on.
> 
> I.e. this one as baseline or for example using zbb as an "alternative".
> 
> As for the "more" variants, I currently have more patches on top, that
> then use an ALTERNATIVE_2
> 
> ALTERNATIVE_2("nop",
>       "j variant_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
>       "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> 
> with the "errata_id" being used as a bitfield to use extension combinations.
> And a "not"-field, so I can do a has-zbb + has-not-fast-unaligned

Thanks, Heiko. This was the info I needed. It might be nice to put some of
it in the commit message too.

drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
  2023-01-09 22:37   ` Conor Dooley
  2023-01-10  9:39   ` Andrew Jones
@ 2023-01-10 12:13   ` Andrew Jones
  2023-01-11 12:30     ` Andrew Jones
  2023-01-12 16:05     ` Heiko Stübner
  2 siblings, 2 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-10 12:13 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Depending on supported extensions on specific RISC-V cores,
> optimized str* functions might make sense.
> 
> This adds basic infrastructure to allow patching the function calls
> via alternatives later on.
> 
> The Linux kernel provides standard implementations for string functions
> but when architectures want to extend them, they need to provide their
> own.
> 
> The added generic string functions are done in assembler (taken from
> disassembling the main-kernel functions for now) to allow us to control
> the used registers and extend them with optimized variants.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/include/asm/string.h | 10 +++++++++
>  arch/riscv/kernel/riscv_ksyms.c |  3 +++
>  arch/riscv/lib/Makefile         |  3 +++
>  arch/riscv/lib/strcmp.S         | 37 ++++++++++++++++++++++++++++++
>  arch/riscv/lib/strlen.S         | 28 +++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S        | 40 +++++++++++++++++++++++++++++++++
>  arch/riscv/purgatory/Makefile   | 13 +++++++++++
>  7 files changed, 134 insertions(+)
>  create mode 100644 arch/riscv/lib/strcmp.S
>  create mode 100644 arch/riscv/lib/strlen.S
>  create mode 100644 arch/riscv/lib/strncmp.S
> 
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index 909049366555..a96b1fea24fe 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
>  #define __HAVE_ARCH_MEMMOVE
>  extern asmlinkage void *memmove(void *, const void *, size_t);
>  extern asmlinkage void *__memmove(void *, const void *, size_t);
> +
> +#define __HAVE_ARCH_STRCMP
> +extern asmlinkage int strcmp(const char *cs, const char *ct);
> +
> +#define __HAVE_ARCH_STRLEN
> +extern asmlinkage __kernel_size_t strlen(const char *);
> +
> +#define __HAVE_ARCH_STRNCMP
> +extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
> +
>  /* For those files which don't want to check by kasan. */
>  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
> index 5ab1c7e1a6ed..a72879b4249a 100644
> --- a/arch/riscv/kernel/riscv_ksyms.c
> +++ b/arch/riscv/kernel/riscv_ksyms.c
> @@ -12,6 +12,9 @@
>  EXPORT_SYMBOL(memset);
>  EXPORT_SYMBOL(memcpy);
>  EXPORT_SYMBOL(memmove);
> +EXPORT_SYMBOL(strcmp);
> +EXPORT_SYMBOL(strlen);
> +EXPORT_SYMBOL(strncmp);
>  EXPORT_SYMBOL(__memset);
>  EXPORT_SYMBOL(__memcpy);
>  EXPORT_SYMBOL(__memmove);
> diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> index 25d5c9664e57..6c74b0bedd60 100644
> --- a/arch/riscv/lib/Makefile
> +++ b/arch/riscv/lib/Makefile
> @@ -3,6 +3,9 @@ lib-y			+= delay.o
>  lib-y			+= memcpy.o
>  lib-y			+= memset.o
>  lib-y			+= memmove.o
> +lib-y			+= strcmp.o
> +lib-y			+= strlen.o
> +lib-y			+= strncmp.o
>  lib-$(CONFIG_MMU)	+= uaccess.o
>  lib-$(CONFIG_64BIT)	+= tishift.o
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> new file mode 100644
> index 000000000000..94440fb8390c
> --- /dev/null
> +++ b/arch/riscv/lib/strcmp.S
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strcmp(const char *cs, const char *ct) */
> +SYM_FUNC_START(strcmp)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strcmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2
> +	 */
> +	mv	t2, a1

The above instruction and the 'mv a1, t2' below appear to be attempting
to preserve a1, but that shouldn't be necessary.

> +1:
> +	lbu	t1, 0(a0)
> +	lbu	t0, 0(a1)

I'd rather have t0 be 0(a0) and t1 be 0(a1)

> +	addi	a0, a0, 1
> +	addi	a1, a1, 1
> +	beq	t1, t0, 3f
> +	li	a0, 1
> +	bgeu	t1, t0, 2f
> +	li	a0, -1
> +2:
> +	mv	a1, t2
> +	ret
> +3:
> +	bnez	t1, 1b
> +	li	a0, 0
> +	j	2b

For fun I removed one conditional and one unconditional branch (untested)

1:
     lbu     t0, 0(a0)
     lbu     t1, 0(a1)
     addi    a0, a0, 1
     addi    a1, a1, 1
     bne     t0, t1, 2f
     bnez    t0, 1b
     li      a0, 0
     ret
2:
     slt     a1, t1, t0
     slli    a1, a1, 1
     li      a0, -1
     add     a0, a0, a1
     ret


> +SYM_FUNC_END(strcmp)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> new file mode 100644
> index 000000000000..09a7aaff26c8
> --- /dev/null
> +++ b/arch/riscv/lib/strlen.S
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strlen(const char *s) */
> +SYM_FUNC_START(strlen)
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers:
> +	 *   t0, t1
> +	 */
> +	mv	t1, a0
> +1:
> +	lbu	t0, 0(t1)
> +	bnez	t0, 2f
> +	sub	a0, t1, a0
> +	ret
> +2:
> +	addi	t1, t1, 1
> +	j	1b

Slightly reorganizing looks better (to me)

   mv    t1, a0
1:
   lbu   t0, 0(t1)
   beqz  t0, 2f
   addi  t1, t1, 1
   j     1b
2:
   sub a0, t1, a0
   ret

> +SYM_FUNC_END(strlen)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> new file mode 100644
> index 000000000000..493ab6febcb2
> --- /dev/null
> +++ b/arch/riscv/lib/strncmp.S
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <linux/linkage.h>
> +#include <asm/asm.h>
> +#include <asm-generic/export.h>
> +
> +/* int strncmp(const char *cs, const char *ct, size_t count) */
> +SYM_FUNC_START(strncmp)
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2
> +	 */
> +	li	t0, 0
> +1:
> +	beq	a2, t0, 4f
> +	add	t1, a0, t0
> +	add	t2, a1, t0
> +	lbu	t1, 0(t1)
> +	lbu	t2, 0(t2)
> +	beq	t1, t2, 3f
> +	li	a0, 1
> +	bgeu	t1, t2, 2f
> +	li	a0, -1
> +2:
> +	ret
> +3:
> +	addi	t0, t0, 1
> +	bnez	t1, 1b
> +4:
> +	li	a0, 0
> +	j	2b

(untested)

     li      t2, 0
1:
     beq     a2, t2, 2f
     lbu     t0, 0(a0)
     lbu     t1, 0(a1)
     addi    a0, a0, 1
     addi    a1, a1, 1
     bne     t0, t1, 3f
     addi    t2, t2, 1
     bnez    t0, 1b
2:
     li      a0, 0
     ret
3:
     slt     a1, t1, t0
     slli    a1, a1, 1
     li      a0, -1
     add     a0, a0, a1
     ret

> +SYM_FUNC_END(strncmp)
> diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
> index dd58e1d99397..d16bf715a586 100644
> --- a/arch/riscv/purgatory/Makefile
> +++ b/arch/riscv/purgatory/Makefile
> @@ -2,6 +2,7 @@
>  OBJECT_FILES_NON_STANDARD := y
>  
>  purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
> +purgatory-y += strcmp.o strlen.o strncmp.o
>  
>  targets += $(purgatory-y)
>  PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> @@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
>  $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
>  	$(call if_changed_rule,as_o_S)
>  
> +$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
> +$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
> +$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
> +	$(call if_changed_rule,as_o_S)
> +
>  $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
>  	$(call if_changed_rule,cc_o_c)
>  
> @@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
>  AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
>  AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
>  AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
>  
>  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
>  		$(call if_changed,ld)
> -- 
> 2.35.1
>

With at least the removal of the unnecessary preserving of a1 in strcmp,
then it looks correct to me, so

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

But I think there's room for making it more readable, and maybe even
optimized, as I've tried to do.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
  2023-01-09 20:39   ` Conor Dooley
  2023-01-10  9:57   ` Andrew Jones
@ 2023-01-11 12:24   ` Andrew Jones
  2023-01-11 14:27     ` Christoph Müllner
  2023-01-12 22:05     ` Heiko Stübner
  2 siblings, 2 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 12:24 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Add handling for ZBB extension and add support for using it as a
> variant for optimized string functions.
> 
> Support for the Zbb-str-variants is limited to the GNU-assembler
> for now, as LLVM has not yet acquired the functionality to
> selectively change the arch option in assembler code.
> This is still under review at
>     https://reviews.llvm.org/D123515
> 
> Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/Kconfig                   |  24 ++++++
>  arch/riscv/include/asm/errata_list.h |   3 +-
>  arch/riscv/include/asm/hwcap.h       |   1 +
>  arch/riscv/include/asm/string.h      |   2 +
>  arch/riscv/kernel/cpu.c              |   1 +
>  arch/riscv/kernel/cpufeature.c       |  18 +++++
>  arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
>  arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
>  arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
>  9 files changed, 367 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..7c814fbf9527 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
>  
>  	   If you don't know what to do here, say Y.
>  
> +config TOOLCHAIN_HAS_ZBB
> +	bool
> +	default y
> +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> +	depends on AS_IS_GNU
> +
> +config RISCV_ISA_ZBB
> +	bool "Zbb extension support for bit manipulation instructions"
> +	depends on TOOLCHAIN_HAS_ZBB
> +	depends on !XIP_KERNEL && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	   Adds support to dynamically detect the presence of the ZBB
> +	   extension (basic bit manipulation) and enable its usage.
> +
> +	   The Zbb extension provides instructions to accelerate a number
> +	   of bit-specific operations (count bit population, sign extending,
> +	   bitrotation, etc).
> +
> +	   If you don't know what to do here, say Y.
> +
>  config TOOLCHAIN_HAS_ZICBOM
>  	bool
>  	default y
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..95e626b7281e 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -24,7 +24,8 @@
>  
>  #define	CPUFEATURE_SVPBMT 0
>  #define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_ZBB 2
> +#define	CPUFEATURE_NUMBER 3
>  
>  #ifdef __ASSEMBLY__
>  
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 86328e3acb02..b727491fb100 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
>  	RISCV_ISA_EXT_SSTC,
>  	RISCV_ISA_EXT_SVINVAL,
> +	RISCV_ISA_EXT_ZBB,
>  	RISCV_ISA_EXT_ID_MAX
>  };
>  static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
> diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> index a96b1fea24fe..17dfc4ab4c80 100644
> --- a/arch/riscv/include/asm/string.h
> +++ b/arch/riscv/include/asm/string.h
> @@ -6,6 +6,8 @@
>  #ifndef _ASM_RISCV_STRING_H
>  #define _ASM_RISCV_STRING_H
>  
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  #include <linux/types.h>
>  #include <linux/linkage.h>
>  
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 1b9a5a66e55a..c4d1aa166f8b 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
>   *    extensions by an underscore.
>   */
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
> +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 205bbd6b1fce..bf3a791d7110 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
>  					set_bit(nr, this_isa);
>  				}
>  			} else {
> +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
>  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
>  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
>  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> @@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
>  	return true;
>  }
>  
> +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	if (!riscv_isa_extension_available(NULL, ZBB))
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * Probe presence of individual extensions.
>   *
> @@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  	if (cpufeature_probe_zicbom(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
>  
> +	if (cpufeature_probe_zbb(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> +
>  	return cpu_req_feature;
>  }
>  
> diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> index 94440fb8390c..5428a8f2eb84 100644
> --- a/arch/riscv/lib/strcmp.S
> +++ b/arch/riscv/lib/strcmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strcmp(const char *cs, const char *ct) */
>  SYM_FUNC_START(strcmp)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)

How about strcmp_zbb and similar for the other functions for the labels?

> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strcmp
> @@ -34,4 +39,93 @@ SYM_FUNC_START(strcmp)
>  	bnez	t1, 1b
>  	li	a0, 0
>  	j	2b
> +
> +/*
> + * Variant of strcmp using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +#define src1		a0
> +#define result		a0
> +#define src2		t5
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define m1		t4

nit: It's probably just me, but I'm not a big fan of creating defines
for registers, particularly when the meaning of the registers' contents
gets changed, because then the name, at least temporarily, no longer
accurately describes the content.

And, in this case, it looks like the code matches Appendix A, except the
register and label naming. I'd prefer to have those match too so I could
"review" with the diff tool.

> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, value like strcmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	bnez	align, 3f
> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3

Why are the starts of the loops aligned to 8 byte boundaries?

> +1:
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal. */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings. */
> +	.p2align 3
> +3:
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 4f
> +	bnez	data1, 3b
> +
> +4:
> +	sub	result, data1, data2

The non-optimized version returns explicitly -1, 0, 1, whereas this
returns < 0, 0, > 0. Assuming we don't need the explicit -1 / 1 then we
can change the non-optimized version to do this subtraction for its
return value too, saving several instructions.

> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strcmp)
> diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> index 09a7aaff26c8..738efb04307d 100644
> --- a/arch/riscv/lib/strlen.S
> +++ b/arch/riscv/lib/strlen.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strlen(const char *s) */
>  SYM_FUNC_START(strlen)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - string length
> @@ -25,4 +30,113 @@ SYM_FUNC_START(strlen)
>  2:
>  	addi	t1, t1, 1
>  	j	1b
> +
> +/*
> + * Variant of strlen using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +
> +#define src		a0
> +#define result		a0
> +#define addr		t0
> +#define data		t1
> +#define offset		t2
> +#define offset_bits	t2
> +#define valid_bytes	t3
> +#define m1		t3
> +
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +# define CZ	clz
> +# define SHIFT	sll
> +#else
> +# define CZ	ctz
> +# define SHIFT	srl
> +#endif
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - string length
> +	 *
> +	 * Parameters
> +	 *   a0 - String to measure
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3
> +	 */
> +
> +	/* Number of irrelevant bytes in the first word. */

I see that this comment doesn't just apply to the next instruction,
but to several instructions. It threw me off at first. Again, I
think I'd prefer we just copy+paste the Appendix, comments an all.

> +	andi	offset, src, SZREG-1
> +
> +	/* Align pointer. */
> +	andi	addr, src, -SZREG
> +
> +	li	valid_bytes, SZREG
> +	sub	valid_bytes, valid_bytes, offset
> +	slli	offset_bits, offset, RISCV_LGPTR

The shift immediate should be 3, even for rv32, since we want
bits-per-byte, not bytes-per-word. The spec example uses PTRLOG, which
does imply bytes-per-word to me as well, so that seems like a spec bug.

> +
> +	/* Get the first word.  */
> +	REG_L	data, 0(addr)
> +
> +	/*
> +	 * Shift away the partial data we loaded to remove the irrelevant bytes
> +	 * preceding the string with the effect of adding NUL bytes at the
> +	 * end of the string.

the end of the string's first word.

> +	 */
> +	SHIFT	data, data, offset_bits
> +
> +	/* Convert non-NUL into 0xff and NUL into 0x00. */
> +	orc.b	data, data
> +
> +	/* Convert non-NUL into 0x00 and NUL into 0xff. */
> +	not	data, data
> +
> +	/*
> +	 * Search for the first set bit (corresponding to a NUL byte in the
> +	 * original chunk).
> +	 */
> +	CZ	data, data
> +
> +	/*
> +	 * The first chunk is special: commpare against the number

compare

> +	 * of valid bytes in this chunk.
> +	 */
> +	srli	result, data, 3
> +	bgtu	valid_bytes, result, 3f
> +
> +	/* Prepare for the word comparison loop. */
> +	addi	offset, addr, SZREG
> +	li	m1, -1
> +
> +	/*
> +	 * Our critical loop is 4 instructions and processes data in
> +	 * 4 byte or 8 byte chunks.
> +	 */
> +	.p2align 3
> +1:
> +	REG_L	data, SZREG(addr)
> +	addi	addr, addr, SZREG
> +	orc.b	data, data
> +	beq	data, m1, 1b
> +2:
> +	not	data, data
> +	CZ	data, data
> +
> +	/* Get number of processed words.  */
> +	sub	offset, addr, offset
> +
> +	/* Add number of characters in the first word.  */
> +	add	result, result, offset
> +	srli	data, data, 3
> +
> +	/* Add number of characters in the last word.  */
> +	add	result, result, data
> +3:
> +	ret
> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strlen)
> diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> index 493ab6febcb2..851428b439dc 100644
> --- a/arch/riscv/lib/strncmp.S
> +++ b/arch/riscv/lib/strncmp.S
> @@ -3,9 +3,14 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm-generic/export.h>
> +#include <asm/alternative-macros.h>
> +#include <asm/errata_list.h>
>  
>  /* int strncmp(const char *cs, const char *ct, size_t count) */
>  SYM_FUNC_START(strncmp)
> +
> +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> +
>  	/*
>  	 * Returns
>  	 *   a0 - comparison result, value like strncmp
> @@ -37,4 +42,110 @@ SYM_FUNC_START(strncmp)
>  4:
>  	li	a0, 0
>  	j	2b
> +
> +/*
> + * Variant of strncmp using the ZBB extension if available
> + */
> +#ifdef CONFIG_RISCV_ISA_ZBB
> +variant_zbb:
> +
> +#define src1		a0
> +#define result		a0
> +#define src2		t6
> +#define len		a2
> +#define data1		t0
> +#define data2		t1
> +#define align		t2
> +#define data1_orcb	t3
> +#define limit		t4
> +#define m1		t5
> +
> +.option push
> +.option arch,+zbb
> +
> +	/*
> +	 * Returns
> +	 *   a0 - comparison result, like strncmp
> +	 *
> +	 * Parameters
> +	 *   a0 - string1
> +	 *   a1 - string2
> +	 *   a2 - number of characters to compare
> +	 *
> +	 * Clobbers
> +	 *   t0, t1, t2, t3, t4, t5, t6
> +	 */
> +	mv	src2, a1
> +
> +	or	align, src1, src2
> +	li	m1, -1
> +	and	align, align, SZREG-1
> +	add	limit, src1, len
> +	bnez	align, 4f
> +
> +	/* Adjust limit for fast-path.  */
> +	addi	limit, limit, -SZREG

To avoid unnecessarily writing the last SZREG bytes one at a time when
'len' is SZREG aligned, we should be using the explicitly aligned limit
here (limit & ~(SZREG - 1)) rather than just subtracting SZREG. Then, down
in the slow-path we use the original limit to finish off if len wasn't
aligned. When it was aligned, we'll just immediately branch to the ret.
(I suspect many times 'len' will be SZREG aligned, since string buffers
are typically allocated with power-of-two sizes).

> +
> +	/* Main loop for aligned string.  */
> +	.p2align 3
> +1:
> +	bgt	src1, limit, 3f
> +	REG_L	data1, 0(src1)
> +	REG_L	data2, 0(src2)
> +	orc.b	data1_orcb, data1
> +	bne	data1_orcb, m1, 2f
> +	addi	src1, src1, SZREG
> +	addi	src2, src2, SZREG
> +	beq	data1, data2, 1b
> +
> +	/*
> +	 * Words don't match, and no null byte in the first
> +	 * word. Get bytes in big-endian order and compare.
> +	 */
> +#ifndef CONFIG_CPU_BIG_ENDIAN
> +	rev8	data1, data1
> +	rev8	data2, data2
> +#endif
> +
> +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> +	sltu	result, data1, data2
> +	neg	result, result
> +	ori	result, result, 1
> +	ret
> +
> +2:
> +	/*
> +	 * Found a null byte.
> +	 * If words don't match, fall back to simple loop.
> +	 */
> +	bne	data1, data2, 3f
> +
> +	/* Otherwise, strings are equal.  */
> +	li	result, 0
> +	ret
> +
> +	/* Simple loop for misaligned strings.  */
> +3:
> +	/* Restore limit for slow-path.  */
> +	addi	limit, limit, SZREG
> +	.p2align 3
> +4:
> +	bge	src1, limit, 6f
> +	lbu	data1, 0(src1)
> +	lbu	data2, 0(src2)
> +	addi	src1, src1, 1
> +	addi	src2, src2, 1
> +	bne	data1, data2, 5f
> +	bnez	data1, 4b
> +
> +5:
> +	sub	result, data1, data2
> +	ret
> +
> +6:
> +	li	result, 0
> +	ret

As strcmp and strncmp have so much in common it's tempting to try and
merge them somehow. Maybe with a handful of macros shared between them?

> +
> +.option pop
> +#endif
>  SYM_FUNC_END(strncmp)
> -- 
> 2.35.1
> 

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-10 12:13   ` Andrew Jones
@ 2023-01-11 12:30     ` Andrew Jones
  2023-01-12 16:05     ` Heiko Stübner
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 12:30 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Tue, Jan 10, 2023 at 01:13:20PM +0100, Andrew Jones wrote:
> On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
...
> > +
> > +/* int strcmp(const char *cs, const char *ct) */
> > +SYM_FUNC_START(strcmp)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strcmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2
> > +	 */
> > +	mv	t2, a1
> 
> The above instruction and the 'mv a1, t2' below appear to be attempting
> to preserve a1, but that shouldn't be necessary.
> 
> > +1:
> > +	lbu	t1, 0(a0)
> > +	lbu	t0, 0(a1)
> 
> I'd rather have t0 be 0(a0) and t1 be 0(a1)
> 
> > +	addi	a0, a0, 1
> > +	addi	a1, a1, 1
> > +	beq	t1, t0, 3f
> > +	li	a0, 1
> > +	bgeu	t1, t0, 2f
> > +	li	a0, -1
> > +2:
> > +	mv	a1, t2
> > +	ret
> > +3:
> > +	bnez	t1, 1b
> > +	li	a0, 0
> > +	j	2b
> 
> For fun I removed one conditional and one unconditional branch (untested)
> 
> 1:
>      lbu     t0, 0(a0)
>      lbu     t1, 0(a1)
>      addi    a0, a0, 1
>      addi    a1, a1, 1
>      bne     t0, t1, 2f
>      bnez    t0, 1b
>      li      a0, 0
>      ret
> 2:
>      slt     a1, t1, t0
>      slli    a1, a1, 1
>      li      a0, -1
>      add     a0, a0, a1

If we just need < 0 and > 0 then the above four instructions can become

       sub     a0, t0, t1

>      ret
>

Similar comment for strncmp.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-10 11:16       ` Andrew Jones
@ 2023-01-11 12:34         ` Andrew Jones
       [not found]           ` <CAEg0e7gJgpoiGjfLeedba0-r=dCE1Z_qkU53w_+-cVjsuqaC3A@mail.gmail.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 12:34 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

On Tue, Jan 10, 2023 at 12:16:47PM +0100, Andrew Jones wrote:
> On Tue, Jan 10, 2023 at 11:46:40AM +0100, Heiko Stübner wrote:
> > Hi Andrew,
> > 
> > Am Dienstag, 10. Januar 2023, 10:39:36 CET schrieb Andrew Jones:
> > > On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > > 
> > > > Depending on supported extensions on specific RISC-V cores,
> > > > optimized str* functions might make sense.
> > > > 
> > > > This adds basic infrastructure to allow patching the function calls
> > > > via alternatives later on.
> > > > 
> > > > The Linux kernel provides standard implementations for string functions
> > > > but when architectures want to extend them, they need to provide their
> > > > own.
> > > 
> > > And the compiler provides builtins. In the previous series it appeared
> > > to be a bad idea to compile the kernel with the compiler's builtins
> > > disabled. How will the optimized string functions which will be based
> > > on this patch be selected?
> > 
> > yep, the consensus seemingly was that the compiler knows best when
> > to use builtins for some cases (which is probably correct), hence the move
> > away from the inline bases.
> > 
> > So I guess the first decision is the compiler's wether to use a builtin or
> > the kernel string function (same as for mem*) .

Hi Heiko,

Thinking about this some more, I'd still like to understand how/when the
compiler makes this choice. Are compiler flags involved? Or do some
heuristics dictate the choice?

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-09 18:17 ` [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives Heiko Stuebner
  2023-01-10  9:28   ` Andrew Jones
@ 2023-01-11 13:18   ` Jisheng Zhang
  2023-01-11 13:53     ` Heiko Stübner
  2023-01-11 14:15     ` Andrew Jones
  1 sibling, 2 replies; 42+ messages in thread
From: Jisheng Zhang @ 2023-01-11 13:18 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> Alternatives live in a different section, so addresses used by jal
> instructions will point to wrong locations after the patch got applied.
> 
> Similar to arm64, adjust the location to consider that offset.

I already submitted this in a month ago.
https://lore.kernel.org/linux-riscv/20221204174632.3677-2-jszhang@kernel.org/

> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> ---
>  arch/riscv/kernel/alternative.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
> index 6212ea0eed72..985c284fe9f4 100644
> --- a/arch/riscv/kernel/alternative.c
> +++ b/arch/riscv/kernel/alternative.c
> @@ -79,6 +79,21 @@ static void riscv_alternative_fix_auipc_jalr(void *ptr, u32 auipc_insn,
>  	patch_text_nosync(ptr, call, sizeof(u32) * 2);
>  }
>  
> +static void riscv_alternative_fix_jal(void *ptr, u32 jal_insn, int patch_offset)
> +{
> +	s32 imm;
> +
> +	/* get and adjust new target address */
> +	imm = riscv_insn_extract_jtype_imm(jal_insn);
> +	imm -= patch_offset;
> +
> +	/* update instructions */
> +	riscv_insn_insert_jtype_imm(&jal_insn, imm);
> +
> +	/* patch the call place again */
> +	patch_text_nosync(ptr, &jal_insn, sizeof(u32));
> +}
> +
>  void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
>  				      int patch_offset)
>  {
> @@ -106,6 +121,18 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
>  			riscv_alternative_fix_auipc_jalr(alt_ptr + i * sizeof(u32),
>  							 insn, insn2, patch_offset);
>  		}
> +
> +		if (riscv_insn_is_jal(insn)) {
> +			s32 imm = riscv_insn_extract_jtype_imm(insn);
> +
> +			/* don't modify jumps inside the alternative block */
> +			if ((alt_ptr + i * sizeof(u32) + imm) >= alt_ptr &&
> +			    (alt_ptr + i * sizeof(u32) + imm) < (alt_ptr + len))
> +				continue;
> +
> +			riscv_alternative_fix_jal(alt_ptr + i * sizeof(u32),
> +						  insn, patch_offset);
> +		}
>  	}
>  }
>  
> -- 
> 2.35.1
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 0/5] Zbb string optimizations and call support in alternatives
  2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
                   ` (4 preceding siblings ...)
  2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
@ 2023-01-11 13:24 ` Jisheng Zhang
  5 siblings, 0 replies; 42+ messages in thread
From: Jisheng Zhang @ 2023-01-11 13:24 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich,
	ajones, Heiko Stuebner

On Mon, Jan 09, 2023 at 07:17:50PM +0100, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> This series still tries to allow optimized string functions for specific
> extensions. The last approach of using an inline base function to hold
> the alternative calls did cause some issues in a number of places
> 
> So instead of that we're now just using an alternative j at the beginning
> of the generic function to jump to a separate place inside the function
> itself.
> 
> This of course needs a fixup for "j" instructions in alternative blocks,
> so that is provided here as well.
> 
> Technically patch4 got a review from Andrew, but that was still with
> the inline approach, so I didn't bring it over to v4.
> 
> 
> changes since v3:
> - rebase on top of 6.2-rc1 + the applied alternative-call series
> - add alternative fixup for jal instructions
> - drop the inline functions and instead just jump

Hi Heiko,

I think I have sent out the jal fixup a month ago. And similar as V3, Zbb
series need to be split from the general alternative improvement.

Thanks

> 
> changes since v2:
> - add patch fixing the c.jalr funct4 value
> - reword some commit messages
> - fix position of auipc addition patch (earlier)
> - fix compile errors from patch-reordering gone wrong
>   (worked at the end of v2, but compiling individual patches
>    caused issues) - patches are now tested individually
> - limit Zbb variants for GNU as for now
>   (LLVM support for .option arch is still under review)
> - prevent str-functions from getting optimized to builtin-variants
> 
> changes since v1:
> - a number of generalizations/cleanups for instruction parsing
> - use accessor function to access instructions (Emil)
> - actually patch the correct location when having more than one
>   instruction in an alternative block
> - string function cleanups (comments etc) (Conor)
> - move zbb extension above s* extensions in cpu.c lists
> 
> changes since rfc:
> - make Zbb code actually work
> - drop some unneeded patches
> - a lot of cleanups
> 
> Heiko Stuebner (5):
>   RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
>   RISC-V: add helpers for J-type immediate handling
>   RISC-V: fix jal addresses in patched alternatives
>   RISC-V: add infrastructure to allow different str* implementations
>   RISC-V: add zbb support to string functions
> 
>  arch/riscv/Kconfig                       |  24 ++++
>  arch/riscv/include/asm/errata_list.h     |   3 +-
>  arch/riscv/include/asm/hwcap.h           |   1 +
>  arch/riscv/include/asm/insn.h            |  36 ++++++
>  arch/riscv/include/asm/string.h          |  12 ++
>  arch/riscv/kernel/alternative.c          |  27 ++++
>  arch/riscv/kernel/cpu.c                  |   1 +
>  arch/riscv/kernel/cpufeature.c           |  18 +++
>  arch/riscv/kernel/probes/simulate-insn.h |   3 -
>  arch/riscv/kernel/riscv_ksyms.c          |   3 +
>  arch/riscv/lib/Makefile                  |   3 +
>  arch/riscv/lib/strcmp.S                  | 131 ++++++++++++++++++++
>  arch/riscv/lib/strlen.S                  | 142 +++++++++++++++++++++
>  arch/riscv/lib/strncmp.S                 | 151 +++++++++++++++++++++++
>  arch/riscv/purgatory/Makefile            |  13 ++
>  15 files changed, 564 insertions(+), 4 deletions(-)
>  create mode 100644 arch/riscv/lib/strcmp.S
>  create mode 100644 arch/riscv/lib/strlen.S
>  create mode 100644 arch/riscv/lib/strncmp.S
> 
> -- 
> 2.35.1
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
       [not found]           ` <CAEg0e7gJgpoiGjfLeedba0-r=dCE1Z_qkU53w_+-cVjsuqaC3A@mail.gmail.com>
@ 2023-01-11 13:42             ` Philipp Tomsich
  2023-01-11 13:47             ` Andrew Jones
  1 sibling, 0 replies; 42+ messages in thread
From: Philipp Tomsich @ 2023-01-11 13:42 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Andrew Jones, Heiko Stübner, linux-riscv, palmer, conor, jszhang

Drew,

the reason why alignment is critical for correctness of optimisations
in userspace applications is that exception semantics must be retained
when crossing page boundaries.

Philipp.

On Wed, 11 Jan 2023 at 14:39, Christoph Müllner
<christoph.muellner@vrull.eu> wrote:
>
>
>
> On Wed, Jan 11, 2023 at 1:34 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>>
>> On Tue, Jan 10, 2023 at 12:16:47PM +0100, Andrew Jones wrote:
>> > On Tue, Jan 10, 2023 at 11:46:40AM +0100, Heiko Stübner wrote:
>> > > Hi Andrew,
>> > >
>> > > Am Dienstag, 10. Januar 2023, 10:39:36 CET schrieb Andrew Jones:
>> > > > On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
>> > > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>> > > > >
>> > > > > Depending on supported extensions on specific RISC-V cores,
>> > > > > optimized str* functions might make sense.
>> > > > >
>> > > > > This adds basic infrastructure to allow patching the function calls
>> > > > > via alternatives later on.
>> > > > >
>> > > > > The Linux kernel provides standard implementations for string functions
>> > > > > but when architectures want to extend them, they need to provide their
>> > > > > own.
>> > > >
>> > > > And the compiler provides builtins. In the previous series it appeared
>> > > > to be a bad idea to compile the kernel with the compiler's builtins
>> > > > disabled. How will the optimized string functions which will be based
>> > > > on this patch be selected?
>> > >
>> > > yep, the consensus seemingly was that the compiler knows best when
>> > > to use builtins for some cases (which is probably correct), hence the move
>> > > away from the inline bases.
>> > >
>> > > So I guess the first decision is the compiler's wether to use a builtin or
>> > > the kernel string function (same as for mem*) .
>>
>> Hi Heiko,
>>
>> Thinking about this some more, I'd still like to understand how/when the
>> compiler makes this choice. Are compiler flags involved? Or do some
>> heuristics dictate the choice?
>
>
> 1) Arch-independent expansion is used e.g. for calculating the result for statically known strings.
> 2) Arch-specific expansion (not mainline, but my series can be found here: [1]) can do whatever is reasonable.
> My series emits a bounded unrolled comparison loop for str(n)cmp with a bound that can be adjusted via flag.
> But it only triggers if both strings are known to be aligned.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605993.html
>
> BR
> Christoph
>
>>
>> Thanks,
>> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
       [not found]           ` <CAEg0e7gJgpoiGjfLeedba0-r=dCE1Z_qkU53w_+-cVjsuqaC3A@mail.gmail.com>
  2023-01-11 13:42             ` Philipp Tomsich
@ 2023-01-11 13:47             ` Andrew Jones
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 13:47 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Heiko Stübner, linux-riscv, palmer, conor, philipp.tomsich, jszhang

On Wed, Jan 11, 2023 at 02:39:36PM +0100, Christoph Müllner wrote:
> On Wed, Jan 11, 2023 at 1:34 PM Andrew Jones <ajones@ventanamicro.com>
> wrote:
> 
> > On Tue, Jan 10, 2023 at 12:16:47PM +0100, Andrew Jones wrote:
> > > On Tue, Jan 10, 2023 at 11:46:40AM +0100, Heiko Stübner wrote:
> > > > Hi Andrew,
> > > >
> > > > Am Dienstag, 10. Januar 2023, 10:39:36 CET schrieb Andrew Jones:
> > > > > On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > > > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > > > >
> > > > > > Depending on supported extensions on specific RISC-V cores,
> > > > > > optimized str* functions might make sense.
> > > > > >
> > > > > > This adds basic infrastructure to allow patching the function calls
> > > > > > via alternatives later on.
> > > > > >
> > > > > > The Linux kernel provides standard implementations for string
> > functions
> > > > > > but when architectures want to extend them, they need to provide
> > their
> > > > > > own.
> > > > >
> > > > > And the compiler provides builtins. In the previous series it
> > appeared
> > > > > to be a bad idea to compile the kernel with the compiler's builtins
> > > > > disabled. How will the optimized string functions which will be based
> > > > > on this patch be selected?
> > > >
> > > > yep, the consensus seemingly was that the compiler knows best when
> > > > to use builtins for some cases (which is probably correct), hence the
> > move
> > > > away from the inline bases.
> > > >
> > > > So I guess the first decision is the compiler's wether to use a
> > builtin or
> > > > the kernel string function (same as for mem*) .
> >
> > Hi Heiko,
> >
> > Thinking about this some more, I'd still like to understand how/when the
> > compiler makes this choice. Are compiler flags involved? Or do some
> > heuristics dictate the choice?
> >
> 
> 1) Arch-independent expansion is used e.g. for calculating the result for
> statically known strings.
> 2) Arch-specific expansion (not mainline, but my series can be found here:
> [1]) can do whatever is reasonable.
> My series emits a bounded unrolled comparison loop for str(n)cmp with a
> bound that can be adjusted via flag.
> But it only triggers if both strings are known to be aligned.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605993.html
> 

Thanks Christoph!

drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-11 13:18   ` Jisheng Zhang
@ 2023-01-11 13:53     ` Heiko Stübner
  2023-01-11 14:15     ` Andrew Jones
  1 sibling, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-11 13:53 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, ajones

Hi,

Am Mittwoch, 11. Januar 2023, 14:18:14 CET schrieb Jisheng Zhang:
> On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Alternatives live in a different section, so addresses used by jal
> > instructions will point to wrong locations after the patch got applied.
> > 
> > Similar to arm64, adjust the location to consider that offset.
> 
> I already submitted this in a month ago.
> https://lore.kernel.org/linux-riscv/20221204174632.3677-2-jszhang@kernel.org/

Yep, though the now merged base series for the call fixups changed a lot
since that. With changes needed as we discussed in your series.

As the change for the jal fixer itself was this small and I needed it for
my work, I did just go ahead and implemented it so I could move forward.


I'm not that attached to the authorship, so if you want we can set you as
author or Co-Developed-by or so that is just fine.


Heiko


> > 
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/kernel/alternative.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
> > index 6212ea0eed72..985c284fe9f4 100644
> > --- a/arch/riscv/kernel/alternative.c
> > +++ b/arch/riscv/kernel/alternative.c
> > @@ -79,6 +79,21 @@ static void riscv_alternative_fix_auipc_jalr(void *ptr, u32 auipc_insn,
> >  	patch_text_nosync(ptr, call, sizeof(u32) * 2);
> >  }
> >  
> > +static void riscv_alternative_fix_jal(void *ptr, u32 jal_insn, int patch_offset)
> > +{
> > +	s32 imm;
> > +
> > +	/* get and adjust new target address */
> > +	imm = riscv_insn_extract_jtype_imm(jal_insn);
> > +	imm -= patch_offset;
> > +
> > +	/* update instructions */
> > +	riscv_insn_insert_jtype_imm(&jal_insn, imm);
> > +
> > +	/* patch the call place again */
> > +	patch_text_nosync(ptr, &jal_insn, sizeof(u32));
> > +}
> > +
> >  void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> >  				      int patch_offset)
> >  {
> > @@ -106,6 +121,18 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> >  			riscv_alternative_fix_auipc_jalr(alt_ptr + i * sizeof(u32),
> >  							 insn, insn2, patch_offset);
> >  		}
> > +
> > +		if (riscv_insn_is_jal(insn)) {
> > +			s32 imm = riscv_insn_extract_jtype_imm(insn);
> > +
> > +			/* don't modify jumps inside the alternative block */
> > +			if ((alt_ptr + i * sizeof(u32) + imm) >= alt_ptr &&
> > +			    (alt_ptr + i * sizeof(u32) + imm) < (alt_ptr + len))
> > +				continue;
> > +
> > +			riscv_alternative_fix_jal(alt_ptr + i * sizeof(u32),
> > +						  insn, patch_offset);
> > +		}
> >  	}
> >  }
> >  
> 





_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-11 13:18   ` Jisheng Zhang
  2023-01-11 13:53     ` Heiko Stübner
@ 2023-01-11 14:15     ` Andrew Jones
  2023-01-11 14:44       ` Jisheng Zhang
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 14:15 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, Heiko Stuebner

On Wed, Jan 11, 2023 at 09:18:14PM +0800, Jisheng Zhang wrote:
> On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Alternatives live in a different section, so addresses used by jal
> > instructions will point to wrong locations after the patch got applied.
> > 
> > Similar to arm64, adjust the location to consider that offset.
> 
> I already submitted this in a month ago.
> https://lore.kernel.org/linux-riscv/20221204174632.3677-2-jszhang@kernel.org/
>

Hi Jisheng,

It'd be great to see a v3 of your series on the list, since I'd like to
base zicboz work on it.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-11 12:24   ` Andrew Jones
@ 2023-01-11 14:27     ` Christoph Müllner
  2023-01-11 15:16       ` Andrew Jones
  2023-01-11 15:22       ` Jeff Law
  2023-01-12 22:05     ` Heiko Stübner
  1 sibling, 2 replies; 42+ messages in thread
From: Christoph Müllner @ 2023-01-11 14:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Wed, Jan 11, 2023 at 1:24 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> >
> > Add handling for ZBB extension and add support for using it as a
> > variant for optimized string functions.
> >
> > Support for the Zbb-str-variants is limited to the GNU-assembler
> > for now, as LLVM has not yet acquired the functionality to
> > selectively change the arch option in assembler code.
> > This is still under review at
> >     https://reviews.llvm.org/D123515
> >
> > Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> > Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/Kconfig                   |  24 ++++++
> >  arch/riscv/include/asm/errata_list.h |   3 +-
> >  arch/riscv/include/asm/hwcap.h       |   1 +
> >  arch/riscv/include/asm/string.h      |   2 +
> >  arch/riscv/kernel/cpu.c              |   1 +
> >  arch/riscv/kernel/cpufeature.c       |  18 +++++
> >  arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
> >  arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
> >  arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
> >  9 files changed, 367 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..7c814fbf9527 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
> >
> >          If you don't know what to do here, say Y.
> >
> > +config TOOLCHAIN_HAS_ZBB
> > +     bool
> > +     default y
> > +     depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> > +     depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> > +     depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > +     depends on AS_IS_GNU
> > +
> > +config RISCV_ISA_ZBB
> > +     bool "Zbb extension support for bit manipulation instructions"
> > +     depends on TOOLCHAIN_HAS_ZBB
> > +     depends on !XIP_KERNEL && MMU
> > +     select RISCV_ALTERNATIVE
> > +     default y
> > +     help
> > +        Adds support to dynamically detect the presence of the ZBB
> > +        extension (basic bit manipulation) and enable its usage.
> > +
> > +        The Zbb extension provides instructions to accelerate a number
> > +        of bit-specific operations (count bit population, sign extending,
> > +        bitrotation, etc).
> > +
> > +        If you don't know what to do here, say Y.
> > +
> >  config TOOLCHAIN_HAS_ZICBOM
> >       bool
> >       default y
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 4180312d2a70..95e626b7281e 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -24,7 +24,8 @@
> >
> >  #define      CPUFEATURE_SVPBMT 0
> >  #define      CPUFEATURE_ZICBOM 1
> > -#define      CPUFEATURE_NUMBER 2
> > +#define      CPUFEATURE_ZBB 2
> > +#define      CPUFEATURE_NUMBER 3
> >
> >  #ifdef __ASSEMBLY__
> >
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 86328e3acb02..b727491fb100 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> >       RISCV_ISA_EXT_ZIHINTPAUSE,
> >       RISCV_ISA_EXT_SSTC,
> >       RISCV_ISA_EXT_SVINVAL,
> > +     RISCV_ISA_EXT_ZBB,
> >       RISCV_ISA_EXT_ID_MAX
> >  };
> >  static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
> > diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> > index a96b1fea24fe..17dfc4ab4c80 100644
> > --- a/arch/riscv/include/asm/string.h
> > +++ b/arch/riscv/include/asm/string.h
> > @@ -6,6 +6,8 @@
> >  #ifndef _ASM_RISCV_STRING_H
> >  #define _ASM_RISCV_STRING_H
> >
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >  #include <linux/types.h>
> >  #include <linux/linkage.h>
> >
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index 1b9a5a66e55a..c4d1aa166f8b 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
> >   *    extensions by an underscore.
> >   */
> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> > +     __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> >       __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> >       __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> >       __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 205bbd6b1fce..bf3a791d7110 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
> >                                       set_bit(nr, this_isa);
> >                               }
> >                       } else {
> > +                             SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> >                               SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> >                               SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> >                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> > @@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
> >       return true;
> >  }
> >
> > +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> > +{
> > +     if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> > +             return false;
> > +
> > +     if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > +             return false;
> > +
> > +     if (!riscv_isa_extension_available(NULL, ZBB))
> > +             return false;
> > +
> > +     return true;
> > +}
> > +
> >  /*
> >   * Probe presence of individual extensions.
> >   *
> > @@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
> >       if (cpufeature_probe_zicbom(stage))
> >               cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
> >
> > +     if (cpufeature_probe_zbb(stage))
> > +             cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> > +
> >       return cpu_req_feature;
> >  }
> >
> > diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> > index 94440fb8390c..5428a8f2eb84 100644
> > --- a/arch/riscv/lib/strcmp.S
> > +++ b/arch/riscv/lib/strcmp.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >
> >  /* int strcmp(const char *cs, const char *ct) */
> >  SYM_FUNC_START(strcmp)
> > +
> > +     ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
>
> How about strcmp_zbb and similar for the other functions for the labels?

Fully agree!

>
> > +
> >       /*
> >        * Returns
> >        *   a0 - comparison result, value like strcmp
> > @@ -34,4 +39,93 @@ SYM_FUNC_START(strcmp)
> >       bnez    t1, 1b
> >       li      a0, 0
> >       j       2b
> > +
> > +/*
> > + * Variant of strcmp using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +#define src1         a0
> > +#define result               a0
> > +#define src2         t5
> > +#define data1                t0
> > +#define data2                t1
> > +#define align                t2
> > +#define data1_orcb   t3
> > +#define m1           t4
>
> nit: It's probably just me, but I'm not a big fan of creating defines
> for registers, particularly when the meaning of the registers' contents
> gets changed, because then the name, at least temporarily, no longer
> accurately describes the content.

Without any strong feelings about this: I think this is common
practice when writing asm code (helps to write and understand the
code).
See random other sources where this is also done:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/strcmp.S;h=054ce15ef8273073de4c821e3f415f1f70541554;hb=HEAD#l35
https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-x86_64.pl#L58
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/lib/memcmp.S?h=v6.2-rc3#n20

> And, in this case, it looks like the code matches Appendix A, except the
> register and label naming. I'd prefer to have those match too so I could
> "review" with the diff tool.

Good that you did not "review" the code with diff.
If so, you would not have found the LGPTR issue (below).

>
>
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +     /*
> > +      * Returns
> > +      *   a0 - comparison result, value like strcmp
> > +      *
> > +      * Parameters
> > +      *   a0 - string1
> > +      *   a1 - string2
> > +      *
> > +      * Clobbers
> > +      *   t0, t1, t2, t3, t4, t5
> > +      */
> > +     mv      src2, a1
> > +
> > +     or      align, src1, src2
> > +     li      m1, -1
> > +     and     align, align, SZREG-1
> > +     bnez    align, 3f
> > +
> > +     /* Main loop for aligned string.  */
> > +     .p2align 3
>
> Why are the starts of the loops aligned to 8 byte boundaries?

Loop alignment seems to be a valid optimization method for other architectures.
I don't see anything specific in RISC-V that would make loop alignment
irrelevant.
AArch64 code seems to align loops (or other jump targets) to 8 or 16 bytes.
This was considered as not harmful, but potentially beneficial.

>
>
> > +1:
> > +     REG_L   data1, 0(src1)
> > +     REG_L   data2, 0(src2)
> > +     orc.b   data1_orcb, data1
> > +     bne     data1_orcb, m1, 2f
> > +     addi    src1, src1, SZREG
> > +     addi    src2, src2, SZREG
> > +     beq     data1, data2, 1b
> > +
> > +     /*
> > +      * Words don't match, and no null byte in the first
> > +      * word. Get bytes in big-endian order and compare.
> > +      */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > +     rev8    data1, data1
> > +     rev8    data2, data2
> > +#endif
> > +
> > +     /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */
> > +     sltu    result, data1, data2
> > +     neg     result, result
> > +     ori     result, result, 1
> > +     ret
> > +
> > +2:
> > +     /*
> > +      * Found a null byte.
> > +      * If words don't match, fall back to simple loop.
> > +      */
> > +     bne     data1, data2, 3f
> > +
> > +     /* Otherwise, strings are equal. */
> > +     li      result, 0
> > +     ret
> > +
> > +     /* Simple loop for misaligned strings. */
> > +     .p2align 3
> > +3:
> > +     lbu     data1, 0(src1)
> > +     lbu     data2, 0(src2)
> > +     addi    src1, src1, 1
> > +     addi    src2, src2, 1
> > +     bne     data1, data2, 4f
> > +     bnez    data1, 3b
> > +
> > +4:
> > +     sub     result, data1, data2
>
> The non-optimized version returns explicitly -1, 0, 1, whereas this
> returns < 0, 0, > 0. Assuming we don't need the explicit -1 / 1 then we
> can change the non-optimized version to do this subtraction for its
> return value too, saving several instructions.

Regarding "we don't need":
"The strcmp() and strncmp() functions return an integer less than, equal
 to, or greater than zero if s1 (or the first n bytes thereof) is found,
 respectively, to be less than, to match, or be greater than s2."

So, yes, the values -1/+1 are just some of the allowed return values.

>
>
> > +     ret
> > +
> > +.option pop
> > +#endif
> >  SYM_FUNC_END(strcmp)
> > diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> > index 09a7aaff26c8..738efb04307d 100644
> > --- a/arch/riscv/lib/strlen.S
> > +++ b/arch/riscv/lib/strlen.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >
> >  /* int strlen(const char *s) */
> >  SYM_FUNC_START(strlen)
> > +
> > +     ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> > +
> >       /*
> >        * Returns
> >        *   a0 - string length
> > @@ -25,4 +30,113 @@ SYM_FUNC_START(strlen)
> >  2:
> >       addi    t1, t1, 1
> >       j       1b
> > +
> > +/*
> > + * Variant of strlen using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +
> > +#define src          a0
> > +#define result               a0
> > +#define addr         t0
> > +#define data         t1
> > +#define offset               t2
> > +#define offset_bits  t2
> > +#define valid_bytes  t3
> > +#define m1           t3
> > +
> > +#ifdef CONFIG_CPU_BIG_ENDIAN
> > +# define CZ  clz
> > +# define SHIFT       sll
> > +#else
> > +# define CZ  ctz
> > +# define SHIFT       srl
> > +#endif
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +     /*
> > +      * Returns
> > +      *   a0 - string length
> > +      *
> > +      * Parameters
> > +      *   a0 - String to measure
> > +      *
> > +      * Clobbers
> > +      *   t0, t1, t2, t3
> > +      */
> > +
> > +     /* Number of irrelevant bytes in the first word. */
>
> I see that this comment doesn't just apply to the next instruction,
> but to several instructions. It threw me off at first. Again, I
> think I'd prefer we just copy+paste the Appendix, comments an all.

I think this comment only applies to this line.
The loop processes on word-aliged chunks and only the first chunk can
be misaligned.
The andi instruction calculates the number of irrelevant bytes in this
first word.

The Appendix states "// offset" here, so I think this comment is better.

But I don't mind if people want to stick with the comments from the Appendix.

>
>
> > +     andi    offset, src, SZREG-1
> > +
> > +     /* Align pointer. */
> > +     andi    addr, src, -SZREG
> > +
> > +     li      valid_bytes, SZREG
> > +     sub     valid_bytes, valid_bytes, offset
> > +     slli    offset_bits, offset, RISCV_LGPTR
>
> The shift immediate should be 3, even for rv32, since we want
> bits-per-byte, not bytes-per-word. The spec example uses PTRLOG, which
> does imply bytes-per-word to me as well, so that seems like a spec bug.

Correct, that's an issue in the Appendix.

>
>
> > +
> > +     /* Get the first word.  */
> > +     REG_L   data, 0(addr)
> > +
> > +     /*
> > +      * Shift away the partial data we loaded to remove the irrelevant bytes
> > +      * preceding the string with the effect of adding NUL bytes at the
> > +      * end of the string.
>
> the end of the string's first word.
>
> > +      */
> > +     SHIFT   data, data, offset_bits
> > +
> > +     /* Convert non-NUL into 0xff and NUL into 0x00. */
> > +     orc.b   data, data
> > +
> > +     /* Convert non-NUL into 0x00 and NUL into 0xff. */
> > +     not     data, data
> > +
> > +     /*
> > +      * Search for the first set bit (corresponding to a NUL byte in the
> > +      * original chunk).
> > +      */
> > +     CZ      data, data
> > +
> > +     /*
> > +      * The first chunk is special: commpare against the number
>
> compare
>
> > +      * of valid bytes in this chunk.
> > +      */
> > +     srli    result, data, 3
> > +     bgtu    valid_bytes, result, 3f
> > +
> > +     /* Prepare for the word comparison loop. */
> > +     addi    offset, addr, SZREG
> > +     li      m1, -1
> > +
> > +     /*
> > +      * Our critical loop is 4 instructions and processes data in
> > +      * 4 byte or 8 byte chunks.
> > +      */
> > +     .p2align 3
> > +1:
> > +     REG_L   data, SZREG(addr)
> > +     addi    addr, addr, SZREG
> > +     orc.b   data, data
> > +     beq     data, m1, 1b
> > +2:
> > +     not     data, data
> > +     CZ      data, data
> > +
> > +     /* Get number of processed words.  */
> > +     sub     offset, addr, offset
> > +
> > +     /* Add number of characters in the first word.  */
> > +     add     result, result, offset
> > +     srli    data, data, 3
> > +
> > +     /* Add number of characters in the last word.  */
> > +     add     result, result, data
> > +3:
> > +     ret
> > +
> > +.option pop
> > +#endif
> >  SYM_FUNC_END(strlen)
> > diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> > index 493ab6febcb2..851428b439dc 100644
> > --- a/arch/riscv/lib/strncmp.S
> > +++ b/arch/riscv/lib/strncmp.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >
> >  /* int strncmp(const char *cs, const char *ct, size_t count) */
> >  SYM_FUNC_START(strncmp)
> > +
> > +     ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> > +
> >       /*
> >        * Returns
> >        *   a0 - comparison result, value like strncmp
> > @@ -37,4 +42,110 @@ SYM_FUNC_START(strncmp)
> >  4:
> >       li      a0, 0
> >       j       2b
> > +
> > +/*
> > + * Variant of strncmp using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +
> > +#define src1         a0
> > +#define result               a0
> > +#define src2         t6
> > +#define len          a2
> > +#define data1                t0
> > +#define data2                t1
> > +#define align                t2
> > +#define data1_orcb   t3
> > +#define limit                t4
> > +#define m1           t5
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +     /*
> > +      * Returns
> > +      *   a0 - comparison result, like strncmp
> > +      *
> > +      * Parameters
> > +      *   a0 - string1
> > +      *   a1 - string2
> > +      *   a2 - number of characters to compare
> > +      *
> > +      * Clobbers
> > +      *   t0, t1, t2, t3, t4, t5, t6
> > +      */
> > +     mv      src2, a1

Why is this needed (i.e. why is src2 not a1, but t6)?

> > +
> > +     or      align, src1, src2
> > +     li      m1, -1
> > +     and     align, align, SZREG-1
> > +     add     limit, src1, len
> > +     bnez    align, 4f
> > +
> > +     /* Adjust limit for fast-path.  */
> > +     addi    limit, limit, -SZREG
>
> To avoid unnecessarily writing the last SZREG bytes one at a time when
> 'len' is SZREG aligned, we should be using the explicitly aligned limit
> here (limit & ~(SZREG - 1)) rather than just subtracting SZREG. Then, down
> in the slow-path we use the original limit to finish off if len wasn't
> aligned. When it was aligned, we'll just immediately branch to the ret.
> (I suspect many times 'len' will be SZREG aligned, since string buffers
> are typically allocated with power-of-two sizes).

Yes, that's a good idea!

So I propose to keep the (original) limit and introduce a new register here:
    andi fast_limit, limit, -SZREG // ~(SZREG - 1) == -SZREG
...and use that in the loop:
    bgt     src1, fast_limit, 3f
...and remove the limit restore ("addi   limit, limit, SZREG").


>
> > +
> > +     /* Main loop for aligned string.  */
> > +     .p2align 3
> > +1:
> > +     bgt     src1, limit, 3f
> > +     REG_L   data1, 0(src1)
> > +     REG_L   data2, 0(src2)
> > +     orc.b   data1_orcb, data1
> > +     bne     data1_orcb, m1, 2f
> > +     addi    src1, src1, SZREG
> > +     addi    src2, src2, SZREG
> > +     beq     data1, data2, 1b
> > +
> > +     /*
> > +      * Words don't match, and no null byte in the first
> > +      * word. Get bytes in big-endian order and compare.
> > +      */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > +     rev8    data1, data1
> > +     rev8    data2, data2
> > +#endif
> > +
> > +     /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> > +     sltu    result, data1, data2
> > +     neg     result, result
> > +     ori     result, result, 1
> > +     ret
> > +
> > +2:
> > +     /*
> > +      * Found a null byte.
> > +      * If words don't match, fall back to simple loop.
> > +      */
> > +     bne     data1, data2, 3f
> > +
> > +     /* Otherwise, strings are equal.  */
> > +     li      result, 0
> > +     ret
> > +
> > +     /* Simple loop for misaligned strings.  */
> > +3:
> > +     /* Restore limit for slow-path.  */
> > +     addi    limit, limit, SZREG
> > +     .p2align 3
> > +4:
> > +     bge     src1, limit, 6f
> > +     lbu     data1, 0(src1)
> > +     lbu     data2, 0(src2)
> > +     addi    src1, src1, 1
> > +     addi    src2, src2, 1
> > +     bne     data1, data2, 5f
> > +     bnez    data1, 4b
> > +
> > +5:
> > +     sub     result, data1, data2
> > +     ret
> > +
> > +6:
> > +     li      result, 0
> > +     ret
>
> As strcmp and strncmp have so much in common it's tempting to try and
> merge them somehow. Maybe with a handful of macros shared between them?
>
> > +
> > +.option pop
> > +#endif
> >  SYM_FUNC_END(strncmp)
> > --
> > 2.35.1
> >
>
> Thanks,
> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling
  2023-01-10  8:54     ` Conor Dooley
@ 2023-01-11 14:43       ` Jisheng Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Jisheng Zhang @ 2023-01-11 14:43 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Andrew Jones, Heiko Stuebner, linux-riscv, palmer,
	christoph.muellner, conor, philipp.tomsich, Heiko Stuebner

On Tue, Jan 10, 2023 at 08:54:33AM +0000, Conor Dooley wrote:
> On Tue, Jan 10, 2023 at 09:44:25AM +0100, Andrew Jones wrote:
> > > +	*insn |= (((imm & (RV_J_IMM_10_1_MASK << RV_J_IMM_10_1_OFF)) << RV_I_IMM_11_0_OPOFF) |
> >                                                                         ^ RV_J_IMM_10_1_OPOFF
> > 									(as pointed out by Conor)
> 
> > > +		  ((imm & (RV_J_IMM_11_MASK << RV_J_IMM_11_OFF)) << RV_J_IMM_11_OPOFF) |
> > > +		  ((imm & (RV_J_IMM_19_12_OPOFF << RV_J_IMM_19_12_OFF)) << RV_J_IMM_19_12_OPOFF) |
> >                            ^ RV_J_IMM_19_12_MASK
> 
> You'd think that having seen one I'd have picked up on the other one
> being wrong too...
> 

we also need RV_X() usage or do similar ;) 

[1] https://lore.kernel.org/linux-riscv/20221204174632.3677-2-jszhang@kernel.org/


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-11 14:15     ` Andrew Jones
@ 2023-01-11 14:44       ` Jisheng Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Jisheng Zhang @ 2023-01-11 14:44 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, Heiko Stuebner

On Wed, Jan 11, 2023 at 03:15:19PM +0100, Andrew Jones wrote:
> On Wed, Jan 11, 2023 at 09:18:14PM +0800, Jisheng Zhang wrote:
> > On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > 
> > > Alternatives live in a different section, so addresses used by jal
> > > instructions will point to wrong locations after the patch got applied.
> > > 
> > > Similar to arm64, adjust the location to consider that offset.
> > 
> > I already submitted this in a month ago.
> > https://lore.kernel.org/linux-riscv/20221204174632.3677-2-jszhang@kernel.org/
> >
> 
> Hi Jisheng,
> 
> It'd be great to see a v3 of your series on the list, since I'd like to
> base zicboz work on it.

Hi Andrew,

Sorry for being late. I'm cooking the patches, will send out today.

Best Regards

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  2023-01-09 20:53   ` Conor Dooley
@ 2023-01-11 15:14     ` Heiko Stübner
  0 siblings, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-11 15:14 UTC (permalink / raw)
  To: Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, philipp.tomsich, ajones,
	jszhang

Am Montag, 9. Januar 2023, 21:53:27 CET schrieb Conor Dooley:
> On Mon, Jan 09, 2023 at 07:17:51PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
> > the kprobes implementation. This got moved into a central header in
> > commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header").
> > 
> > Though it looks like I overlooked two of them, so fix that. FENCE itself is
> > an instruction defined directly by its own opcode, while the created
> > riscv_isn_is_system function covers all instructions defined under the SYSTEM
> > opcode.
> > 
> > Fixes: ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header")
> 
> Not quite sure why it needs a fixes tag, nothing was actually broken
> previously, right?

I was going with the thought of "this should've been in the original patch
already" / "the original patch does not fully do what it says it's doing"
hence the Fixes, but you're right - it's just cosmetics at this point and
would just incur more work for the stable people.

So I've dropped the Fixes

Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-11 14:27     ` Christoph Müllner
@ 2023-01-11 15:16       ` Andrew Jones
  2023-01-11 15:22       ` Jeff Law
  1 sibling, 0 replies; 42+ messages in thread
From: Andrew Jones @ 2023-01-11 15:16 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: Heiko Stuebner, linux-riscv, palmer, conor, philipp.tomsich,
	jszhang, Heiko Stuebner

On Wed, Jan 11, 2023 at 03:27:28PM +0100, Christoph Müllner wrote:
> On Wed, Jan 11, 2023 at 1:24 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
...
> > > +     li      valid_bytes, SZREG
> > > +     sub     valid_bytes, valid_bytes, offset
> > > +     slli    offset_bits, offset, RISCV_LGPTR
> >
> > The shift immediate should be 3, even for rv32, since we want
> > bits-per-byte, not bytes-per-word. The spec example uses PTRLOG, which
> > does imply bytes-per-word to me as well, so that seems like a spec bug.
> 
> Correct, that's an issue in the Appendix.
>

Thanks for the confirmation. I've sent a PR,
https://github.com/riscv/riscv-bitmanip/pull/182

drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-11 14:27     ` Christoph Müllner
  2023-01-11 15:16       ` Andrew Jones
@ 2023-01-11 15:22       ` Jeff Law
  1 sibling, 0 replies; 42+ messages in thread
From: Jeff Law @ 2023-01-11 15:22 UTC (permalink / raw)
  To: linux-riscv



On 1/11/23 07:27, Christoph Müllner wrote:

>>> +
>>> +.option push
>>> +.option arch,+zbb
>>> +
>>> +     /*
>>> +      * Returns
>>> +      *   a0 - comparison result, value like strcmp
>>> +      *
>>> +      * Parameters
>>> +      *   a0 - string1
>>> +      *   a1 - string2
>>> +      *
>>> +      * Clobbers
>>> +      *   t0, t1, t2, t3, t4, t5
>>> +      */
>>> +     mv      src2, a1
>>> +
>>> +     or      align, src1, src2
>>> +     li      m1, -1
>>> +     and     align, align, SZREG-1
>>> +     bnez    align, 3f
>>> +
>>> +     /* Main loop for aligned string.  */
>>> +     .p2align 3
>>
>> Why are the starts of the loops aligned to 8 byte boundaries?
> 
> Loop alignment seems to be a valid optimization method for other architectures.
> I don't see anything specific in RISC-V that would make loop alignment
> irrelevant.
Right.  It's pretty common to see small, but observable performance 
improvements on many uarchs due to code alignments.  There's typically 
three knobs to tune in this space.  First is function alignment, second 
is loop alignment, then finally other jump target alignments.

Drew, happy to sync on this if you want -- I've got a low priority TODO 
to come up with some sensible guidelines for our v1 chip.   I'm not too 
inclined to push hard on it yet as it'll be a lot easier to find the 
sweet spot once real hardware is available.

jeff


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives
  2023-01-10  9:28   ` Andrew Jones
@ 2023-01-11 17:15     ` Jisheng Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Jisheng Zhang @ 2023-01-11 17:15 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Heiko Stuebner, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, Heiko Stuebner

On Tue, Jan 10, 2023 at 10:28:09AM +0100, Andrew Jones wrote:
> On Mon, Jan 09, 2023 at 07:17:53PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Alternatives live in a different section, so addresses used by jal
> > instructions will point to wrong locations after the patch got applied.
> > 
> > Similar to arm64, adjust the location to consider that offset.
> > 
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/kernel/alternative.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
> > index 6212ea0eed72..985c284fe9f4 100644
> > --- a/arch/riscv/kernel/alternative.c
> > +++ b/arch/riscv/kernel/alternative.c
> > @@ -79,6 +79,21 @@ static void riscv_alternative_fix_auipc_jalr(void *ptr, u32 auipc_insn,
> >  	patch_text_nosync(ptr, call, sizeof(u32) * 2);
> >  }
> >  
> > +static void riscv_alternative_fix_jal(void *ptr, u32 jal_insn, int patch_offset)
> > +{
> > +	s32 imm;
> > +
> > +	/* get and adjust new target address */
> > +	imm = riscv_insn_extract_jtype_imm(jal_insn);
> > +	imm -= patch_offset;
> > +
> > +	/* update instructions */
> 
> instruction
> 
> > +	riscv_insn_insert_jtype_imm(&jal_insn, imm);
> > +
> > +	/* patch the call place again */
> > +	patch_text_nosync(ptr, &jal_insn, sizeof(u32));
> > +}
> > +
> >  void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> >  				      int patch_offset)
> >  {
> > @@ -106,6 +121,18 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> >  			riscv_alternative_fix_auipc_jalr(alt_ptr + i * sizeof(u32),
> >  							 insn, insn2, patch_offset);
> 
> I think we should add an i++ here. There's not a problem now, since we're
> only adding a fixup for jal, not jalr, but we should future-proof this and
> there's no reason to revisit an already fixed-up instruction anyway.

Hi Andrew,

IMHO, adding the i++ here belongs to the aupic jalr imm patching
commit, since the patch has been merged to riscv-next, I think
it's better to add a seperate patch to do this improvement, thus
I didn't update as commented here in my v3 series.

Thanks

> 
> >  		}
> > +
> > +		if (riscv_insn_is_jal(insn)) {
> > +			s32 imm = riscv_insn_extract_jtype_imm(insn);
> > +
> > +			/* don't modify jumps inside the alternative block */
> 
> Don't
> 
> > +			if ((alt_ptr + i * sizeof(u32) + imm) >= alt_ptr &&
> > +			    (alt_ptr + i * sizeof(u32) + imm) < (alt_ptr + len))
> > +				continue;
> > +
> > +			riscv_alternative_fix_jal(alt_ptr + i * sizeof(u32),
> > +						  insn, patch_offset);
> > +		}
> >  	}
> >  }
> >  
> > -- 
> > 2.35.1
> >
> 
> Otherwise
> 
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> 
> Thanks,
> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-10 10:14     ` Conor Dooley
@ 2023-01-12 11:21       ` Heiko Stübner
  2023-01-12 12:06         ` Conor Dooley
  0 siblings, 1 reply; 42+ messages in thread
From: Heiko Stübner @ 2023-01-12 11:21 UTC (permalink / raw)
  To: Andrew Jones, Conor Dooley
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

Am Dienstag, 10. Januar 2023, 11:14:37 CET schrieb Conor Dooley:
> On Tue, Jan 10, 2023 at 10:57:20AM +0100, Andrew Jones wrote:
> > On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> 
> > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > > index 1b9a5a66e55a..c4d1aa166f8b 100644
> > > --- a/arch/riscv/kernel/cpu.c
> > > +++ b/arch/riscv/kernel/cpu.c
> > > @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
> > >   *    extensions by an underscore.
> > >   */
> > >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> > > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> > >  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> > >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> > >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > 
> > Huh, this array still doesn't appear to be in order... Zbb should
> > be getting inserted between the Zi* extensions (which should be first)
> > and the S* extensions and each of those three categories should be
> > in alphabetical order.
> 
> Correct. The new entry was at least added in the right place, reordering
> existing entries aside.
> 
> > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > > index 205bbd6b1fce..bf3a791d7110 100644
> > > --- a/arch/riscv/kernel/cpufeature.c
> > > +++ b/arch/riscv/kernel/cpufeature.c
> > > @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
> > >  					set_bit(nr, this_isa);
> > >  				}
> > >  			} else {
> > > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> > >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> > >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> > 
> > I think we wanted this one in alphabetical order...
> 
> Correct again.
> 
> I've been avoiding mentioning this stuff in reviews though as Palmer has
> not yet picked up the patches [0] putting these arrays into those orders
> in the first place.
> 
> I tried to harass him about them last night, but he didn't get around to
> them. Perhaps worth mentioning tomorrow if we're gonna keep having to
> discussing these lists in reviews?
> 
> Thanks,
> Conor.
> 
> 0 - https://lore.kernel.org/all/20221205144525.2148448-1-conor.dooley@microchip.com/
> (I tried applying it yesterday, worked with `git am -3`)

I've gone forward and imported that series as a dependency in my tree so that the
ordering becomes correct :-) .

Though of course I got a conflict with Andrew's
commit e923f4625ed3 ("riscv: Apply a static assert to riscv_isa_ext_id")
but fixed that up - though maybe a rebased v3 may be in order?


Heiko




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-12 11:21       ` Heiko Stübner
@ 2023-01-12 12:06         ` Conor Dooley
  2023-01-12 12:28           ` Heiko Stübner
  0 siblings, 1 reply; 42+ messages in thread
From: Conor Dooley @ 2023-01-12 12:06 UTC (permalink / raw)
  To: Heiko Stübner, palmer
  Cc: Andrew Jones, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, jszhang


[-- Attachment #1.1: Type: text/plain, Size: 977 bytes --]

On Thu, Jan 12, 2023 at 12:21:38PM +0100, Heiko Stübner wrote:
> 
> I've gone forward and imported that series as a dependency in my tree so that the
> ordering becomes correct :-) .
> 
> Though of course I got a conflict with Andrew's
> commit e923f4625ed3 ("riscv: Apply a static assert to riscv_isa_ext_id")
> but fixed that up - though maybe a rebased v3 may be in order?

To be honest, I "strategically" didn't resend a v3 despite that trivial
conflict since it's at the bottom of the list on patchwork.
I was hoping that it'd get merged earlier in the window so any of the
various bits of extension support stuff could go on top of the re-order,
given it conflicts with, give or take, every other extension related
patch. That's not come to pass (yet?) though.

I can easily respin, but if I do & some other extension bit gets merged
before it, it'll conflict again, so I didn't in the hopes that the
conflict could be resolved...

What you think?


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-12 12:06         ` Conor Dooley
@ 2023-01-12 12:28           ` Heiko Stübner
  0 siblings, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-12 12:28 UTC (permalink / raw)
  To: palmer, Conor Dooley
  Cc: Andrew Jones, linux-riscv, palmer, christoph.muellner, conor,
	philipp.tomsich, jszhang

Am Donnerstag, 12. Januar 2023, 13:06:01 CET schrieb Conor Dooley:
> On Thu, Jan 12, 2023 at 12:21:38PM +0100, Heiko Stübner wrote:
> > 
> > I've gone forward and imported that series as a dependency in my tree so that the
> > ordering becomes correct :-) .
> > 
> > Though of course I got a conflict with Andrew's
> > commit e923f4625ed3 ("riscv: Apply a static assert to riscv_isa_ext_id")
> > but fixed that up - though maybe a rebased v3 may be in order?
> 
> To be honest, I "strategically" didn't resend a v3 despite that trivial
> conflict since it's at the bottom of the list on patchwork.
> I was hoping that it'd get merged earlier in the window so any of the
> various bits of extension support stuff could go on top of the re-order,
> given it conflicts with, give or take, every other extension related
> patch. That's not come to pass (yet?) though.
> 
> I can easily respin, but if I do & some other extension bit gets merged
> before it, it'll conflict again, so I didn't in the hopes that the
> conflict could be resolved...

I guess it mainly comes down to what is easier for Palmer.

From personal experience, the less you have to think before applying a
patch, the easier it is to just do that that in a free minute. Though on
the other hand, the conflict in your series is pretty minimal.

So both ways have their advantages somehow.



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations
  2023-01-10 12:13   ` Andrew Jones
  2023-01-11 12:30     ` Andrew Jones
@ 2023-01-12 16:05     ` Heiko Stübner
  1 sibling, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-12 16:05 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

Hi Andrew,

thanks a lot for taking the time to provide these extensive and really
helpful comments.

Am Dienstag, 10. Januar 2023, 13:13:20 CET schrieb Andrew Jones:
> On Mon, Jan 09, 2023 at 07:17:54PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Depending on supported extensions on specific RISC-V cores,
> > optimized str* functions might make sense.
> > 
> > This adds basic infrastructure to allow patching the function calls
> > via alternatives later on.
> > 
> > The Linux kernel provides standard implementations for string functions
> > but when architectures want to extend them, they need to provide their
> > own.
> > 
> > The added generic string functions are done in assembler (taken from
> > disassembling the main-kernel functions for now) to allow us to control
> > the used registers and extend them with optimized variants.
> > 
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/include/asm/string.h | 10 +++++++++
> >  arch/riscv/kernel/riscv_ksyms.c |  3 +++
> >  arch/riscv/lib/Makefile         |  3 +++
> >  arch/riscv/lib/strcmp.S         | 37 ++++++++++++++++++++++++++++++
> >  arch/riscv/lib/strlen.S         | 28 +++++++++++++++++++++++
> >  arch/riscv/lib/strncmp.S        | 40 +++++++++++++++++++++++++++++++++
> >  arch/riscv/purgatory/Makefile   | 13 +++++++++++
> >  7 files changed, 134 insertions(+)
> >  create mode 100644 arch/riscv/lib/strcmp.S
> >  create mode 100644 arch/riscv/lib/strlen.S
> >  create mode 100644 arch/riscv/lib/strncmp.S
> > 
> > diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> > index 909049366555..a96b1fea24fe 100644
> > --- a/arch/riscv/include/asm/string.h
> > +++ b/arch/riscv/include/asm/string.h
> > @@ -18,6 +18,16 @@ extern asmlinkage void *__memcpy(void *, const void *, size_t);
> >  #define __HAVE_ARCH_MEMMOVE
> >  extern asmlinkage void *memmove(void *, const void *, size_t);
> >  extern asmlinkage void *__memmove(void *, const void *, size_t);
> > +
> > +#define __HAVE_ARCH_STRCMP
> > +extern asmlinkage int strcmp(const char *cs, const char *ct);
> > +
> > +#define __HAVE_ARCH_STRLEN
> > +extern asmlinkage __kernel_size_t strlen(const char *);
> > +
> > +#define __HAVE_ARCH_STRNCMP
> > +extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
> > +
> >  /* For those files which don't want to check by kasan. */
> >  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
> >  #define memcpy(dst, src, len) __memcpy(dst, src, len)
> > diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c
> > index 5ab1c7e1a6ed..a72879b4249a 100644
> > --- a/arch/riscv/kernel/riscv_ksyms.c
> > +++ b/arch/riscv/kernel/riscv_ksyms.c
> > @@ -12,6 +12,9 @@
> >  EXPORT_SYMBOL(memset);
> >  EXPORT_SYMBOL(memcpy);
> >  EXPORT_SYMBOL(memmove);
> > +EXPORT_SYMBOL(strcmp);
> > +EXPORT_SYMBOL(strlen);
> > +EXPORT_SYMBOL(strncmp);
> >  EXPORT_SYMBOL(__memset);
> >  EXPORT_SYMBOL(__memcpy);
> >  EXPORT_SYMBOL(__memmove);
> > diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
> > index 25d5c9664e57..6c74b0bedd60 100644
> > --- a/arch/riscv/lib/Makefile
> > +++ b/arch/riscv/lib/Makefile
> > @@ -3,6 +3,9 @@ lib-y			+= delay.o
> >  lib-y			+= memcpy.o
> >  lib-y			+= memset.o
> >  lib-y			+= memmove.o
> > +lib-y			+= strcmp.o
> > +lib-y			+= strlen.o
> > +lib-y			+= strncmp.o
> >  lib-$(CONFIG_MMU)	+= uaccess.o
> >  lib-$(CONFIG_64BIT)	+= tishift.o
> >  
> > diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> > new file mode 100644
> > index 000000000000..94440fb8390c
> > --- /dev/null
> > +++ b/arch/riscv/lib/strcmp.S
> > @@ -0,0 +1,37 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strcmp(const char *cs, const char *ct) */
> > +SYM_FUNC_START(strcmp)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strcmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2
> > +	 */
> > +	mv	t2, a1
> 
> The above instruction and the 'mv a1, t2' below appear to be attempting
> to preserve a1, but that shouldn't be necessary.

correct and gone now

> > +1:
> > +	lbu	t1, 0(a0)
> > +	lbu	t0, 0(a1)
> 
> I'd rather have t0 be 0(a0) and t1 be 0(a1)

ok

> > +	addi	a0, a0, 1
> > +	addi	a1, a1, 1
> > +	beq	t1, t0, 3f
> > +	li	a0, 1
> > +	bgeu	t1, t0, 2f
> > +	li	a0, -1
> > +2:
> > +	mv	a1, t2
> > +	ret
> > +3:
> > +	bnez	t1, 1b
> > +	li	a0, 0
> > +	j	2b
> 
> For fun I removed one conditional and one unconditional branch (untested)
> 
> 1:
>      lbu     t0, 0(a0)
>      lbu     t1, 0(a1)
>      addi    a0, a0, 1
>      addi    a1, a1, 1
>      bne     t0, t1, 2f
>      bnez    t0, 1b
>      li      a0, 0
>      ret
> 2:
>      slt     a1, t1, t0
>      slli    a1, a1, 1
>      li      a0, -1
>      add     a0, a0, a1
>      ret

yep that
- looks correct
- and also seems to produce correct results

also including your
	  sub     a0, t0, t1
comment from the followup that then produces the same result als the
zbb-variant.

And I've also verified the
- 0, if the s1 and s2 are equal;
- a negative value if s1 is less than s2;
- a positive value if s1 is greater than s2.
return value calling convention with documentation

And added a comment above it, pointing out this fact for the next
person stumbling over this :-)


> > +SYM_FUNC_END(strcmp)
> > diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> > new file mode 100644
> > index 000000000000..09a7aaff26c8
> > --- /dev/null
> > +++ b/arch/riscv/lib/strlen.S
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strlen(const char *s) */
> > +SYM_FUNC_START(strlen)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - string length
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - String to measure
> > +	 *
> > +	 * Clobbers:
> > +	 *   t0, t1
> > +	 */
> > +	mv	t1, a0
> > +1:
> > +	lbu	t0, 0(t1)
> > +	bnez	t0, 2f
> > +	sub	a0, t1, a0
> > +	ret
> > +2:
> > +	addi	t1, t1, 1
> > +	j	1b
> 
> Slightly reorganizing looks better (to me)
> 
>    mv    t1, a0
> 1:
>    lbu   t0, 0(t1)
>    beqz  t0, 2f
>    addi  t1, t1, 1
>    j     1b
> 2:
>    sub a0, t1, a0
>    ret

ok


> > +SYM_FUNC_END(strlen)
> > diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> > new file mode 100644
> > index 000000000000..493ab6febcb2
> > --- /dev/null
> > +++ b/arch/riscv/lib/strncmp.S
> > @@ -0,0 +1,40 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm.h>
> > +#include <asm-generic/export.h>
> > +
> > +/* int strncmp(const char *cs, const char *ct, size_t count) */
> > +SYM_FUNC_START(strncmp)
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strncmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *   a2 - number of characters to compare
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2
> > +	 */
> > +	li	t0, 0
> > +1:
> > +	beq	a2, t0, 4f
> > +	add	t1, a0, t0
> > +	add	t2, a1, t0
> > +	lbu	t1, 0(t1)
> > +	lbu	t2, 0(t2)
> > +	beq	t1, t2, 3f
> > +	li	a0, 1
> > +	bgeu	t1, t2, 2f
> > +	li	a0, -1
> > +2:
> > +	ret
> > +3:
> > +	addi	t0, t0, 1
> > +	bnez	t1, 1b
> > +4:
> > +	li	a0, 0
> > +	j	2b
> 
> (untested)
> 
>      li      t2, 0
> 1:
>      beq     a2, t2, 2f
>      lbu     t0, 0(a0)
>      lbu     t1, 0(a1)
>      addi    a0, a0, 1
>      addi    a1, a1, 1
>      bne     t0, t1, 3f
>      addi    t2, t2, 1
>      bnez    t0, 1b
> 2:
>      li      a0, 0
>      ret
> 3:
>      slt     a1, t1, t0
>      slli    a1, a1, 1
>      li      a0, -1
>      add     a0, a0, a1
>      ret

same here, I did go over the changed assembly and verified that
it produces the same results as the original and then did a second
pass to also add the  
	sub     a0, t0, t1
replacment.


> > +SYM_FUNC_END(strncmp)
> > diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
> > index dd58e1d99397..d16bf715a586 100644
> > --- a/arch/riscv/purgatory/Makefile
> > +++ b/arch/riscv/purgatory/Makefile
> > @@ -2,6 +2,7 @@
> >  OBJECT_FILES_NON_STANDARD := y
> >  
> >  purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
> > +purgatory-y += strcmp.o strlen.o strncmp.o
> >  
> >  targets += $(purgatory-y)
> >  PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> > @@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
> >  $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
> >  	$(call if_changed_rule,as_o_S)
> >  
> > +$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> > +$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> > +$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
> > +	$(call if_changed_rule,as_o_S)
> > +
> >  $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
> >  	$(call if_changed_rule,cc_o_c)
> >  
> > @@ -77,6 +87,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
> >  AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
> >  AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
> >  AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
> > +AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
> >  
> >  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> >  		$(call if_changed,ld)
> >
> 
> With at least the removal of the unnecessary preserving of a1 in strcmp,
> then it looks correct to me, so
> 
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> 
> But I think there's room for making it more readable, and maybe even
> optimized, as I've tried to do.


Thanks a lot
Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: add zbb support to string functions
  2023-01-11 12:24   ` Andrew Jones
  2023-01-11 14:27     ` Christoph Müllner
@ 2023-01-12 22:05     ` Heiko Stübner
  1 sibling, 0 replies; 42+ messages in thread
From: Heiko Stübner @ 2023-01-12 22:05 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, palmer, christoph.muellner, conor, philipp.tomsich, jszhang

Hi Andrew,

Am Mittwoch, 11. Januar 2023, 13:24:52 CET schrieb Andrew Jones:
> On Mon, Jan 09, 2023 at 07:17:55PM +0100, Heiko Stuebner wrote:
> > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > 
> > Add handling for ZBB extension and add support for using it as a
> > variant for optimized string functions.
> > 
> > Support for the Zbb-str-variants is limited to the GNU-assembler
> > for now, as LLVM has not yet acquired the functionality to
> > selectively change the arch option in assembler code.
> > This is still under review at
> >     https://reviews.llvm.org/D123515
> > 
> > Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
> > Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
> > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > ---
> >  arch/riscv/Kconfig                   |  24 ++++++
> >  arch/riscv/include/asm/errata_list.h |   3 +-
> >  arch/riscv/include/asm/hwcap.h       |   1 +
> >  arch/riscv/include/asm/string.h      |   2 +
> >  arch/riscv/kernel/cpu.c              |   1 +
> >  arch/riscv/kernel/cpufeature.c       |  18 +++++
> >  arch/riscv/lib/strcmp.S              |  94 ++++++++++++++++++++++
> >  arch/riscv/lib/strlen.S              | 114 +++++++++++++++++++++++++++
> >  arch/riscv/lib/strncmp.S             | 111 ++++++++++++++++++++++++++
> >  9 files changed, 367 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..7c814fbf9527 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -416,6 +416,30 @@ config RISCV_ISA_SVPBMT
> >  
> >  	   If you don't know what to do here, say Y.
> >  
> > +config TOOLCHAIN_HAS_ZBB
> > +	bool
> > +	default y
> > +	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
> > +	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
> > +	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
> > +	depends on AS_IS_GNU
> > +
> > +config RISCV_ISA_ZBB
> > +	bool "Zbb extension support for bit manipulation instructions"
> > +	depends on TOOLCHAIN_HAS_ZBB
> > +	depends on !XIP_KERNEL && MMU
> > +	select RISCV_ALTERNATIVE
> > +	default y
> > +	help
> > +	   Adds support to dynamically detect the presence of the ZBB
> > +	   extension (basic bit manipulation) and enable its usage.
> > +
> > +	   The Zbb extension provides instructions to accelerate a number
> > +	   of bit-specific operations (count bit population, sign extending,
> > +	   bitrotation, etc).
> > +
> > +	   If you don't know what to do here, say Y.
> > +
> >  config TOOLCHAIN_HAS_ZICBOM
> >  	bool
> >  	default y
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 4180312d2a70..95e626b7281e 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -24,7 +24,8 @@
> >  
> >  #define	CPUFEATURE_SVPBMT 0
> >  #define	CPUFEATURE_ZICBOM 1
> > -#define	CPUFEATURE_NUMBER 2
> > +#define	CPUFEATURE_ZBB 2
> > +#define	CPUFEATURE_NUMBER 3
> >  
> >  #ifdef __ASSEMBLY__
> >  
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 86328e3acb02..b727491fb100 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -59,6 +59,7 @@ enum riscv_isa_ext_id {
> >  	RISCV_ISA_EXT_ZIHINTPAUSE,
> >  	RISCV_ISA_EXT_SSTC,
> >  	RISCV_ISA_EXT_SVINVAL,
> > +	RISCV_ISA_EXT_ZBB,
> >  	RISCV_ISA_EXT_ID_MAX
> >  };
> >  static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
> > diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
> > index a96b1fea24fe..17dfc4ab4c80 100644
> > --- a/arch/riscv/include/asm/string.h
> > +++ b/arch/riscv/include/asm/string.h
> > @@ -6,6 +6,8 @@
> >  #ifndef _ASM_RISCV_STRING_H
> >  #define _ASM_RISCV_STRING_H
> >  
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >  #include <linux/types.h>
> >  #include <linux/linkage.h>
> >  
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index 1b9a5a66e55a..c4d1aa166f8b 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -162,6 +162,7 @@ arch_initcall(riscv_cpuinfo_init);
> >   *    extensions by an underscore.
> >   */
> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> > +	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
> >  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
> >  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
> >  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 205bbd6b1fce..bf3a791d7110 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -222,6 +222,7 @@ void __init riscv_fill_hwcap(void)
> >  					set_bit(nr, this_isa);
> >  				}
> >  			} else {
> > +				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
> >  				SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
> >  				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> >  				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> > @@ -301,6 +302,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
> >  	return true;
> >  }
> >  
> > +static bool __init_or_module cpufeature_probe_zbb(unsigned int stage)
> > +{
> > +	if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB))
> > +		return false;
> > +
> > +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > +		return false;
> > +
> > +	if (!riscv_isa_extension_available(NULL, ZBB))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> >  /*
> >   * Probe presence of individual extensions.
> >   *
> > @@ -318,6 +333,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
> >  	if (cpufeature_probe_zicbom(stage))
> >  		cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
> >  
> > +	if (cpufeature_probe_zbb(stage))
> > +		cpu_req_feature |= BIT(CPUFEATURE_ZBB);
> > +
> >  	return cpu_req_feature;
> >  }
> >  
> > diff --git a/arch/riscv/lib/strcmp.S b/arch/riscv/lib/strcmp.S
> > index 94440fb8390c..5428a8f2eb84 100644
> > --- a/arch/riscv/lib/strcmp.S
> > +++ b/arch/riscv/lib/strcmp.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >  
> >  /* int strcmp(const char *cs, const char *ct) */
> >  SYM_FUNC_START(strcmp)
> > +
> > +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> 
> How about strcmp_zbb and similar for the other functions for the labels?

agreed, that is a way more sensible naming

> > +
> >  	/*
> >  	 * Returns
> >  	 *   a0 - comparison result, value like strcmp
> > @@ -34,4 +39,93 @@ SYM_FUNC_START(strcmp)
> >  	bnez	t1, 1b
> >  	li	a0, 0
> >  	j	2b
> > +
> > +/*
> > + * Variant of strcmp using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +#define src1		a0
> > +#define result		a0
> > +#define src2		t5
> > +#define data1		t0
> > +#define data2		t1
> > +#define align		t2
> > +#define data1_orcb	t3
> > +#define m1		t4
> 
> nit: It's probably just me, but I'm not a big fan of creating defines
> for registers, particularly when the meaning of the registers' contents
> gets changed, because then the name, at least temporarily, no longer
> accurately describes the content.

I just had to scroll up and down a lot between these defines and the
parts using it, cause I couldn't really remember which register that was :-)

Also these somewhat generic names the produce conflicts in my
zbb+fast-unaligned variant, so would need more prefixes.

And I actually find it harder to read the assembly when the names are
these different lengths instead of the usual 2 characters and things are
neatly aligned in the assembly.


So in sum, I decided to follow that suggestion and dropped the defines.


> And, in this case, it looks like the code matches Appendix A, except the
> register and label naming. I'd prefer to have those match too so I could
> "review" with the diff tool.
> 
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, value like strcmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3, t4, t5
> > +	 */
> > +	mv	src2, a1
> > +
> > +	or	align, src1, src2
> > +	li	m1, -1
> > +	and	align, align, SZREG-1
> > +	bnez	align, 3f
> > +
> > +	/* Main loop for aligned string.  */
> > +	.p2align 3
> 
> Why are the starts of the loops aligned to 8 byte boundaries?
> 
> > +1:
> > +	REG_L	data1, 0(src1)
> > +	REG_L	data2, 0(src2)
> > +	orc.b	data1_orcb, data1
> > +	bne	data1_orcb, m1, 2f
> > +	addi	src1, src1, SZREG
> > +	addi	src2, src2, SZREG
> > +	beq	data1, data2, 1b
> > +
> > +	/*
> > +	 * Words don't match, and no null byte in the first
> > +	 * word. Get bytes in big-endian order and compare.
> > +	 */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > +	rev8	data1, data1
> > +	rev8	data2, data2
> > +#endif
> > +
> > +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence. */
> > +	sltu	result, data1, data2
> > +	neg	result, result
> > +	ori	result, result, 1
> > +	ret
> > +
> > +2:
> > +	/*
> > +	 * Found a null byte.
> > +	 * If words don't match, fall back to simple loop.
> > +	 */
> > +	bne	data1, data2, 3f
> > +
> > +	/* Otherwise, strings are equal. */
> > +	li	result, 0
> > +	ret
> > +
> > +	/* Simple loop for misaligned strings. */
> > +	.p2align 3
> > +3:
> > +	lbu	data1, 0(src1)
> > +	lbu	data2, 0(src2)
> > +	addi	src1, src1, 1
> > +	addi	src2, src2, 1
> > +	bne	data1, data2, 4f
> > +	bnez	data1, 3b
> > +
> > +4:
> > +	sub	result, data1, data2
> 
> The non-optimized version returns explicitly -1, 0, 1, whereas this
> returns < 0, 0, > 0. Assuming we don't need the explicit -1 / 1 then we
> can change the non-optimized version to do this subtraction for its
> return value too, saving several instructions.
> 
> > +	ret
> > +
> > +.option pop
> > +#endif
> >  SYM_FUNC_END(strcmp)
> > diff --git a/arch/riscv/lib/strlen.S b/arch/riscv/lib/strlen.S
> > index 09a7aaff26c8..738efb04307d 100644
> > --- a/arch/riscv/lib/strlen.S
> > +++ b/arch/riscv/lib/strlen.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >  
> >  /* int strlen(const char *s) */
> >  SYM_FUNC_START(strlen)
> > +
> > +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> > +
> >  	/*
> >  	 * Returns
> >  	 *   a0 - string length
> > @@ -25,4 +30,113 @@ SYM_FUNC_START(strlen)
> >  2:
> >  	addi	t1, t1, 1
> >  	j	1b
> > +
> > +/*
> > + * Variant of strlen using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +
> > +#define src		a0
> > +#define result		a0
> > +#define addr		t0
> > +#define data		t1
> > +#define offset		t2
> > +#define offset_bits	t2
> > +#define valid_bytes	t3
> > +#define m1		t3
> > +
> > +#ifdef CONFIG_CPU_BIG_ENDIAN
> > +# define CZ	clz
> > +# define SHIFT	sll
> > +#else
> > +# define CZ	ctz
> > +# define SHIFT	srl
> > +#endif
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +	/*
> > +	 * Returns
> > +	 *   a0 - string length
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - String to measure
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3
> > +	 */
> > +
> > +	/* Number of irrelevant bytes in the first word. */
> 
> I see that this comment doesn't just apply to the next instruction,
> but to several instructions. It threw me off at first. Again, I
> think I'd prefer we just copy+paste the Appendix, comments an all.

Hmm, somehow for me as non-native-speaker in assembly [ ;-) ],
using temporary registers for temporary variables feels somehow
more intuitive. 


> > +	andi	offset, src, SZREG-1
> > +
> > +	/* Align pointer. */
> > +	andi	addr, src, -SZREG
> > +
> > +	li	valid_bytes, SZREG
> > +	sub	valid_bytes, valid_bytes, offset
> > +	slli	offset_bits, offset, RISCV_LGPTR
> 
> The shift immediate should be 3, even for rv32, since we want
> bits-per-byte, not bytes-per-word. The spec example uses PTRLOG, which
> does imply bytes-per-word to me as well, so that seems like a spec bug.

fixed this according to your's and Christoph's comment


> > +
> > +	/* Get the first word.  */
> > +	REG_L	data, 0(addr)
> > +
> > +	/*
> > +	 * Shift away the partial data we loaded to remove the irrelevant bytes
> > +	 * preceding the string with the effect of adding NUL bytes at the
> > +	 * end of the string.
> 
> the end of the string's first word.

ok


> > +	 */
> > +	SHIFT	data, data, offset_bits
> > +
> > +	/* Convert non-NUL into 0xff and NUL into 0x00. */
> > +	orc.b	data, data
> > +
> > +	/* Convert non-NUL into 0x00 and NUL into 0xff. */
> > +	not	data, data
> > +
> > +	/*
> > +	 * Search for the first set bit (corresponding to a NUL byte in the
> > +	 * original chunk).
> > +	 */
> > +	CZ	data, data
> > +
> > +	/*
> > +	 * The first chunk is special: commpare against the number
> 
> compare

ok


> > +	 * of valid bytes in this chunk.
> > +	 */
> > +	srli	result, data, 3
> > +	bgtu	valid_bytes, result, 3f
> > +
> > +	/* Prepare for the word comparison loop. */
> > +	addi	offset, addr, SZREG
> > +	li	m1, -1
> > +
> > +	/*
> > +	 * Our critical loop is 4 instructions and processes data in
> > +	 * 4 byte or 8 byte chunks.
> > +	 */
> > +	.p2align 3
> > +1:
> > +	REG_L	data, SZREG(addr)
> > +	addi	addr, addr, SZREG
> > +	orc.b	data, data
> > +	beq	data, m1, 1b
> > +2:
> > +	not	data, data
> > +	CZ	data, data
> > +
> > +	/* Get number of processed words.  */
> > +	sub	offset, addr, offset
> > +
> > +	/* Add number of characters in the first word.  */
> > +	add	result, result, offset
> > +	srli	data, data, 3
> > +
> > +	/* Add number of characters in the last word.  */
> > +	add	result, result, data
> > +3:
> > +	ret
> > +
> > +.option pop
> > +#endif
> >  SYM_FUNC_END(strlen)
> > diff --git a/arch/riscv/lib/strncmp.S b/arch/riscv/lib/strncmp.S
> > index 493ab6febcb2..851428b439dc 100644
> > --- a/arch/riscv/lib/strncmp.S
> > +++ b/arch/riscv/lib/strncmp.S
> > @@ -3,9 +3,14 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm-generic/export.h>
> > +#include <asm/alternative-macros.h>
> > +#include <asm/errata_list.h>
> >  
> >  /* int strncmp(const char *cs, const char *ct, size_t count) */
> >  SYM_FUNC_START(strncmp)
> > +
> > +	ALTERNATIVE("nop", "j variant_zbb", 0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
> > +
> >  	/*
> >  	 * Returns
> >  	 *   a0 - comparison result, value like strncmp
> > @@ -37,4 +42,110 @@ SYM_FUNC_START(strncmp)
> >  4:
> >  	li	a0, 0
> >  	j	2b
> > +
> > +/*
> > + * Variant of strncmp using the ZBB extension if available
> > + */
> > +#ifdef CONFIG_RISCV_ISA_ZBB
> > +variant_zbb:
> > +
> > +#define src1		a0
> > +#define result		a0
> > +#define src2		t6
> > +#define len		a2
> > +#define data1		t0
> > +#define data2		t1
> > +#define align		t2
> > +#define data1_orcb	t3
> > +#define limit		t4
> > +#define m1		t5
> > +
> > +.option push
> > +.option arch,+zbb
> > +
> > +	/*
> > +	 * Returns
> > +	 *   a0 - comparison result, like strncmp
> > +	 *
> > +	 * Parameters
> > +	 *   a0 - string1
> > +	 *   a1 - string2
> > +	 *   a2 - number of characters to compare
> > +	 *
> > +	 * Clobbers
> > +	 *   t0, t1, t2, t3, t4, t5, t6
> > +	 */
> > +	mv	src2, a1
> > +
> > +	or	align, src1, src2
> > +	li	m1, -1
> > +	and	align, align, SZREG-1
> > +	add	limit, src1, len
> > +	bnez	align, 4f
> > +
> > +	/* Adjust limit for fast-path.  */
> > +	addi	limit, limit, -SZREG
> 
> To avoid unnecessarily writing the last SZREG bytes one at a time when
> 'len' is SZREG aligned, we should be using the explicitly aligned limit
> here (limit & ~(SZREG - 1)) rather than just subtracting SZREG. Then, down
> in the slow-path we use the original limit to finish off if len wasn't
> aligned. When it was aligned, we'll just immediately branch to the ret.
> (I suspect many times 'len' will be SZREG aligned, since string buffers
> are typically allocated with power-of-two sizes).

I followed Christoph's suggestion from his follow-up to your mail to
address this.


> > +
> > +	/* Main loop for aligned string.  */
> > +	.p2align 3
> > +1:
> > +	bgt	src1, limit, 3f
> > +	REG_L	data1, 0(src1)
> > +	REG_L	data2, 0(src2)
> > +	orc.b	data1_orcb, data1
> > +	bne	data1_orcb, m1, 2f
> > +	addi	src1, src1, SZREG
> > +	addi	src2, src2, SZREG
> > +	beq	data1, data2, 1b
> > +
> > +	/*
> > +	 * Words don't match, and no null byte in the first
> > +	 * word. Get bytes in big-endian order and compare.
> > +	 */
> > +#ifndef CONFIG_CPU_BIG_ENDIAN
> > +	rev8	data1, data1
> > +	rev8	data2, data2
> > +#endif
> > +
> > +	/* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
> > +	sltu	result, data1, data2
> > +	neg	result, result
> > +	ori	result, result, 1
> > +	ret
> > +
> > +2:
> > +	/*
> > +	 * Found a null byte.
> > +	 * If words don't match, fall back to simple loop.
> > +	 */
> > +	bne	data1, data2, 3f
> > +
> > +	/* Otherwise, strings are equal.  */
> > +	li	result, 0
> > +	ret
> > +
> > +	/* Simple loop for misaligned strings.  */
> > +3:
> > +	/* Restore limit for slow-path.  */
> > +	addi	limit, limit, SZREG
> > +	.p2align 3
> > +4:
> > +	bge	src1, limit, 6f
> > +	lbu	data1, 0(src1)
> > +	lbu	data2, 0(src2)
> > +	addi	src1, src1, 1
> > +	addi	src2, src2, 1
> > +	bne	data1, data2, 5f
> > +	bnez	data1, 4b
> > +
> > +5:
> > +	sub	result, data1, data2
> > +	ret
> > +
> > +6:
> > +	li	result, 0
> > +	ret
> 
> As strcmp and strncmp have so much in common it's tempting to try and
> merge them somehow. Maybe with a handful of macros shared between them?

I guess the question is does this make things easier to read or harder.

As mentioned above, going between register defines and code was hard
already. With at least a strcmp-zbb-unaligned variant in my follow-up and
possible more future variants, my feeling is to leave strcmp, strncmp in
their own .S files for readability (and how everyone else is doing it)
and thus sharing macros between them would require a shared header then.

Without comments strncmp_zbb is like 46 lines long, so right now it feels
like splitting things up would make it harder to follow, but I could be wrong :-) .


Heiko



_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-01-12 22:06 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09 18:17 [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Heiko Stuebner
2023-01-09 18:17 ` [PATCH v4 1/5] RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes Heiko Stuebner
2023-01-09 20:53   ` Conor Dooley
2023-01-11 15:14     ` Heiko Stübner
2023-01-10  8:32   ` Andrew Jones
2023-01-09 18:17 ` [PATCH v4 2/5] RISC-V: add helpers for J-type immediate handling Heiko Stuebner
2023-01-09 22:22   ` Conor Dooley
2023-01-10  8:44   ` Andrew Jones
2023-01-10  8:54     ` Conor Dooley
2023-01-11 14:43       ` Jisheng Zhang
2023-01-09 18:17 ` [PATCH v4 3/5] RISC-V: fix jal addresses in patched alternatives Heiko Stuebner
2023-01-10  9:28   ` Andrew Jones
2023-01-11 17:15     ` Jisheng Zhang
2023-01-11 13:18   ` Jisheng Zhang
2023-01-11 13:53     ` Heiko Stübner
2023-01-11 14:15     ` Andrew Jones
2023-01-11 14:44       ` Jisheng Zhang
2023-01-09 18:17 ` [PATCH v4 4/5] RISC-V: add infrastructure to allow different str* implementations Heiko Stuebner
2023-01-09 22:37   ` Conor Dooley
2023-01-09 23:31     ` Heiko Stübner
2023-01-10  9:39   ` Andrew Jones
2023-01-10 10:46     ` Heiko Stübner
2023-01-10 11:16       ` Andrew Jones
2023-01-11 12:34         ` Andrew Jones
     [not found]           ` <CAEg0e7gJgpoiGjfLeedba0-r=dCE1Z_qkU53w_+-cVjsuqaC3A@mail.gmail.com>
2023-01-11 13:42             ` Philipp Tomsich
2023-01-11 13:47             ` Andrew Jones
2023-01-10 12:13   ` Andrew Jones
2023-01-11 12:30     ` Andrew Jones
2023-01-12 16:05     ` Heiko Stübner
2023-01-09 18:17 ` [PATCH v4 5/5] RISC-V: add zbb support to string functions Heiko Stuebner
2023-01-09 20:39   ` Conor Dooley
2023-01-10  9:57   ` Andrew Jones
2023-01-10 10:14     ` Conor Dooley
2023-01-12 11:21       ` Heiko Stübner
2023-01-12 12:06         ` Conor Dooley
2023-01-12 12:28           ` Heiko Stübner
2023-01-11 12:24   ` Andrew Jones
2023-01-11 14:27     ` Christoph Müllner
2023-01-11 15:16       ` Andrew Jones
2023-01-11 15:22       ` Jeff Law
2023-01-12 22:05     ` Heiko Stübner
2023-01-11 13:24 ` [PATCH v4 0/5] Zbb string optimizations and call support in alternatives Jisheng Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.